Sunday, October 10, 2021

What kind of product are you making?

First know everything, then you can automate it! Also, if you can express your problem in numbers, I can tell you if they’re going up or down.

A freeform exploration product attempts to enable a customer to achieve understanding and express the problem in numbers. Features may include user-editable schema (traditional database and ETL products) or schema on the fly (Splunk or Elastic), loose typing, and a “unix is user friendly” workflow experience. In some markets this sort of product can be so far beyond usability that it is more of a platform for partners to build products on. Many products rightfully do not expose their use of MS-SQL, Oracle, Mongo, or Redis to their users. 

Those products can be thought of as market-targeted solutions. They come to the customer with an opinion about what problem is being solved and how that solution should work. They notably have predefined schema and strict typing (good for performance and safety). One should expect a polished wizards-n-workflows experience with no unnecessary options.

So, as a development team with a project starting up, you might need to ask yourself: which one of those supports the business outcome this project is looking for?

Unless you are starting a company from scratch, the first approach of solving hard, general problems is probably not what you want to tackle. And even when you are starting from scratch to build a new platform… there’s a lot of reasons why most startups fail. Time lost to analysis paralysis at general problems is one. Inability to describe your value add to the market of customers with problems is an ever bigger one. If I’m trying to reduce theft in my supply chain, I probably don’t want to start with defining schemas in a raw data management tool.

And if you’re in a team at an existing company, chances are extremely slim that your charter includes solving general problems. Much more likely is a charter to deliver a solution which maximally leverages your existing platform to capture entry in a new market.

Saturday, October 2, 2021

Declaring Idea Bankruptcy

It’s obvious that your R&D team can’t do everything at once, right?

It’s obvious that items lower on the backlog aren’t going to happen unless they displace something higher on the backlog, right?

And yet. Lots of those items are people’s beloved ideas. Good ideas, that would make the product better, open new business opportunities, solve real problems. 

Good ideas are going to be rejected for a lack of capacity to develop and exploit them. This makes a lot of people sad, so product managers can fall into a trap. Instead of taking action, they ignore the backlog as it grows to thousands and thousands of items. And now, it’s no longer functional. Instead of a todo list, you have a swamp. People who actually need a backlog start doing their own in shadow IT, the roadmap moves into spreadsheets, and your organization no longer has visibility between planning and execution.

There’s a solution to this: Automatic bankruptcy. Any tickets that aren't touched in six months get resolved “future consideration” with a friendly note to reopen if it's still relevant.

Let’s look at how this works at each level of activity:

  • Tasks are the easiest— a task is just a development note of things that ought to be done. When they don’t get done quickly, they’re forgotten. A reminder at six months lets the unimportant and already done tickets close easily, and prompts the developers to do it if it’s still important. There’s even an argument to silently close these, though I don’t agree with that idea.
  • Bugs are also pretty easy. Bug tickets are regularly opened without sufficient information to act on, leading to either stopped communication or out-of-band investigation. This produces a steady flow of junk tickets. Real problems get fixed way before six months, if only because they hit some executive’s inbox. So a bug with six months of inactivity is highly unlikely to be acted on. Shut it down. Maybe the watchers will reopen with new information.
  • Stories (may also be called Improvements or New Features) are where it gets tough: this is where you’re discussing people’s ideas and problems. Maybe the filer is losing hours per week because this story isn’t done. Maybe they're going to lose some deals. Maybe they’re just insulted that you aren’t acting on their advice. Nevertheless, six months of inactivity is a strong signal: your team has not had bandwidth for this. Don’t delay any further: it’s the Product Manager's job to determine if this is truly a priority and make it happen, or to let the filer down as easily as possible. A PM who avoids this difficult conversation is doing no favors to anyone.
  • Epics are easy, just collections of stories and bugs, everyone forgets they exist as soon as MVP is shipped. This is housekeeping on the same level as tasks.
  • Initiatives: now for the hardest one. Initiatives may not be in the same backlog at all, maybe you’re using post-its or a spreadsheet or a product management specific tool. Still, you’re looking at the same problem as the story, just writ large. Should we field a product to enter that market? Maybe it's a good idea, that's great. But, there’s other things that won’t be done now, and the point of this process is to include opportunity costs in decision making. Or maybe there's been some out-of-band people working the idea to prove it out. If customers don’t really want it or you can't really build it, then it stops. Lots of initiatives quietly die, either untried or stopped after investigation. Just like in the development backlog, they clog up your vision, until you no longer know what your plan is. As a product leader, if you're trying to tell the e-staff and board that there's ten years of work in your todo list, expect to hear some questions.

It really doesn’t matter if we’re discussing board-facing initiatives or developer-facing tasks or the stories and bugs between them. After six months of inactivity: everything must either die or be defended.

Doing it in JIRA:

  1. Meet with R&D and customer support leadership so they know what you're about to do and why you're doing it. Get at least to disagree and commit. It's critical to note that resolution isn't permanent -- if the team is willing to keep reopening this ticket, maybe you need to just do it. The goal is to make communication happen.
  2. Edit the relevant shared schemas and add a resolution type of “Future Consideration”. For instance, you might have four schemas: initiatives, software projects, SaaS projects, and services projects. Make sure all software development projects use the same schema or else life sucks too much for you to follow any of this advice.
  3. Announce to the organization that you're starting this process.
  4. Make a saved search like this: updatedDate <= startofday(-180d) and resolution=unresolved. You'll probably need to start with a selection of projects and gradually expand to everything as you train the organization to this new reality.
  5. Schedule it for once a week on your most communication friendly day.
  6. Bulk action, Transition, Resolve, Future Consideration, Resolution Comment: "Automatically resolved after six months of inactivity. Please reopen if still valid." 
    • You'll want a keyboard shortcut for that phrase, just like "What is the problem you are trying to solve?"
  7. Leave the Send Email button checked on, and politely engage with the conversations that result. 

Saturday, September 18, 2021

The regrettable features you have to do

Sometimes as a product manager you get a feature request that’s fun and challenging and moves your company forward. Then there’s requests that just make you feel like a sad clown: stuff that doesn’t fit your plan at all. It’s hard work to ignore the nay-sayers and make a new thing. The product team and engineering have worked together to create a different and better approach to a class of problems… Congratulations! But there’s features you may need to build even though you don’t want to.

Compromise between your product differentiation and market fit comes in three flavors: legitimate needs that you hadn’t recognized or accounted for, design culture that you’re hoping to change, and technology designs that you don’t address.

A legit need might be regulatory compliance that your design doesn’t account for well, or a use case where you were too optimistic about your solution’s fit. This feature request is legitimate, and the regrettable part is wholly on you and your team. You have to find a way to solve it or accept a limit on who you can sell to. That story is beyond the scope of this post.

But design culture… here you’re trying to change behavior. You think your product has a better answer, and you need customers to take a leap of faith. If you have an interesting idea you’ll probably be able to find early adopters, but these customer have a lot on the line. It’s only natural that they want the safety of familiarity or a backup plan, and you may need to offer a bridge feature

Lastly, technology design choices are anything but simple: some design domains just aren’t good fits, or aren’t within your company’s scope. Another source of compromise is design fashion. Ideas cycle in and out of fashion as the memory of failure fades. This can mean surges of industry excitement about concepts that you don’t personally agree with. Always begin with questioning your own assumptions, because conditions do change. Sometimes the idea that fizzled last time or the time before succeeds in the next iteration.

No matter what you think of design fashion, it’s not likely that you’ll be able to refuse to play the game. Your company is on the competitive field, and you have to have a response, or else your response is going to be whatever your sales people think of. So, research the problem space and ask what has changed. If there is new opportunity and you can capitalize on it, this isn’t a regrettable feature. That story is also beyond the scope of this post. But if you do your research and you don’t see how this time is any different, then you need to find an answer which minimizes investment while maximizing information return. 

That is the same regrettable feature answer as when the idea in question is not new fashion at all, but just not a good fit: a concept that your company doesn’t and/or won’t play in. If it’s not a good fit, then you shouldn’t build it, but you can’t ignore everything that isn’t a good fit.

So, regrettable features: bridges from old patterns to new, fashionable experiments, and requirements you don’t want to or can’t satisfy. What can be done by partnering, adding content, or remarketing what you already have?

Partnership can take a couple of flavors:  total outsource of the problem to a chosen vendor, or meeting the community at an interface. The total outsource is least effort on your part, but it only works if the partner is a de facto winner in the space. You don’t make success by stacking failures together. So if that obvious winner hasn’t taken all, you need to make a clear interface that treats them all the same. Set your terms, define your boundaries, pick a couple of partners to go to market with, and see what happens. You’ve got a feature, but it’s stripped to the minimum.

Adding content is another option for vendors who have a strong enough boundary between platform and content. If you have a community of customers, field engineers, and pro service partners building content, then you can solve problems without engineering and support contracts. This approach arguably has a limited shelf life because some customers will push back on the unsupported nature of roll-your-own or second party content. That point can be far in the future though, and you’ll certainly get some real world data about what customers want.

Remarketing is the toughest option, saved for last; only the very lucky vendors can shake the puzzle box and get a better picture. But if you can package your components differently to resolve a problem, it's a lot faster to do that work with legal and sales ops now instead of after engineering is done.

Saturday, September 4, 2021

Capitalizing and Operating a Software Business

“As I have noted in the past, this is why the venture capital model that was developed to support silicon so seamlessly switched to supporting software: both entail huge up-front costs to produce zero marginal cost goods, which means capped downside and theoretically infinite upside.” That Ben Thompson quote is from a non-public newsletter, but here's a similar public post.

"The most common high level concepts associated with lean product development are: 1, Creation of re-usable knowledge. Knowledge is created and maintained so that it can be leveraged for successive products or iterations." Lean product development - Wikipedia 

I particularly like the table on that page which, breaks products into needed, wanted, and wished for.

  1. A needed product has a broad requirement which is stable and commoditized
  2. A wanted product has a specialist requirement with future potential for a wider range of markets
  3. A wished for product has an unrealized requirement, needing to be introduced to market 

The concept of a modern, lean software business is to iterate development and improve product market fit until you have satisficed every market you can reach. 

Customers are trained to expect this as well. When you buy subscription model access to a software thing, you expect that thing to evolve and improve, right? Even if it’s commoditized, if you’re paying monthly you expect benefit for that recurring bite. Even if your vendor seems to follow a model closer to that of a car lease, there are still new models every couple of quarters and strong pressures to upgrade into those releases.

With the possible exception of accidental outcomes from failed acquisitions or private equities conversions, there are few places where enterprise software truly reaches “finished” and stops development. If people are still assigned to the product and people are still using it, which is to say if the product matters at all: then developers are receiving customer input and itching to fix things. Even if feature development does completely stop, bugs are impossible to prevent or detect perfectly. Additionally, the complexity of real world usage makes bug discovery and resulting impacts unpredictable. Therefore, any software thing a vendor is still taking support subscription money for has some level of ongoing maintenance requirement. A vendor that ignores this requirement is taking a risk.

This means the pivot from value creation to value extraction which the VC driven portion of the software industry (eg all of it) expects is sort of broken, right? Well, not exactly... but the model is not perfect either. The key is to think about that chip factory. It isn’t actually free to operate: regardless of the mystic processes that go into making sand into computational power, there are all the obvious inputs of any physical plant. Electricity, water, materials, people and their safety and comfort requirements, not to mention salaries. So there are ongoing costs, of course, it’s just that those costs are dwarfed by the profits of providing in-demand chips. Cost-benefit ratio is so good that it might as well be zero once the plant’s spin up costs are recovered. Similarly, the cost of keeping Milton in the basement working on product maintenance is negligible if the recurring revenue is high enough.

The goal is to keep costs linear while making revenues exponential. If you can do that, you fit the VC model and are a good bet. If you look like you can do that for a while, but can’t, you may still fit the bubble economy model because there’s a sucker born every minute. If you can’t do either, you probably won’t get funded by a VC.

Event Suppression Sucks

I’ve always hated the concept of event suppression in security products.

Let’s start with some definitions of suppression, and where better than product documentation?

There’s two common reasons for this feature:

The first: “I don't want to see this thing in my console of actionable items because I don't have the time, knowledge, perspective, or priority to do anything with it right now.” There is nothing inaccurate in this behavior, but suppressed alerts are a mismatched solution to this problem. A better answer is multi-level event generation. The system should recognize that some events are not worthy of human attention. Low importance events should go into a statistical model or an audit log, not an analyst’s workbench. The further left (earlier) in the event creation and processing pipeline this happens, the better. This design results in better performance and scalability, which means more signal, less noise per dollar spent. Suppressing generated events at the end of the pipeline is wasteful design that throttles the system’s capacity.

The second is “This rule is wrong and I can't edit it, but I need to get rid of its alerts.” This situation is tougher because it’s where internal or external regulations are driving behaviors. The rule book says to generate alerts no one is going to look at, so we do. The proper answer is still multi level alerts. “Proper” in this case means most efficiently using human and compute time. However, vendor and customer will also need to explain and negotiate with the rule-quoting gatekeepers in order to demonstrate that the rules are not broken. Sending less important events into a separate channel is functionally no different than suppression, but the effect is far more efficient because the separate channel isn’t indexed for rapid and continuous human access. Instrument that channel and monitor it as a production service and you’re good to disable event generation from incorrect or non-actionable rules.

There is a sub-form of that second reason: the analysts don’t have permission to edit rules to fix them or change event generation. “I don’t want to look at this but I can’t stop it so suppression is the answer.”  Maybe this is product immaturity, maybe it’s organizational failure, but it’s still a problem to fix. As a vendor, if your product supports a better option that your customers won’t use, you’ve got a customer input collection tour to do. This is a case where technology can’t solve a people problem (Edwards’ Law), so it needs person to person communication and possibly a professional services engagement. Customers who can adjust to the better model will be more effective and efficient than those who don’t, but you may still need to offer support to those who can’t. The design job therefore is to discourage suppression without preventing it.

Security is full of whataboutism, but most of the desired data  is really low value — so pass it into a low value, low cost pipeline. Don’t leave it clogging up your SOC.

Saturday, May 29, 2021

What Should Go Into a CMDB


It’s not every day that information technology work leads you into philosophy, but designing a configuration management database will do it. Spend a little while thinking about what is known or even knowable about the services you’re trying to provide, maybe you’ll end up asking “what does existence even mean?” Fear not, there are some practical guiding principles to follow. 

First, some background: what is the purpose of a Configuration Management Data Base (CMDB)? Why are we even trying to align Configuration Items (CIs) to the Services that they provide? The intention is to provide visibility into what affects what. The purpose of that visibility is to ensure reliability of changes, by understanding what is affected and planning in advance to minimize service interruption.

Of course, that goal is not simple to achieve. If it were really simple, it would just be a notebook or a spreadsheet, right? But the modern organization has a complex web of interdependent systems. A flat sheet of items misses so much of reality that it fails to make life any easier. On the other end of the complexity spectrum, there’s a very careful and intentional attempt to describe everything important. Let’s pretend that’s possible for a moment, which I doubt… it’s still unlikely that your organization can do it at a justifiable cost. If the system fails to provide return on investment, then it’s scrapped with good reason.

So, the CMDB should only track what is truly necessary to affordably meet the goal of visibility driving change safety. Common advice is to begin with a map of CIs to service offerings. But, is that a detailed 1:1 map? Sort of, maybe. That’s a fine directional goal, but you should not insist on or try to attain a complete CI to service offering mapping on day one. You might start off asking “which service offering does this CI map to in the service catalog”, but it’s all too easy to wander into epistemology. Business leaders are asking for things that can’t be easily made visible and so you end up doing service mappings by proxy… which doesn’t work. 

If you can’t draw a clear line from “this attribute on this entity maps to that attribute on that entity to support this metric indicating that service”, then it’s dreams going into the system, not measurements. So only put in things in that you have to because it’s obvious that they work to support a service. For instance, let’s say we’re talking about a job processing farm. The NAS volumes support the VMs that perform the jobs that test the designs. You can say “this volume supports that job” clearly, but you will struggle to say “this NAS failure is costing that business unit an on-time delivery of that deal” because you don’t have sufficient context in your software. In an organization with sufficient complexity to be considering a CMDB, the business map isn’t clear and the technology map is only a little better. 

What is a CI anyway, and how are you going to identify it? Is a group of ephemeral containers providing services worth a CI each, or one for the service, or none? If you’ve got a big expensive server and you replace its motherboard so the serials and MACs change, do you update all your records? What if you just changed its hostname? Does it still support the same services that it used to? How do you reference the systems you run on a cloud Infrastructure as a Service (IaaS)? Do you need a CI for each of the Software as a Service (SaaS) functions that you use?

The simplest answer to those questions is to manually define and track systems, ignoring identifying attributes like serials and MACs. However, this leaves recalls, warranty tracking, and depreciation out of the CMDB and in another tool, reducing the cost justification for the CMDB. Maybe that works for your organization, but it’s a decision that has got to be done across the organization so that you can present meaningful numbers in your KPIs and compliance audits.

What types of devices should go into the database? Does a network switch count, since it’s critical for making your organization’s software service operate? How about the AC power or HVAC systems that are critical for keeping the switch work? How about the roof that keeps the rain off the switch? Again, the philosophy of everything being interconnected is fascinating but you’ve got a job to do. So, a CI is any item that you can actively manage. I take an absolutist approach here: if there is not a software tool providing visibility and control to the device, then it is not a CI worth tracking.

My advice is to draw a CI line at the management interface. Only put a thing into the data system if it has an agent on it or an interface that enables a management system that you own to collect data. Again, three quarters of my career is in those agents, so call it self-serving if you like. But if the device can’t update your systems on its own; then someone has a job to maintain the “time-saving” tool. If you can afford to allocate people to maintaining data cleanliness in a CMDB so ITAM reports make sense and ITSM service requests go smoothly, that’s great, but it sounds like questionable use of resources at any scale. As Corey Quinn writes, “What people lose sight of is that infrastructure, in almost every case, costs less than payroll.”

Next, treat every collected attribute like it’s a personal insult. For every report, ask how little can you possibly collect and still solve the problem. Is there already data collected that will work? This is the land of practically unbounded high velocity data sets, and scale is going to be a problem. Habits built with a small number of entities will fall apart quickly as your organization grows. 

So that’s CIs… now for the services that they provide. Definitionally, a service offering is something your stakeholders and customers can directly use. But, remember that the point of the CMDB is to de-risk changes, meaning you have to map things that might get changed to that service. Service: Our self-hosted website is up. CIs to provide that: dozens of servers, plus all sorts of unknowables like network and power and HVAC and a functioning civil society. That said, while a one-to-many service:CI map is more complex, it’s the only model that is at all realistic. Business users requesting changes and reporting problems should have no need to select CIs and suggest solutions, so the complexity of looking down the tree from service offering shouldn’t matter. IT operators requesting changes and reporting problems do need to see what those changes affect, and looking up the tree to affected services is theoretically useful. However, those two statements rely on an accurate service:CI map, and that accuracy is sorely lacking. More likely, the business requestor is aware of problematic CIs because they’ve been troubleshooting the problem on their own, and the IT operator is not aware of affected services because the CI has been multi-tasked or repurposed. Therefore, incident triage and handling often include preliminary discovery of the functional map, if possible.

Exceptions to that grim state of affairs can exist: just as I recommend the definition of a CI is limited to an automatically discoverable entity, I also recommend that the definition of a service is limited to machine readable labels. Tagging or labeling CIs with the services that they support allows the CI data collection mechanism to be used to support CI to service mapping. Better, this allows an organization to begin with manual process (apply this label to anything you spin up with this account in these availability zones) and then grow to automation (if spinning up from this image and running this process, then include that label). That way the organization does not have to begin with perfect in order to get somewhat better.

VMBlog Post on Decentralization

 linking to this piece I wrote for VMblog 

Why Decentralized Work Calls for Decentralized Data