Sunday, December 19, 2021

Enterprise Roshambo

Ever wish there was a simple game to explain how complex organizations make decisions? You’re in luck! Roshambo, also known as rock-paper-scissors, explains it all. There are a few productive hours in each day, and three conflicting ways to spend them. The game explains how they will be prioritized.

Default rules: in enterprise roshambo, Compliance beats Security.  Operations beats Compliance in most organizations. Operations beats Compliance beats Security.

  • Should we deploy a new patch? Security says yes, Compliance says not necessary yet, Operations says it’s risky: patch isn’t deployed.
  • Should we deploy an old patch? Security says yes, Compliance says yes, Operations says it’s risky: patch is deployed in a carefully scheduled maintenance window.
  • Should we alter scope of a Compliance audit if Operations asks? Yes.
  • Should we disable or uninstall a Security tool if Operations asks? Yes.

The exceptions are highly regulated environments, such as government agencies, food and pharmaceuticals, some commercial finance. Compliance beats Operations beats Security there because failure to follow the law stops (via government intervention) or slows (via budget-disrupting fines) the complex organization’s mission.

  • Should we interrupt Operations to ensure the medicine meets the needs of Compliance? Yes.
  • Should we interrupt Operations to deploy a patch for Security? Not unless Compliance says so.

An emergency can temporarily change the rules of the game. How they change depends on the emergency. For instance, Operations change freezes and Compliance change control boards are put on hold during a Security zero day response. Security beats Operations beats Compliance, until the emergency is resolved.

If the emergency is in compliance, such as threats of a crippling fine or loss of a major customer, then the lightly-regulated organization can temporarily act like a highly-regulated one. Compliance beats Operations beats Security.

Friday, November 26, 2021

Total Compensation

 Career decisions are complicated, but here are a few models to work with which might help. A salaried job is more than a simple exchange of your time for their money: you will be giving it some headspace in most of your waking moments. This conversation is not relevant to contract or hourly jobs. Still, the model of exchanging time for money is a useful starting point, so let’s start with the three forms of money that you’ll be offered in a salaried position.

  • Unvested stock options are lottery tickets. 
  • Cash is cash. 
  • Titles are free.

An ownership stake in the company is a common starting point in negotiation. We will all work together to make the company grow larger and therefore every dollar in options is potentially worth thousands in future money. Likewise, every lottery ticket in the gutter was once potentially worth thousands or more. They say lotteries are a tax on people who are bad at math; stock options aren’t quite that bad, but they’re certainly not guaranteed income. The form of the ownership doesn’t really matter: Restricted Stock Units (RSUs) are better than Incentive Stock Options (ISOs) but the whole thing is silly if the company doesn’t grow. You may think the risk of stocks is reduced when a company is past its startup years. I can’t help but note that lots of employees lost out in MCI and Enron’s glorious flameouts, while my own turn of the century Intel ISOs remained essentially worthless for twenty years.

Cash, on the other hand, is immediately liquid and can be exchanged for goods and services from coast to coast. Cash comes in two amounts: More Than Enough, and Less Than Enough. 

  • More Than Enough means that you do not worry about providing your dependents with their needs. Health and life insurance is part of cash. 
  • Less Than Enough means that you worry. If you do not receive enough, then you are going to devote part of your headspace to handling this financial pressure. 
As an employee, you should never accept a salaried job that includes financial pressure. If you’re going to accept financial pressure, then you’d better be holding a double-digit ownership stake because you’re not an involved employee any more, you’re a committed owner. See the story of the Chicken and the Pig for graphic illustration of the difference between employee and owner.

The final form of compensation is title. A salaried position has responsibilities and blast radius. The scope and radius can be commensurate with the size of the organization, but it’s not a hard rule. Individuals may be hired to try new ideas, or to revitalize major functions, or to solve long standing problems. The organization’s easy answer is to fit your title into the existing system and hierarchy as another one of a role that already exists, but it’s all open to negotiation. It costs the organization very little to stick “Senior”, “Chief”, “Principal”, or “Head of” in front of your role. That extra filip of title can help you tell a better story on your resume, or it can be useless. A Vice President in a large software company might report to the CEO and lead a thousand people, or they might fly around to conferences evangelizing vaporware and control nothing more than their own expense account. Titles are free.

Those three things are the tangibles that you as a salaried employee can get, but now let’s talk about your mental and spiritual compensation. Paloma Medina’s excellent BICEPS model is a great resource for thinking about what a role does for your happiness. No matter how much money you’re making or gambling, a salaried role should meet your needs for Belonging, Improvement, Choice, Equality, Predictability, and Significance. Really, just go read the link. A salaried role is at minimum two thirds of your waking hours but realistically something more like four fifths… if you’re not getting what you need, then you’re devoting that time to a thing that leaves you unhappy. That’s not a sustainable thing to do with one's life.

One last thing to consider if you’re looking at your current role or a potential role through this lens of compensation and happiness: 

  • Change is constant, but tempo is not.

All organizations will always be changing and that change will affect your happiness, both positively and negatively. At the very least, this can be a challenge to Predictability, and for some people that’s a very big desire. The next organization you join? They’re going to alter the deal also, because the world changes and we must all respond. What is not the same across all organizations is the pace at which change is accepted and implemented. Some organizations lean into change, while some lean back. Asking how changes have been handled in the past might help you find a team that supports your needs.

Thursday, November 25, 2021

Thoughts on Logging

  1. What should go into a log? 
  2. Enough, but not too much.

Those two phrases are contradictory and context dependent. This is why logging has different levels. They may be expressed as numbers or words. Each more verbose level is inclusive of the less verbose levels. Sematext has a nice in-depth overview

I can’t count the number of times that initial log collection didn’t uncover the root cause of a problem. I’ve also disabled or crashed the target system by turning logging too high. Balance is the key.

At the one extreme, Fatal is “don’t log unless you’re dying” which is awfully presumptive that the application will be able to realize it’s dying. A more common option is Error (only log problems) or Info (log what you’re doing, not how).

At the other extreme, Trace level logging is preemptive print-debugging. Each important function in each routine, when did it start, what was the input, what was the outcome. If issuing a special build for a customer to troubleshoot is a common occurrence, or something that you wish you could do, then Trace logging is for you. If the application still performs acceptably in production with Trace on, you’re probably not logging enough. Consider putting a built-in timer into the application to shut Trace off after a set amount of time.

Each line of a log should start with the date and time. UTC time is best. Logging in the local time of the device is regrettable, but acceptable if you’ll never need to correlate the events from one device with another one in another time zone. Logging in local time will cost unexpected effort and cause problems, but sometimes it can’t be avoided.

The time format should be ISO8601. Unix epoch is not ideal because people can’t read it. The format of your favorite locale is not ideal because people will get confused. ISO8601 or GTFO.

Entities should have identifiers so they can be traced. This is pretty simple if you’re the only application of your type running on a single physical system. The thing writing the log is the entity that matters and the hostname is plenty of ID. It gets deep if you’re an elastically scaling micro service on multiple cloud providers. Is the thing writing the log what matters any more? You might even architecturally be able to say that you don’t care and it’s all going to wash out at the services level… as long as you never have to debug or do forensics.

Tasks (or threads, or forked processes, or containers) should have identifiers so they can be traced. This is a remarkably deep problem; don’t let perfect be the enemy of good. Again, your architecture might suggest this isn’t necessary, and that suggestion may be correct in many circumstances.

Each line of the log should include a single event with time, entities, and tasks all recorded up front. Some of that feels repetitive and like you would only need to emit it once per file. That would be fine if you never need to correlate with the logs from other systems in order to trace a transaction.

It may be tempting to log an entire transaction as a single event, but that presumes your application will stay alive long enough to see the transaction neatly complete. It is better to log the start and end of a transaction as separate events. 

It is also tempting to use Event IDs instead of language to describe what is happening, since “404” takes less space than “Page not found.” A human readable string should be included as well so that humans can learn your IDs by use. This also allows you to potentially translate your event ID into different languages for use by different humans. You should never use translated strings without event IDs though, this is annoying. Why? Because it makes things harder for organizations that operate in many languages. Microsoft did not keep this behavior when upgrading Windows, but customers and log analysis vendors have still needed to build separate Windows log parsers for every language they'll support.

Writing a log to a file is the obvious path, and there’s a whole world of tooling for rerouting that file output into searchable storage. The simplicity of a file or cloud bucket is a huge win; if you’re up at all, you can probably write a file; if you can’t write to a file, the reason is probably obvious and more widespread than a subtle problem in your program. However, there are also reasons to write to a structured data system. Log data is at least semistructured, and using a structured system allows indexing and search. There’s cost down this road though; either the commercial cost of a tool or service, or the performance cost of a database. Furthermore, the log consumer now has to get into the database, or your product has to provide an auditing API. In my opinion databases are rarely a good idea for log storage.

Some other thoughts on logging:

Product Manager to Product Ratio

How many chucks could a woodchuck chuck if a woodchuck could chuck wood? It depends on the structure of the organization's tech stack. A highly structured tech stack provides a format that you can build repetitive products with. Structure makes the design obvious and lets the developers work faster, which means more gets done easier. 

Every problem that you're solving is just a case of “fit this into our model and then solve it”, and you can do a lot with a little. For instance, in many systems management products the problem is to recognize endpoint state and take action, and the interface is to show state and guide to action. In many data analytics products the problem is to collect data and show informative panels like counts of types. If you make sponges, the easy use cases are all some form of "clean something". Given the commonality of these goals, a highly structured tech stack can be produced to make common tasks simple. In my experience this might be an average of three devs per product and a one to three PM to product ratio. I’ve seen a lot higher: averages of one to five and one to six in two of the teams I’ve been on.

On the other hand, a lot of structure can be constrictive, and a more flexible approach has benefits. If you treat every problem as a new solution to discover and build from first principles, maybe you’ll come up with new shortcuts to success. But you’ll also be unable to depend on preexisting structure. In my estimation you'll need at least five or six devs per product and a one to one PM relationship.

I've run over a dozen very similar products by myself and I've been taxed to capacity by a single product… it's all about how much structure the development platform provides. I'm defining product as an installable/removable module that’s complex enough to need semi-dedicated development team. A content pack is not a product.

If you’re not sure how to describe the amount of structure in your organization, you can assess it in reverse by asking about past performance. If getting a new product to usable and salable is a one quarter job for under three people, you're probably very structured. The first task is to learn the structure, and studying the existing products will help. If it's more like a year, plan accordingly and lean into product management process.

Sunday, October 10, 2021

What kind of product are you making?

First know everything, then you can automate it! Also, if you can express your problem in numbers, I can tell you if they’re going up or down.

A freeform exploration product attempts to enable a customer to achieve understanding and express the problem in numbers. Features may include user-editable schema (traditional database and ETL products) or schema on the fly (Splunk or Elastic), loose typing, and a “unix is user friendly” workflow experience. In some markets this sort of product can be so far beyond usability that it is more of a platform for partners to build products on. Many products rightfully do not expose their use of MS-SQL, Oracle, Mongo, or Redis to their users. 

Those products can be thought of as market-targeted solutions. They come to the customer with an opinion about what problem is being solved and how that solution should work. They notably have predefined schema and strict typing (good for performance and safety). One should expect a polished wizards-n-workflows experience with no unnecessary options.

So, as a development team with a project starting up, you might need to ask yourself: which one of those supports the business outcome this project is looking for?

Unless you are starting a company from scratch, the first approach of solving hard, general problems is probably not what you want to tackle. And even when you are starting from scratch to build a new platform… there’s a lot of reasons why most startups fail. Time lost to analysis paralysis at general problems is one. Inability to describe your value add to the market of customers with problems is an ever bigger one. If I’m trying to reduce theft in my supply chain, I probably don’t want to start with defining schemas in a raw data management tool.

And if you’re in a team at an existing company, chances are extremely slim that your charter includes solving general problems. Much more likely is a charter to deliver a solution which maximally leverages your existing platform to capture entry in a new market.

Saturday, October 2, 2021

Declaring Idea Bankruptcy

It’s obvious that your R&D team can’t do everything at once, right?

It’s obvious that items lower on the backlog aren’t going to happen unless they displace something higher on the backlog, right?

And yet. Lots of those items are people’s beloved ideas. Good ideas, that would make the product better, open new business opportunities, solve real problems. 

Good ideas are going to be rejected for a lack of capacity to develop and exploit them. This makes a lot of people sad, so product managers can fall into a trap. Instead of taking action, they ignore the backlog as it grows to thousands and thousands of items. And now, it’s no longer functional. Instead of a todo list, you have a swamp. People who actually need a backlog start doing their own in shadow IT, the roadmap moves into spreadsheets, and your organization no longer has visibility between planning and execution.

There’s a solution to this: Automatic bankruptcy. Any tickets that aren't touched in six months get resolved “future consideration” with a friendly note to reopen if it's still relevant.

Let’s look at how this works at each level of activity:

  • Tasks are the easiest— a task is just a development note of things that ought to be done. When they don’t get done quickly, they’re forgotten. A reminder at six months lets the unimportant and already done tickets close easily, and prompts the developers to do it if it’s still important. There’s even an argument to silently close these, though I don’t agree with that idea.
  • Bugs are also pretty easy. Bug tickets are regularly opened without sufficient information to act on, leading to either stopped communication or out-of-band investigation. This produces a steady flow of junk tickets. Real problems get fixed way before six months, if only because they hit some executive’s inbox. So a bug with six months of inactivity is highly unlikely to be acted on. Shut it down. Maybe the watchers will reopen with new information.
  • Stories (may also be called Improvements or New Features) are where it gets tough: this is where you’re discussing people’s ideas and problems. Maybe the filer is losing hours per week because this story isn’t done. Maybe they're going to lose some deals. Maybe they’re just insulted that you aren’t acting on their advice. Nevertheless, six months of inactivity is a strong signal: your team has not had bandwidth for this. Don’t delay any further: it’s the Product Manager's job to determine if this is truly a priority and make it happen, or to let the filer down as easily as possible. A PM who avoids this difficult conversation is doing no favors to anyone.
  • Epics are easy, just collections of stories and bugs, everyone forgets they exist as soon as MVP is shipped. This is housekeeping on the same level as tasks.
  • Initiatives: now for the hardest one. Initiatives may not be in the same backlog at all, maybe you’re using post-its or a spreadsheet or a product management specific tool. Still, you’re looking at the same problem as the story, just writ large. Should we field a product to enter that market? Maybe it's a good idea, that's great. But, there’s other things that won’t be done now, and the point of this process is to include opportunity costs in decision making. Or maybe there's been some out-of-band people working the idea to prove it out. If customers don’t really want it or you can't really build it, then it stops. Lots of initiatives quietly die, either untried or stopped after investigation. Just like in the development backlog, they clog up your vision, until you no longer know what your plan is. As a product leader, if you're trying to tell the e-staff and board that there's ten years of work in your todo list, expect to hear some questions.

It really doesn’t matter if we’re discussing board-facing initiatives or developer-facing tasks or the stories and bugs between them. After six months of inactivity: everything must either die or be defended.

Doing it in JIRA:

  1. Meet with R&D and customer support leadership so they know what you're about to do and why you're doing it. Get at least to disagree and commit. It's critical to note that resolution isn't permanent -- if the team is willing to keep reopening this ticket, maybe you need to just do it. The goal is to make communication happen.
  2. Edit the relevant shared schemas and add a resolution type of “Future Consideration”. For instance, you might have four schemas: initiatives, software projects, SaaS projects, and services projects. Make sure all software development projects use the same schema or else life sucks too much for you to follow any of this advice.
  3. Announce to the organization that you're starting this process.
  4. Make a saved search like this: updatedDate <= startofday(-180d) and resolution=unresolved. You'll probably need to start with a selection of projects and gradually expand to everything as you train the organization to this new reality.
  5. Schedule it for once a week on your most communication friendly day.
  6. Bulk action, Transition, Resolve, Future Consideration, Resolution Comment: "Automatically resolved after six months of inactivity. Please reopen if still valid." 
    • You'll want a keyboard shortcut for that phrase, just like "What is the problem you are trying to solve?"
  7. Leave the Send Email button checked on, and politely engage with the conversations that result. 
Update: Matt Schellhas has a relevant post

Saturday, September 18, 2021

The regrettable features you have to do

Sometimes as a product manager you get a feature request that’s fun and challenging and moves your company forward. Then there’s requests that just make you feel like a sad clown: stuff that doesn’t fit your plan at all. It’s hard work to ignore the nay-sayers and make a new thing. The product team and engineering have worked together to create a different and better approach to a class of problems… Congratulations! But there’s features you may need to build even though you don’t want to.

Compromise between your product differentiation and market fit comes in three flavors: legitimate needs that you hadn’t recognized or accounted for, design culture that you’re hoping to change, and technology designs that you don’t address.

A legit need might be regulatory compliance that your design doesn’t account for well, or a use case where you were too optimistic about your solution’s fit. This feature request is legitimate, and the regrettable part is wholly on you and your team. You have to find a way to solve it or accept a limit on who you can sell to. That story is beyond the scope of this post.

But design culture… here you’re trying to change behavior. You think your product has a better answer, and you need customers to take a leap of faith. If you have an interesting idea you’ll probably be able to find early adopters, but these customer have a lot on the line. It’s only natural that they want the safety of familiarity or a backup plan, and you may need to offer a bridge feature

Lastly, technology design choices are anything but simple: some design domains just aren’t good fits, or aren’t within your company’s scope. Another source of compromise is design fashion. Ideas cycle in and out of fashion as the memory of failure fades. This can mean surges of industry excitement about concepts that you don’t personally agree with. Always begin with questioning your own assumptions, because conditions do change. Sometimes the idea that fizzled last time or the time before succeeds in the next iteration.

No matter what you think of design fashion, it’s not likely that you’ll be able to refuse to play the game. Your company is on the competitive field, and you have to have a response, or else your response is going to be whatever your sales people think of. So, research the problem space and ask what has changed. If there is new opportunity and you can capitalize on it, this isn’t a regrettable feature. That story is also beyond the scope of this post. But if you do your research and you don’t see how this time is any different, then you need to find an answer which minimizes investment while maximizing information return. 

That is the same regrettable feature answer as when the idea in question is not new fashion at all, but just not a good fit: a concept that your company doesn’t and/or won’t play in. If it’s not a good fit, then you shouldn’t build it, but you can’t ignore everything that isn’t a good fit.

So, regrettable features: bridges from old patterns to new, fashionable experiments, and requirements you don’t want to or can’t satisfy. What can be done by partnering, adding content, or remarketing what you already have?

Partnership can take a couple of flavors:  total outsource of the problem to a chosen vendor, or meeting the community at an interface. The total outsource is least effort on your part, but it only works if the partner is a de facto winner in the space. You don’t make success by stacking failures together. So if that obvious winner hasn’t taken all, you need to make a clear interface that treats them all the same. Set your terms, define your boundaries, pick a couple of partners to go to market with, and see what happens. You’ve got a feature, but it’s stripped to the minimum.

Adding content is another option for vendors who have a strong enough boundary between platform and content. If you have a community of customers, field engineers, and pro service partners building content, then you can solve problems without engineering and support contracts. This approach arguably has a limited shelf life because some customers will push back on the unsupported nature of roll-your-own or second party content. That point can be far in the future though, and you’ll certainly get some real world data about what customers want.

Remarketing is the toughest option, saved for last; only the very lucky vendors can shake the puzzle box and get a better picture. But if you can package your components differently to resolve a problem, it's a lot faster to do that work with legal and sales ops now instead of after engineering is done.

Saturday, September 4, 2021

Capitalizing and Operating a Software Business

“As I have noted in the past, this is why the venture capital model that was developed to support silicon so seamlessly switched to supporting software: both entail huge up-front costs to produce zero marginal cost goods, which means capped downside and theoretically infinite upside.” That Ben Thompson quote is from a non-public newsletter, but here's a similar public post.

"The most common high level concepts associated with lean product development are: 1, Creation of re-usable knowledge. Knowledge is created and maintained so that it can be leveraged for successive products or iterations." Lean product development - Wikipedia 

I particularly like the table on that page which, breaks products into needed, wanted, and wished for.

  1. A needed product has a broad requirement which is stable and commoditized
  2. A wanted product has a specialist requirement with future potential for a wider range of markets
  3. A wished for product has an unrealized requirement, needing to be introduced to market 

The concept of a modern, lean software business is to iterate development and improve product market fit until you have satisficed every market you can reach. 

Customers are trained to expect this as well. When you buy subscription model access to a software thing, you expect that thing to evolve and improve, right? Even if it’s commoditized, if you’re paying monthly you expect benefit for that recurring bite. Even if your vendor seems to follow a model closer to that of a car lease, there are still new models every couple of quarters and strong pressures to upgrade into those releases.

With the possible exception of accidental outcomes from failed acquisitions or private equity conversions, there are few places where enterprise software truly reaches “finished” and stops development. If people are still assigned to the product and people are still using it, which is to say if the product matters at all: then developers are receiving customer input and itching to fix things. Even if feature development does completely stop, bugs are impossible to prevent or detect perfectly. Additionally, the complexity of real world usage makes bug discovery and resulting impacts unpredictable. Therefore, any software thing a vendor is still taking support subscription money for has some level of ongoing maintenance requirement. A vendor that ignores this requirement is taking a risk.

This means the pivot from value creation to value extraction which the VC driven portion of the software industry (e.g. all of it) expects is sort of broken, right? Well, not exactly... but the model is not perfect either. The key is to think about that chip factory. It isn’t actually free to operate: regardless of the mystic processes that go into making sand into computational power, there are all the obvious inputs of any physical plant. Electricity, water, materials, people and their safety and comfort requirements, not to mention salaries. So there are ongoing costs, of course, it’s just that those costs are dwarfed by the profits of providing in-demand chips. Cost-benefit ratio is so good that it might as well be zero once the plant’s spin up costs are recovered. Similarly, the cost of keeping Milton in the basement working on product maintenance is negligible if the recurring revenue is high enough.

The goal is to keep costs linear while making revenues exponential. If you can do that, you fit the VC model and are a good bet. If you look like you can do that for a while, but can’t, you may still fit the bubble economy model because there’s a sucker born every minute. If you can’t do either, you probably won’t get funded by a VC.

Event Suppression Sucks

I’ve always hated the concept of event suppression in security products.

Let’s start with some definitions of suppression, and where better than product documentation?

There’s two common reasons for this feature:

The first: “I don't want to see this thing in my console of actionable items because I don't have the time, knowledge, perspective, or priority to do anything with it right now.” There is nothing inaccurate in this behavior, but suppressed alerts are a mismatched solution to this problem. A better answer is multi-level event generation. The system should recognize that some events are not worthy of human attention. Low importance events should go into a statistical model or an audit log, not an analyst’s workbench. The further left (earlier) in the event creation and processing pipeline this happens, the better. This design results in better performance and scalability, which means more signal, less noise per dollar spent. Suppressing generated events at the end of the pipeline is wasteful design that throttles the system’s capacity.

The second is “This rule is wrong and I can't edit it, but I need to get rid of its alerts.” This situation is tougher because it’s where internal or external regulations are driving behaviors. The rule book says to generate alerts no one is going to look at, so we do. The proper answer is still multi level alerts. “Proper” in this case means most efficiently using human and compute time. However, vendor and customer will also need to explain and negotiate with the rule-quoting gatekeepers in order to demonstrate that the rules are not broken. Sending less important events into a separate channel is functionally no different than suppression, but the effect is far more efficient because the separate channel isn’t indexed for rapid and continuous human access. Instrument that channel and monitor it as a production service and you’re good to disable event generation from incorrect or non-actionable rules.

There is a sub-form of that second reason: the analysts don’t have permission to edit rules to fix them or change event generation. “I don’t want to look at this but I can’t stop it so suppression is the answer.”  Maybe this is product immaturity, maybe it’s organizational failure, but it’s still a problem to fix. As a vendor, if your product supports a better option that your customers won’t use, you’ve got a customer input collection tour to do. This is a case where technology can’t solve a people problem (Edwards’ Law), so it needs person to person communication and possibly a professional services engagement. Customers who can adjust to the better model will be more effective and efficient than those who don’t, but you may still need to offer support to those who can’t. The design job therefore is to discourage suppression without preventing it.

Security is full of whataboutism, but most of the desired data  is really low value — so pass it into a low value, low cost pipeline. Don’t leave it clogging up your SOC.

Saturday, May 29, 2021

What Should Go Into a CMDB


It’s not every day that information technology work leads you into philosophy, but designing a configuration management database will do it. Spend a little while thinking about what is known or even knowable about the services you’re trying to provide, maybe you’ll end up asking “what does existence even mean?” Fear not, there are some practical guiding principles to follow. 

First, some background: what is the purpose of a Configuration Management Data Base (CMDB)? Why are we even trying to align Configuration Items (CIs) to the Services that they provide? The intention is to provide visibility into what affects what. The purpose of that visibility is to ensure reliability of changes, by understanding what is affected and planning in advance to minimize service interruption.

Of course, that goal is not simple to achieve. If it were really simple, it would just be a notebook or a spreadsheet, right? But the modern organization has a complex web of interdependent systems. A flat sheet of items misses so much of reality that it fails to make life any easier. On the other end of the complexity spectrum, there’s a very careful and intentional attempt to describe everything important. Let’s pretend that’s possible for a moment, which I doubt… it’s still unlikely that your organization can do it at a justifiable cost. If the system fails to provide return on investment, then it’s scrapped with good reason.

So, the CMDB should only track what is truly necessary to affordably meet the goal of visibility driving change safety. Common advice is to begin with a map of CIs to service offerings. But, is that a detailed 1:1 map? Sort of, maybe. That’s a fine directional goal, but you should not insist on or try to attain a complete CI to service offering mapping on day one. You might start off asking “which service offering does this CI map to in the service catalog”, but it’s all too easy to wander into epistemology. Business leaders are asking for things that can’t be easily made visible and so you end up doing service mappings by proxy… which doesn’t work. 

If you can’t draw a clear line from “this attribute on this entity maps to that attribute on that entity to support this metric indicating that service”, then it’s dreams going into the system, not measurements. So only put in things in that you have to because it’s obvious that they work to support a service. For instance, let’s say we’re talking about a job processing farm. The NAS volumes support the VMs that perform the jobs that test the designs. You can say “this volume supports that job” clearly, but you will struggle to say “this NAS failure is costing that business unit an on-time delivery of that deal” because you don’t have sufficient context in your software. In an organization with sufficient complexity to be considering a CMDB, the business map isn’t clear and the technology map is only a little better. 

What is a CI anyway, and how are you going to identify it? Is a group of ephemeral containers providing services worth a CI each, or one for the service, or none? If you’ve got a big expensive server and you replace its motherboard so the serials and MACs change, do you update all your records? What if you just changed its hostname? Does it still support the same services that it used to? How do you reference the systems you run on a cloud Infrastructure as a Service (IaaS)? Do you need a CI for each of the Software as a Service (SaaS) functions that you use?

The simplest answer to those questions is to manually define and track systems, ignoring identifying attributes like serials and MACs. However, this leaves recalls, warranty tracking, and depreciation out of the CMDB and in another tool, reducing the cost justification for the CMDB. Maybe that works for your organization, but it’s a decision that has got to be done across the organization so that you can present meaningful numbers in your KPIs and compliance audits.

What types of devices should go into the database? Does a network switch count, since it’s critical for making your organization’s software service operate? How about the AC power or HVAC systems that are critical for keeping the switch work? How about the roof that keeps the rain off the switch? Again, the philosophy of everything being interconnected is fascinating but you’ve got a job to do. So, a CI is any item that you can actively manage. I take an absolutist approach here: if there is not a software tool providing visibility and control to the device, then it is not a CI worth tracking.

My advice is to draw a CI line at the management interface. Only put a thing into the data system if it has an agent on it or an interface that enables a management system that you own to collect data. Again, three quarters of my career is in those agents, so call it self-serving if you like. But if the device can’t update your systems on its own; then someone has a job to maintain the “time-saving” tool. If you can afford to allocate people to maintaining data cleanliness in a CMDB so ITAM reports make sense and ITSM service requests go smoothly, that’s great, but it sounds like questionable use of resources at any scale. As Corey Quinn writes, “What people lose sight of is that infrastructure, in almost every case, costs less than payroll.”

Next, treat every collected attribute like it’s a personal insult. For every report, ask how little can you possibly collect and still solve the problem. Is there already data collected that will work? This is the land of practically unbounded high velocity data sets, and scale is going to be a problem. Habits built with a small number of entities will fall apart quickly as your organization grows. 

So that’s CIs… now for the services that they provide. Definitionally, a service offering is something your stakeholders and customers can directly use. But, remember that the point of the CMDB is to de-risk changes, meaning you have to map things that might get changed to that service. Service: Our self-hosted website is up. CIs to provide that: dozens of servers, plus all sorts of unknowables like network and power and HVAC and a functioning civil society. That said, while a one-to-many service:CI map is more complex, it’s the only model that is at all realistic. Business users requesting changes and reporting problems should have no need to select CIs and suggest solutions, so the complexity of looking down the tree from service offering shouldn’t matter. IT operators requesting changes and reporting problems do need to see what those changes affect, and looking up the tree to affected services is theoretically useful. However, those two statements rely on an accurate service:CI map, and that accuracy is sorely lacking. More likely, the business requestor is aware of problematic CIs because they’ve been troubleshooting the problem on their own, and the IT operator is not aware of affected services because the CI has been multi-tasked or repurposed. Therefore, incident triage and handling often include preliminary discovery of the functional map, if possible.

Exceptions to that grim state of affairs can exist: just as I recommend the definition of a CI is limited to an automatically discoverable entity, I also recommend that the definition of a service is limited to machine readable labels. Tagging or labeling CIs with the services that they support allows the CI data collection mechanism to be used to support CI to service mapping. Better, this allows an organization to begin with manual process (apply this label to anything you spin up with this account in these availability zones) and then grow to automation (if spinning up from this image and running this process, then include that label). That way the organization does not have to begin with perfect in order to get somewhat better.

VMBlog Post on Decentralization

 linking to this piece I wrote for VMblog 

Why Decentralized Work Calls for Decentralized Data

Sunday, May 9, 2021

Planning your Year

The plan is maybe accurate, but probably not. The act of planning is a useful time to step back, evaluate strategic position and rethink investment. Using financial targets is a mechanism. If target is thirty percent growth, does the product org have a realistic answer for how that will happen? If that answer is couched in details like “in eleven months we’ll ship the grand frobulator version”, then it’s maybe not so realistic. 

For a non-SaaS software offering it takes a quarter for product changes to affect sales numbers, and a year to know if you’re going to get renewals or just have a flash in the pan. It takes a quarter to deliver a change to an existing product. So if you have to make product changes to affect this year’s numbers and you’re just planning them now in May, you’ve got all your eggs in Q4. It’s key to realize that product time scales are very different from sales time scales. And if your team isn’t already big enough or properly skilled to deliver the features, you’re even worse off because hiring and developing engineers is even slower than making features.

Whatever plan you offer for making product in the second half of this year is setting up for next year’s sales. Why can’t you just parallelize those efforts? Well, customers are rightly suspicious of features that are planned but not shipped. The world of enterprise software sales is also still relatively small and interconnected, so your own team cares about their reputations. They won’t be happy selling sizzle unachievably ahead of steak.

Software as a Service doesn’t change those dynamics as much as people might assume that it does, but it does help to smooth out a spiky cash flow. Whether you’re selling term or perm, on-prem or as a service, trying to move your current revenue numbers with expected product features is very prone to forecast failure.

Using a Technical Edge in Products

"Don't worry about people stealing an idea. If it's original, you will have to ram it down their throats." -- Howard Aiken

Innovative technologies are different than what came before. Product buying patterns are based on what has come before. If you have an innovative technology you will need to bend your customer’s stated needs and your technology’s capabilities to fit each other. Product market fit is not easy or pretty.

“But wait,” you might be thinking, “isn’t listening to the customer’s need the most important thing in product management?” Well, sort of. It’s not listening to what the customer says they need, it’s listening to what they need. The customer’s stated need is shaped by several components:

  • Their actual need to do something, such as pass a compliance audit or roll out a new service
  • Their experiences with other products that they’ve used for this purpose in the past
  • Their team’s appetite for innovation versus surety
  • Their purchasing process.

That last one is extra challenging; if your product fits into the buyer’s signing authority, then great, but if you’ve got to make it through an RFP-gated procurement office, your innovative product is going to be checklist compared with the last twenty years of status quo. The only way through is for your buyer to spend their political capital and force the issue.

If your innovative technology can help the customer solve their real need to do something faster and cheaper than legacy technology, then lean into that and give your buyer the weapons they need. You can get past experience and conservatism and even a purchasing department if you’re really helping the customer save money and time.

I’ll spare the full list of examples, but there are many; trust me that technologies now considered boring infrastructure, such as gratuitous ARP NIC teaming, were once radical and difficult to sell.

Two Types of Questioning

 Answers to questions can easily fit into two flavors: operationalized and free-form. Classify the use cases: there’s the questions you know how to ask, and the questions you don’t know to ask yet.

A question that you know how to ask is operationalized. You’re looking for yes, no, or broken, or perhaps a count. The operational nature of the question means you can improve operations of your system by using the questions. 

You may need to use a series of operationalized questions to drill down to success.

* A good operationalized pattern uses multiple questions to reduce the target set at each step: “are there systems with problems?” -> “there are systems with problems” -> “Show me the systems with the problems” -> “here are the types of problems on the systems with the problems” -> click “Show me the ones with the problem I know how to fix” -> “here are the systems with that problem” -> “deploy a fix for that problem”. This is good because it’s efficient: each step is small, and each step is hitting fewer targets. Whether your problem domain is management, computation, or surveying humans, you'll use fewer resources if you ask fewer questions of fewer targets.

* A bad operationalized pattern == “give me all the data and i’ll search for answers in it”. This is misuse of an effective tool: search through raw data is powerful for discovering what you don’t know to ask, but it can be the wrong tool for daily repetition tasks. It works, but it costs more time and money than necessary.

Noted, it is possible to take the progressive questions pattern entirely too far, as is shown by Microsoft “click to see more” Teams. A forced wizard flow where it isn’t necessary is an anti-pattern. Progressive disclosure of necessary data can become an anti-pattern.

A question that you don’t know how to ask is free-form. You’re looking for weirdness, patterns, outliers, intuition. Is there an anomalous behavior pattern on a subset of systems? That’s hard to answer without a big data lake and a Stats101 textbook, so you stream data at the lake and see what kind of stuff can be found. Algorithms can help, but you’ll also certainly need human analysts. And the findings from that data lake, you will probably want to convert to operationalized questions.

This is a lifecycle of discovery, a process of learning. Operationalized questions grow stale over time, and need to be replaced. Part of the job as an analyst is to maintain the tools.

The Mystical Art of Getting New Things Done

The process isn’t actually mystical: it’s selling an idea to everyone who must work together to execute it. People who would use it. People who will fund it. People who will build it. People who will sell it. Eventually, people who will buy it.

Tell people what you want to do. Lots of people. Product people, engineering people, sales, marketing, finance. Whoever will listen. Even if they're not helping you move the ball forward, they're helping you practice and learn. Explain why it needs to be done, how it will work, what will indicate that it’s working, and who would need to do it. Use a written artifact (document or slide deck) to summarize what you’ve been saying.

Answer their questions. If they ask things you can’t answer, get back to them when you can in writing, and update your written artifact.

Ask everyone you talk to who else you should talk to. Keep going until you’ve got resources. Keep going until they build the right thing. Keep going until customers use it. Keep going until the goals are met.

So far so good, right? Except, that is the process for people who’ve already got the expectation that they drive new ideas to executed success. Product leaders. Maybe that's a product manager in your organization, or an engineering manager, or a service owner. If that isn’t your role, then you’re also adding a second sales job.

First, you need to understand what you want. Are you asking to have the job of executing your idea? Or are you asking someone else to execute on your idea?

If you want the job, it’s actually much simpler to proceed. You’re explaining why you are the right person to make this happen. Details of how to proceed will depend on your organization. Do you have to get the role before you can act the part? Then you need to interview for the role. Does your organization reward acting outside of your lane? Then you need to make sure everyone involved sees that you’re doing so, acting as an unpaid product manager (presumably without abandoning the responsibilities you are paid for). Regardless of the mechanics, you’re effectively campaigning for a job as a product leader so you need to do what is necessary in your organization to get that job along with execution of your idea.

If you don’t want that for your career, then you are seeking a product leader to own your idea. Again, a second thread from driving the idea to execution. Thing is, few product leaders are sitting around bereft of ideas and trying to figure out what to have the engineers build. So, why should your idea be what gets done now? Another challenge you’re going to need to consider is ownership. Are you truly ready to give up ownership of your idea, or do you want to retain some stake? Can you find an accommodation with your product leader partner?

Friday, April 23, 2021

Data Value and Volume are Inversely Proportional

In 2006, Clive Humby coined the phrase “Data is the new oil”. This is often misinterpreted as “Data powers the economy”, particularly by folks who sell data processing and storage, but it’s useful to see what someone who actually uses data says. In 2013 Michael Palmer, of the Association of National Advertisers, expanded on Humby’s quote: “Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analysed for it to have value." 

Much like refined oil (or processed ore for that matter), the volume of material is reduced as the value is increased. This concept should be intuitive to anyone who’s ever manually sifted a pile of data into a thin report, but it’s sometimes lost in contexts where locating a needle in the haystack is the analyst’s goal. Adding human effort reduces volume.

Corollary: if the project requires massive amounts of data storage, it might be worth asking how much value is going to be in there? Is the purpose to store as cheaply as possible and rarely retrieve? Or maybe the plan is to figure out value later

There’s an interesting confluence of partially aligned incentives between people who have to retain large amounts of data, people who want to retain and access large amounts of data for analytics, and people who just want answers to problems today. “Storage and compute are practically free”, which is why Amazon Web Services is worth 1.5 trillion USD in 2021. If you don't have to pay the bills, collecting raw data for real time analytics sounds great. Otherwise, you'll need to consider this project's needs for data modeling.

A Simplistic View of Venture Capital

I have a lot of conversations with people. 1:1’s, interviews, &c. When the opportunity arises, sometimes those who’ve done startups will talk about their experiences.

There’s sometimes a flag in those conversations about startups that raises my hackles. It’s a description of the money raised as if that were a success metric. “I was at blah”, or “I founded a thing called foo”, and “we raised dollars!” 

I think it’s from accepting the Venture Capital narrative as presented on Shark Tank: that businesses are competing for the pool of dollars, and that the VC’s are judges of innovation. I also think that is a false and self-serving narrative. 

Venture capitalists are salespeople. They are selling a financial product, basically a fancy mortgage. You give away future equity in your house or company for upfront cash today. Good on you, salespeople who can make buying your product seem like a competitive win. 

It is true that have to do your homework in order to get VC investment, and it's more work than a mortgage: a business plan that discusses how you are going to financially swing your journey to product market fit. You also can’t get a mortgage if you don’t fill in a bunch of paperwork with indications on how you’ll pay it back.

The problem with this analogy is that you don’t have to convince a bank that your idea for buying a house is both excitingly novel and fairly safe, you just have to fit the expected cultural and financial parameters. On second thought, given the history of redlining, the analogy between mortgages and venture capital seems pretty solid.

Evidence of purchasing a product is not a symbol of success or status, it is a symbol of in-group membership plus basic competence. 

Back to the interview context, use of this trope is a yellow flag that the candidate may have trouble recognizing or accepting that a thing hasn’t worked. If they can’t point at anything more positive than raising money, their business failed. So they’re spinning a story at me, nothing wrong with that. But I need to see an indication that they can launch a product, so I’m going to keep asking questions.

Update with this excellent thread 

Product Tempo versus Deal Tempo

Going from sales engineering to product management  was a jarring transition at first, because it represented a change in tempo. As a sales engineer, I lived by the fiscal quarter. While a given customer might be a multi-quarter job to convert, my sales partners and I had money to make and were judged on a quarterly basis. Our bread and butter was the quarterly cycle of identifying opportunities and closing them. I wrote software to help my customers and learned some lessons that way, but the jump to product management in enterprise development organizations still surprised me.  

Shock number one: resourcing. From the field, you can see numbers like the allocation of headcount to engineering, and you can easily imagine that there are dozens of people working on a product. From the factory, you’re more aware that a team of dozens is rare, and a given product probably has one to six actual humans working on it — if you’re lucky. A platform company can do an amazing amount of value creation with small teams, but they can't do everything.

Shock number two: here comes the flood. All products have an incoming stream of tickets. Customers and partners have support requests, bug reports, and feature enhancement asks. Maybe it’s a small trickle, but if the product is successful, there’s at least dozens a day, maybe hundreds. You can try to offload it on someone else, but rigorous process is the only effective answer in my opinion. As a field engineer it can seem that a good idea is its own justification. As a PM, that good idea is only a starting point.

Shock number three: tempo. As the title of this post indicates, product time scales are very different from sales time scales. As a field engineer making free software development feels fast. You write some stuff, it works in your lab, you post it for some friendly users, iteration chugs along as fast as you can go. As an R&D scrum team the process is much slower. You have more platforms to support. You can't rely on some software licenses. You have to think about adversarial use cases and product security. You have to coordinate your activity with the rest of your team in order to build sustainable and supportable software. Bug fixes can go pretty quick, but as a rule of thumb, adding a meaningful feature takes a quarter. Adding a minor feature takes at least a month.

Here’s some advice for sales engineers and similar field characters — of course you’ve probably heard it before, summarized as “Sell what’s on the truck.”

  • If you’re expecting a product change to save your deal you’re making an account management mistake. “Sell the sizzle not the steak” is about making your product look good, not about building a deal on dreams.
  • Shipping product is for dinner, roadmaps are for dessert. The roadmap’s function is to demonstrate that the company is still investing in this product, not to establish a schedule of future feature deliveries.
  • Those said, there is a counterpoint: If you don’t sell it, it will never be fixed. Selling the products is essential to the success of a software company, and if customers don’t use the products then they won’t sell very well. Without sales there’s no one validating features work, no one finding problems, and no reason to fix what’s already known to be suboptimal. So you’ve got to have faith that what the factory says will work can be made to work. 

Followed by advice for product managers coming from the field, which you may have heard as Eisenhower’s quote: "In preparing for battle I have always found that plans are useless, but planning is indispensable."

  • Your tempo has changed, not your activity. Where you were preparing a customer to buy or renew a product (quarterly tempo), you are now preparing the company to invest or maintain investment in that product (yearly tempo).
  • Your scale has changed, not your motivation. Where you were helping a customer to use the product successfully, you’re now helping all customers. Where you were helping a friend with their deal, you’re now enabling the entire field to position your product correctly. Where you were convincing a product leader to prioritize a bug fix, you’re now convincing an engineering leader to allocate person hours.
  • The future is an unknown country. There are only three time buckets that matter. The present has what you’ve recently shipped and the things that engineers are working on now. There is a good chance it will work. There is what you want to do next, which you’re currently designing with your engineering counterparts. There’s a 50/50 chance this will eventually ship. Everything else is in the backlog and does not matter. You can draw a pretty picture of a roadmap with releases for that stuff, but it’s fiction. Its purpose is to build excitement and set up resourcing for next year.

Summarizing, watch out for this offset in tempos, because it can cause inaccurate expectations.

  • It takes a quarter for enterprise software product changes to affect sales numbers, because smart field people won’t sell it until it exists. 
  • It takes a quarter to deliver a change to an existing product because teams of engineers have to build sustainably. So if you have to make product changes with existing engineer allocations to affect this year’s numbers and you’re just planning them at the end of Q1,  you’re going to deliver them in Q3 and you’ve got all your eggs in Q4. 
  • Whatever plan you offer for making new product without allocated engineers is setting up for next year’s sales at best, assuming you convince leadership to rob an existing product or hire new engineers starting now.
  • SaaS doesn’t change those dynamics as much as people might assume that it does, but it does help to smooth out a spiky cash flow.
  • Acquiring a company instead of building software provides a faster bump to your sales numbers because you can just claim the new company’s numbers as part of your own; but it also hurts your ability to deliver software later on.

Sunday, March 7, 2021

What does a Director of Product Management do?

I’ve written up some thoughts for the regular work of product managers and product management interns, but have not yet written about the next level up. What’s day to day like when you’re not just helping one team? What makes a director level product manager?

Just like the definition of what a Product Manager even does, the definition of Director level work is different in different organizations. The easiest way to define the role across companies is by blast radius. A product manager can bring more benefit or do more damage than a developer or salesperson. A product moving in the wrong direction takes months or quarters to discover and quarters to years to fix. A director of PM might be driving multiple products, or driving a new business line, or pivoting the company. In other words, their efforts take that blast radius observation up a few notches. So, what do they do?

First: research the market, build strategy, and make plans. What do you think the path to improved product market fit is? Where do you think the biggest missed opportunity is? How good is the data backing these opinions? How many iterative steps can the work be broken into?  What are the indicators of success or failure at each step? How much investment will your plan take? How are you going to make money with it? This work is mainly done in an iterative set of documents and spreadsheets, with regular output of presentations and backing spreadsheets.

Meanwhile, find a consensus. Who needs to be convinced to support this plan? What would convince them? Can you drive a disagree-and-commit if they aren’t convinced, or do you need to adjust your strategy? This work is done in non-stop meetings, one-and-one and small groups, interleaved with pitches to interested customers, analysts, and leaders. Also you’ll need to make regular excitement-building presentations to larger groups. 

As soon as that process has kicked off, you’ll also need to deal with salespeople selling your idea ahead of its existence. This is a great opportunity to prove the hypothesis of your strategy; if it doesn’t resonate, you don’t have any customers to talk to. If it does, you’re fighting to hold sales teams back from the revenue recognition cliff they’re trying to race over. 

Directors of PM don’t have to do people lead work, but it’s not uncommon either. So you may also be managing team: helping with communication, escalations, training, expenses, PTO approval, hiring, performance management, firing. Oh and of course, filling any gaps in that roster either by doing the work yourself, finding someone to do it, or finding a way it can be left undone.

That brings us to the last duty: saying “no” even more than you did before. As a Director of PM, you’ll start to receive the product ideas and complaints for everything even vaguely related to your domain. The problem is, most of those ideas aren’t well-formed enough to work on and even the good ones go onto a slush pile. The capacity of a scrum team is far lower than most people expect, and they’re probably already staring at a backlog that would keep them busy for the next five years if they did it all. Having influence over more teams doesn't change this reality. On a product feature level, the individual product manager kills ideas, but as a director you’re their rubber ducky, tie-breaker, and bad cop. You are also doing the same thing at a product line or business unit level. You may also need to put your own ideas and projects on ice to make room for new inbounds.

To sum up, the communication and coordination needs go up and the amount of detail work goes down but doesn’t go away.

Tuesday, March 2, 2021

Why I Don’t Like the Squad Model

Development teams have a fondness for trying new models… for better or for worse. I’m not a fan of the current excitement around the squad model. 

What is it

Spotify is the purest example, but the model owes a deal to Stanley McChrystal’s Team of Teams, as well as a bit of LeSS. The basic idea is that teams are tightly aligned to their team mates, but loosely coupled to their projects. Teams will then shift to where work is needed, motivated to do the right thing by their shared purpose and awareness of shared strategic goals. Add a sprinkle of LeSS or tribes alignment for legibility, and you’ve got a model for change from “machine-like” structures to “organism-like” ones. 

Why is it attractive

“Work expands so as to fill the time available for its completion” - Parkinson’s Law. Consequently there is no actively growing development team that feels they have enough resources to do everything they want. Of course, not all development teams are actively growing, and so some teams may have less pressure on this axis. Nevertheless there are lots of teams that can’t allocate sufficient resources to the things they want to get done. 

When you can’t allocate resources to everything you want to do, you’ve naturally got some projects that aren’t getting done. Maybe there are products that need working on, or maybe you have features that can’t be completed or bugs that can’t be fixed. Maybe they’re process requirements like content production, quality assurance, documentation or tech debt reduction. Whatever it is, it’s something painful. If it isn’t painful, it wouldn’t matter, but it's a safe bet there's enough pain to justify considering a big change.

Here are your options. 

  • Option 0: do nothing. You may continue to do without the missing projects, but obviously that’s really painful and no one is excited by it. Something must be done, right?
  • Option 1: incrementalism. You can also try to get the project done with your existing team. You announce that Everyone’s responsible for the project, or you encourage weekend warrior efforts. Or maybe you defer work on all the other projects, with varying degrees of formality.
  • Option 2: radical change. You increase resources so you can resolve the problem. There’s lots of tools in that box, but none of them are very cheap. More hiring, consultants, acquisitions, or try to talk your user and partner community into doing it. Nothing in this list is guaranteed or fast.

The squad model makes it look like your organization is rationally choosing Option 1. It’s not really robbing Peter to pay Paul, there are text books and TED Talks. Kidding aside, there are some legitimate benefits to the model.

The genuinely positive aspects

  • It reduces the effect of Conway’s Law. Teams are no longer as focused on their product, they're more focused on their role in the company's success.
  • It allows for high team cohesion within the squad by encouraging cultural bonds, presuming that the company's culture is sufficiently supportive
  • A nice corollary of that: it doesn’t tie a development team to the market failures of the product they work on. It can suck to see a development team beaten down over failures they can’t control. In a squad model, the team can be judged on something other than eventual product market fit, and may move on to another project.

The mildly pernicious aspect

That movement between projects may sound familiar to anyone who’s ever worked in a professional services shop. Pro services team members go to where the jobs are, and they don’t work on many things that don’t have revenue attached. Bench work and research projects are sometimes an option, but not too much of either. In an enterprise software company, this model leads to a terrible challenge: products have been sold, but there’s no steady maintenance work being done on them. At best, a product is returned to when its lack of maintenance becomes a problem, but at worst the product simply rots. 

Software is sold as if it were wine, but in practice it ages like fish. Without attention, it starts producing problems. A company may choose to say that nothing is wrong with this. Aside from one’s ability to sleep at night, this may not work out so well. If the product was sold to customers, there’s likely to be a contract requiring maintenance. One can argue this is less of an issue with products that are offered as free supporting material to a commercial product or service, but practically speaking there’s not a lot of difference. 

The nasty failure modes

All of that is just a model allowing morality and economics to play out with Parkinson’s Law, though, and there’s nothing specific to the squad model in it. Squad model is simply a mechanism, potentially allowing pernicious behavior but not forcing it. Here are the ways that I think this model can actively fail the software development organization.

  • Product managers and executives are shielded from their responsibility to kill product that isn’t working. A poorly performing product isn’t EOLed, it’s just deprived of any development time. “We aren’t putting Old Yeller down, we’re just not feeding him and waiting for Nature to take its course.” Killing a product that isn’t selling is one of the harder tasks in product management, and a team that doesn’t really have to is going to find a way to avoid it. Squad model keeps false hope alive for bad products. Surely it would succeed if it just got some resources? And surely those resources are just around the corner, at the next planning cycle. And so the zombie product shuffles along.
  • Engineers can’t focus on long term improvements or tech debt reduction. Of course everyone complains about this when they don’t do squad model, but the squad model actively makes this situation worse by adding churn costs. At best, developers are coming and going in squad sized chunks but a continual skeleton crew is always on deck. At worst, the product lays fallow between development cycles. For each introduction of people, the cycle of course looks like this: 
    • Spend a sprint or two spinning up to understand the subject matter domain, product design, existing codebase, and plan of attack
    • Bandaid the immediate problem the squad was brought in for, ore or less well
    • Spend a sprint or two patching the worst bugs and documenting your work before you move on to something else. 
  • This isn’t enterprise software any more, it’s professional services. Why’s that bad? Because enterprise software businesses are exponential money makers, but professional services businesses are linear money makers. It’s an oversimplification of course, but based on the observation that services are inherently customized to the client while products are generalized. In the squad model, the projects that the customers buy are now the clients, but the developers are still moving to where the dollars are today instead of skating to the future puck.
  • It doesn’t take long for leadership to spot these failure modes, so it also doesn’t take long for correctives to be applied in order to smooth out churn and allow for maintenance. The result is that engineering is split into haves and have-nots. Some squads will be put onto those long-term work items that suffer from shifting resources. Maybe it’s called a Sustaining team (probably lower status), or maybe it’s a Platform team (probably high status), but the effect is the same: parts of the world have to keep working as they did before the re-organization and those engineers are now on an island. Is this expressed as a pendulum swing back towards the prior organization, or as a bold stride towards a new future that simply looks a little familiar? Regardless, the development organization now has another source of cognitive dissonance to discuss.

At best, it’s not worse

As we often say in the business, “it’s software, you can do anything with sufficient time and money.” None of what I’ve described is necessarily fatal to a development organization, and indeed some might welcome a splash of internal cognitive dissonance to distract from their external issues. But in my opinion the model isn’t great at meeting its stated goal, which means it’s not a useful way to allocate resources in my opinion.  Ostrom’s Law states that “A resource arrangement that works in practice can work in theory”, but I strongly suspect that’s exactly what we’re seeing here. The massive productivity made possible by enterprise software development allows for a wide variety of otherwise non-productive resource arrangements to maintain effectiveness, regardless of their actual merits or efficiency at producing more quality software.