Sunday, April 17, 2022

Gambling or Groceries?



Following on the post about sustaining software, here’s an opposing argument. 

Go big or go home. Deliver shocking value. Focus attention on exponential results instead of linear ones. Leverage your investment into the biggest possible return.

All of those exciting phrases are exciting because they mean increasing risk. Some are drawn to that risk, preferring to be a dead eagle than a live turkey. “No one ever achieved greatness by playing it safe” said Harry Gray. There’s truth to this statement: taking risk does not guarantee a reward, but a high degree of risk avoidance can guarantee the absence of reward.

And so, the question becomes how to balance pursuit of risk against safe bets, which luckily is the sort of thing that businesses have been thinking about for a long time. Unfortunately, some of the tools available to the finance department may not work as well for a product team. For instance, diverse asset allocation makes a ton of sense for a financial advisor, and is a legitimate company goal up to the point of market saturation, but becomes less great when you’re spreading your development efforts across unique and unrelated product efforts. The essence of strategy is making choices, after all, and it's potentially hard to be unified behind a single strategy as well as diversified in your investments. Still, the idea of isolating risk budget from non-risk budget has merit.

Therefore we think of gambling versus groceries, or more prosaically, Research and Development. The gambling side of the house does research into ideas that become new products, and the groceries side of the house develops and maintains the products that already exist. A very sensible model, but one that has been known to produce a house divided. If internal political pressures or external market drivers shift an R&D organization into full groceries or full gambling, then another organization will most likely be formed or funded to solve the other need.

Saturday, April 16, 2022

When to use Design Discovery Patterns


 

People love one size fits all patterns and models, but it is usually better to use the correct tool for the job at hand.

So you need to solve a problem. You’re going to build a solution to solve the problem with. It’s going to have users, who will configure it, give it their problems, and get solutions out of it. You’ve written some product requirements, you’ve got personas, you’ve got ideas. Then you start talking with other people and are faced with a dilemma:

* Do you Double Diamond explore the whole space of potential?

* Or do you “thin steel thread” to a Minimum Viable Product, then expand as demand drives?

The designer: “You’re suggesting an approach that covers left-to-right languages, but I see in your meeting notes that some of your customers do have operations in countries with right-to-left languages, and are you sure you won’t have need for that some day?” 

The architect: “I see you’re thinking under a dozen simultaneous admin-level users but are you sure it won’t need to be architected for thousands?” 

The engineering manager: “There’s an export data function here, what if instead of raw data dump we support PDF with an accessible color template?”

This is double-diamond thinking, and each of those ideas represents a quarter of work that you didn’t account for in your timeline.

Alternatively, the project manager comes in with “why don’t we do a non-working prototype and see what people do with it?”

And the team lead suggests: “if we constrain to a single type of conversion, we could get this done by end of the quarter.”

This is thin steel thread, developing an MVP (Minimum Viable Product), and each of those ideas is removing a big chunk of what you wanted from the first release (in the non-working case, the chunk is “everything”). Digression: a popular model for explaining MVP is the “from scooter to bike to car” blog post — which I have to say I hate, because it’s not an effective metaphor. Making and selling physical objects that don’t optimally solve a problem is tremendously wasteful, especially if you take the problems of disposal into account. Making and selling software that doesn’t optimally solve a problem is considerably less wasteful. The problem is that while software ages like milk, the cost of producing it is considerably more expensive than milking a cow.

All that said, the George Box quote that “all models are wrong, some are useful" is certainly applicable here. Both models are the right choice at different times, and the distinction is in whether your development team has the right tools in their bench for solving the problem that needs solving. 

To do the thin steel thread trick efficiently, you need to be using off the shelf components. It doesn’t matter if they are in house or OEM or open source, it only matters if your dev team is knowledgeable and comfortable with the components. If the engineers aren’t using tools that they’re comfortable with, then they aren’t able to efficiently produce successful solutions to problems. So, first job is to get the toolbox you’ll need, and that’s a double diamond job. In order to build or select and learn a good set of tools, you have to know the whole problem space that you’re going to work in.

A second factor that leads to is domain expertise, which can be a product manager’s problem. Developers spend a lot of time learning software engineering, and not the problem space that your customer is trying to buy a solution for. The purpose of these design discovery patterns is to build a minimum level of expertise in the domain, sufficient to produce useful software tools. Like any tool, they are limited to the use case that their designer intended (consciously or otherwise) and may not function properly in others scenarios. 

Saturday, February 26, 2022

Continued Improvement in Software Products

What’s more valuable to the software vendor: improving what you’ve already delivered, or building something new?

At first it might seem like building a new thing will have the highest return on investment. After all, new customer growth being equal, the finished product is only going to get support, renewal, and expansion dollars from existing customers. A new product could be sold for full price to existing customers, so more potential. This view neglects the sad fact that software ages like milk, and requires constant attention to maintain its value. An enterprise software product exists in an ecosystem of supply chains, changing standards, and data flows with adjacent products. The ecosystem changes continually, and a product that doesn’t keep up weakens and eventually dies. There are exceptions, but by and large when products stop working, customers stop paying for them.

This is why growing and sustaining software can actually have higher business value for an enterprise vendor than new projects, because it is churn prevention. The easy benchmark in sales is every dollar churned is two you have to make in new business. But in enterprise businesses that's more like 1:5 or 1:10. Why? There’s a strong chance that product failures don't just churn a product, they churn the entire relationship. Failing to maintain or add features to software that your customers are still using leads to customer churn. 

You could argue that's not true or not important if you're on a one-and-done perm license model… you’re only risking your support contract renewals, reputation, and any expansion opportunity, but you’ve sold the software and that’s all done. "Just risking reputation" or "won't get expansions" ought to give any company serious pause, but this straw man position is that it's okay to ignore software maintenance if you primarily sell perpetual licenses. However, perpetual licenses make a lumpy revenue stream. If your business is primarily living on them, then your CFO is probably pushing for a recurring subscription model such as SaaS right now, to smooth out the lumpiness. And if you’re a SaaS, you are not very likely to keep subscriptions if the software starts breaking and you’re not fulfilling your end of the contract.

New projects have potential to disrupt industry and make 10x or 100x annual ROI. Every baby also has potential to be an Olympic athlete or famous musician. The odds are against a particular new project instantly becoming a world-changing 10X, and pretty good for a much more modest rate of return that gradually improves with the application of more effort. New products that don't immediately fail are often legit businesses that could steadily grow into serious revenue generators, if they’re sustained at the appropriate tempo. Leaving them unsustained or not adding features hurts the vendor. Or they can be shut down, leaving disappointed customers and opportunities for competitors.

Squad development models can exacerbate this form of “almost made it” product failure, by encouraging shift of resources away from product that is “finished” or not driving 10x right now. Another way to produce a stagnant product is to build with assumptions that someone else will take care of content, but then never build the partnership conditions for that to happen.

Sustaining and feature growth teams are churn prevention teams, and they're very probably making more money for the company than higher profile projects. The product manager’s challenge is to make that visible and exciting. A new product with its own license has an easy path to show exponential growth; after all, it’s not hard to double and triple your yearly sales when the starting point was zero. It takes at least a year in enterprise software to recognize if that early growth will hit a plateau. Sales of big ticket software take time to complete, license bundle definitions might have changed, and feedback or demand might be muddled by context loss. Arguments for how much money an existing product makes can get bogged down in soft accounting of pull-through dollars.

Anecdata can be a powerful tool in this situation. A financial argument is always your foundation, but you can support it with positive customer quotes about what the product means to their businesses. It can be tempting to go negative and pull quotes indicating intention to churn if features aren’t built: my advice is to avoid engaging your own leadership’s fight-or-flight response. Tell a positive story of account building, value retention, and license expansions instead.

Since you’re making an argument about the future, using a probabilistic model can help reveal the opportunity (or opportunity cost). Douglas Hubbard’s How To Measure Anything has a good set of examples to follow. The inputs are current and projected business per product, probability of partial churn, probability of complete churn, and planned cost of sustaining engineering over the same time frame.

Update: more thoughts here

Sunday, December 19, 2021

Enterprise Roshambo


Ever wish there was a simple game to explain how complex organizations make decisions? You’re in luck! Roshambo, also known as rock-paper-scissors, explains it all. There are a few productive hours in each day, and three conflicting ways to spend them. The game explains how they will be prioritized.

Default rules: in enterprise roshambo, Compliance beats Security.  Operations beats Compliance in most organizations. Operations beats Compliance beats Security.

  • Should we deploy a new patch? Security says yes, Compliance says not necessary yet, Operations says it’s risky: patch isn’t deployed.
  • Should we deploy an old patch? Security says yes, Compliance says yes, Operations says it’s risky: patch is deployed in a carefully scheduled maintenance window.
  • Should we alter scope of a Compliance audit if Operations asks? Yes.
  • Should we disable or uninstall a Security tool if Operations asks? Yes.

The exceptions are highly regulated environments, such as government agencies, food and pharmaceuticals, some commercial finance. Compliance beats Operations beats Security there because failure to follow the law stops (via government intervention) or slows (via budget-disrupting fines) the complex organization’s mission.

  • Should we interrupt Operations to ensure the medicine meets the needs of Compliance? Yes.
  • Should we interrupt Operations to deploy a patch for Security? Not unless Compliance says so.

An emergency can temporarily change the rules of the game. How they change depends on the emergency. For instance, Operations change freezes and Compliance change control boards are put on hold during a Security zero day response. Security beats Operations beats Compliance, until the emergency is resolved.

If the emergency is in compliance, such as threats of a crippling fine or loss of a major customer, then the lightly-regulated organization can temporarily act like a highly-regulated one. Compliance beats Operations beats Security.

Friday, November 26, 2021

Total Compensation


 Career decisions are complicated, but here are a few models to work with which might help. A salaried job is more than a simple exchange of your time for their money: you will be giving it some headspace in most of your waking moments. This conversation is not relevant to contract or hourly jobs. Still, the model of exchanging time for money is a useful starting point, so let’s start with the three forms of money that you’ll be offered in a salaried position.

  • Unvested stock options are lottery tickets. 
  • Cash is cash. 
  • Titles are free.

An ownership stake in the company is a common starting point in negotiation. We will all work together to make the company grow larger and therefore every dollar in options is potentially worth thousands in future money. Likewise, every lottery ticket in the gutter was once potentially worth thousands or more. They say lotteries are a tax on people who are bad at math; stock options aren’t quite that bad, but they’re certainly not guaranteed income. The form of the ownership doesn’t really matter: Restricted Stock Units (RSUs) are better than Incentive Stock Options (ISOs) but the whole thing is silly if the company doesn’t grow. You may think the risk of stocks is reduced when a company is past its startup years. I can’t help but note that lots of employees lost out in MCI and Enron’s glorious flameouts, while my own turn of the century Intel ISOs remained essentially worthless for twenty years.

Cash, on the other hand, is immediately liquid and can be exchanged for goods and services from coast to coast. Cash comes in two amounts: More Than Enough, and Less Than Enough. 

  • More Than Enough means that you do not worry about providing your dependents with their needs. Health and life insurance is part of cash. 
  • Less Than Enough means that you worry. If you do not receive enough, then you are going to devote part of your headspace to handling this financial pressure. 
As an employee, you should never accept a salaried job that includes financial pressure. If you’re going to accept financial pressure, then you’d better be holding a double-digit ownership stake because you’re not an involved employee any more, you’re a committed owner. See the story of the Chicken and the Pig for graphic illustration of the difference between employee and owner.

The final form of compensation is title. A salaried position has responsibilities and blast radius. The scope and radius can be commensurate with the size of the organization, but it’s not a hard rule. Individuals may be hired to try new ideas, or to revitalize major functions, or to solve long standing problems. The organization’s easy answer is to fit your title into the existing system and hierarchy as another one of a role that already exists, but it’s all open to negotiation. It costs the organization very little to stick “Senior”, “Chief”, “Principal”, or “Head of” in front of your role. That extra filip of title can help you tell a better story on your resume, or it can be useless. A Vice President in a large software company might report to the CEO and lead a thousand people, or they might fly around to conferences evangelizing vaporware and control nothing more than their own expense account. Titles are free.

Those three things are the tangibles that you as a salaried employee can get, but now let’s talk about your mental and spiritual compensation. Paloma Medina’s excellent BICEPS model is a great resource for thinking about what a role does for your happiness. No matter how much money you’re making or gambling, a salaried role should meet your needs for Belonging, Improvement, Choice, Equality, Predictability, and Significance. Really, just go read the link. A salaried role is at minimum two thirds of your waking hours but realistically something more like four fifths… if you’re not getting what you need, then you’re devoting that time to a thing that leaves you unhappy. That’s not a sustainable thing to do with one's life.

One last thing to consider if you’re looking at your current role or a potential role through this lens of compensation and happiness: 

  • Change is constant, but tempo is not.

All organizations will always be changing and that change will affect your happiness, both positively and negatively. At the very least, this can be a challenge to Predictability, and for some people that’s a very big desire. The next organization you join? They’re going to alter the deal also, because the world changes and we must all respond. What is not the same across all organizations is the pace at which change is accepted and implemented. Some organizations lean into change, while some lean back. Asking how changes have been handled in the past might help you find a team that supports your needs.

Thursday, November 25, 2021

Thoughts on Logging



  1. What should go into a log? 
  2. Enough, but not too much.

Those two phrases are contradictory and context dependent. This is why logging has different levels. They may be expressed as numbers or words. Each more verbose level is inclusive of the less verbose levels. Sematext has a nice in-depth overview

I can’t count the number of times that initial log collection didn’t uncover the root cause of a problem. I’ve also disabled or crashed the target system by turning logging too high. Balance is the key.

At the one extreme, Fatal is “don’t log unless you’re dying” which is awfully presumptive that the application will be able to realize it’s dying. A more common option is Error (only log problems) or Info (log what you’re doing, not how).

At the other extreme, Trace level logging is preemptive print-debugging. Each important function in each routine, when did it start, what was the input, what was the outcome. If issuing a special build for a customer to troubleshoot is a common occurrence, or something that you wish you could do, then Trace logging is for you. If the application still performs acceptably in production with Trace on, you’re probably not logging enough. Consider putting a built-in timer into the application to shut Trace off after a set amount of time.

Each line of a log should start with the date and time. UTC time is best. Logging in the local time of the device is regrettable, but acceptable if you’ll never need to correlate the events from one device with another one in another time zone. Logging in local time will cost unexpected effort and cause problems, but sometimes it can’t be avoided.

The time format should be ISO8601. Unix epoch is not ideal because people can’t read it. The format of your favorite locale is not ideal because people will get confused. ISO8601 or GTFO.

Entities should have identifiers so they can be traced. This is pretty simple if you’re the only application of your type running on a single physical system. The thing writing the log is the entity that matters and the hostname is plenty of ID. It gets deep if you’re an elastically scaling micro service on multiple cloud providers. Is the thing writing the log what matters any more? You might even architecturally be able to say that you don’t care and it’s all going to wash out at the services level… as long as you never have to debug or do forensics.

Tasks (or threads, or forked processes, or containers) should have identifiers so they can be traced. This is a remarkably deep problem; don’t let perfect be the enemy of good. Again, your architecture might suggest this isn’t necessary, and that suggestion may be correct in many circumstances.

Each line of the log should include a single event with time, entities, and tasks all recorded up front. Some of that feels repetitive and like you would only need to emit it once per file. That would be fine if you never need to correlate with the logs from other systems in order to trace a transaction.

It may be tempting to log an entire transaction as a single event, but that presumes your application will stay alive long enough to see the transaction neatly complete. It is better to log the start and end of a transaction as separate events. 

It is also tempting to use Event IDs instead of language to describe what is happening, since “404” takes less space than “Page not found.” A human readable string should be included as well so that humans can learn your IDs by use. This also allows you to potentially translate your event ID into different languages for use by different humans. You should never use translated strings without event IDs though, this is annoying. Why? Because it makes things harder for organizations that operate in many languages. Microsoft did not keep this behavior when upgrading Windows, but customers and log analysis vendors have still needed to build separate Windows log parsers for every language they'll support.

Writing a log to a file is the obvious path, and there’s a whole world of tooling for rerouting that file output into searchable storage. The simplicity of a file or cloud bucket is a huge win; if you’re up at all, you can probably write a file; if you can’t write to a file, the reason is probably obvious and more widespread than a subtle problem in your program. However, there are also reasons to write to a structured data system. Log data is at least semistructured, and using a structured system allows indexing and search. There’s cost down this road though; either the commercial cost of a tool or service, or the performance cost of a database. Furthermore, the log consumer now has to get into the database, or your product has to provide an auditing API. In my opinion databases are rarely a good idea for log storage.

Some other thoughts on logging:

Product Manager to Product Ratio

How many chucks could a woodchuck chuck if a woodchuck could chuck wood? It depends on the structure of the organization's tech stack. A highly structured tech stack provides a format that you can build repetitive products with. Structure makes the design obvious and lets the developers work faster, which means more gets done easier. 

Every problem that you're solving is just a case of “fit this into our model and then solve it”, and you can do a lot with a little. For instance, in many systems management products the problem is to recognize endpoint state and take action, and the interface is to show state and guide to action. In many data analytics products the problem is to collect data and show informative panels like counts of types. If you make sponges, the easy use cases are all some form of "clean something". Given the commonality of these goals, a highly structured tech stack can be produced to make common tasks simple. In my experience this might be an average of three devs per product and a one to three PM to product ratio. I’ve seen a lot higher: averages of one to five and one to six in two of the teams I’ve been on.

On the other hand, a lot of structure can be constrictive, and a more flexible approach has benefits. If you treat every problem as a new solution to discover and build from first principles, maybe you’ll come up with new shortcuts to success. But you’ll also be unable to depend on preexisting structure. In my estimation you'll need at least five or six devs per product and a one to one PM relationship.

I've run over a dozen very similar products by myself and I've been taxed to capacity by a single product… it's all about how much structure the development platform provides. I'm defining product as an installable/removable module that’s complex enough to need semi-dedicated development team. A content pack is not a product.

If you’re not sure how to describe the amount of structure in your organization, you can assess it in reverse by asking about past performance. If getting a new product to usable and salable is a one quarter job for under three people, you're probably very structured. The first task is to learn the structure, and studying the existing products will help. If it's more like a year, plan accordingly and lean into product management process.