Sunday, December 19, 2021

Enterprise Roshambo

Ever wish there was a simple game to explain how complex organizations make decisions? You’re in luck! Roshambo, also known as rock-paper-scissors, explains it all. There are a few productive hours in each day, and three conflicting ways to spend them. The game explains how they will be prioritized.

Default rules: in enterprise roshambo, Compliance beats Security.  Operations beats Compliance in most organizations. Operations beats Compliance beats Security.

  • Should we deploy a new patch? Security says yes, Compliance says not necessary yet, Operations says it’s risky: patch isn’t deployed.
  • Should we deploy an old patch? Security says yes, Compliance says yes, Operations says it’s risky: patch is deployed in a carefully scheduled maintenance window.
  • Should we alter scope of a Compliance audit if Operations asks? Yes.
  • Should we disable or uninstall a Security tool if Operations asks? Yes.

The exceptions are highly regulated environments, such as government agencies, food and pharmaceuticals, some commercial finance. Compliance beats Operations beats Security there because failure to follow the law stops (via government intervention) or slows (via budget-disrupting fines) the complex organization’s mission.

  • Should we interrupt Operations to ensure the medicine meets the needs of Compliance? Yes.
  • Should we interrupt Operations to deploy a patch for Security? Not unless Compliance says so.

An emergency can temporarily change the rules of the game. How they change depends on the emergency. For instance, Operations change freezes and Compliance change control boards are put on hold during a Security zero day response. Security beats Operations beats Compliance, until the emergency is resolved.

If the emergency is in compliance, such as threats of a crippling fine or loss of a major customer, then the lightly-regulated organization can temporarily act like a highly-regulated one. Compliance beats Operations beats Security.

Friday, November 26, 2021

Total Compensation

 Career decisions are complicated, but here are a few models to work with which might help. A salaried job is more than a simple exchange of your time for their money: you will be giving it some headspace in most of your waking moments. This conversation is not relevant to contract or hourly jobs. Still, the model of exchanging time for money is a useful starting point, so let’s start with the three forms of money that you’ll be offered in a salaried position.

  • Unvested stock options are lottery tickets. 
  • Cash is cash. 
  • Titles are free.

An ownership stake in the company is a common starting point in negotiation. We will all work together to make the company grow larger and therefore every dollar in options is potentially worth thousands in future money. Likewise, every lottery ticket in the gutter was once potentially worth thousands or more. They say lotteries are a tax on people who are bad at math; stock options aren’t quite that bad, but they’re certainly not guaranteed income. The form of the ownership doesn’t really matter: Restricted Stock Units (RSUs) are better than Incentive Stock Options (ISOs) but the whole thing is silly if the company doesn’t grow. You may think the risk of stocks is reduced when a company is past its startup years. I can’t help but note that lots of employees lost out in MCI and Enron’s glorious flameouts, while my own turn of the century Intel ISOs remained essentially worthless for twenty years.

Cash, on the other hand, is immediately liquid and can be exchanged for goods and services from coast to coast. Cash comes in two amounts: More Than Enough, and Less Than Enough. 

  • More Than Enough means that you do not worry about providing your dependents with their needs. Health and life insurance is part of cash. 
  • Less Than Enough means that you worry. If you do not receive enough, then you are going to devote part of your headspace to handling this financial pressure. 
As an employee, you should never accept a salaried job that includes financial pressure. If you’re going to accept financial pressure, then you’d better be holding a double-digit ownership stake because you’re not an involved employee any more, you’re a committed owner. See the story of the Chicken and the Pig for graphic illustration of the difference between employee and owner.

The final form of compensation is title. A salaried position has responsibilities and blast radius. The scope and radius can be commensurate with the size of the organization, but it’s not a hard rule. Individuals may be hired to try new ideas, or to revitalize major functions, or to solve long standing problems. The organization’s easy answer is to fit your title into the existing system and hierarchy as another one of a role that already exists, but it’s all open to negotiation. It costs the organization very little to stick “Senior”, “Chief”, “Principal”, or “Head of” in front of your role. That extra filip of title can help you tell a better story on your resume, or it can be useless. A Vice President in a large software company might report to the CEO and lead a thousand people, or they might fly around to conferences evangelizing vaporware and control nothing more than their own expense account. Titles are free.

Those three things are the tangibles that you as a salaried employee can get, but now let’s talk about your mental and spiritual compensation. Paloma Medina’s excellent BICEPS model is a great resource for thinking about what a role does for your happiness. No matter how much money you’re making or gambling, a salaried role should meet your needs for Belonging, Improvement, Choice, Equality, Predictability, and Significance. Really, just go read the link. A salaried role is at minimum two thirds of your waking hours but realistically something more like four fifths… if you’re not getting what you need, then you’re devoting that time to a thing that leaves you unhappy. That’s not a sustainable thing to do with one's life.

One last thing to consider if you’re looking at your current role or a potential role through this lens of compensation and happiness: 

  • Change is constant, but tempo is not.

All organizations will always be changing and that change will affect your happiness, both positively and negatively. At the very least, this can be a challenge to Predictability, and for some people that’s a very big desire. The next organization you join? They’re going to alter the deal also, because the world changes and we must all respond. What is not the same across all organizations is the pace at which change is accepted and implemented. Some organizations lean into change, while some lean back. Asking how changes have been handled in the past might help you find a team that supports your needs.

Thursday, November 25, 2021

Thoughts on Logging

  1. What should go into a log? 
  2. Enough, but not too much.

Those two phrases are contradictory and context dependent. This is why logging has different levels. They may be expressed as numbers or words. Each more verbose level is inclusive of the less verbose levels. Sematext has a nice in-depth overview

I can’t count the number of times that initial log collection didn’t uncover the root cause of a problem. I’ve also disabled or crashed the target system by turning logging too high. Balance is the key.

At the one extreme, Fatal is “don’t log unless you’re dying” which is awfully presumptive that the application will be able to realize it’s dying. A more common option is Error (only log problems) or Info (log what you’re doing, not how).

At the other extreme, Trace level logging is preemptive print-debugging. Each important function in each routine, when did it start, what was the input, what was the outcome. If issuing a special build for a customer to troubleshoot is a common occurrence, or something that you wish you could do, then Trace logging is for you. If the application still performs acceptably in production with Trace on, you’re probably not logging enough. Consider putting a built-in timer into the application to shut Trace off after a set amount of time.

Each line of a log should start with the date and time. UTC time is best. Logging in the local time of the device is regrettable, but acceptable if you’ll never need to correlate the events from one device with another one in another time zone. Logging in local time will cost unexpected effort and cause problems, but sometimes it can’t be avoided.

The time format should be ISO8601. Unix epoch is not ideal because people can’t read it. The format of your favorite locale is not ideal because people will get confused. ISO8601 or GTFO.

Entities should have identifiers so they can be traced. This is pretty simple if you’re the only application of your type running on a single physical system. The thing writing the log is the entity that matters and the hostname is plenty of ID. It gets deep if you’re an elastically scaling micro service on multiple cloud providers. Is the thing writing the log what matters any more? You might even architecturally be able to say that you don’t care and it’s all going to wash out at the services level… as long as you never have to debug or do forensics.

Tasks (or threads, or forked processes, or containers) should have identifiers so they can be traced. This is a remarkably deep problem; don’t let perfect be the enemy of good. Again, your architecture might suggest this isn’t necessary, and that suggestion may be correct in many circumstances.

Each line of the log should include a single event with time, entities, and tasks all recorded up front. Some of that feels repetitive and like you would only need to emit it once per file. That would be fine if you never need to correlate with the logs from other systems in order to trace a transaction.

It may be tempting to log an entire transaction as a single event, but that presumes your application will stay alive long enough to see the transaction neatly complete. It is better to log the start and end of a transaction as separate events. 

It is also tempting to use Event IDs instead of language to describe what is happening, since “404” takes less space than “Page not found.” A human readable string should be included as well so that humans can learn your IDs by use. This also allows you to potentially translate your event ID into different languages for use by different humans. You should never use translated strings without event IDs though, this is annoying. Why? Because it makes things harder for organizations that operate in many languages. Microsoft did not keep this behavior when upgrading Windows, but customers and log analysis vendors have still needed to build separate Windows log parsers for every language they'll support.

Writing a log to a file is the obvious path, and there’s a whole world of tooling for rerouting that file output into searchable storage. The simplicity of a file or cloud bucket is a huge win; if you’re up at all, you can probably write a file; if you can’t write to a file, the reason is probably obvious and more widespread than a subtle problem in your program. However, there are also reasons to write to a structured data system. Log data is at least semistructured, and using a structured system allows indexing and search. There’s cost down this road though; either the commercial cost of a tool or service, or the performance cost of a database. Furthermore, the log consumer now has to get into the database, or your product has to provide an auditing API. In my opinion databases are rarely a good idea for log storage.

Some other thoughts on logging:

Product Manager to Product Ratio

How many chucks could a woodchuck chuck if a woodchuck could chuck wood? It depends on the structure of the organization's tech stack. A highly structured tech stack provides a format that you can build repetitive products with. Structure makes the design obvious and lets the developers work faster, which means more gets done easier. 

Every problem that you're solving is just a case of “fit this into our model and then solve it”, and you can do a lot with a little. For instance, in many systems management products the problem is to recognize endpoint state and take action, and the interface is to show state and guide to action. In many data analytics products the problem is to collect data and show informative panels like counts of types. If you make sponges, the easy use cases are all some form of "clean something". Given the commonality of these goals, a highly structured tech stack can be produced to make common tasks simple. In my experience this might be an average of three devs per product and a one to three PM to product ratio. I’ve seen a lot higher: averages of one to five and one to six in two of the teams I’ve been on.

On the other hand, a lot of structure can be constrictive, and a more flexible approach has benefits. If you treat every problem as a new solution to discover and build from first principles, maybe you’ll come up with new shortcuts to success. But you’ll also be unable to depend on preexisting structure. In my estimation you'll need at least five or six devs per product and a one to one PM relationship.

I've run over a dozen very similar products by myself and I've been taxed to capacity by a single product… it's all about how much structure the development platform provides. I'm defining product as an installable/removable module that’s complex enough to need semi-dedicated development team. A content pack is not a product.

If you’re not sure how to describe the amount of structure in your organization, you can assess it in reverse by asking about past performance. If getting a new product to usable and salable is a one quarter job for under three people, you're probably very structured. The first task is to learn the structure, and studying the existing products will help. If it's more like a year, plan accordingly and lean into product management process.

Sunday, October 10, 2021

What kind of product are you making?

First know everything, then you can automate it! Also, if you can express your problem in numbers, I can tell you if they’re going up or down.

A freeform exploration product attempts to enable a customer to achieve understanding and express the problem in numbers. Features may include user-editable schema (traditional database and ETL products) or schema on the fly (Splunk or Elastic), loose typing, and a “unix is user friendly” workflow experience. In some markets this sort of product can be so far beyond usability that it is more of a platform for partners to build products on. Many products rightfully do not expose their use of MS-SQL, Oracle, Mongo, or Redis to their users. 

Those products can be thought of as market-targeted solutions. They come to the customer with an opinion about what problem is being solved and how that solution should work. They notably have predefined schema and strict typing (good for performance and safety). One should expect a polished wizards-n-workflows experience with no unnecessary options.

So, as a development team with a project starting up, you might need to ask yourself: which one of those supports the business outcome this project is looking for?

Unless you are starting a company from scratch, the first approach of solving hard, general problems is probably not what you want to tackle. And even when you are starting from scratch to build a new platform… there’s a lot of reasons why most startups fail. Time lost to analysis paralysis at general problems is one. Inability to describe your value add to the market of customers with problems is an ever bigger one. If I’m trying to reduce theft in my supply chain, I probably don’t want to start with defining schemas in a raw data management tool.

And if you’re in a team at an existing company, chances are extremely slim that your charter includes solving general problems. Much more likely is a charter to deliver a solution which maximally leverages your existing platform to capture entry in a new market.

Saturday, October 2, 2021

Declaring Idea Bankruptcy

It’s obvious that your R&D team can’t do everything at once, right?

It’s obvious that items lower on the backlog aren’t going to happen unless they displace something higher on the backlog, right?

And yet. Lots of those items are people’s beloved ideas. Good ideas, that would make the product better, open new business opportunities, solve real problems. 

Good ideas are going to be rejected for a lack of capacity to develop and exploit them. This makes a lot of people sad, so product managers can fall into a trap. Instead of taking action, they ignore the backlog as it grows to thousands and thousands of items. And now, it’s no longer functional. Instead of a todo list, you have a swamp. People who actually need a backlog start doing their own in shadow IT, the roadmap moves into spreadsheets, and your organization no longer has visibility between planning and execution.

There’s a solution to this: Automatic bankruptcy. Any tickets that aren't touched in six months get resolved “future consideration” with a friendly note to reopen if it's still relevant.

Let’s look at how this works at each level of activity:

  • Tasks are the easiest— a task is just a development note of things that ought to be done. When they don’t get done quickly, they’re forgotten. A reminder at six months lets the unimportant and already done tickets close easily, and prompts the developers to do it if it’s still important. There’s even an argument to silently close these, though I don’t agree with that idea.
  • Bugs are also pretty easy. Bug tickets are regularly opened without sufficient information to act on, leading to either stopped communication or out-of-band investigation. This produces a steady flow of junk tickets. Real problems get fixed way before six months, if only because they hit some executive’s inbox. So a bug with six months of inactivity is highly unlikely to be acted on. Shut it down. Maybe the watchers will reopen with new information.
  • Stories (may also be called Improvements or New Features) are where it gets tough: this is where you’re discussing people’s ideas and problems. Maybe the filer is losing hours per week because this story isn’t done. Maybe they're going to lose some deals. Maybe they’re just insulted that you aren’t acting on their advice. Nevertheless, six months of inactivity is a strong signal: your team has not had bandwidth for this. Don’t delay any further: it’s the Product Manager's job to determine if this is truly a priority and make it happen, or to let the filer down as easily as possible. A PM who avoids this difficult conversation is doing no favors to anyone.
  • Epics are easy, just collections of stories and bugs, everyone forgets they exist as soon as MVP is shipped. This is housekeeping on the same level as tasks.
  • Initiatives: now for the hardest one. Initiatives may not be in the same backlog at all, maybe you’re using post-its or a spreadsheet or a product management specific tool. Still, you’re looking at the same problem as the story, just writ large. Should we field a product to enter that market? Maybe it's a good idea, that's great. But, there’s other things that won’t be done now, and the point of this process is to include opportunity costs in decision making. Or maybe there's been some out-of-band people working the idea to prove it out. If customers don’t really want it or you can't really build it, then it stops. Lots of initiatives quietly die, either untried or stopped after investigation. Just like in the development backlog, they clog up your vision, until you no longer know what your plan is. As a product leader, if you're trying to tell the e-staff and board that there's ten years of work in your todo list, expect to hear some questions.

It really doesn’t matter if we’re discussing board-facing initiatives or developer-facing tasks or the stories and bugs between them. After six months of inactivity: everything must either die or be defended.

Doing it in JIRA:

  1. Meet with R&D and customer support leadership so they know what you're about to do and why you're doing it. Get at least to disagree and commit. It's critical to note that resolution isn't permanent -- if the team is willing to keep reopening this ticket, maybe you need to just do it. The goal is to make communication happen.
  2. Edit the relevant shared schemas and add a resolution type of “Future Consideration”. For instance, you might have four schemas: initiatives, software projects, SaaS projects, and services projects. Make sure all software development projects use the same schema or else life sucks too much for you to follow any of this advice.
  3. Announce to the organization that you're starting this process.
  4. Make a saved search like this: updatedDate <= startofday(-180d) and resolution=unresolved. You'll probably need to start with a selection of projects and gradually expand to everything as you train the organization to this new reality.
  5. Schedule it for once a week on your most communication friendly day.
  6. Bulk action, Transition, Resolve, Future Consideration, Resolution Comment: "Automatically resolved after six months of inactivity. Please reopen if still valid." 
    • You'll want a keyboard shortcut for that phrase, just like "What is the problem you are trying to solve?"
  7. Leave the Send Email button checked on, and politely engage with the conversations that result. 

Saturday, September 18, 2021

The regrettable features you have to do

Sometimes as a product manager you get a feature request that’s fun and challenging and moves your company forward. Then there’s requests that just make you feel like a sad clown: stuff that doesn’t fit your plan at all. It’s hard work to ignore the nay-sayers and make a new thing. The product team and engineering have worked together to create a different and better approach to a class of problems… Congratulations! But there’s features you may need to build even though you don’t want to.

Compromise between your product differentiation and market fit comes in three flavors: legitimate needs that you hadn’t recognized or accounted for, design culture that you’re hoping to change, and technology designs that you don’t address.

A legit need might be regulatory compliance that your design doesn’t account for well, or a use case where you were too optimistic about your solution’s fit. This feature request is legitimate, and the regrettable part is wholly on you and your team. You have to find a way to solve it or accept a limit on who you can sell to. That story is beyond the scope of this post.

But design culture… here you’re trying to change behavior. You think your product has a better answer, and you need customers to take a leap of faith. If you have an interesting idea you’ll probably be able to find early adopters, but these customer have a lot on the line. It’s only natural that they want the safety of familiarity or a backup plan, and you may need to offer a bridge feature

Lastly, technology design choices are anything but simple: some design domains just aren’t good fits, or aren’t within your company’s scope. Another source of compromise is design fashion. Ideas cycle in and out of fashion as the memory of failure fades. This can mean surges of industry excitement about concepts that you don’t personally agree with. Always begin with questioning your own assumptions, because conditions do change. Sometimes the idea that fizzled last time or the time before succeeds in the next iteration.

No matter what you think of design fashion, it’s not likely that you’ll be able to refuse to play the game. Your company is on the competitive field, and you have to have a response, or else your response is going to be whatever your sales people think of. So, research the problem space and ask what has changed. If there is new opportunity and you can capitalize on it, this isn’t a regrettable feature. That story is also beyond the scope of this post. But if you do your research and you don’t see how this time is any different, then you need to find an answer which minimizes investment while maximizing information return. 

That is the same regrettable feature answer as when the idea in question is not new fashion at all, but just not a good fit: a concept that your company doesn’t and/or won’t play in. If it’s not a good fit, then you shouldn’t build it, but you can’t ignore everything that isn’t a good fit.

So, regrettable features: bridges from old patterns to new, fashionable experiments, and requirements you don’t want to or can’t satisfy. What can be done by partnering, adding content, or remarketing what you already have?

Partnership can take a couple of flavors:  total outsource of the problem to a chosen vendor, or meeting the community at an interface. The total outsource is least effort on your part, but it only works if the partner is a de facto winner in the space. You don’t make success by stacking failures together. So if that obvious winner hasn’t taken all, you need to make a clear interface that treats them all the same. Set your terms, define your boundaries, pick a couple of partners to go to market with, and see what happens. You’ve got a feature, but it’s stripped to the minimum.

Adding content is another option for vendors who have a strong enough boundary between platform and content. If you have a community of customers, field engineers, and pro service partners building content, then you can solve problems without engineering and support contracts. This approach arguably has a limited shelf life because some customers will push back on the unsupported nature of roll-your-own or second party content. That point can be far in the future though, and you’ll certainly get some real world data about what customers want.

Remarketing is the toughest option, saved for last; only the very lucky vendors can shake the puzzle box and get a better picture. But if you can package your components differently to resolve a problem, it's a lot faster to do that work with legal and sales ops now instead of after engineering is done.