Saturday, October 5, 2019

Scripts for Adulting


  1. Hello, I’ve been admitted to the 2019 class and I have a question about my high school grades. Can you help? My reference number is #######. * Get the dates and account numbers together ahead of time.
  2. I’m going to get a bad grade in a class, or possibly a withdrawal. * Just the facts! They don’t care what happened.
  3. will this affect my acceptance to university?
  4. does it make a difference if I take the bad grade or the withdrawal?
  5. are there recommended steps I should take?
  6. what was your name? * In case you need to explain where you got advice later.
  7. thank you!


I’ve found that writing little scripts like that really helped my kids with their adulting conversations as they went through high school and into college. My daughter was very upset about the class, but it wasn’t relevant to her major so there was no point in discussing how or why the bad grade was happening.

Plan out what you’ve got to say, plot a path that your own emotional hot buttons, and gather the stuff that you can anticipate needing.

It’s a useful tool for managers as well. Tough conversations are part of the career. If you go in prepared, they are a little less tough.


  1. the company is making a change. * Just the facts.
  2. what’s the reasoning, quick outline of process. * Why is this happening.
  3. how does it impact this team. * Most positive spin possible.
  4. how does it impact you. * Simply your opinion of the reasoning and outcome, and how you came to accept that it was acceptable. If it’s not acceptable, save that for the separate communication where you announce your resignation.
  5. summarize: what’s happening, impact to this team, what should everyone do next.


If you’ve got lots of time to prepare, you might even think through some likely interactions, but that can backfire by helping you spiral back into emotional territory. The goal is to be able to communicate the facts and save your feelings for a different conversation.

Wednesday, August 28, 2019

Platform and Partners, Round Two

After reviewing this post on platforms and partnerships, there’s more to dig into. By definition, you can’t cross the Bill Gates line by yourself, but who should you be seeking partnership with? Developers who consult or consultants who develop? What tools should you build for them?

At the end of that article, I felt that free form coding was required. My reasoning is that the platform vendor cannot predict valuable use cases well enough to produce the toolkit that a consultant would need. This is not condemnation of the toolkit or the consultant. Rather, it is a recognition that high value jobs require deep linkage to the customer’s processes and data systems, meaning that they are complex and customer specific. This means you’ll need consulting service shops to achieve them, not development shops.

Consulting services partners only have linear contributions to your bottom line though. Managing and supporting them therefore needs to be a linear cost, and that implies keeping their toolkit minimalist and simple in nature.

The most elegant and efficient way to reach this state is to not provide a special toolkit to service partners at all; instead, partners work with the same toolkit that your own development teams use. Imagine a company in which every team’s functionality is available via service interfaces that designed to be eventually public. Such a company is not only using Conway’s Law for good, they are enabling partners by enabling themselves. This doesn’t eliminate partner-vendor squabbling, but it can keep the tenor focused on more easily resolved questions. It’s easier to answer “we want a bigger slice of these deals” (single variable, money) than “we want an easier and more flexible development toolkit” (what do easy and flexible even mean).

“APIs everywhere” as a partner service model also generates the maximum value for a development partner, who is now unconstrained. They may plugin to your stack anywhere and create value in any way. However, this is not an unalloyed good. Where many services partners are constrained to a single platform vendor (or at least a preferred vendor per use case), the development partner has a more flexible destiny. They are also more inclined to risk, since their business model rests on big upfront investments with uncertain but hopefully exponential rewards. If the platform vendor’s stack is completely open, a development vendor can easily subvert the vendor’s intention, and is far more likely to try it than a services partner. A few interesting examples: Elastic’s fight with AWS, AirBNB’s uneasy relationship with listing indexers, and Twitter’s on-again-off-again stance towards third parties. One might use an analogy: services partners for dependable, steady growth, development partners for high risk, potentially explosive growth. This can be a helpful model in deciding what vendors to support, but isn’t as helpful when deciding what toolkit to ship to them.

It’s worth picking apart the difference between technical support of a model and legal support of a model. Open APIs as a technical choice is a clearly beneficial system: internal and external teams are on the same footing, allowing maximal value to customers for minimal effort expenditure. The downsides of the model are in business risk. Remediation of that risk is a business problem, and the resolution is a partnership contract requirement and a technical enforcement via access keys. That’s obviously not an option for a fully open source system, but I can’t say I’d advise a fully open source approach to any platform business anyway.

Licensing models, self-service style

In my other two posts about licensing, I suggested that flat rate pricing is best for customers, but impossible in enterprise sales because of the variable and high costs of making a sale.

Those costs are difficult to understand if you haven’t been exposed before, but they are all too real. Weeks spent in negotiating a price are only the start; weeks spent in negotiating contract language are just a feature. What about indemnification? Can the vendor insure the customer against potential supply chain threats for the foreseeable future? It’s simply a matter of cost... and that insurance policy is now part of pricing.

What will happen to the deal if the vendor is purchased by another company? Can the customer audit the vendor’s source code? If the vendor goes insolvent, does the customer get to keep the source code? Yes, I have seen a customer organization running their own version of a formerly commercial product a decade after the vendor threw in the towel.

I was once involved in a contract between two industry titans that included a minimally disguised barter of services, and one of those services was sold to a third company as soon as the ink was dry. The cost to make and then keep that sale was... not small.

So as a vendor, there is a reasonable pressure to force your cost of sale down, and there is a clear goal: the almost zero cost clickwrap contract. Simply set your terms, disallow negotiation, and let the dollars roll in. It’s the ultimate expression of flat-rate pricing.

This is a fine approach for what I like to call lifestyle businesses: if you just need enough money for you and your cat to live happily, then sell away. The catch is that the most lucrative potential customers literally can’t buy from your business because of the potential risk. You’re probably good to go if your addressable market is consumers and your price fits on a credit card, but big business is off the table.

Wait! Singleton users and small teams buy in this model all the time! Expense report reimbursement is open to question, but no one cares if the price is low enough. A frustrated employee may just eat a few dollars for a productivity enhancing tool. The clickwrap model gets extremely blurry around personal computing appliances. I’m writing this in Bear on my iPhone, how is my employer to distinguish it from work I do with and for the company on the same device with the same app? (In my case, I use different editors for different roles.) Corporation and government legal departments try to draw a clear line, but IT struggles to implement that line and a clickwrap vendor is therefore always in danger of being pinched by changes in policy. Shadow IT is no place to make big money.

However, shadow IT does have some astounding success stories: Amazon Web Services is the obvious example, but Balsamiq, Basecamp, and Glitch (FKA Fog Creek) come to mind as well. If the official channels cannot support a use case and the need is great, then people will find a way.

Sunday, August 25, 2019

Put PICA on Notable Events


For every notable event, the analyst adds a little PICA.

What’s a notable event? It’s a record that something happened, or an alert that something is expected to happen. It theoretically requires some form of response, from “read and move on” to “read and acknowledge” to “follow this run book” to “alert the [managers|Red Team|President] and [start the clock|increase logging|take cover]”. A notable event may be an Incident or Event in ITIL terms, a Ticket in bug tracker or fry cook terms, or simply grist data for a machine learning mill.

What is PICA? An acronym borrowed from the Dallas News by Clayton Christensen.
* Perspective: what is the importance of this event to the organization’s goals? Does it affect security posture? A service level objective? Is it a compliance breach?
* Insight: what is the cascade potential for the risk represented by this event? this event? Does it require immediate remediation or is it just a counter to be watched?
* Context: Is this event a one-off, or is it common? Is it more common for the grouping than the overall organization?
* Analysis: is this type of event occurring more or less frequently than in the past?

With a special incident, the statement is clearly true: The SAN is almost full. My perspective tells me that systems are going to stop working, and my insight into those systems lets me understand knock-on events across my organization. I know the context, why we need these systems to fulfill our mission and why that is important, and I use my analytical skills to determine a course of action.

However, every firewall rule triggered alert in a SOC or breakfast ticket in a diner does not immediately require a great deal of insight. As a developer, I see your low-impact typo ticket and I fix the bug.

There is still a need for PICA on these low-or-no impact notable events. Perspective: they still consume human attention, wasting the most expensive resource in the environment.  Insight: this kind of alert is ripe for automation, and a fine place to use a machine learning algorithm. Context: Reducing the flow of useless alerts makes important ones stand out better. Analysis: cost-benefit calculation suggests spending this much time to eliminate that noise.

Managing the Unmanageable

I’ve been thinking off and on about containers (FKA partitions, zones, jails, virtualized apps) and mobile ecosystems for a few years. These technologies have gone through several iterations, and different implementations have different goals, but there is an overlap in the currently extant and growing versions. Hold containers, IOS/Android, and MDM-plus-AppStore enabled laptops together and look at the middle of the diagram: 1: management is done in the surrounding systems, not in the daily use artifact. 2: management needs are minimized by simplicity.

A container is built, run, and deleted. There is no “manage”. To change or fix it, you go upstream in the process. A phone app may be installed or uninstalled, but it will take care of updating itself from someone else’s activities upstream in the process, just like a container. Users and admins don’t patch them, instead vendors push updated versions into an infrastructure that automatically does the needful. Even the infrastructure around the app or container, firewall policies, routing policies, device controls, all the policies and configuration that make the system secure and effective are also managed centrally and pushed into place.

This vision of abstracted management has attractions from many perspectives, which are obvious enough that I won’t waste time repeating them. It is also frustrating to teams tasked with monitoring and managing to existing standards of compliance. The new model is for computing appliances and services, and does not fit well with the current model of managing general purpose operating systems. It’s arguable if the computing appliance model can apply to general purpose computers at all; it’s theoretically possible to lock one down sufficiently but the result isn’t better than a mobile device. This attempt failed in the BYOD (Bring Your Own Device) laptop cycle, but the idea of being able to add and remove “appliance mode” on a general purpose device hasn’t died and only time will tell. BYOD seems to be working just great for phones, after all.

The power of systems management tools comes from the philosophy of the general purpose operating system. Programs run with each other in a shared environment which fosters their working together to serve one or many users. Users, including administrators, can remotely do whatever they need via networking. In the primordial slime of the business opportunity called systems management, administrators would use remote shells to script their desires into being, pulling packages into place when needed. Much has changed, but the fundamentals of these tools remain the same: a remote program with privileges, command and control networking, and a file movement tool.

The new model does not allow these fundamentals. We aren’t running as root in the remote host anymore. While mobile and laptop systems retain broader abilities, in the strictest container models even communication and files are only allowed to come from one place. There are exceptions as a matter of theory, but organizations who embrace the philosophy are going to prefer blocking those exceptions. And they will be right, because running visibility and control agent programs in a container or a mobile app sucks. Not only does it increase the weight and computational complexity of the target, it does so for no good reason; the fabric and philosophy of the new model are designed to prevent anything useful being done from this vantage point. Your process is not suppose to worry about other processes. As a user, you’re supposed to worry about your service fulfilling its purpose, not management functions.

This philosophy is not a comfort to compliance auditors, some infosec teams, or traditional systems administrators (hi, BOFH and PFY). It sounds too much like developers sitting in an ivory tower and announcing that they have handled everything just fine, a priori. Even if they say “devops” and “SRE” a lot. But at the end of the day, organizations are regularly accepting a similar statement from their everything as a service vendors, and not many can fully resist the new model’s siren song. But, a new computing model is not able to ignore law, finance, and customary process. The result is a grudging middle ground of management APIs, allowing a minimum viable level of visibility and control into the new model.

These APIs do not restore management fundamentals; they only allow you to log, to measure states, and to initiate change within the new model’s parameters. Posit that breaking the new model rules is going to fail, immediately or eventually. A management vendor is therefore in a jail cell, and has to differentiate from inside when offering visibility and control for computing appliances. Windows CE was the last gasp of general purpose operating systems for appliance friendly use cases (Linux may appear to be an exception, but the deployed instances used in appliances are hardly sporting full Unix shells). From here out, endpoints are full general purpose machines, a mobile approach, or a handful of frozen kiosk and VDI images. Servers are a mass of general purpose machines, mostly on virtualization, sometimes delivered as a service, with an explosively growing segment of service oriented app virtualization.

A new type of management agent is born for these API-driven appliance models. Maybe it’s implemented in “sidecar” containers or as “MDM approved” apps, or maybe it lives fully in the cloud, maybe it’s the focus of a new vendor or the side project of an established one. There will certainly be pronouncements that it brings new value to the use case. Doesn’t matter how it’s implemented or marketed though, it’s accessing the same APIs as everyone else. Its best efforts are limited to “me-too”. Differentiation is either in costly and difficult up-stack integration, or a capital-burning race to open sourced commoditization.

A customer who wants single pane of glass visibility, is left with few options: build their own analytics, invest in data lake technologies, or buy extensions to their main management tools. Almost all select two of the three for resilience.

It may make an unpleasant experience for the management tool, where this ghost of management is fit into the same console and mental model as a full-powered vendor’s real capabilities. “Here is your domain, in which you can do what is needed to ensure your organization’s mission! Except on these special systems where you know a lot less and can’t do much of anything.” Customer expectations are sort of hit but kind of missed, and no one is very happy. Some vendors can sell “know less and do less” alongside “full visibility and control” for the same price. Others may adjust the license model instead.

So, is the single pane of glass worth a cognitively dissonant user experience? Or does the customer split their visibility and control tools and buy something else to glue things back together, moving that dissonance higher up the stack? Because there will surely be dissonance when clicking for action in tool A has to go through tool B’s brokerage into Tool C for execution.

There is a useful comparison to minority or legacy operating systems. Management and visibility tools universally reduce their capabilities on platforms that aren’t as important to their customers, so very few are excellent on Solaris, AIX, or HP/UX. The important difference is that a vendor’s reduced AIX capabilities are a matter of choice. If the market demanded, the vendor could eventually resolve the problem. A management vendor cannot change the operating model of an entire ecosystem, so computing appliances are not like legacy computing. But there is an analogy in that the tools do not align perfectly with customer needs, leaving gaps to fill with people and process.

If we imagine a perfectly amazing management tool for AIX that doesn’t integrate with the tools used for Linux and Windows, the choice becomes clearer. Customers don’t require visibility and control for operating systems or computing models, but rather for business functions and services. Buying different tools for different systems can be a required stop gap, but it’s not a goal in itself. Therefore, a single product, single pane of glass approach wins over a multi-product, best of breed approach. The remaining question is therefore one of approach: do you use an endpoint-centric vendor that was born from visibility and control, or a data-centric vendor that was born from searching and correlation? The answer lies in your organization’s willingness to supplement tools with labor. A data lake can have great visibility, but it has no native control, meaning another gap to cross before even hitting the API gaps in the new computing model.

The goal of the new model is to minimize and ultimately remove management entirely. As long as it is unsuccessful in this goal, there will be rough edges between the new model and the old. Those edges bias towards the old model consuming the new.

Sunday, August 18, 2019

Everything I know about the IT business is from Godzilla Against Mechagodzilla


Which movie is this? There’s a few, so it’s important to disambiguate! https://en.wikipedia.org/wiki/Godzilla_Against_Mechagodzilla



Not to be confused with https://en.wikipedia.org/wiki/Godzilla_vs._Mechagodzilla (which I can only recommend to the fevered or otherwise hallucinating) or even https://en.wikipedia.org/wiki/Godzilla_vs._Mechagodzilla_II (pretty fun, but not a sufficient scaffold for understanding one’s career). To be fair, https://en.wikipedia.org/wiki/Pacific_Rim_(film) would probably also work for this exercise. Let’s get started!

When problems become apparent enough to need resolution, they’re often all-consuming.



Your old plans for resolution are for your old problems, and might not work any more.



Someone or something is going to take a fall. New attempts to fix this culture are laudable and I hope they succeed.



A new solution is much more likely than fixes to the old solution.



People struggle to put rationality ahead of emotions. This fellow blames people instead of technology for past failures.



New solution projects always have surprises and grow to cost more than anyone expects.



They also still go into the field untested against outside context problems...



... and bad things can happen when testing in production.



Metrics data is fine, but user interfaces should always put the bottom line up front.



Give an analyst an alert, and they’ll want all the data.



Someone always ends up taking one for the team.



Stated differently, if everyone only performs to the stated requirements then the project won’t be successful.



Inelegant solutions need a lot of power.



Partial success is better than complete failure.



Analysts will grow fond of their tools, even if the ultimate outcome was only partially successful.



Tweetise here: https://twitter.com/puercomal/status/1163117265543294976?s=21

Friday, August 2, 2019

Enterprise Business Metrics


So you’ve launched a product... Is the product selling? How’s ASP (Average Sales Price) after discounting? Deal size? Cost of sales? Are there measurable predictors for losses? Are there ways to accelerate or increase the wins? If your license is per term instead of perpetual, is your product getting renewed?

And if your product is one of many...  are your wins correlated with the wins for other products? What would happen if they were bundled? Does your product cannibalize something else the company sells? How would you know? Do ELAs hurt or help your product’s adoption?

Companies ask these questions because they need to manage the business. Tautology, right? So let’s be blunt: you can’t get investment or spend investment without some way to predict how you’re doing, and you can’t decide if you’re going to continue an investment if you can’t see how that investment is performing. As a product leader, your paycheck is coming more or less directly from that investment and these questions have a ring of immediacy to them.

And if your product is a consumer-facing direct sales widget, you may be facing some mysteries (aren’t we all), but the numbers are probably relatively clear. Unless it’s sold through retail partners. Enterprise software sales though... when your product starts at “new car” and can cost up to “Central Park penthouse”, measuring performance starts getting strangely difficult.

Enterprise software is sold to customers who don’t always want to be clear about how much they are willing to pay or when they are willing to pull the trigger. At the very least, this means that the deal data is unclear and may change for reasons that don’t involve your product.

Enterprise software is sold by sales people, and sales people are maximizing their compensation plan and pipeline. At the very least, this means that entering the data you want into Salesforce is pretty low on their priority list. At the extreme, it can mean a variety of bad behaviors, particularly if the sales person is not actually competent. They might have good reasons for sandbagging deals or inflating pipeline, or they might have heard it was a good idea somewhere and misunderstood the reasons... but all sorts of craziness can happen in an enterprise sales team.

Setting aside the truly bizarre behavior of a failing team, sales leadership might try a bunch of mechanisms to deal with the normal lack of clarity. Favorites include:
* Dedicated people who force the deal to make sense. They might be called something like “sales operations” or “deal desk” or “contract specialists”, or the function might be overloaded onto an inside sales team. The resulting organization is simply fatter than before, because there’s still customers and salespeople with their own motivations and context in between the data and the organization.
* Punitive policies: the deal won’t be booked or the sales person won’t be paid if all the reporting isn’t done in a correct and timely fashion. This is an amusing game of chicken because the company willing to a) not sell product or b) risk a lawsuit over a principle of report quality has got their priorities seriously backward. Actually going through with such a threat is a great way to lose customers and sales people, which really reduces your sales numbers.
* Rewarding policies: the sales person will get a toy or points toward the yearly club or public recognition for doing their reporting in a correct and timely fashion. Again, simply amusing, because this data is not worth an incentive large enough to motivate a sales person worth hiring. A good sales person in enterprise software makes very large amounts of money. You can play on their sense of camaraderie and you can ask them to be diligent, but those factors are present without the incentive. In order to incent, the prize has to mean something in relation to their compensation, and that number is not small. Furthermore, the sales person’s job is to make the customer’s organization complete the purchase, and the incentive has no impact at all on the customer. Is it supposed to make the sales person work harder? Then why isn’t it simply paid to them as part of their total compensation? Are they supposed to give the incentive to the customer’s purchasing department? Sorry, that’s illegal bribery.

Given the ineffectiveness of these interventions, why do companies pursue them? Any generalization will miss a lot of examples, but I am fond of two explanations: the manager who is more comfortable with spreadsheets than conversations, and the manager who isn’t sure what to do, so they do something that they understand.

So we return to thinking about what the deal data is worth... there’s a quote popularly attributed to Charles Babbage, “Errors using inadequate data are much less than those using no data at all.” If the data in Salesforce can be considered directionally correct but untrustworthy in detail, it is still useful for the purposes listed above. You can manage with it. You’re working with a string and a rock instead of a titanium yardstick and a laser level, but a lot of buildings have been erected like this. Improving the quality of your sales measurement tools is worth very little when compared with double-checking their results: talking about specific deals with sales people and customers can be remarkably illuminating.

Working with a Coach

Executive coaching and mentorship is an interesting part of modern business, and sometimes people are not prepared for taking advantage of it. Here are some notes on the purpose and value.

So you’re working with a coach to get better at execution... what are you going to say?

As a potential mentee, start with outside boundaries, the Overton window. A corporate coach is not a therapist; the need is to find performance problems and optimize teamwork. More Dave Righetti, less Sigmund Freud. You may uncover reasons to work with a therapist, because leadership work involves you as a person far more than individual contributor work does. Your emotional resources are going to be used when you lead, inspire, console, and support others. The coach is not the person to help with personal maintenance, but they might be able to show you what you need. Fixing those problems is work to do elsewhere.

Within what you’re willing to discuss, locate the secrets. Why don’t you want that opinion known? Is the reason rational and explainable to an outside coach? Is that reason a you problem or an organization problem?
Ideally you’re not trying to discuss anything that your team couldn’t already know, and the question is how to be more effective in communicating your thinking, persuading others to agree, or accepting that your opinion didn’t prevail and the team is going elsewhere. If the situation doesn’t look like that, there’s something to fix. Is it you, or is it the organization? Is it reconcilable, or will it end in parting ways? Might as well focus on figuring that out before you bother with unpacking whatever secret has led to this realization.

Now, back to the coach. This person fits into one of three boxes: A mentor who is helping you for charity and their own growth, a coach that your organization paid for, or a coach that you paid for.

Mentors may or may not be encouraged by the organization, and there may or may not be a formal mentorship program; in my opinion those parameters only affect the degree to which the mentor is actually willing and able to help. A mentor who is able to see problems and help you fix them is helpful, regardless of what structure the organization provides or does not provide. Notably, a mentor’s time is limited and you need to aim for efficiency. If you don’t have a specific and achievable improvement goal in mind, you’re not getting mentorship, you’re having coffee with a coworker.

The existence of an organization-paid coach may open some questions: why is this person assigned to me? Are they being hired for others as well, or am I singled out? You may feel like you’re being prepared, for greatness or for unemployment. It’s best to assume this is a positive investment on the organization’s part in order to make you (and potentially others) a better leader. There is little to be gained from indulging in paranoia. However, it is fair to note that a person hired by the organization has loyalty to the organization as well as to you. You should think, and communicate with the coach, about what information you are willing to share. Again, time is limited, but you probably don’t know what the budget is; it’s best to get what you need fast.

A coach that you pay opens a different kind of risk: they are potentially seeing the organization through your eyes alone. Even if they’re able to attend your meetings as your guest, they are always getting a filtered experience. Ideally the fact that you’re paying will help you focus, but that’s not always the case; you should plan for a fixed number of sessions up front and clearly state your immediate and achievable goal.

A good coach will see small and large blockers, and they’ll sound trivially obvious. You may even already know you have these issues. “Discussing the problem and constraints first is good for engineers, but bad for executives; start with your proposal instead of how you got there.” “Don’t cross your arms and look away when you’re asked a question, you look defensive.” It’s not helpful to discuss the existence of the blocker; move on to their recommendation, and sincerely try that recommendation at least three times so you’ll know if it was the right thing.

Bad outcomes can occur in mentorship and coaching, but by far the more common result is an ineffectual series of meetings. If the mentor is unable to provide helpful advice, or if the mentee is unable to act on the provided advice, or if the actual problem is outside of the mentee’s control, then coaching is really only relevant to a future situation where the mentee is hopefully able to use what they learn.

Sunday, July 28, 2019

Team Scars


You know those signs in every shared kitchen? “Please rinse your dishes and put them in the dishwasher”, “please don’t leave food in the fridge over the weekend”, &c? They happen because people were piling dishes in the sink and letting food rot. Something had to be done, and the sign was the result. You may meet people that are imagining signs like that around unexpected aspects of their work. It can be surprising because these signs are invisible.

“Ticket validation should be a separate stage in the workflow”. Why? Why not? This isn’t an objective proposition, it’s reaction to shipped bugs that more careful validation might have caught. So a manager adds a step in JIRA... which is completely pointless unless the team agrees to actually do more testing. Just as likely, the developers now resolve and validate where they used to just resolve, without changing anything on shipped quality. Dang, the kitchen is still dirty! So maybe a developer speaks up.

“Please write unit tests for all of your code,” says Alice. She’s sick of being on call for dumb logic errors. Like washing the dishes, this is a shared code hygiene issue, and hard to argue with. A unit test is an alarm that the code might no longer be working as intended. It costs little, and it’s helpful. A policy that you should have one at check in? That’s a helpful reminder. But a policy without enforcement is really a guideline, so maybe Bob wants to talk carrots and sticks: But now there’s resentment, because Charlie doesn’t agree with the goal. Worse, they may be right.

Each team member brings their past experiences to this conversation, and can easily spend a lot of time talking past each other. I’ve had one very bad experience and several mediocre ones with test-driven development teams, so I’m biased against the concept even though I appreciate the theory. I have to focus on being constructive and evaluating each new proposal for TDD on its own merits.

That’s how to do the job... what about who’s doing what job? Defining the actual roles that team members are going to play is an even bigger minefield. “Product managers should be technical enough to engage in code reviews” ...or is it “Product managers should be so non-technical that they can’t follow the standup”? “Teams should have dedicated quality assurance engineers” versus “Developers should take full responsibility for code quality.” There are software people out there who’ll passionately argue for any of those statements. That passion might come from recently reading an Agile textbook, but it’s more likely to be from remembered experience. “Darlene was really helpful, she’d even fix bugs between customer calls!” “Eddie was kind of a seagull, but he was too scared to bother us much so that was okay.” “Frankie gave clear direction and then got out of the way.”

As long as someone is doing the various tasks that need doing, the team can still be successful no matter what. But the larger organization wants to see repeatable results and clear accountability, so it’s not great to have paper PMs writing code while paper engineers define requirements. Furthermore, in software culture the work occupies most of our time and headspace. Our job function and personal identity can become tightly interwoven, which makes questioning of that function feel like an attack. “What would you say you do here?” Sometimes a well-intentioned role re-definition leads to angry resignations, as when a field engineering team is asked to do more consultative selling and less field repair work.

What to do and who will do it. These are all people problems: mental models that need communication, a shared consensus that needs building, a plan that needs to be followed. The only way to get there is to talk it out. Woe betide the team imagining one-size-fits-all technical solutions.

Sunday, June 16, 2019

Changing the Company


I’ve written a bit about mishandled change attempts — everyone loves a little schadenfreude, and failure is easier to spot than success.

This does not mean I think it’s wrong to change: you can’t keep selling buggy whips or spellcheckers when the market for them disappears. Let’s set aside the strategic question of recognizing that the need for change is there: it’s somewhat data driven and somewhat emotional and a huge topic on its own. But tactically speaking, once the decision is made, how to proceed?

First, an inflection point must be identified. Can the current business remain profitable long enough to keep the company alive while the new business starts up? If yes, then you’re planning for a relatively smooth transition. The new business takes off, the old business lands, management feels profound relief and it’s high-fives all around.

Product: If your company can gradually transition into a new form, then launching new products is a great way to get there. The most obvious example of transformation via product is Apple. Famously vertical, Apple has gradually shifted from computers to music players to compute appliances, but always from the same brand and always offering the same core value proposition. Amazon Web Services appears to be on a similar trajectory. If their original business of rented server instances sunsets in favor of pure server-less, it would be a smooth transition and represent very little change in the vendor-customer relationship.

Business unit: If a more intense change is needed, you may need to start the new business as a separate vertical unit. Microsoft has been fairly successful at this with Azure, while many others have done worse (such as Intel Online Services). A business unit is ideally a separate business. Some fail to actually separate, and remain dependent on the parent until they fold back in. Some fail because they blatantly compete with the parent instead of outside challengers. A successful business unit helps its parent to produce a new business. Azure is a good example of this — it enables Microsoft to continue selling the same value chain into a changing enterprise marketplace. The same approach can be seen in enterprise software companies that sell their products as a managed service, such as Atlassian.

If the window for change is too short though, and the company can’t survive on existing products, then you’re planning for a rough ride. “Any landing you walk away from is a good one” in this case. Company wide transformations mean stopping the old business and starting a new one. There are not a lot of examples of success, but one really stands out: Netflix pulled this off, after a nasty misstep. Good luck if this is your situation, I’d love to hear of another success!

Wednesday, June 12, 2019

Multi-tenancy in platforms

If you’ve built a monolithic enterprise product, it is not sensible to convert it to multi-tenancy. You can sell a managed service provider (MSP), but you’re not going to get to software as a service (SaaS).

Often no one wants to discuss reasoning at all, because the need to convert your business to a different model is taken as an imperative which overrides any reasoning. Present problems are ignored, because the future state is too valuable to ignore. Unfortunately, “skating to the puck” in this case is leaving the rink, and the result looks like disrupting yourself.

But what about customers who demand multi-tenancy? There are very few customers who actually need multi-tenancy features. Let’s take a moment to clarify what these features are, because a lot of folks confuse multi-tenancy with role-based access control (RBAC).

Multi-tenancy lets you operate a single instance of a product for multiple groups of people, keeping content, capabilities, and configurations hidden from users who aren’t members of the correct group. Multi-tenancy features allow for a super administrator who can configure which tenants are part of which environments. They allow for tenant administrators who can configure which users and groups are part of a single tenant. They allow for tenant users who do the job that the software is for.

Most customers do not need multi-tenancy features for themselves, they need to be tenants and they hope that the vendor is using modern cloud techniques to deliver features cheaply. Maybe they want administrivia to be separated or hidden away so that the result is delivered as a service. This doesn’t mean the customer wants multi-tenancy. It means the customer wants your software as a service.

The exception is the customer who plans to provide your software to their own customers as a service. This customer does want multi-tenancy features: they want to manage the access rights of business units A, B, and C. This customer is not going to be happy with a stack of singleton instances of the software. This customer needs a professional services development partner, or perhaps a different software vendor. They are asking the vendor to sell a product that allows configuration and maintenance of a multi-tenant environment.

The whole conversation rests on an assumption that the resulting software environment will be simpler and cheaper to set up and use than a stack of singleton instances. I see no evidence for that assumption when the software in question was not designed from the ground up as a multi-tenant application. It’s far better for the vendor to offer their software in singleton mode as a managed service. This effort will inevitably produce the tools necessary to support and automate software installs, which the company can decide to sell to selected customers if they like; but those tools do not need to be exposed to all customers.

Sunday, June 2, 2019

Why is open source content rare?

Open source community incentives are biased to prefer developers over content creators.

Open source communities are particularly prone to this failure mode. After all, the developers in the community are all doing their work for valid reasons, so why wouldn’t content creators join them? Hot take: the incentives are different.

Open source development is a resume-building value add for the developer. They’re publishing concrete proof of their ability to write working code. In some cases that code even solves interesting problems. In the best cases the developer is proving that they can work in a distributed team.

This effect continues for a dedicated developer writing content, but that developer isn’t always in a good position to write content without the help of customer-facing consultants, engineers, and analysts.

The social reward of providing quality content is not the same for a developer as that for providing quality code. You might think this is driven by a technical difference. Isn’t writing a configuration file or a test file easier than solving an engineering problem in a compiled language?

Well, maybe. For instance, writing content that reliably and optimally finds all of the vulnerable Java engines across an entire organization is far harder than any whiteboard coding test. (Hint one: a JRE doesn’t have to be registered with the operating system in order to operate. Hint two: crawling the file system is very costly. Hint three: you can’t rely on OS indexing features being enabled.)

 Worse, the risk level is higher for the developer writing content: the content is an incomplete starting point, the user has to learn more to be successful, and the failure potential is increased. So the developer’s risk-reward ratio is skewed away from writing content and towards writing engines.

What about professional service consultants? Don’t they spend every billable hour writing content? They sure do, and billable is the key word there. They’ll only release their work to open source when it’s no longer a competitive edge: too commonplace or esoteric to be regularly valuable. Again, misaligned incentives blocking open source content.

Twitters

Sunday, May 26, 2019

Supporting Ancient Software

With another round of fixes to Windows XP, the time is ripe for bloviating about supporting ancient stuff. Every software vendor has to decide what to do about supporting what they used to ship, as well as the broader ecosystem around them. Operating systems, databases, service providers. Maximize use of your new features, minimize maintenance of your old ones. Maximize the number of potential customers, minimize the amount of development time required.  Keeping support for old systems looks like it’s on the maximizing side at first, but it’s an exponentially scaled problem when combined with your own features. It’s worth considering how the vendors of those ecosystem parts do things.

Why do customers stay on an antiquated platform? Perhaps they can’t afford the upgrade job, or perhaps they’re focused elsewhere and willing to accept the risk. For a software vendor, the former is a questionable customer; landing and keeping them may be profitable, but it won’t be great margin. Ah, but the latter... a vendor can charge the latter appropriately for the work to be done through a special one-off development effort. Welcome to the world of extended support contracts.

“Oh come now”, one might say, “that is not charitable at all!” And it’s true, there are nuances: many customers depend on equipment that cannot be upgraded. It was sold as a unified system, its vendor will not provide an upgrade at all or at an affordable cost, and its vendor will not support updates to the system. This sucks. What is the manufacturer or hospital or university to do, fund a new robot or MRI or TEM vendor? And yet from the vendor’s perspective, the predicament of customers who can’t upgrade is not distinguishable from the customers who won’t. They’re still stuck on the dead branch, forced to pay what the market will bear or take the risk of going unpatched. Once again, we are in the world of extended support contracts.

So there’s patches for the dead and unsupported OS from time to time. Who makes them?

I suppose it’s possible that there’s an XP engineering team at Microsoft sitting around on mothballs waiting for the opportunity to fix this stuff, but I’m guessing that is not the case. I think it’s highly unlikely that these patches ever come entirely from a vendor’s internal development teams, because it would be wasteful to maintain the systems and processes to produce two different levels of supportability for a single product, much less maintain a dead product. It would be doubly expensive to pull developers off of the current Windows line into a one-off effort to fix the dead product. More likely, when it breaks badly enough to need fixing, a new development team parachutes in, figures it out, and posts a patch. I’d bet that development team is outsourced, too, at least to another team within the vendor.

That would mean every patch is a special snowflake, provided by giving source access to a services team that charges to sustain it. The vendor collects extra support contracts from X customers to pay for the super smokejumper team, recognizes that revenue every month, and about once a quarter has a patch built. Not hard to make that into a profitable, high-margin business. In fact, if a vendor kept this in-sourced and gambled on one or two developers to maintain their knowledge, they could even defer the cost of the super smokejumper team for quite some time.

A third party software vendor has the opportunity to make the same decision, of course; should they spend their developer time on extending support to old software, or new? The answer is driven by their customers, in theory — but the vendor must evaluate the value of each decision. For a vendor with a small customer base, each customer demanding an oddity can be a significant percentage of revenue potential. For a vendor with a large customer base, each oddity request can have a significant number of requesters.  What’s not clear is the associated margin opportunity versus opportunity cost. Worse, there won’t be associated requests for the obvious choices, because they’re obvious, and a PM would be mistaken to ignore them until customers have to request.

If the vendor embraces the requested oddity, putting aside the non-requested mainstream, the customer should theoretically pay extra for their decision to stay on the old platform; otherwise the vendor is eating their own opportunity cost. The dollars spent on patching old stuff or extending features to old stuff are taken directly from the budget to do new work with. And since most vendors don’t have internal permission to use external super smokejumpers, they’re pulling developers off of (say) Mongo support to build (say) DB/2 support. This adds context-switch costs to the overall pain load.

Adding salt to this wound, many vendors end up giving the customer a hefty discount while bending over backwards to provide one-off snowflake features, robbing their future Peter to pay the present Paul. It’s an easy decision to make when the company’s leaders allow profit-making to be deferred into the invisible future.

Sunday, May 5, 2019

Land and Expand Packaging Decisions


Subsets of packaged content are needed in different system classes. If you're pursuing a land-and-expand model, then you need to have a way to expand. One way is to ship a static monolith with features turned off. Another is to ship dynamic add-ons to your base product. 

Teams make these dynamic vs static decisions early, see https://www.monkeynoodle.org/2018/06/its-not-platform-without-partners.html. If the business went dynamic, then content names and versions that are already deployed must be visible. If on-prem with high availability & fault tolerance goals, this can be remarkably challenging.

During installation, upgrade, or removal operations, the admin must fully understand the infrastructure and know more about the internal workings of the packaged content than anyone desires. Proceeding without understanding produces unpredictable installs and high support burden.

Any enterprise vendor with this problem decides: hide the complexity and offer one big package (fully static linking, or shipping the whole product as a service), or expose the complexity and offer separate packages for every role? Plus regional availability problems in cloud.

Option 0 (do nothing): You might say, "this is a relatively infrequent problem; when a customer goes to distributed component infrastructure, we train heavily and plan for dedicated support allotments."

Option 1 (incremental): design the infrastructure so each component can announce what roles it uses, design the package so each file in it is associated with a role, design the package installer to install files that match roles. User repeats desired action on every component.

Option 2 (radical): As above, but a separate deployer policy enforcement service ensures packages are installed, updated, and removed from all infrastructure. User commits desired action once on the policy tool. This is easiest for Cloud-only organizations.


For a sick sort of fun, look at how many times operating systems and programming languages have recreated this wheel since 2000.

Sunday, March 24, 2019

What does your product disallow?

Product design sometimes opens an interesting can of worms: things that may be possible to do, but which the designer didn’t intend. Do you hide these paths or not? Will your user ultimately be frustrated, or satisfied?

The answer depends on whether your product’s design accurately sets and meets expectations for the majority of its user base. Let’s try a quadrant.

Vertical axis: complexity level of the typical user’s actual need
Horizontal axis: designer’s assumption of typical user’s need

Lower left: If the task that the user’s trying is simple and the designer has assumed this to be true, a prescriptive interface that hides everything but the happy path makes sense. Don’t offer options you haven’t planned for, and narrowly design for specific use cases. For example, the basic note taking app Bear has few options, easy discoverability, and a low bar to entry. I’m writing this article in it on an iPhone with a folding keyboard, and it’s a good tool for this task.

Upper right: If the user’s actual need is highly complex and the designer realizes this, then a wide open toolbox interface makes the most sense. Guard rails on the user’s experience are as likely to produce complaints as relief. The vim text editor is hard to use correctly without training, and its design makes no pretense towards friendly or easy. If I want to anonymize and tokenize a gigabyte of log files, vim is a good tool.

Upper left: If the user is doing something complex and the product does not support this complexity, they are unlikely to be happy using it to complete the task. I would find it very difficult to write a script or process logs with Bear on an iPhone. It does not support me or aid me in that task because its assumptions do not correctly align with what I will need to do that task. I’ll be frustrated in doing this task, but I’m still allowed to try.

Lower Right: If the user’s needs are low complexity and the designer’s assuming a high complexity need, the product is going to be very frustrating. For instance, using the vim text editor to take simple notes in a meeting is possible, but a user who is not familiar with the editor will struggle with its modes and may not even know how to save their file and exit.

Alignment is alignment, straightforward enough. The choices made in products for moments of non-alignment are more interesting. If the product overshoots the user’s need it frustrates with a lack of clarity. The user struggles to see if the product is able to do the task, exploring the interface and searching for an answer: should they invest more time into learning the product, or switch products? If the product undershoots the user’s need, it’s clearer, and the user moves on quickly.

So far so good with text editing. What about an enterprise-scale policy enforcement tool? For much of my career I’ve worked with tools that empower the enterprise to see what’s true and make it better. Some products have focused harder on different aspects of this mission, but everything I’ve ever sold has been able to cause massive damage if misused. What’s more, it’s not theoretical: some customer has done that damage, and all enterprise software vendors have off-the-record stories. That includes the everything as a service folks, of course, and commonly enough that stories about those accidents are public knowledge.

And yet, all of these products or services overshoot the complexity target and err on the side of flexibility. They may offer use-case specific wizards for specific tasks as extra cost add ons, but you can always get to the platform’s full capabilities if you’ve got administrative rights.

Why is that? People have often heard me say the flippant phrase “we sell chainsaws, up to the user to be careful”, but why does that resonate? As a vendor it might look like abdication of responsibilities... but it is a free market, in which make-X-easier startups fail every day. I think the reason is that protecting people from themselves is not a good look. It’s far more effective to produce the powerful product for complex stories, allow full access to that power, and add easier tools as extra cost options.

Saturday, January 26, 2019

Entities and Attributes


Quadrant models are useful organizing tools. Let’s use one to look at the problem of managing the attributes of entities in systems visibility. I’m not expecting to solve the problem, just usefully describe the playing field.

Horizontal axis:
* persistent entities with changing attributes
* ephemeral entities with static attributes

Vertical axis:
* Set the relationship at index time
* Set the relationship at search time

Let’s start with the old school entities model. Once upon a time managed computers were modular, high value devices. A server or desktop would be repaired or upgraded by replacing components. If its use case went away, the device would be repurposed to another use case. It did not vanish until accounting was certain it had fully depreciated and could be sold, donated, or scrapped. This state of affairs persists at the high end, so it’s still worth considering. Someone’s racking and stacking some pricey hardware to make things go.

Top Left: Persistent entities, Index time relationships


The computer (let’s call it THESEUS) has a timeline of footprints. Business Analytics teams can see it in their Enterprise Resource Planning (ERP) systems, contract tracking, and accounting systems. Facilities knows it by its power draw and heat load. Security Operations has interest in it, and their agents come and go with the vicissitudes of fortune. Lights On Doors Open (LODO) Operations cares the most about it, and tracks it closely as it serves each purpose of its lifecycle.

Each group’s view into the computer’s function is limited by their immediate needs. Most of the time, various teams are happy with their limited view into this entity. They are able to set any needed attribute to entity mappings at index time, when data about the device is collected. Changes don’t much matter, and can be manually updated or ignored.

Bottom Left: Persistent entities, Search time relationships


This works until change spills over between groups: for instance, if a missed recall for a faulty part leads to a hardware failure that starts a fire, Facilities and LODO will be equally interested in how they could have better coordinated with Business Analytics functions. “Where was this ball dropped?” The answer is often “changes in reality were lost because we don’t keep proper track of entities.” Of course no one states it like that. Rather, they say “we received the recall and sent it to the point of contact on record.” This scenario plays out in security as well, when incident responders can’t find out if an attacked device is safe to restart, or when a monitoring tool alert is how they learn that DevOps is rolling out a new service. These misses in visibility drive folks towards mapping attributes to entities at search time. Of course, no one says it like that; they say “we’re bringing updated data into our visibility tools.”

There’s a dirty secret in those tools though. Actively mapping attributes to entities entirely at search time is a hard problem to scale, and it gets even harder to do if you want to maintain that awareness into past records as well as the present. Few systems can handle “this was SVR42 before Tuesday, now it’s SVR69”. Add in that behavior has changed and the old model is still good for old records but a new model needs to be started, and most tools give up and start a new entity record. Sorry administrators and analysts, here are the tools for pruning stale entities from the system, good luck!

Lower Right: Ephemeral entities, Search time Relationships


And so a sea change: what if that whole set of reality based problems is outsourced, and the organization uses ephemeral virtual devices based on static configurations to perform its tasks? Amazon and Microsoft still have to worry about physical hardware, but the rest of us can just inject prebuilt software bundles into a management tool and let the load balancers figure it out. As long as logging is done properly and auditing can still be supported, this can be a great answer. The unpredictable nature of this world has precipitated a tsunami of heisenbugs, but unifying development and operations reduces the lag time for diagnosing those. Furthermore, the attribute to entity relationship really doesn’t matter; who cares what address or hostname a function was served from? All that matters is service level objectives and agreements: success, failure, completion time, and resource consumption. It’s a pretty ideal solution for anything where the entity is disposable: simply stop tracking it at all, and use temporary search time relationships based on the functions that were served to maintain visibility.

Upper Right: Ephemeral entities, Index time relationships


Although... that requires the analyst to know what they need to track up front. If the image doesn’t issue enough data attributes to answer a question, you’re out of luck. That’s annoying for internal visibility, but the internal folks aren’t the only ones asking questions. A hypothetical: there was an instance on Tuesday, let’s call it EPHEMERIDES. Interpol would like to know what it was doing at 10 AM UTC because it was apparently exploited and used for evil deeds. Or maybe not? Who knows? In the long-lived server world, we would have been dumping all output into a central system and could sort through it on demand, but now we just know that it was doing its intended job within acceptable parameters. That’s all we’d decided to monitor. If we’re not proactively tracking the organization’s activities from its infrastructure, we’ll have to track something else to achieve visibility. Why bother? Well, let’s talk about how you’re going to prove compliance with data privacy regulations or due diligence security assurances when you can’t say what happened where last Tuesday. “Trust us, we’re pretty sure it didn’t do anything bad in the few hours it was alive” may not wash in court. An easy solution is to dump whatever you can from these devices into the cheapest storage possible, with some index-time identifiers to make it hopefully retrievable later. And if that’s not possible? Oh well, at least we tried and the fines will probably be reduced.

* Top left is the legacy, revolution against the mainframe. It’s servers as pets, deploy then configure thinking.
* Bottom left, legacy incremented, insufficiently.
* Bottom right is the future, servers as cattle, configure then deploy thinking. Revolution against the Wintel and Lintel server world, of course.
* Top right, the future incremented, insufficiently.

Hopefully this has been fun and useful!

Sunday, January 13, 2019

How to manage a Proof of Concept



POCs as a concept are a response to customers getting oversold. As a vendor we’d rather skip the whole thing and trust our sales team to scope properly. As a customer we’d rather not spend time testing instead of doing. Sometimes they have to be done though, and it’s best for everyone to do it right. Right = tightly scoped and timeframed.

An ideal POC should look like a well-planned professional services engagement. The goal is established in writing before anyone gets on a plane. Infrastructure testing and go/no-go call Friday. Fly Monday. Kick off meeting Tuesday morning, installation rest of day. Wednesday and Thursday, go through the list of use cases and check them all off. Friday morning meeting to get the verbal, fly home, spend the next week with procurement instead of kicking the tires.

You should spend more time helping a customer or vendor define use cases up front than you allow for the POC.  If they can’t define use cases, you might still have a deal, but you’ve established that the product is not worth any actual dollars. That’s bad for the vendor obviously, but it also means the customer can’t get any internal attention for this project. Real use cases mean business value mean time and money allocations. If there is demonstrable value, there is easy justification for a fair price.

Given my one week frame, you’ve got a maximum of 16 hours for use cases. This is a bit more time than a circa 2018 Nicolas Cage binge. If it can be remote, great! Travel time can turn into work time for a maximum of 32 hours. That’s a play through of Far Cry 5. Planning ahead of time lets both sides think about how long each step will take. Estimate how long each use case will take to demonstrate, then double or triple that time. If you don’t need those hours, you’ll have time to get creative after the real work is done. Both sides should bring a punch list of extra things they’d like to show off or see.

This ideal model can have a couple of interesting wrinkles based on product maturity though. A young company with a single product has a straightforward agenda, but a mature company with many products on a shared platform has to pick and choose. Marketing being what it is, the customer’s excitement is also centered on the newest, highest-risk stuff! The reality is that these are things that haven’t been done before, at least by the feet on the ground, so they take even more time.

The only way to be successful in that case is to compartmentalize the platform use cases from the new shiny use cases so that you’re using the new stuff on a solid foundation. Everyone will be thankful in the end.

One last note on why this matters to customers; I’m describing the approach of quality field personnel, which is specifically intended to cast a product in its best light. This is good for customers because it makes the sale easy to explain and process. However, this is how a pro sales team gets their crap product over the line to beat out weak sales teams with potentially better solutions. If you care about the quality of the solution you’re going to be living with, it’s in your interest to understand and manage the POC process.