Sunday, March 24, 2019

What does your product disallow?

Product design sometimes opens an interesting can of worms: things that may be possible to do, but which the designer didn’t intend. Do you hide these paths or not? Will your user ultimately be frustrated, or satisfied?

The answer depends on whether your product’s design accurately sets and meets expectations for the majority of its user base. Let’s try a quadrant.

Vertical axis: complexity level of the typical user’s actual need
Horizontal axis: designer’s assumption of typical user’s need

Lower left: If the task that the user’s trying is simple and the designer has assumed this to be true, a prescriptive interface that hides everything but the happy path makes sense. Don’t offer options you haven’t planned for, and narrowly design for specific use cases. For example, the basic note taking app Bear has few options, easy discoverability, and a low bar to entry. I’m writing this article in it on an iPhone with a folding keyboard, and it’s a good tool for this task.

Upper right: If the user’s actual need is highly complex and the designer realizes this, then a wide open toolbox interface makes the most sense. Guard rails on the user’s experience are as likely to produce complaints as relief. The vim text editor is hard to use correctly without training, and its design makes no pretense towards friendly or easy. If I want to anonymize and tokenize a gigabyte of log files, vim is a good tool.

Upper left: If the user is doing something complex and the product does not support this complexity, they are unlikely to be happy using it to complete the task. I would find it very difficult to write a script or process logs with Bear on an iPhone. It does not support me or aid me in that task because its assumptions do not correctly align with what I will need to do that task. I’ll be frustrated in doing this task, but I’m still allowed to try.

Lower Right: If the user’s needs are low complexity and the designer’s assuming a high complexity need, the product is going to be very frustrating. For instance, using the vim text editor to take simple notes in a meeting is possible, but a user who is not familiar with the editor will struggle with its modes and may not even know how to save their file and exit.

Alignment is alignment, straightforward enough. The choices made in products for moments of non-alignment are more interesting. If the product overshoots the user’s need it frustrates with a lack of clarity. The user struggles to see if the product is able to do the task, exploring the interface and searching for an answer: should they invest more time into learning the product, or switch products? If the product undershoots the user’s need, it’s clearer, and the user moves on quickly.

So far so good with text editing. What about an enterprise-scale policy enforcement tool? For much of my career I’ve worked with tools that empower the enterprise to see what’s true and make it better. Some products have focused harder on different aspects of this mission, but everything I’ve ever sold has been able to cause massive damage if misused. What’s more, it’s not theoretical: some customer has done that damage, and all enterprise software vendors have off-the-record stories. That includes the everything as a service folks, of course, and commonly enough that stories about those accidents are public knowledge.

And yet, all of these products or services overshoot the complexity target and err on the side of flexibility. They may offer use-case specific wizards for specific tasks as extra cost add ons, but you can always get to the platform’s full capabilities if you’ve got administrative rights.

Why is that? People have often heard me say the flippant phrase “we sell chainsaws, up to the user to be careful”, but why does that resonate? As a vendor it might look like abdication of responsibilities... but it is a free market, in which make-X-easier startups fail every day. I think the reason is that protecting people from themselves is not a good look. It’s far more effective to produce the powerful product for complex stories, allow full access to that power, and add easier tools as extra cost options.

Saturday, January 26, 2019

Entities and Attributes

Quadrant models are useful organizing tools. Let’s use one to look at the problem of managing the attributes of entities in systems visibility. I’m not expecting to solve the problem, just usefully describe the playing field.

Horizontal axis:
* long running entities with changing attributes
* ephemeral entities with static attributes

Vertical axis:
* Set the relationship at index time
* Set the relationship at search time

Let’s start with the old school entities model. Once upon a time managed computers were modular, high value devices. A server or desktop would be repaired or upgraded by replacing components. If its use case went away, the device would be repurposed to another use case. It did not vanish until accounting was certain it had fully depreciated and could be sold, donated, or scrapped. This state of affairs persists at the high end, so it’s still worth considering. Someone’s racking and stacking some pricey hardware to make things go.

Top Left: long running entities, Index time relationships

The computer (let’s call it THESEUS) has a timeline of footprints. Business Analytics teams can see it in their Enterprise Resource Planning (ERP) systems, contract tracking, and accounting systems. Facilities knows it by its power draw and heat load. Security Operations has interest in it, and their agents come and go with the vicissitudes of fortune. Lights On Doors Open (LODO) Operations cares the most about it, and tracks it closely as it serves each purpose of its lifecycle.

Each group’s view into the computer’s function is limited by their immediate needs. Most of the time, various teams are happy with their limited view into this entity. They are able to set any needed attribute to entity mappings at index time, when data about the device is collected. Changes don’t much matter, and can be manually updated or ignored.

Top Right: Long running entities, Search time relationships

This works until change spills over between groups: for instance, if a missed recall for a faulty part leads to a hardware failure that starts a fire, Facilities and LODO will be equally interested in how they could have better coordinated with Business Analytics functions. “Where was this ball dropped?” The answer is often “changes in reality were lost because we don’t keep proper track of entities.” Of course no one states it like that. Rather, they say “we received the recall and sent it to the point of contact on record.” This scenario plays out in security as well, when incident responders can’t find out if an attacked device is safe to restart, or when a monitoring tool alert is how they learn that DevOps is rolling out a new service. These misses in visibility drive folks towards mapping attributes to entities at search time. Of course, no one says it like that; they say “we’re bringing updated data into our visibility tools.”

There’s a dirty secret in those tools though. Actively mapping attributes to entities entirely at search time is a hard problem to scale, and it gets even harder to do if you want to maintain that awareness into past records as well as the present. Few systems can handle “this was SVR42 before Tuesday, now it’s SVR69”. Add in that behavior has changed and the old model is still good for old records but a new model needs to be started, and most tools give up and start a new entity record. Sorry administrators and analysts, here are the tools for pruning stale entities from the system, good luck!

Lower Right: Ephemeral entities, Search time Relationships

And so a sea change: what if that whole set of reality based problems is outsourced, and the organization uses ephemeral virtual devices based on static configurations to perform its tasks? Amazon and Microsoft still have to worry about physical hardware, but the rest of us can just inject prebuilt software bundles into a management tool and let the load balancers figure it out. As long as logging is done properly and auditing can still be supported, this can be a great answer. The unpredictable nature of this world has precipitated a tsunami of heisenbugs, but unifying development and operations reduces the lag time for diagnosing those. Furthermore, the attribute to entity relationship really doesn’t matter; who cares what address or hostname a function was served from? All that matters is service level objectives and agreements: success, failure, completion time, and resource consumption. It’s a pretty ideal solution for anything where the entity is disposable: simply stop tracking it at all, and use temporary search time relationships based on the functions that were served to maintain visibility.

Lower Left: Ephemeral entities, Index time relationships

Although... that requires the analyst to know what they need to track up front. If the image doesn’t issue enough data attributes to answer a question, you’re out of luck. That’s annoying for internal visibility, but the internal folks aren’t the only ones asking questions. A hypothetical: there was an instance on Tuesday, let’s call it EPHEMERIDES. Interpol would like to know what it was doing at 10 AM UTC because it was apparently exploited and used for evil deeds. Or maybe not? Who knows? In the long-lived server world, we would have been dumping all output into a central system and could sort through it on demand, but now we just know that it was doing its intended job within acceptable parameters. That’s all we’d decided to monitor. If we’re not proactively tracking the organization’s activities from its infrastructure, we’ll have to track something else to achieve visibility. Why bother? Well, let’s talk about how you’re going to prove compliance with data privacy regulations or due diligence security assurances when you can’t say what happened where last Tuesday. “Trust us, we’re pretty sure it didn’t do anything bad in the few hours it was alive” may not wash in court. An easy solution is to dump whatever you can from these devices into the cheapest storage possible, with some index-time identifiers to make it hopefully retrievable later. And if that’s not possible? Oh well, at least we tried and the fines will probably be reduced.

* Top left is the legacy, revolution against the mainframe. It’s servers as pets, deploy then configure thinking.
* Top right, legacy incremented, insufficiently.
* Bottom right is the future, servers as cattle, configure then deploy thinking. Revolution against the Wintel and Lintel server world, of course.
* Bottom left, the future incremented, insufficiently.

Hopefully this has been fun and useful!

Sunday, January 13, 2019

How to manage a Proof of Concept

POCs as a concept are a response to customers getting oversold. As a vendor we’d rather skip the whole thing and trust our sales team to scope properly. As a customer we’d rather not spend time testing instead of doing. Sometimes they have to be done though, and it’s best for everyone to do it right. Right = tightly scoped and timeframed.

An ideal POC should look like a well-planned professional services engagement. The goal is established in writing before anyone gets on a plane. Infrastructure testing and go/no-go call Friday. Fly Monday. Kick off meeting Tuesday morning, installation rest of day. Wednesday and Thursday, go through the list of use cases and check them all off. Friday morning meeting to get the verbal, fly home, spend the next week with procurement instead of kicking the tires.

You should spend more time helping a customer or vendor define use cases up front than you allow for the POC.  If they can’t define use cases, you might still have a deal, but you’ve established that the product is not worth any actual dollars. That’s bad for the vendor obviously, but it also means the customer can’t get any internal attention for this project. Real use cases mean business value mean time and money allocations. If there is demonstrable value, there is easy justification for a fair price.

Given my one week frame, you’ve got a maximum of 16 hours for use cases. This is a bit more time than a circa 2018 Nicolas Cage binge. If it can be remote, great! Travel time can turn into work time for a maximum of 32 hours. That’s a play through of Far Cry 5. Planning ahead of time lets both sides think about how long each step will take. Estimate how long each use case will take to demonstrate, then double or triple that time. If you don’t need those hours, you’ll have time to get creative after the real work is done. Both sides should bring a punch list of extra things they’d like to show off or see.

This ideal model can have a couple of interesting wrinkles based on product maturity though. A young company with a single product has a straightforward agenda, but a mature company with many products on a shared platform has to pick and choose. Marketing being what it is, the customer’s excitement is also centered on the newest, highest-risk stuff! The reality is that these are things that haven’t been done before, at least by the feet on the ground, so they take even more time.

The only way to be successful in that case is to compartmentalize the platform use cases from the new shiny use cases so that you’re using the new stuff on a solid foundation. Everyone will be thankful in the end.

One last note on why this matters to customers; I’m describing the approach of quality field personnel, which is specifically intended to cast a product in its best light. This is good for customers because it makes the sale easy to explain and process. However, this is how a pro sales team gets their crap product over the line to beat out weak sales teams with potentially better solutions. If you care about the quality of the solution you’re going to be living with, it’s in your interest to understand and manage the POC process.