Sunday, August 25, 2019

Managing the Unmanageable

I’ve been thinking off and on about containers (FKA partitions, zones, jails, virtualized apps) and mobile ecosystems for a few years. These technologies have gone through several iterations, and different implementations have different goals, but there is an overlap in the currently extant and growing versions. Hold containers, IOS/Android, and MDM-plus-AppStore enabled laptops together and look at the middle of the diagram: 1: management is done in the surrounding systems, not in the daily use artifact. 2: management needs are minimized by simplicity.

A container is built, run, and deleted. There is no “manage”. To change or fix it, you go upstream in the process. A phone app may be installed or uninstalled, but it will take care of updating itself from someone else’s activities upstream in the process, just like a container. Users and admins don’t patch them, instead vendors push updated versions into an infrastructure that automatically does the needful. Even the infrastructure around the app or container, firewall policies, routing policies, device controls, all the policies and configuration that make the system secure and effective are also managed centrally and pushed into place.

This vision of abstracted management has attractions from many perspectives, which are obvious enough that I won’t waste time repeating them. It is also frustrating to teams tasked with monitoring and managing to existing standards of compliance. The new model is for computing appliances and services, and does not fit well with the current model of managing general purpose operating systems. It’s arguable if the computing appliance model can apply to general purpose computers at all; it’s theoretically possible to lock one down sufficiently but the result isn’t better than a mobile device. This attempt failed in the BYOD (Bring Your Own Device) laptop cycle, but the idea of being able to add and remove “appliance mode” on a general purpose device hasn’t died and only time will tell. BYOD seems to be working just great for phones, after all.

The power of systems management tools comes from the philosophy of the general purpose operating system. Programs run with each other in a shared environment which fosters their working together to serve one or many users. Users, including administrators, can remotely do whatever they need via networking. In the primordial slime of the business opportunity called systems management, administrators would use remote shells to script their desires into being, pulling packages into place when needed. Much has changed, but the fundamentals of these tools remain the same: a remote program with privileges, command and control networking, and a file movement tool.

The new model does not allow these fundamentals. We aren’t running as root in the remote host anymore. While mobile and laptop systems retain broader abilities, in the strictest container models even communication and files are only allowed to come from one place. There are exceptions as a matter of theory, but organizations who embrace the philosophy are going to prefer blocking those exceptions. And they will be right, because running visibility and control agent programs in a container or a mobile app sucks. Not only does it increase the weight and computational complexity of the target, it does so for no good reason; the fabric and philosophy of the new model are designed to prevent anything useful being done from this vantage point. Your process is not suppose to worry about other processes. As a user, you’re supposed to worry about your service fulfilling its purpose, not management functions.

This philosophy is not a comfort to compliance auditors, some infosec teams, or traditional systems administrators (hi, BOFH and PFY). It sounds too much like developers sitting in an ivory tower and announcing that they have handled everything just fine, a priori. Even if they say “devops” and “SRE” a lot. But at the end of the day, organizations are regularly accepting a similar statement from their everything as a service vendors, and not many can fully resist the new model’s siren song. But, a new computing model is not able to ignore law, finance, and customary process. The result is a grudging middle ground of management APIs, allowing a minimum viable level of visibility and control into the new model.

These APIs do not restore management fundamentals; they only allow you to log, to measure states, and to initiate change within the new model’s parameters. Posit that breaking the new model rules is going to fail, immediately or eventually. A management vendor is therefore in a jail cell, and has to differentiate from inside when offering visibility and control for computing appliances. Windows CE was the last gasp of general purpose operating systems for appliance friendly use cases (Linux may appear to be an exception, but the deployed instances used in appliances are hardly sporting full Unix shells). From here out, endpoints are full general purpose machines, a mobile approach, or a handful of frozen kiosk and VDI images. Servers are a mass of general purpose machines, mostly on virtualization, sometimes delivered as a service, with an explosively growing segment of service oriented app virtualization.

A new type of management agent is born for these API-driven appliance models. Maybe it’s implemented in “sidecar” containers or as “MDM approved” apps, or maybe it lives fully in the cloud, maybe it’s the focus of a new vendor or the side project of an established one. There will certainly be pronouncements that it brings new value to the use case. Doesn’t matter how it’s implemented or marketed though, it’s accessing the same APIs as everyone else. Its best efforts are limited to “me-too”. Differentiation is either in costly and difficult up-stack integration, or a capital-burning race to open sourced commoditization.

A customer who wants single pane of glass visibility, is left with few options: build their own analytics, invest in data lake technologies, or buy extensions to their main management tools. Almost all select two of the three for resilience.

It may make an unpleasant experience for the management tool, where this ghost of management is fit into the same console and mental model as a full-powered vendor’s real capabilities. “Here is your domain, in which you can do what is needed to ensure your organization’s mission! Except on these special systems where you know a lot less and can’t do much of anything.” Customer expectations are sort of hit but kind of missed, and no one is very happy. Some vendors can sell “know less and do less” alongside “full visibility and control” for the same price. Others may adjust the license model instead.

So, is the single pane of glass worth a cognitively dissonant user experience? Or does the customer split their visibility and control tools and buy something else to glue things back together, moving that dissonance higher up the stack? Because there will surely be dissonance when clicking for action in tool A has to go through tool B’s brokerage into Tool C for execution.

There is a useful comparison to minority or legacy operating systems. Management and visibility tools universally reduce their capabilities on platforms that aren’t as important to their customers, so very few are excellent on Solaris, AIX, or HP/UX. The important difference is that a vendor’s reduced AIX capabilities are a matter of choice. If the market demanded, the vendor could eventually resolve the problem. A management vendor cannot change the operating model of an entire ecosystem, so computing appliances are not like legacy computing. But there is an analogy in that the tools do not align perfectly with customer needs, leaving gaps to fill with people and process.

If we imagine a perfectly amazing management tool for AIX that doesn’t integrate with the tools used for Linux and Windows, the choice becomes clearer. Customers don’t require visibility and control for operating systems or computing models, but rather for business functions and services. Buying different tools for different systems can be a required stop gap, but it’s not a goal in itself. Therefore, a single product, single pane of glass approach wins over a multi-product, best of breed approach. The remaining question is therefore one of approach: do you use an endpoint-centric vendor that was born from visibility and control, or a data-centric vendor that was born from searching and correlation? The answer lies in your organization’s willingness to supplement tools with labor. A data lake can have great visibility, but it has no native control, meaning another gap to cross before even hitting the API gaps in the new computing model.

The goal of the new model is to minimize and ultimately remove management entirely. As long as it is unsuccessful in this goal, there will be rough edges between the new model and the old. Those edges bias towards the old model consuming the new.