Saturday, May 29, 2021

What Should Go Into a CMDB

 

It’s not every day that information technology work leads you into philosophy, but designing a configuration management database will do it. Spend a little while thinking about what is known or even knowable about the services you’re trying to provide, maybe you’ll end up asking “what does existence even mean?” Fear not, there are some practical guiding principles to follow. 

First, some background: what is the purpose of a Configuration Management Data Base (CMDB)? Why are we even trying to align Configuration Items (CIs) to the Services that they provide? The intention is to provide visibility into what affects what. The purpose of that visibility is to ensure reliability of changes, by understanding what is affected and planning in advance to minimize service interruption.

Of course, that goal is not simple to achieve. If it were really simple, it would just be a notebook or a spreadsheet, right? But the modern organization has a complex web of interdependent systems. A flat sheet of items misses so much of reality that it fails to make life any easier. On the other end of the complexity spectrum, there’s a very careful and intentional attempt to describe everything important. Let’s pretend that’s possible for a moment, which I doubt… it’s still unlikely that your organization can do it at a justifiable cost. If the system fails to provide return on investment, then it’s scrapped with good reason.

So, the CMDB should only track what is truly necessary to affordably meet the goal of visibility driving change safety. Common advice is to begin with a map of CIs to service offerings. But, is that a detailed 1:1 map? Sort of, maybe. That’s a fine directional goal, but you should not insist on or try to attain a complete CI to service offering mapping on day one. You might start off asking “which service offering does this CI map to in the service catalog”, but it’s all too easy to wander into epistemology. Business leaders are asking for things that can’t be easily made visible and so you end up doing service mappings by proxy… which doesn’t work. 

If you can’t draw a clear line from “this attribute on this entity maps to that attribute on that entity to support this metric indicating that service”, then it’s dreams going into the system, not measurements. So only put in things in that you have to because it’s obvious that they work to support a service. For instance, let’s say we’re talking about a job processing farm. The NAS volumes support the VMs that perform the jobs that test the designs. You can say “this volume supports that job” clearly, but you will struggle to say “this NAS failure is costing that business unit an on-time delivery of that deal” because you don’t have sufficient context in your software. In an organization with sufficient complexity to be considering a CMDB, the business map isn’t clear and the technology map is only a little better. 

What is a CI anyway, and how are you going to identify it? Is a group of ephemeral containers providing services worth a CI each, or one for the service, or none? If you’ve got a big expensive server and you replace its motherboard so the serials and MACs change, do you update all your records? What if you just changed its hostname? Does it still support the same services that it used to? How do you reference the systems you run on a cloud Infrastructure as a Service (IaaS)? Do you need a CI for each of the Software as a Service (SaaS) functions that you use?

The simplest answer to those questions is to manually define and track systems, ignoring identifying attributes like serials and MACs. However, this leaves recalls, warranty tracking, and depreciation out of the CMDB and in another tool, reducing the cost justification for the CMDB. Maybe that works for your organization, but it’s a decision that has got to be done across the organization so that you can present meaningful numbers in your KPIs and compliance audits.

What types of devices should go into the database? Does a network switch count, since it’s critical for making your organization’s software service operate? How about the AC power or HVAC systems that are critical for keeping the switch work? How about the roof that keeps the rain off the switch? Again, the philosophy of everything being interconnected is fascinating but you’ve got a job to do. So, a CI is any item that you can actively manage. I take an absolutist approach here: if there is not a software tool providing visibility and control to the device, then it is not a CI worth tracking.

My advice is to draw a CI line at the management interface. Only put a thing into the data system if it has an agent on it or an interface that enables a management system that you own to collect data. Again, three quarters of my career is in those agents, so call it self-serving if you like. But if the device can’t update your systems on its own; then someone has a job to maintain the “time-saving” tool. If you can afford to allocate people to maintaining data cleanliness in a CMDB so ITAM reports make sense and ITSM service requests go smoothly, that’s great, but it sounds like questionable use of resources at any scale. As Corey Quinn writes, “What people lose sight of is that infrastructure, in almost every case, costs less than payroll.”

Next, treat every collected attribute like it’s a personal insult. For every report, ask how little can you possibly collect and still solve the problem. Is there already data collected that will work? This is the land of practically unbounded high velocity data sets, and scale is going to be a problem. Habits built with a small number of entities will fall apart quickly as your organization grows. 

So that’s CIs… now for the services that they provide. Definitionally, a service offering is something your stakeholders and customers can directly use. But, remember that the point of the CMDB is to de-risk changes, meaning you have to map things that might get changed to that service. Service: Our self-hosted website is up. CIs to provide that: dozens of servers, plus all sorts of unknowables like network and power and HVAC and a functioning civil society. That said, while a one-to-many service:CI map is more complex, it’s the only model that is at all realistic. Business users requesting changes and reporting problems should have no need to select CIs and suggest solutions, so the complexity of looking down the tree from service offering shouldn’t matter. IT operators requesting changes and reporting problems do need to see what those changes affect, and looking up the tree to affected services is theoretically useful. However, those two statements rely on an accurate service:CI map, and that accuracy is sorely lacking. More likely, the business requestor is aware of problematic CIs because they’ve been troubleshooting the problem on their own, and the IT operator is not aware of affected services because the CI has been multi-tasked or repurposed. Therefore, incident triage and handling often include preliminary discovery of the functional map, if possible.

Exceptions to that grim state of affairs can exist: just as I recommend the definition of a CI is limited to an automatically discoverable entity, I also recommend that the definition of a service is limited to machine readable labels. Tagging or labeling CIs with the services that they support allows the CI data collection mechanism to be used to support CI to service mapping. Better, this allows an organization to begin with manual process (apply this label to anything you spin up with this account in these availability zones) and then grow to automation (if spinning up from this image and running this process, then include that label). That way the organization does not have to begin with perfect in order to get somewhat better.