Sunday, December 23, 2018

Moving the transformation point of data

There’s a pattern that has become common knowledge, perhaps on its way to received wisdom. Endpoints pass their raw data off to storage as quickly as possible. Analysts then do their work against that storage using map reduced processors, automated and/or ad hoc.

This pattern has many benefits and is correct for many use cases. Ephemeral endpoints, such as elastically scaling containers, are able to emit data before it is lost. Machines under attack are able to emit data before the attacker deletes or corrupts it. Best of all, the analyst can explore the data, learn to ask smarter questions, and gradually improve the quality of their work.

The pattern also has a downside: staggering cost. Network load, storage, compute resources, license costs, service costs, and analyst time. It can all be worthwhile during the process of discovering a new root cause or isolating a security threat, but this is not an ideal way to perform operationalized tasks.

What if more of the analysis work is pushed to the endpoint where the raw data is generated?

That does not work for use cases where raw data must be saved, but those cases do not have to be as prevalent as assumed. It’s only like this because we work with data systems that fight against producing information.

We haul dozens of low fidelity lines around for every actual event, generating gigs of noise from every device to sort through. The distance from a given unit of event or metric data to the business impact it represents is huge; just ask anyone who’s collected data for a SIEM.

Bulk raw data collection is similar to running all your applications with logging level at Debug: it’s the right move when you need it, but a waste on the average day.

Alternatives are out there. The most obvious is metrics: it’s hard to argue that a single raw measurement is valuable. Generating a periodic histogram is higher signal, and a smaller data set to boot. Performing metrics analysis at the point of collection produces a better result for less effort.

“checking... is more efficient to do on the host as opposed to querying against a central metrics store... there was no need to store host metrics with that retention and resolution.“

What about events? It’s true that a single event could be the needle in a haystack that explains a critical situation. It’s also true that these events can be lost to system failure or malicious activity. Unfortunately, it’s also true that these events are often nearly impossible to derive state from. That stream of debug events is great for forensics, but poor for analytics.

What if the endpoint were able to regularly report state in business-friendly terms? Less DURSLEy, more CAPS ( What would that require?

Administrators would need to describe the transformations that they need. Ideally, that’s writing configuration for an engine, just like they’re doing in analysis tools today. However, it would also work to write scripts; this could even be the optimal approach when native tools are faster than a cross-platform engine.

However the work is done, endpoints would need to be able to perform calculations to turn raw data into information. That almost certainly requires a data cache on the endpoint, so cache management becomes a probable requirement.

Of course, bidirectional data distribution, execution shell, and configuration management are required. Those are baseline requirements for any data collection mechanism though, even if some shortcut through baking into a golden image. Ditto for system identification and disambiguation.

So there are some gnarly problems in the infrastructure required to do endpoint-based business analysis, but nothing new, and no true show stoppers. The biggest challenge is at the point of highest value: converting data to business knowledge instead of gathering data for it’s own sake.

Tuesday, December 18, 2018

Conflicts at Work

Work conflicts aren’t fun, but they come with the territory. Here’s a quick field guide for recognizing the type of conflict being observed.

conflict level 4: we disagree on a tactical implementation approach. I think we should write this in language foo and you think language bar. I think the configuration options should be in a vertical accordion and you think a horizontal tab bar. This level of disagreement is a mix of ego and misunderstanding. The more senior employee should start asking questions. "Why do you think that approach is better?" "Where have you seen it done that way before?" "What's the benefit that the user would see?" Ideally one or the other person will be convinced. If not, then go to the first shared manager for a decision that both parties will disagree-and-commit to.

conflict level 3: we disagree on a tactical ownership situation. My team and your team are both trying to solve the same problem. Both of us have sunk costs. This level of disagreement is an upstream fail that you now get to fix. The team leads need to review product market fit and decide: keep one solution, merge the solutions, keep both solutions, or kill both. Again, someone has to open this conversation, so the more senior employee should start.

conflict level 2: we disagree on role definition. I think the task at hand is my job, but you think it's yours. You think I should be doing something I'm not. I think you should be doing something you're not. Everyone around us is confused by mixed messaging and inconsistent results. We have to discuss who's doing which tasks in detail. If we can't fix it in one meeting, we have to involve our respective managers and get a decision made. One or both of us must disagree-and-commit ASAP.

conflict level 1: we disagree on strategic direction. I think the mission is wrong for the team and/or the company, and you are committed to doing it. You think my team is a waste of resources that needs to be redirected or disbanded. Again, we need to go to management for resolution, but one of us is probably looking for a job after that meeting. That’s assuming management gave a clear answer and someone wasn’t convinced of course! If the problem was just postponed, you might persist at conflict level 1 until you can’t bring yourself to walk into the office any more.

conflict level 0: we annoy each other. We speak in different ways, we don’t share common assumptions, we would rather not interact but we have to work together. This level of disagreement is easy, right? All ego? Just put the two individuals into a different context and encourage them to work out their differences? Sure, if everyone involved is dead certain this isn’t actually a diversity and inclusion issue. There’s a big difference between “you think I’m a jerk” and “you think I’m an unworthy human being”, but the two opinions can be hard to distinguish in constrained work scenarios where it’s not okay to have unsavory opinions. Maybe one or both people in this situation are working from something deeper than annoyance. Asking individual employees to work out their racism, sexism, and classism prejudices with each other is a recipe for disaster. If one of them has power over the other, that disaster is on roller skates.

Some common themes:

  • Conflict is going to persist until someone decides to stop it 
  • The more senior employee should act first and most graciously.
  • If the problem is beyond the scope of the people in conflict, bring in someone who can resolve it
And a warning: Disagree-and-commit agreements don’t always stick. If a person really cares so much that they’re willing to fight and can’t honestly work to further the other point of view, then this is the hill they’ll die on. Ideally they’d change teams or companies if the decision went against them. In our non-ideal world, changing jobs might not be an option and bad outcomes may ensue.