Sunday, January 13, 2019

How to manage a Proof of Concept

POCs as a concept are a response to customers getting oversold. As a vendor we’d rather skip the whole thing and trust our sales team to scope properly. As a customer we’d rather not spend time testing instead of doing. Sometimes they have to be done though, and it’s best for everyone to do it right. Right = tightly scoped and timeframed.

An ideal POC should look like a well-planned professional services engagement. The goal is established in writing before anyone gets on a plane. Infrastructure testing and go/no-go call Friday. Fly Monday. Kick off meeting Tuesday morning, installation rest of day. Wednesday and Thursday, go through the list of use cases and check them all off. Friday morning meeting to get the verbal, fly home, spend the next week with procurement instead of kicking the tires.

You should spend more time helping a customer or vendor define use cases up front than you allow for the POC.  If they can’t define use cases, you might still have a deal, but you’ve established that the product is not worth any actual dollars. That’s bad for the vendor obviously, but it also means the customer can’t get any internal attention for this project. Real use cases mean business value mean time and money allocations. If there is demonstrable value, there is easy justification for a fair price.

Given my one week frame, you’ve got a maximum of 16 hours for use cases. This is a bit more time than a circa 2018 Nicolas Cage binge. If it can be remote, great! Travel time can turn into work time for a maximum of 32 hours. That’s a play through of Far Cry 5. Planning ahead of time lets both sides think about how long each step will take. Estimate how long each use case will take to demonstrate, then double or triple that time. If you don’t need those hours, you’ll have time to get creative after the real work is done. Both sides should bring a punch list of extra things they’d like to show off or see.

This ideal model can have a couple of interesting wrinkles based on product maturity though. A young company with a single product has a straightforward agenda, but a mature company with many products on a shared platform has to pick and choose. Marketing being what it is, the customer’s excitement is also centered on the newest, highest-risk stuff! The reality is that these are things that haven’t been done before, at least by the feet on the ground, so they take even more time.

The only way to be successful in that case is to compartmentalize the platform use cases from the new shiny use cases so that you’re using the new stuff on a solid foundation. Everyone will be thankful in the end.

One last note on why this matters to customers; I’m describing the approach of quality field personnel, which is specifically intended to cast a product in its best light. This is good for customers because it makes the sale easy to explain and process. However, this is how a pro sales team gets their crap product over the line to beat out weak sales teams with potentially better solutions. If you care about the quality of the solution you’re going to be living with, it’s in your interest to understand and manage the POC process.

Sunday, December 23, 2018

Moving the transformation point of data

There’s a pattern that has become common knowledge, perhaps on its way to received wisdom. Endpoints pass their raw data off to storage as quickly as possible. Analysts then do their work against that storage using map reduced processors, automated and/or ad hoc.

This pattern has many benefits and is correct for many use cases. Ephemeral endpoints, such as elastically scaling containers, are able to emit data before it is lost. Machines under attack are able to emit data before the attacker deletes or corrupts it. Best of all, the analyst can explore the data, learn to ask smarter questions, and gradually improve the quality of their work.

The pattern also has a downside: staggering cost. Network load, storage, compute resources, license costs, service costs, and analyst time. It can all be worthwhile during the process of discovering a new root cause or isolating a security threat, but this is not an ideal way to perform operationalized tasks.

What if more of the analysis work is pushed to the endpoint where the raw data is generated?

That does not work for use cases where raw data must be saved, but those cases do not have to be as prevalent as assumed. It’s only like this because we work with data systems that fight against producing information.

We haul dozens of low fidelity lines around for every actual event, generating gigs of noise from every device to sort through. The distance from a given unit of event or metric data to the business impact it represents is huge; just ask anyone who’s collected data for a SIEM.

Bulk raw data collection is similar to running all your applications with logging level at Debug: it’s the right move when you need it, but a waste on the average day.

Alternatives are out there. The most obvious is metrics: it’s hard to argue that a single raw measurement is valuable. Generating a periodic histogram is higher signal, and a smaller data set to boot. Performing metrics analysis at the point of collection produces a better result for less effort.

“checking... is more efficient to do on the host as opposed to querying against a central metrics store... there was no need to store host metrics with that retention and resolution.“
https://eng.uber.com/observability-at-scale/

What about events? It’s true that a single event could be the needle in a haystack that explains a critical situation. It’s also true that these events can be lost to system failure or malicious activity. Unfortunately, it’s also true that these events are often nearly impossible to derive state from. That stream of debug events is great for forensics, but poor for analytics.

What if the endpoint were able to regularly report state in business-friendly terms? Less DURSLEy, more CAPS (http://www.monkeynoodle.org/2018/11/dursley-and-caps.html)? What would that require?

Administrators would need to describe the transformations that they need. Ideally, that’s writing configuration for an engine, just like they’re doing in analysis tools today. However, it would also work to write scripts; this could even be the optimal approach when native tools are faster than a cross-platform engine.

However the work is done, endpoints would need to be able to perform calculations to turn raw data into information. That almost certainly requires a data cache on the endpoint, so cache management becomes a probable requirement.

Of course, bidirectional data distribution, execution shell, and configuration management are required. Those are baseline requirements for any data collection mechanism though, even if some shortcut through baking into a golden image. Ditto for system identification and disambiguation.

So there are some gnarly problems in the infrastructure required to do endpoint-based business analysis, but nothing new, and no true show stoppers. The biggest challenge is at the point of highest value: converting data to business knowledge instead of gathering data for it’s own sake. https://open.nytimes.com/talking-technology-nick-rockwell-charity-majors-2acad1690dcf

Tuesday, December 18, 2018

Conflicts at Work



Work conflicts aren’t fun, but they come with the territory. Here’s a quick field guide for recognizing the type of conflict being observed.

conflict level 4: we disagree on a tactical implementation approach. I think we should write this in language foo and you think language bar. I think the configuration options should be in a vertical accordion and you think a horizontal tab bar. This level of disagreement is a mix of ego and misunderstanding. The more senior employee should start asking questions. "Why do you think that approach is better?" "Where have you seen it done that way before?" "What's the benefit that the user would see?" Ideally one or the other person will be convinced. If not, then go to the first shared manager for a decision that both parties will disagree-and-commit to.

conflict level 3: we disagree on a tactical ownership situation. My team and your team are both trying to solve the same problem. Both of us have sunk costs. This level of disagreement is an upstream fail that you now get to fix. The team leads need to review product market fit and decide: keep one solution, merge the solutions, keep both solutions, or kill both. Again, someone has to open this conversation, so the more senior employee should start.

conflict level 2: we disagree on role definition. I think the task at hand is my job, but you think it's yours. You think I should be doing something I'm not. I think you should be doing something you're not. Everyone around us is confused by mixed messaging and inconsistent results. We have to discuss who's doing which tasks in detail. If we can't fix it in one meeting, we have to involve our respective managers and get a decision made. One or both of us must disagree-and-commit ASAP.

conflict level 1: we disagree on strategic direction. I think the mission is wrong for the team and/or the company, and you are committed to doing it. You think my team is a waste of resources that needs to be redirected or disbanded. Again, we need to go to management for resolution, but one of us is probably looking for a job after that meeting. That’s assuming management gave a clear answer and someone wasn’t convinced of course! If the problem was just postponed, you might persist at conflict level 1 until you can’t bring yourself to walk into the office any more.

conflict level 0: we annoy each other. We speak in different ways, we don’t share common assumptions, we would rather not interact but we have to work together. This level of disagreement is easy, right? All ego? Just put the two individuals into a different context and encourage them to work out their differences? Sure, if everyone involved is dead certain this isn’t actually a diversity and inclusion issue. There’s a big difference between “you think I’m a jerk” and “you think I’m an unworthy human being”, but the two opinions can be hard to distinguish in constrained work scenarios where it’s not okay to have unsavory opinions. Maybe one or both people in this situation are working from something deeper than annoyance. Asking individual employees to work out their racism, sexism, and classism prejudices with each other is a recipe for disaster. If one of them has power over the other, that disaster is on roller skates.

Some common themes:

  • Conflict is going to persist until someone decides to stop it 
  • The more senior employee should act first and most graciously.
  • If the problem is beyond the scope of the people in conflict, bring in someone who can resolve it
And a warning: Disagree-and-commit agreements don’t always stick. If a person really cares so much that they’re willing to fight and can’t honestly work to further the other point of view, then this is the hill they’ll die on. Ideally they’d change teams or companies if the decision went against them. In our non-ideal world, changing jobs might not be an option and bad outcomes may ensue.

Friday, November 23, 2018

Growing the Company

Recent conversations on going public have reminded me that some assume taking a company public is inherently, completely good, necessary to being Important in the Industry. Here’s a few reasons why that is not always true, noting that I am not a financial professional.

Posit that the natural course of a successful company is to achieve a monopoly. Oligopoly will do in a pinch, but the ideal scenario for a company is to take all the cookies.  This is generally viewed as a bad thing, so societies might pass laws or enact breakups to prevent it.

That needs a government actor, and current theory holds that government is bad. One should create market forces that enable good outcomes via greed and invisible fairy hands. To the degree that theory admits monopolies are bad, public markets seem to be the anti-monopoly agent.

Public markets love growth. Vastly oversimplified, there are two types of investments: safety and growth. Bonds and stocks. Monopoly provides safety, whether through bonds or dividends, but it has no growth. A startup provides growth opportunities, but it is not safe.

As an individual investor or fund, this is all fine. Select the balance of safety and risk that makes sense for your goals, and all will be well. As long as there are opportunities. But, if companies achieve their goals, there will just be a few safe monopolies and no growth.

Now let’s play Sim Captain of Wall Street and manage the balance of safety and growth opportunities. The first lever you might try is merger and acquisition. Encourage the monopolies to buy each other and form massive conglomerates with a few basic shared functions.

The outcome is socially fascinating, in that it appears to have encouraged the growth of functional careers like project management. Abstracting a role across the units of Buy-N-Large is good prep for considering that role as an abstract function for any organization.

However, it’s tough to argue that the resulting conglomerates have become growth investments. Jamming a bunch of unrelated businesses into a holding entity doesn’t increase productivity.

A more cynical lever exists in the tech industry: encourage the monopolies to self-disrupt. If a company jumps in a new direction, one of two things will happen: succeed and produce new growth for themselves, or fail and produce new growth opportunities for other companies.

Once initial investments are recovered, there’s almost no way to lose in encouraging a mature, successful company to try crazy risks.

Looking at this as Sim Company Leader, I don’t see how farming the market to increase growth helps me get monopoly. It’s great to get windfall money and lower interest loans, but I don’t want to lose control. I may not have a choice though: early investors expect their paydays.

What if I could table flip the market though? It would be distracting to the attain monopoly game... but after going public, I might be in a mood to gamble on an acquisition or a new product architecture.

It’s all fun and games until someone loses their job, but this cycle, if it’s real, creates higher opportunity jobs by creating duplicative roles across many smaller companies. Not so many gold watch careers though.

Tweetise

Sunday, November 18, 2018

DURSLEy and CAPS

Monitoring and metrics! Theoretically any system that a human cares about could be monitored with these four patterns:

  • LETS
  • USE
  • RED 
  • SLED can’t find where I saw this now, but it’s the same stuff.

I’m hardly the first to notice there’s overlap... https://medium.com/devopslinks/how-to-monitor-the-sre-golden-signals-1391cadc7524 is a good starting point to read from. I haven’t seen these compressed to a single metric set yet, probably from not looking hard enough. Or because “DURSLEy” is too dumb for real pros.


  • Duration: How long are things taking to complete?
  • Utilization: How many resources are used?
  • Rate: How many things are happening now?
  • Saturation: How many resources are left?
  • Latency: How long do things wait to start?
  • Errors: Are there known problems?
  • Yes: We’re done

These are popular metrics to monitor because they can be easily built up from existing sensors. They provide functional details of a service, in data that is fairly easy to derive information from.

In an ideal world, those metrics are measuring “things” and “resources” that are directly applicable to the business need. Sales made. Units produced.

In a less ideal world, machine readable metrics are often used as a proxy to value, because they are easier to measure. CPU load consumed. Amount of traffic routed.

In the best of all possible worlds, the report writer is working directly with business objectives. CAPS  is a metric set that uses business level input to provide success indicators of a service, producing knowledge and wisdom from data and information.


  • Capacity: How much can we do for customers now?
  • Availability: Can customers use the system now?
  • Performance: Are customers getting a good experience now?
  • Scalability: How many more customers could we handle now?

These metrics present the highest value to the organization, particularly when they can be tied to insight about root cause and remediation. That is notably not easy to do, but far more valuable than yet another CPU metric.

Report writers can build meaningful KPIs and SLOs from CAPS metrics. KPIs and SLOs built from DURSLEy metrics are also useful, but they have to be used as abstractions of the organization’s actual mission.

Examples: the number of tents deployed to a disaster area is a CAPS metric, but any measure of resources consumed by deploying those tents is a DURSLEy metric. Synthetic transactions showing ordering is possible: CAPS. Load metrics showing all components are idle: DURSLEy.

Tweetise

Saturday, November 10, 2018

Licensing thoughts, round two


Tweetise.

License Models Suck got a lot of interesting conversations started, time to revisit from the customer’s perspective. Let’s also be clear, this is enterprise sales with account reps and engineers: self-service models are for another day.

As a vendor, the options I describe seem clearly different; but as a customer I just want to buy the thing I need at a price that works. “Works” here means “fits in the budget for that function” and “costs less than building it myself or buying it elsewhere”.

A price model has to work when growth or decline happen.  As a customer I build a spreadsheet model to find if the deal would quit working under some reasonably likely future scenarios. If it passes that analysis, fine. I don’t care if the model is good or bad for the vendor.

So, the obvious question: why doesn’t flat rate pricing rule the world? It’s certainly the easiest thing to model and describe! Answer: organizations are internally subdivided.

The customer may work at BigCo, and BigCo may use some of the vendor’s products, but the customer doesn’t need to buy for all of BigCo. They need to solve the problem in front of them. Charging them a flat BigCo price for that problem doesn’t work.

What’s more, the customer can’t do anything to make it work. Maybe they can help the sales team pivot this into a top-down BigCo-wide deal, but that’s going to take a long time and require all sorts of political capital and organizational skill that not every customer has.

This is easy to solve, right? Per-unit pricing is the answer! Only, we’re talking enterprise sales and products that require hand-holding. The vendor has a spreadsheet model too, and that model doesn’t work if a sales team isn’t producing enough revenue per transaction.

If the customer’s project isn’t big enough, then the deal won’t work with per-unit pricing. In response, the vendor will drop deals that are too small, set minimum deal size floors for their products, or make product bundles that force larger purchases.

If the customer has no control over the number of units, a per unit price might as well be a flat rate. There’s no natural price elasticity, and the only way to construct a deal is through discounting.

Why not get unnatural then? Just scale the price into bands! You want 10 of these? That’s $10,000 each. You want 10,000 of these? That’s $10 each. Why not sell the customer what they want?

Because the cost to execute a deal and support a customer is variable and difficult to model, and the more complex a pricing model is, the less clarity your have into whether your business is profitable and healthy.

The knock on effects from that non-clarity are profound, because they affect anything that involves planning for the future. It’s more difficult to raise capital or get loans, negotiate partnerships, hire and retain talent.

And so we mostly see fairly simple pricing systems in mid-sized enterprise software vendors. I’m most familiar with “platform with a unit price, less expensive add-ons locked to the same unit quantity.”

This pricing works for the middle of the bell curve, but small customers are underserved while large customers negotiate massive discounts or all-you-can-eat agreements that can hurt the vendor.

Sunday, October 28, 2018

Phases of Data Modeling

Say that you want to use some data to answer a question. You’ve got a firewall, it’s emitting logs, and you make a dashboard in your logging tool to show its status. Maybe even alert when something bad happens. You’ve worked with this firewall tech for a few years and you’re pretty familiar with it.

You’ve built a tool at Phase 1. A subject matter expert with data can use pretty much anything to be successful at Phase 1. That dashboard may not make a lot of sense to anyone else, but it works for you because you’ve seen that when the top right panel turns red, the firewall is close to crashing. You know that the middle left panel is a boring counter of failed attackers, while the middle right panel is bad news if it goes above 3.

One day your team gets a new member who’s interested in firewalls and they start asking questions. You improve the dashboard in response to their questions, and other teams start to notice. Some more improvements and you can share your dashboard with the community. Maybe it gets you a talk at a conference. This is a Phase 2 tool. People don’t need to know as much as you do about that firewall to get value from your dashboard.

So far so good... but now you start to get some tougher questions. “Can I use this in my SIEM?” Or “can you do the same thing for this other firewall?” Now you’re getting asked to put this data into a common information model.

This is a Phase 3 problem. Simply understand the data sources and use cases well enough to describe a minimalist abstraction layer between them. There is some good news here, because Phase 3 tools are hard to do and therefore worth money. Why? Well, let’s look at the process:

1. Read the information model of the logging or security product in question and understand what it’s looking for. There’s no point in modeling data it can’t use.
2. Find events in your data that line up with the events that the product can understand. Make sure they’re presenting all of the fields necessary, figure out how you’ll deal with any gaps, and describe the events properly.
3. Test that it works, then start over with the next event. Continue until you’ve gotten everything the model covers now.
4. Decide if it’s worth it and/or possible to extend the model and build the rest of the possible use cases.
5. Decide if it’s worth rethinking your Phase 1 and Phase 2 problems in light of the Phase 3 work (probably not).

This is tedious work that requires some domain knowledge. That doesn’t mean you should wait until the domain knowledgeable wizard comes along... domain knowledge is gained through trial and error. Try to build this thing! When it doesn’t work, you can use this framework to find and fix the problem.

Let’s also consider a common product design mistake. When using this perspective, it’s easy to think that the phases are a progression through levels, like apprentice to journeyman to master. Instead, these phases are mental modes that a given user might switch between several times in a working session.

I’m fairly proficient with data modeling, but that doesn’t make me a master of every use case that might need modeled data. An incident response security analyst may be amazing at detecting malicious behavior in the logs of an infrastructure device, but that doesn’t mean they actually understand what the affected device does.

This distinction is important when product designs put artificial barriers between phases of use, preventing the analyst from accessing help they need in the places they need it, or preventing them from moving beyond help they don’t need. More on product design next week.

Not a tweetise, just a link