Friday, November 23, 2018

Growing the Company

Recent conversations on going public have reminded me that some assume taking a company public is inherently, completely good, necessary to being Important in the Industry. Here’s a few reasons why that is not always true, noting that I am not a financial professional.

Posit that the natural course of a successful company is to achieve a monopoly. Oligopoly will do in a pinch, but the ideal scenario for a company is to take all the cookies.  This is generally viewed as a bad thing, so societies might pass laws or enact breakups to prevent it.

That needs a government actor, and current theory holds that government is bad. One should create market forces that enable good outcomes via greed and invisible fairy hands. To the degree that theory admits monopolies are bad, public markets seem to be the anti-monopoly agent.

Public markets love growth. Vastly oversimplified, there are two types of investments: safety and growth. Bonds and stocks. Monopoly provides safety, whether through bonds or dividends, but it has no growth. A startup provides growth opportunities, but it is not safe.

As an individual investor or fund, this is all fine. Select the balance of safety and risk that makes sense for your goals, and all will be well. As long as there are opportunities. But, if companies achieve their goals, there will just be a few safe monopolies and no growth.

Now let’s play Sim Captain of Wall Street and manage the balance of safety and growth opportunities. The first lever you might try is merger and acquisition. Encourage the monopolies to buy each other and form massive conglomerates with a few basic shared functions.

The outcome is socially fascinating, in that it appears to have encouraged the growth of functional careers like project management. Abstracting a role across the units of Buy-N-Large is good prep for considering that role as an abstract function for any organization.

However, it’s tough to argue that the resulting conglomerates have become growth investments. Jamming a bunch of unrelated businesses into a holding entity doesn’t increase productivity.

A more cynical lever exists in the tech industry: encourage the monopolies to self-disrupt. If a company jumps in a new direction, one of two things will happen: succeed and produce new growth for themselves, or fail and produce new growth opportunities for other companies.

Once initial investments are recovered, there’s almost no way to lose in encouraging a mature, successful company to try crazy risks.

Looking at this as Sim Company Leader, I don’t see how farming the market to increase growth helps me get monopoly. It’s great to get windfall money and lower interest loans, but I don’t want to lose control. I may not have a choice though: early investors expect their paydays.

What if I could table flip the market though? It would be distracting to the attain monopoly game... but after going public, I might be in a mood to gamble on an acquisition or a new product architecture.

It’s all fun and games until someone loses their job, but this cycle, if it’s real, creates higher opportunity jobs by creating duplicative roles across many smaller companies. Not so many gold watch careers though.

Tweetise

Sunday, November 18, 2018

DURSLEy and CAPS

Monitoring and metrics! Theoretically any system that a human cares about could be monitored with these four patterns:

  • LETS
  • USE
  • RED 
  • SLED can’t find where I saw this now, but it’s the same stuff.

I’m hardly the first to notice there’s overlap... https://medium.com/devopslinks/how-to-monitor-the-sre-golden-signals-1391cadc7524 is a good starting point to read from. I haven’t seen these compressed to a single metric set yet, probably from not looking hard enough. Or because “DURSLEy” is too dumb for real pros.


  • Duration: How long are things taking to complete?
  • Utilization: How many resources are used?
  • Rate: How many things are happening now?
  • Saturation: How many resources are left?
  • Latency: How long do things wait to start?
  • Errors: Are there known problems?
  • Yes: We’re done

These are popular metrics to monitor because they can be easily built up from existing sensors. They provide functional details of a service, in data that is fairly easy to derive information from.

In an ideal world, those metrics are measuring “things” and “resources” that are directly applicable to the business need. Sales made. Units produced.

In a less ideal world, machine readable metrics are often used as a proxy to value, because they are easier to measure. CPU load consumed. Amount of traffic routed.

In the best of all possible worlds, the report writer is working directly with business objectives. CAPS  is a metric set that uses business level input to provide success indicators of a service, producing knowledge and wisdom from data and information.


  • Capacity: How much can we do for customers now?
  • Availability: Can customers use the system now?
  • Performance: Are customers getting a good experience now?
  • Scalability: How many more customers could we handle now?

These metrics present the highest value to the organization, particularly when they can be tied to insight about root cause and remediation. That is notably not easy to do, but far more valuable than yet another CPU metric.

Report writers can build meaningful KPIs and SLOs from CAPS metrics. KPIs and SLOs built from DURSLEy metrics are also useful, but they have to be used as abstractions of the organization’s actual mission.

Examples: the number of tents deployed to a disaster area is a CAPS metric, but any measure of resources consumed by deploying those tents is a DURSLEy metric. Synthetic transactions showing ordering is possible: CAPS. Load metrics showing all components are idle: DURSLEy.

Tweetise

Saturday, November 10, 2018

Licensing thoughts, round two


Tweetise.

License Models Suck got a lot of interesting conversations started, time to revisit from the customer’s perspective. Let’s also be clear, this is enterprise sales with account reps and engineers: self-service models are for another day.

As a vendor, the options I describe seem clearly different; but as a customer I just want to buy the thing I need at a price that works. “Works” here means “fits in the budget for that function” and “costs less than building it myself or buying it elsewhere”.

A price model has to work when growth or decline happen.  As a customer I build a spreadsheet model to find if the deal would quit working under some reasonably likely future scenarios. If it passes that analysis, fine. I don’t care if the model is good or bad for the vendor.

So, the obvious question: why doesn’t flat rate pricing rule the world? It’s certainly the easiest thing to model and describe! Answer: organizations are internally subdivided.

The customer may work at BigCo, and BigCo may use some of the vendor’s products, but the customer doesn’t need to buy for all of BigCo. They need to solve the problem in front of them. Charging them a flat BigCo price for that problem doesn’t work.

What’s more, the customer can’t do anything to make it work. Maybe they can help the sales team pivot this into a top-down BigCo-wide deal, but that’s going to take a long time and require all sorts of political capital and organizational skill that not every customer has.

This is easy to solve, right? Per-unit pricing is the answer! Only, we’re talking enterprise sales and products that require hand-holding. The vendor has a spreadsheet model too, and that model doesn’t work if a sales team isn’t producing enough revenue per transaction.

If the customer’s project isn’t big enough, then the deal won’t work with per-unit pricing. In response, the vendor will drop deals that are too small, set minimum deal size floors for their products, or make product bundles that force larger purchases.

If the customer has no control over the number of units, a per unit price might as well be a flat rate. There’s no natural price elasticity, and the only way to construct a deal is through discounting.

Why not get unnatural then? Just scale the price into bands! You want 10 of these? That’s $10,000 each. You want 10,000 of these? That’s $10 each. Why not sell the customer what they want?

Because the cost to execute a deal and support a customer is variable and difficult to model, and the more complex a pricing model is, the less clarity your have into whether your business is profitable and healthy.

The knock on effects from that non-clarity are profound, because they affect anything that involves planning for the future. It’s more difficult to raise capital or get loans, negotiate partnerships, hire and retain talent.

And so we mostly see fairly simple pricing systems in mid-sized enterprise software vendors. I’m most familiar with “platform with a unit price, less expensive add-ons locked to the same unit quantity.”

This pricing works for the middle of the bell curve, but small customers are underserved while large customers negotiate massive discounts or all-you-can-eat agreements that can hurt the vendor.

Sunday, October 28, 2018

Phases of Data Modeling

Say that you want to use some data to answer a question. You’ve got a firewall, it’s emitting logs, and you make a dashboard in your logging tool to show its status. Maybe even alert when something bad happens. You’ve worked with this firewall tech for a few years and you’re pretty familiar with it.

You’ve built a tool at Phase 1. A subject matter expert with data can use pretty much anything to be successful at Phase 1. That dashboard may not make a lot of sense to anyone else, but it works for you because you’ve seen that when the top right panel turns red, the firewall is close to crashing. You know that the middle left panel is a boring counter of failed attackers, while the middle right panel is bad news if it goes above 3.

One day your team gets a new member who’s interested in firewalls and they start asking questions. You improve the dashboard in response to their questions, and other teams start to notice. Some more improvements and you can share your dashboard with the community. Maybe it gets you a talk at a conference. This is a Phase 2 tool. People don’t need to know as much as you do about that firewall to get value from your dashboard.

So far so good... but now you start to get some tougher questions. “Can I use this in my SIEM?” Or “can you do the same thing for this other firewall?” Now you’re getting asked to put this data into a common information model.

This is a Phase 3 problem. Simply understand the data sources and use cases well enough to describe a minimalist abstraction layer between them. There is some good news here, because Phase 3 tools are hard to do and therefore worth money. Why? Well, let’s look at the process:

1. Read the information model of the logging or security product in question and understand what it’s looking for. There’s no point in modeling data it can’t use.
2. Find events in your data that line up with the events that the product can understand. Make sure they’re presenting all of the fields necessary, figure out how you’ll deal with any gaps, and describe the events properly.
3. Test that it works, then start over with the next event. Continue until you’ve gotten everything the model covers now.
4. Decide if it’s worth it and/or possible to extend the model and build the rest of the possible use cases.
5. Decide if it’s worth rethinking your Phase 1 and Phase 2 problems in light of the Phase 3 work (probably not).

This is tedious work that requires some domain knowledge. That doesn’t mean you should wait until the domain knowledgeable wizard comes along... domain knowledge is gained through trial and error. Try to build this thing! When it doesn’t work, you can use this framework to find and fix the problem.

Let’s also consider a common product design mistake. When using this perspective, it’s easy to think that the phases are a progression through levels, like apprentice to journeyman to master. Instead, these phases are mental modes that a given user might switch between several times in a working session.

I’m fairly proficient with data modeling, but that doesn’t make me a master of every use case that might need modeled data. An incident response security analyst may be amazing at detecting malicious behavior in the logs of an infrastructure device, but that doesn’t mean they actually understand what the affected device does.

This distinction is important when product designs put artificial barriers between phases of use, preventing the analyst from accessing help they need in the places they need it, or preventing them from moving beyond help they don’t need. More on product design next week.

Not a tweetise, just a link

Sunday, September 30, 2018

Weekly Status

Tweetise

People are creatures of habit, and effective work is produced by grooming useful habits. Here’s a quick write up of a useful habit: the weekly status report.

I haven’t always written these, and I haven’t always worked for people who’ve wanted to receive them, but I’ve been at my most effective when I was writing and discussing them.

A weekly report of your status is a distillation of the most important things that have happened in the last few days. It’s also an agenda for the next week, and a chance to reflect. It can also help you actually have a weekend, because you’re closing the books on Friday.

How to work this magic? You’ll need a text editor. I’m also fond of a cloud service for syncing text documents. You’ll need a communication tool too: email, slack, or a wiki.

The document: a simple text document with no formatting.

Hi,

Meta:
* 1 line about you. Happy? Sick? Overworked?

$project:
* 1-3 single line statements of status affecting events.
* Started X
* Y Ongoing
* Finished Z
* Last release, date, purpose
* Next release, ETA, purpose
* The goal after that

*Repeat as needed.*

Thanks,
$me

Every Friday when I’m about ready to call the day done, I open this document and replace last week’s material with this week’s. I reflect on how I’m doing and how that presents. Same items not moving? Can’t stand looking at this any more? I need help and this is my chance to ask.

Sync: If it’s possible to put this text block in a cloud sync service, then it’s possible to do this on your phone while riding to the airport or standing in the boarding line. That’s remarkably useful. The big thing is to see what you wrote last week.

A push based communication is ideal, because the recipients aren’t going to look at a web page. They’re all too used to safe and boring status, so don’t be boring. Email or Slack work. Skip the formatting and pictures. Just the status.

I’ve been in teams that used wikis or Evernote for status updates, and it can work, but it’s notably worse; those are the teams where a lot more phone calls were needed. There’s a reason those tools all send email notifications.

Finally, who to send your status to? Your manager is supposed to be thrilled to get a concise, timely, and accurate ping of status. However, folks sometimes fall short of ideals, and that doesn’t have to stop you from doing this work for yourself.

Given sufficient tuning and need, the weekly status can go to your teammates, your direct reports, or a cross-functional group. I do think it’s important to send it to someone, otherwise it’s a diary. But as in any writing, think of the audience.

Sunday, September 23, 2018

Community

Tweetise.

So you’re a software company, and you want to have a community. What next?

“Why community” is a great place to start: the stated reasons and budget are often somewhere in marketing, but the community is equally important for customer support. Community is where soft guidelines are communicated, FAQs are advertised, and newcomers are made welcome.

All of that means reduced customer support costs, because the folks that are answering these questions aren’t on your payroll. Note that also means you don’t have a lot of control over what they say, so we’ll dig into that in a bit.

A software community is a forum for discussions about your software and the problems that it solves. This may take many forms, non-exclusively. Asynchronous email lists (Mailman) and fora (Lithium). Synchronous channels like Slack, or face-to-face user groups and conferences.

In an ideal world these are all options on the table, but there’s a very definite cost gradient to consider. The more synchronous you get, the more it costs for fewer people; but they get better results. Support may be a major beneficiary, but they have no budget power.

Marketing is the team paying for this if anyone does, so the dollars are entirely dependent on the community’s ability to meet marketing’s agenda. That can be an issue for the types of folks who offer free support for someone else’s software.

Who are those community members, anyway? They are wonderful gems. Customers, pro service partners, maybe internal employees who just can’t get enough. They’re putting “spare time” into your support forum because they care about people being successful, with your product.

They’re also doing work for themselves, building a community reputation. They’re the pool you’ll hire from as you grow. In the meantime, are you offering them a path to stay with you? Certifications? Awards? Where’s the public recognition of their effort?

Unfortunately, people are people and those nobly motivated activities might get blurred by bad behavior. While solving your problems, your community may also air views on race, sex, religion, politics. Fights happen. Do you even know, and are you prepared to keep the peace?

Moderation is absolutely required if you don’t want your community to turn into a cesspool. And so we return to the question of budget. Moderation means people, and people gotta eat, and quality people expect quality pay and tools for their job.

At a tiny scale, your company is able to do this work “on the side”. Just like the social engineering of people and project management, your star employees quietly shoulder it all while you congratulate yourself on not actually needing those functions.

Don’t kid yourself; there’s someone taking care of the social work you’re not seeing, and you’d better recognize their contribution before it stops. Keeping people working well together doesn’t just happen.

At a massive scale, there’s so much moderation and so much community that tiny and medium communities are forming around the main communities. If you’re getting a B-Sides, you’ve got a whole new set of problems.

The medium sized scale is where things are toughest. Big enough to truly need part-time or full-time paid help, but small enough to question that need and try to half-ass it. So, for those in that boat, let’s consider what a successful community looks like.

New users are welcomed & their problems are answered correctly. People are free to be themselves, but bigotry and bullying are not tolerated. Thorny problems get redirected to proper channels. Fights are resolved promptly without collateral damage.

The stars of the community are recognized and rewarded, regardless of where their paychecks originate. They keep magnifying your reach because they’re feeling good about doing that.

If that doesn’t sound like your community, you might be better off shutting it down until you hire someone to do it right. Buying tools isn’t going to help.

Sunday, September 16, 2018

Security Logging

Tweetise form.

Security logging is interesting. Detecting security and compliance issues means uncovering nasty little leakages of unintentional or surprising information all over. When you add a powerful security tool to the environment, it starts to shine light into dark corners.

No one expects that temporary file of sensitive data or the password in a script to be recorded. Credential tokens start safe, but get copied to unsafe paths. They’re not intentional flaws, but rather hygiene issues.

If a tool detects security hygiene issues, the responding team must decide if they believe the tool or not, and then what to do about it. As a vendor planning that security tool, figuring out which way the customer team will go is an existential crisis.

Obviously, if the customer doesn’t believe the tool, that sale isn’t made or that renewal doesn’t happen. Less obviously, even if the customer does believe the tool, success is not guaranteed. The social angles are too complex for today’s thread.

The logical path for tool developers is to log any data, offending or otherwise.
It’s impossible to describe every possible problem scenario & filter objectionable material. Even getting low hanging fruit is bad, it builds an expectation that the tool solves hard problems too.

Worse, if the tool does not record the raw data and only records that a user did a prohibited thing at place and time... then the tool won’t be trusted. The user doesn’t remember doing a bad thing, and now it’s human versus log. Human wins.

So financial pressure leads to security tools logging everything they see. This is not ideal because it can mean worsening the security situation by logging and transmitting secure tidbits. Instead of searching every mattress in town, our raccoon-masked baddie can rob the bank.

Because belief is ahead of action in the customer’s decision path, data collection problems are true of failing security tools as well as successful ones. Everyone wants to be trusted, so everyone records at high fidelity.

Encrypt all the things is then used to protect these high value stores. I’m reminded of the DRM problem though... the data has to be in usable form to get used, so there’s always an exposure somewhere. Makes you wonder how many SOCs have extra folks listening in.

Sunday, September 9, 2018

Disrupting Ourselves

Tweetise here

Let’s talk about some received wisdom: “disrupt your own market before someone else does it to you”. Sensible advice: complacency can kill. Except disruption is generally a pioneering activity, and the survival rate for pioneers is lower than for copycats.

Corporate blindspots being what they are, this style of transition is more often a new company’s opportunity to disrupt an existing market. When done internally, it’s as disruptive as calving a new company.

Still, let’s assume our company has decided to change. Further assume that we’re not completely altering business model from vertical integration to horizontal commoditization or vice versa. That takes executive team guidance, but I generally write about technology companies.

There are many architects with opinions on horizontal versus vertical technology stacks. Worse, they win budget to shift the stack under the rubric of self-disruption. Horizontal and vertical both work, so a team can start anywhere on the cycle and shift to the next step.


Moving from vertical to horizontal:
* Identify functional components
* Abstract those components with APIs
* Replace the ones that can’t elastically scale
* Start writing large checks to your IaaS of choice

That’s all fairly straightforward for a new project, but if you’ve got an existing customer base there’s some challenges.
* Maintain performance and quality while complicating architecture
* Decide to expose or hide the APIs… Who’s embracing and extending who?

Worst of all:
* Does the license and business model still work after this change, or do you need to revisit product market fit?
* Backwards compatibility... well if you’re not Microsoft, let’s all have a good laugh over that one.

Moving from horizontal to vertical:
* Identify painful integrations that need consolidating.
* Define interfaces where your solution will tie into the rest of the world.
* Execute. Ease of purchase, use, and assurance. Buyer must feel confident they didn’t make a mistake here.

There’s no lack of startup memoirs. Doing it from within a company is gnarlier, disrupting your own existing system. Professional services and the partner community are going to ask some tough questions. Sales and marketing might not be thrilled about rewriting the playbook.

Transition is reimplementation of capabilities, meaning forward progress slows or halts for at least a year. Strong support in a fat Q2 evaporates in the following lean Q1. Teams that mismanage their planning find their work going into the bitbucket, along with some executives.

To forestall that reckoning, leadership spends significant effort badmouthing existing product: hopelessly outdated, unscalable, and just bad. This is easy and successful; and therefore the worst damage of the entire process. It burns the boats and commits the company.

Once “Something must be done” is accepted wisdom, all manner of crazy can be considered reasonable. Add some sunk costs and it takes a major crisis to reset direction.

Monday, September 3, 2018

Engines and fuel - who writes quality content?

Tweetise.

In software, everyone wants to build engines, and no one wants to make fuel. A platform for executing content has high potential leverage and lots of vendors make those. The expected community of fuel makers rarely materializes.

Content for software engines breaks down along two axes: simplicity versus complexity and generality versus specificity to the customer’s environment. Half of the resulting quadrant is unsuitable for sharing communities, because it’s not general.

Simple and customer specific: a list of assets and identities. Vendors clearly can’t do these at all, so they make management tools. This quadrant is an obvious dead zone for content.

Complex and customer specific: personnel onboard and termination processes. Again, dead zone.

Sad times occur when companies try to operate in one of the dead zones: for example, the process automation engine. A hypothetical vendor faces years of customer issues root caused to process failures, so they decide to help customers succeed by automating the process.

Turns out that the customers who think about process already have 20 of these on the shelf. The customers who don’t? Some aren’t interested, and some want to be told what they should be doing. They need fuel, and the vendor can’t give it to them without professional services.

Complex and general: compliance tests for common off the shelf (COTS) solutions. This is where in-house content teams are justified; their success is measured in lower sales cycle times and professional services spend. Those metrics are hard to defend, but that’s another story.

Compliance auditing is an excellent place to observe this type of content in the market. Anything that can execute script on an endpoint and return results can be used to check compliance with (say) the PCI dirty dozen.

You’d be mad to do this with a tool that doesn’t already have some content though. Who wants to maintain the scripts to determine if all your endpoint agents are getting their updates? So customer demand for content is high.

The supply side is more challenging. The engine vendor makes some content because they need it, but the work is harder than it appears and they’re eager to share. Why can’t they? There’s four paths they might try.

1: Pays their own developers to write content. Mostly gets what they want at high cost. 2: Pays others to write it, through outsourcing or spiffs. Mostly gets less than they want at high cost. 3: Markets a partner program and SDK. Mostly gets nothing, at low cost.

4: They do nothing and hope for the best. In a strong social community, this can actually work great. Participants learn new skills, improve their employability and social standing, and genuinely make the world a bit better. Without that community, vendor had better pay up.

The strongest motivation to make complex content for an engine to execute is if you own that engine or make a living with it. Next is if you improve your individual standing with this work. The weakest motivation is held by other software companies seeking marketing synergy.

Which brings us to the last quadrant, simple and general: a malicious IP address blacklist. This is where entire companies are justified through attractive-looking profit margins; there success can be measured in the usual metrics of sales.

The threat intelligence market is a recent example of this effect. TI comes from four sources: internal teams, paid vendors, private interest communities, and open source communities. In the first three, employees produce quality content for real world use cases.

Ask your friendly neighborhood security expert which TI sources they prefer, and I expect the answer will look like @sroberts’: https://medium.com/ctisc/intelligence-collection-priorities-10cd4c3e1b9d

Taking responsibility for TI content leads to increasing risk avoidance as well, further reducing its value. Over time the developer facing a flood of support tickets will err on the side of caution, accept more false positives, and add caveat emptor warnings.

Another interesting factor in these models is the mean time to maintenance. Threat intel needs analysis of fast-moving criminal groups and rapidly decays in value. Compliance content relies on analysis of slow-moving components and can last for years of low maintenance costs.

I think that this dichotomy in maintenance cost holds across most examples in the simple to complex axis. Connectivity drivers are complex and last for a long time. VM or container definitions are simple wrappers around complex content and last for a short time.

The requirement for maintenance defines whether the vendor offers support for the content, which in turn defines many customers’ willingness to depend on it and consultants’ willingness to provide it.

Playing it out, higher maintenance content is less supported, and therefore more likely to be used by risk-embracing customers with strong internal teams and bias towards community software. Lower maintenance, higher support, more services dollars.

Sunday, August 26, 2018

Line Product Management Process

Tweetise (thanks @djpiebob!)

I have some issues with the concept of “automating” or “scaling” product management, which I went into in this blog post: http://www.monkeynoodle.org/2018/03/automating-ers-through-support-is-crap.html — what I haven’t written up is what I do use.

This is the process for directly running a product or multiple products; leading a team that runs products has a different set of tools required which I’ll go into some other time.

It’s pretty old school! I use whatever is available for shared documents,  [Confluence|Wiki|Google docs], to keep a freeform record of customer contacts. During a meeting I take notes on my phone (Apple Notes) or my laptop (BBEdit), depending on the need to avoid keyboard noise.

ASAP after the meeting I rewrite them into the shared doc and share the rewritten result with interested Slack teams. I’ve also tried SFDC call notes and direct to JIRA, but found it impossible to correlate and review across customers and projects.

The first raw notes document is a mess of shorthand, repetitions, action items and factoids for me, and acronyms that may only make sense to me. The second is still just notes, but more readable by other team members. This is the citations bibliography for everything else.

I might use it for competitive research as well, or I might put competitive notes in a separate document if they get too big. I‘ll also break the customer notes doc off into new docs every X pages or Y months, which can be useful for seeing changes in the market requirements.

I regularly re-read those notes and research items, looking for common threads, opportunities, and leverage points. I start copying these into a summary at the top of the shared notes document, and I use them to produce more structured requirements docs.

I need a Market Requirements Document (MRD): What is the problem, what industries are affected, how much money is available, what are the titles and responsibilities of the Champions, Gatekeepers, and Buyers?

I need a Product Requirements Document (PRD): What would we build for that market? What features would it need, and who would those features serve? I usually write up enough high level features for two or three major releases before I start trying to decide what might get done.

For a small project I‘ll combine the MRD and PRD. The PRD will be used to produce JIRA epics and stories. This means rewriting and converting from tool to tool, which means doing the creative work of refining, sifting, correlating, synthesizing, and sorting ideas.

The development team is introduced to these drafts as well, and we start to refine them together. Whiteboards, wireframes, and flowcharts start happening here. Maybe some prototype code.

I rewrite the epics and stories of the PRD in greater detail every time I touch them. I also clone them, move them from story to epic, throw them away and start over. Tickets are free, roadmaps are predictive estimates, and the backlog is a playground.

Change tracking, prioritization, progress reporting, workload sizing, and release estimation are driven from JIRA data, often processed in Splunk or Excel for presentation in [PowerPoint|Keynote].

Idea accountability and closing the loop with customers is not tracked in JIRA. That’s my responsibility to take care of, which I do by reviewing the customer notes document whenever I have contact with the customer or their sales team.

The system I suggest requires a lot of work. The PM must open themselves to as many sources of input as possible and work to reduce the firehose to sensible, high-leverage ideas for engineering to implement.

Centralization is critical so that the PM’s work is visible and can be taken over by another PM. Some sort of tool helps, but the specific tool chosen doesn’t matter as long as it doesn’t get in the way. The more workflow a tool suggests, the more it’s going to get in the way.

Moving ideas from tool to tool at each stage is actually very helpful. Putting a technical barrier between input and output that requires human brainpower to push things through is analogous to transcribing from longhand notes to an essay in a text editor.

People are excellent at doing all sorts of creative work, but they’re also excellent at avoiding work and justifying results. Getting work done requires forming useful habits, and critically rewriting your own work is one of those.

There are a number of complaints that come up with this conversation, which I synthesize to “that process can’t scale!” As I understand the argument: “As a PM I want to offload portions of the workflow to an automatic system or a process that other teams do so that I can do more”.

Or the more pernicious: “As a PM I want to point other people at automated systems so that they don’t have to interact with me to get what they want”. As an introvert, I do sympathize with this position, but not very much, so let’s drop that one.

The work of doing product management is not automation friendly. Software is eating the world, and as product managers we are the chefs preparing that meal. It’s only natural to look at our own job, see a process and think “that can be automated too!”

It’s not true though, because software can only eat the things that are expressed in numbers without losing context. The computer can’t understand context. People have to do it, so the product opportunity is in personal productivity tools, not team aids.

Handling scale as a PM means managing the number and scope of projects, changing the balance of anecdata and metrics, and avoiding all the easy work that blocks this process with a false feeling of accomplishment.

Saturday, August 25, 2018

English degree, Tech Career

Also tweeted.

What is the career value of an English degree in a technology career? 

I graduated from UC Berkeley with a degree in English Literature, focus on American poetry. My thesis was on Emily Dickinson. I’ve been working in information technology ever since. So I’m biased on this subject.

I’m hardly the only person with this kind of career path, and I realize how lucky I’ve been. I didn’t always, though. I faceplanted on an interview softball about my education several years ago.

I was interviewing with a rather prestigious company who was riding an amazing wave. They’d recruited me, so I was feeling good. Then: “Tell me about your English degree” and I started digging a hole. I had unwittingly internalized the view of humanities as useless.

Lesson 0: Have something positive to say about every word in your resume. Even if it’s something that your industry stereotypes.

Now hopefully less stupid, I have some thoughts about what the degree has done for me. The English degree taught me to read critically, synthesize information, and write clearly. I use these skills all day, every day.

In the classical education paradigm, this was called Logic and Rhetoric. (https://en.wikipedia.org/wiki/Trivium). @ckindel has posted an excellent update of this mental toolbox here http://ceklog.kindel.com/2018/07/08/tools-to-achieve-clarity-of-thought/ (the linked articles are all worthwhile).

There are two power tools learned in the English degree that are not directly discussed in that: academic papers, and poetry.

Economic expression of ideas in standard persuasive forms is key to good writing. An academic paper’s standard form provides two leverage points. It helps you write. Writer’s block is defeated by words on paper, and the form gives you words, showing the gaps that remain. 

Form helps the reader accelerate. Look at the humble 5 paragraph essay. Thesis, three arguments, conclusion. Tell ‘em, show ‘em, tell ‘em again. A skilled reader processes this in seconds, while a less structured rant is a more challenging experience. 

Academic papers also ask the author to focus on quality. Because each sentence will be questioned, each sentence must carry its weight. The Twitter editor adds a similar value to one’s writing.

In a 10 page thesis or a 100 page dissertation, a product requirements document, or an engineering design discussion, writing has a job and every word is in service to that job.

When you take an English degree, you’re writing several 10 page papers a week, and working on longer papers at the same time. This is quite similar to the workload for product managers.

Economic expression of emotion via poetry is the second power tool of the English major. A strictly rational approach to the requirements above is acceptable or even desirable in some contexts, but overall insufficient. 

@brenebrown writes, "We want to believe that we are thinking, rational people and on occasion tangle with emotion, flick it out of the way, and go back to thinking. That is not the truth. The truth is we are emotional beings who on occasion think."

Because a PM must communicate with humans, we need to be able to engage emotions with our language. “Maximizing emotional load of each word through musical awareness” is a rather soulless description of poetry, but it’ll do for function.

Like the mental habits of engineering for scale... these are part of a toolbox that the English degree provides. Reading thousands of pages per week has turned out to be useful in modern life as well.

Sunday, August 12, 2018

Merger & Acquisition Failures

Also available on Twitter.

Sometimes when two companies love each other very much... Companies buy other companies. Maybe it’s to pump marketshare or shut down competition. Sounds like a boring transaction as long as regulators don’t mind. Or maybe it’s to get technology and people.

Those are exciting projects, full of hope and dreams. And yet, so much of the time the technology is shelved and the people quit. Why is that? Because acquisition alters the delicate chemistry of teamwork and product-market-fit.

Maybe the acquired company continues to operate as a wholly own subsidiary and little changes for a long time. Or maybe the acquired company is quickly integrated into the mother ship so that all that goodness can be utilized ASAP.

I’m no expert on corporate acquisition, but I’ve had a front row seat for some of these. A few of them could even be called successful. Let’s generalize heavily from after-hours conversations that clearly have nothing to do with any of my previous employers.

The fateful day has come! Papers are signed, the Tent of Secrecy is taken down, and the press release is out. Acquiring teams are answering questions and testing defenses. They’ve got to retain key team members, integrate technology, and align the sales teams before blood spills.

At the same time, they’ve drawn political attention and are certainly facing some negative buzz. In a really challenging environment, they’re also facing coup attempts. M&A is as hard as launching companies, so it’s easy for others to snipe at.

Meanwhile, acquired teams are all over the emotional map. Excited, sad, suddenly rich, furious at how little they’re getting. Are friends now redundant, immediately or “soon”? Who's reviewing retention plans on a key team members list, and who's not: it won’t be private for long.

After an acquisition one might assume headhunter attention. When better to check in on someone’s emotional state and promise greener grass? Churn commences. The network starts to buzz, people are distracted, and some leave.  Of course, lots stay!

And maybe the folks that stay for retention bonus are a little more conservative. Bird in the hand, part of a bigger safer company, and there’s so much opportunity because everyone else in the big company is beat down and burned out. Sour like old pickles.

It seems that there’s more engineers and salespeople that make it through the acquisition. The acquired executives disappear into other projects or out of the company. Resting and vesting, pivoted into something new, but unlikely they’re still guiding their old team. Who is?

The middle managers who stay all drift up to fill recently vacated executive slots, where they either grow or flame out. Their attention is diffused into new teams and new problems. PMO steps in heavily, since the acquired company didn’t have one. Devs are largely on their own.

Nature abhors a vacuum, and someone steps in to fill this one. With luck they maintain product-market-fit and mesh with internal requirements. Or they fail and introduce interpersonal conflict to boot. Is this is the end of the acquisition road? Or does engineering lead itself?

There’s bugs to fix, and everyone knows what the old company was planning. The acquiring company has lots of new requirements too, like ADA compliance and Asian language support. Who needs customer and market input anyway? After a while the old roadmap is consumed.

There’s layoffs of course, and new requirements keep coming. “Please replace these old frameworks with a more modern workalike.” “Please rescue this customer with a technical fix for their social problems.” “Please do something about our new global vaporware initiative.”

The challenge of doing more with less is sort of fun. There’s some friends left. And the acquired person feels big. They talk with important customers and executives and they can spend more time on their home life. More folks have left, the remaining acquirees are authorities.

But the recruiter calls stopped. A temporary market slowdown, or is it personal? Can they get a job outside of the big company anymore? So they reach out and do a few interviews, pass up on some lower-paying opportunities, get shot down by something cool.

Better take more projects in the big company. By now the tech they came in with has lost its shine. Put a cucumber in brine long enough and it’s just another pickle. They’re helping new engineers with weird tools and pursuing new hobbies.

The street cred of being from an acquisition is gone, and they’re neck deep in big dull projects. “Lipstick this pig until it looks fashionable.” “Squeeze more revenue from the customer base.” “Tie yourself into the latest silly vaporware.”

Or even “Propose an acquisition to enter a new market with.” If this is success, who needs competition? When the game is no fun but you have to keep playing, people will change the rules — and that is whycome politics suck. Good luck out there, but don't stay too safe.

Sunday, July 22, 2018

Tools and the Analyst

also posted as a Twitter thread

Let’s say I’m responsible for a complex system. I might have a lot of titles, but for a big part of my job I’m an analyst of that system. I need tools to help me see into it and change its behavior. As an analyst with a tool, I have some generic use cases the tool needs to meet.

  • Tell me how things are right now
    • What is the state?
    • Is it changing?
  • Tell me how things have been over time?
    • What is the state?
    • Is there any change in progress?
    • Is the state normal?
    • Is the state good/bad/indifferent/unknown?
  • Tell me what I'm supposed to know
    • What is important?
    • What should I mitigate?
    • What can I ignore?
  • Alert me when something needs me
    • What is the problem?
    • What is the impact?
    • Are there any suggested actions?
  • How much can I trust this tool?
    • Do I see outside context changes reflected in it?
    • How does the information it gives me compare with what I see in other tools?
  • How much can I share this tool?
    • Do I understand it well enough to teach it?
    • Can I defend it?
As a generic set of use cases, this is equivalent to the old sysadmin joke, “go away or I will replace you with a small shell script”. A tool that can provide that level of judgement is also capable of doing the analyst’s job. So a lot of tools stop well short of that lofty goal and let the user fill in a great deal.

  • Alert me when a condition is met
  • Tell me how things are right now
  • Tell me how things have been over time?

Maybe the analyst can tie the tool’s output to something else that tries to fill in more meaningful answers, or maybe they just do all of that in their own heads. This is fine at the early adopter end of Geoffrey Moore’s chasm, and many vendors will stare blankly at you if you ask for more.

After all, their customers are using it now! And besides, how could they add intelligence, they don’t know how you want to use their tool? They don’t know your system. But let’s get real, the relationships between customers, vendors, tools, analysts, and systems are not stable.

The system will change, the customer’s goals will change, and the analyst won’t stay with this tool. Even if everything else stays stable, experienced analysts move on to new problems and are replaced by new folks who need to learn.

The result is that tools mature and their user communities shift, growing into mainstream adopters and becoming a norm instead of an outlier. By the time your tool is being introduced to late adopters, it needs to be able to teach a green analyst how to do the job at hand.

How’s that going to work? Here’s a few ideas:

0: ignore the problem. There’s always a cost:benefit analysis to doing something, and nature abhors a vacuum. If a vendor does nothing, perhaps the customer will find it cost-effective to solve the problem instead.
Look at open source software packages aimed into narrow user communities, such as email transfer. Learning to use the tools is a rite of passage to doing the job. This only works because of email-hosting services though.
Because email is generally handled by a 3rd party today, the pool of organizations looking at open source mail transfer agents is self-selected to shops that can take the time to learn the tools.

1: ship with best practices. If the product is aimed at a larger user community, ignoring the problem won't work well. Another approach is to build in expected norms, like the spelling and grammar checkers in modern office suites.
An advanced user will chafe and may turn these features off, but the built-in and automated nature has potential to improve outcomes across the board. That potential is not always realized though, as users can still ignore the tool’s advice.
An outcome of embarrassing typos is one thing, but an outcome of service outage is another. Since there is risk, vendors are incentivized to provide anodyne advice and false-positive prone warnings, which analysts rapidly learn to ignore.

2: invest into a services community and partner ecosystem. No one can teach well as a person who learned first. Some very successful organizations build passionate communities of educators, developers, and deployment engineers.
Organizations with armies of partners have huge reach compared with more narrowly scoped organizations. However, an army marches on its stomach and all these people have to be paid. The overall cost and complexity for a customer goes up in-line with ecosystem size.

3: invest into machine intelligence. If the data has few outside context problems, a machine intelligence approach can help the analyst answer qualitative questions about the data they’re seeing from the system. Normal, abnormal: no prob! Good, bad: maybe.
It takes effort and risk is not eliminated, so it’s best to think of this as a hybrid between the best-practice and services approaches. Consultants need to help with the implementation at any given customer, and the result is a best practice that needs regular re-tuning.

Perhaps we are seeing a reason why most technology vendors don’t last as independent entities very long.

Monday, July 2, 2018

Dev and Test with Customer Data Samples

The short answer is don’t do it. Accepting customer data samples will only lead to sorrow.


  • Not even if they gave it to you on purpose so you can develop for their use case or troubleshoot their problem.
  • The person who wants you to do that may not be fully authorized to give you that data and you may not be fully authorized to keep it. What if both of you had to explain this transaction to your respective VPs? In advance of a press conference?
  • Even if the customer has changed the names, it’s often possible or even easy to triangulate from outside context and reconstruct damaging data. That’s if they did it right, turning Alice into Bob consistently. More likely they wiped all names and replaced them with XXXXXX, in which case you may be lucky, but probably have garbage instead of data.
  • Even if everyone in the transaction today understands everything that is going on… Norton’s Law is going to get you. Someone else will find this sample after you're gone and do a dumb.
  • Instead of taking data directly, work with your customer to produce a safe sample.


REDUCE THE DATA

At first, you may look at a big data problem as a Volume or Velocity issue, but those are scaling issues that are easily dealt with later. Variety is the hardest part of the equation, so it should be handled first.

Are you working with machine-created logs or human-created documents? 


Logs


  1. Find blocks of data that will trigger concern. Since we do not care about recreating a realistic log stream, we can chose to focus only on these events. If we do want the log stream to be realistic for a sales demo, we will need to consider the non-concerning events too, but finding the parts with sensitive data helps you prioritize.

  2. Identify single line events.
    • TIME HOST Service Started Status OK
  3. Identify multi-line transactional events.
    • TIME HOST Received message 123 from Alice to Bob
    • TIME HOST Processed message 123 Status Clean
    • TIME HOST Delivered message 123 to Bob from Alice
  4. Copy the minimum amount of data necessary to produce a trigger into a new file.


Documents


  1. Find individual blocks of data that will trigger concern: names, identifiers, health information, financial information.

  2. Find patterns and sequences of concerning data. For instance a PDF of a government form is a recognizable container of data, so the header is sufficient indicator that you’ve got a problem file. A submitted Microsoft Word resume might have any format, though. 

  3. Copy the minimum amount of data necessary to produce a trigger into a new file.


TOKENIZE THE DATA

Simply replacing the content of all sensitive data fields with XXXXXX will work for a single event, but it destroys context. Transactions make no sense, interactions between systems make no sense, and it’s impossible to use the data sample for anything but demoware. If you need to produce data for developing or testing a real product, you need transactions and interactions.


  1. In the new files, replace the sensitive data blocks with tokens. 
    1. Use a standard format that can be clearly identified and has a beginning and end, such as ###VARIABLE###.
    2. Be descriptive with your variables: ###USER_EMAIL### or ###PROJECT_NAME### or ###PASSWORD_TEXT### make more sense than ###EMADDR###, ###PJID###, or ###PWD###. Characters are cheap. Getting confused is costly.
    3. Note that you may need to use a sequence number to distinguish multiple actors or attributes. For example, an email transaction has a minimum of two accounts involved, so ###EMAIL_ACCOUNT### is insufficient. ###EMAIL_ACCOUNT_1### and ###EMAIL_ACCOUNT_2### will produce better sample data. 

  2. Use randomly generated text or lorem ipsum to replace non-triggering data as needed.
 Defining "as needed" can seem more art than science, but as a rule of thumb it's less important in logs than documents.

GENERATE THE TEST FILES

Raw samples as above are now suitable for processing in a tool like EventGen or GoGen. This allows you to safely produce any desired Volume, Velocity, and Variety of realistic events without directly using customer data or creating a triangulation problem.

Sunday, June 24, 2018

From Enterprise to Cloud, Badly

Also posted as a Twitter thread

Sometimes enterprise companies try to go cloudy for bad reasons with bad outcomes. I’m going to talk about those instead of well-planned initiatives with good outcomes because it’s more fun.

So you’ve got an enterprise product and you’re facing pressure to go to the Cloud… after all, that’s regular recurrent revenue instead of up-front tranches, and très chic as well! Unfortunately, there are more ways to fail than there are to succeed. Let’s review.

What is the real motivation? Is it to better serve existing customers or to capture new customers? Companies can rationalize an emotional desire to do something with anecdotal data from cherry-picked customers. It is sad to see reality reasserted by subsequent events.

Where is the data suggesting that existing customers need this, and are you really sure that you’ve captured what they want and planned a move that will help them? Have you all considered that you’re changing the compliance picture for both parties?

Maybe the answer is that you don’t care about existing customers and will leave them on existing product while pursuing new customers. Assuming that the product market fit of your current successful company can be recreated in a new market on a new platform shows some hubris.

If it’s for a new market, what is the plan to break into that market? Does your brand help you there at all? Do you have people who can do it? Is there a plan to avoid losing your existing market? Are you building a new team, or adding this job to your existing one?

Selling enterprise solutions into a market that doesn’t already use them is a very tough job, starting with explaining why there is a problem and ending with why you have a solution. Your enterprise company solves a problem. Does this cloud market have that problem too?

Are you planning to continue producing the on-prem variant or going cloud-only? The textbook answer is to build for one and back-port to the other, which is terrible. The resulting products are suboptimal for one or both environments.

Option: build for on-prem first. Your cloud product is customer one for a new set of features you’re adding to on-prem. Fully automated configurability via API (Application Programming Interface) and/or CLI (Command Line Interface) and/or conf files. Fully UX'ed (User Experience) configuration for admin CRUD (Create Read Update Delete) functions. Really solid RBAC (Role Based Access Control) and AAA (Authentication, Authorization, and Access).

The features required by a Cloud deployment are interesting to the largest enterprise customers, and not so important to anyone else. They want those features because they’re operating a service for internal teams to operate. You’re making your product ready for use by an MSP (Managed Service Provider).

In fact, you probably are one now. There’s a hosted variant of your product, and a team operating it, staffed with non-developers. It's a services company inside of a products company. Welcome to the SHATTERDOME.

This sucks. Everything’s a ticket, ops hates your product, your customers hate ops, and the MSP is not as profitable as the pure software side. Worst, your best customers have jumped into it with both feet because you're owning their ops problems now. Suboptimal.

Another option is to select partners to run this MSP business, but that produces so many weird problems. Loss of customer control, coopetition, revenue recognition, channel stuffing, product release planning, blame games at every step of customer acquisition and retention.

A service-partner MSP model also gives competitors and analysts a pain point to focus on and open weird new threat surfaces. But it keeps your software product company’s motivations and responsibilities crystal clear! That clarity may not be as obvious to the customer.

That option sucks, so let’s do cloud first and back-port to on-prem. Now you’re building a work-alike second version of your old product, new code with parallel functionality. Contractually you’ve got to keep the old product going, but the A Team has moved on. Hire accordingly.

Cloud first is great! There's all this IaaS (Infrastructure as a Service) stuff you can use! Which doesn't exist on-prem, and differs between platforms. That's okay, there's all these abstraction layers! Which add expense and hurt performance. Do this enough and you're shipping the old product in a VM (Virtual Machine).

Wait a minute. The goals of cloudy micro-service engineering are to enable cost-optimization via component standardization, load-shifting, and multi-tenancy. So we're not going to ship our Cloud product until it does those! Good plan.

Functions are costing money when they’re used, customers are getting what they need, done. There’s a lot more code and companies between your functionality and the hardware though, and that means more opportunities for failure. Say hi to Murphy.

Object-oriented programming versus functional programming comes to mind. Every micro-service is a black box with APIs describing how it fits into the larger system. Harder to troubleshoot, easier to scale, costly to design and maintain. The right tool for some jobs.

For instance, it’s brilliant for sharing resources between low utilization customers. It’s fabulous for ensuring that AAA and RBAC are designed correctly at the foundation. It’s just right for scaling dependent processes with uneven requirements.

Does that means it’s better for streaming or batching? YMMV (Your Mileage May Vary). What problems can it solve? Anything, given sufficient time and effort. Just like OOP vs FP, the engineering isn’t very important as long as it’s in service to a business need.

Multi-tenancy assumes large numbers of low-interaction customers. Someone who doesn’t use your product heavily and will self-service their own management needs is a perfect fit, and if you have lots of those someones you can make a profit.

If that isn’t what your business looks like, it’s silly to hope new customers will appear fast enough for the old business. If you customers are migrating to cloud for SaaS, then you do want this new Cloudy product so you can follow them. Otherwise, why would you do it.

36 person-months later… “The new product is misaligned with our customers!” “Fix sales and marketing?” “We could go back to the drawing board.” “If only we had the same architecture and customer base as a company with a different product market fit!”

Disrupting your own product before someone else does is fine, unless you're doing it too early. It might make more sense to just invest in some one else’s company and revisit this idea later.

Monday, June 4, 2018

It's not a platform without partners

Also posted as a Twitter thread

What are the major decisions that a platform needs to make in order to balance incentivizing development vs. maintaining quality and control over their 3rd party app marketplace?

Let's look at this on three scales, in which the right answer for a given team is somewhere between two unrealistic and absolutist extremes.

First decision scale: Allow freeform development or provide a limited toolkit.

A lot of platform vendors assume that everyone building things for them is a developer, because they developed something. These vendors plan for developer support that they can't build, or build stuff that goes unused.

If the people solving problems on a platform are making a living by selling software that they wrote, they're developers. The platform should not proscribe their toolchain choices. They need a freeform environment that lets them do anything, and they don't want safety lines.

They need thorough documentation far more than they need anything else. Seriously, just direct resources to development and tech writing.

If the people solving problems on a platform are making a living by selling or using software that someone else wrote, they are not developers. Call them consultants, integrators, PS, admins, engineers, or architects instead.

Consultants who develop have different needs than developers who consult. They may want to teach their customers to fish, or they may want to be fishmongers, but they're not often trying to create new seafood concepts.

They want an easy way to connect components together to increase value. They want an easy and popular language with lots of community support, libraries of common functions, and simple guardrails that keep things safe and reasonable.

Second decision scale: Allow content dependencies and component reuse or force monolithic apps.

The chaos of extensibility, DLL Hell, a rich ecosystem of shared responsibility and global namespace? A platform that enables connectivity and dependency opens the door to expansion, competition, and growth, at the cost of instability.

The control of stability, bloated monoliths, a statically linked walled garden of singleton solutions? A platform that encourages safety and stability is easy to depend on, at the cost of expensive, repetitive efforts to reach limited solutions.

To consider this difference with more realism and less extremism, compare the Win32 ecosystem of the aughts with the IOS ecosystem of the teens (and note that the former added monolithic containers while the latter added sharing interfaces).

Third decision scale: Allow partners to write closed source or force them to be open source.

The verbs in that are not accidental. A platform has to offer a least some support for closed source, by keeping the source away from the user. Perhaps it goes farther and supports licensing, or brokers purchase for the partners, or not.

Of course, a partner can always choose to post their source code. If the platform only supports an scripting language and the user can just read the JS or PY files, then the partner doesn't have a choice: it's open source.

This scale decides if the possible business models are based on selling software, or selling services. Another way to say that: partners in the ecosystem can grow exponentially at high risk, or linearly at lower risk.

I've matrixed these scales and I don't have great examples for all of the possible combinations, but I do have a suspicion that going above the Bill Gates Line needs freeform development... more thought on that later.

Monday, May 7, 2018

GDPR is great for Facebook and Google

Also posted as a Twitter thread

GDPR is going to be great for Facebook and Google.

"Over time, all data approaches deleted, or public." -- Norton's Law. See http://idlewords.com/talks/haunted_by_data.htm by Pinboard and https://www.youtube.com/watch?v=NKpuX_yzdYs by AndrewYNg for more background and viewpoints.

Picture two types of data store, public and private. If your store is private, you can use it to advantage as suggested by Andrew Ng's talk. If your store is public, everyone can use it and advantage is created in other ways.

Say a company wants to live dangerously and creates a private store of personally identifiable information about people. Say that company suffers a cybersecurity incident and the store of data becomes public. Say that there is no long term negative impact on that company.

Say that this pattern happens over and over again for many years. A sensible executive might infer that it's not so dangerous to live dangerously.

Of course I'm talking about credit card number storage in the early days of web retail, not anything modern :) For all of its issues, PCI changed the picture of credit card storage by putting real financial penalties on the problem.

Now companies either perform the minimum due diligence, with separate cardholder data environments and regular audits, or they outsource card-handling to another company that is focused on this problem.

Putting more and more credit card data into a single store obviously creates a watering hole problem, but it also allows focusing protective efforts. Overall it's a net good. Until that third party hits a rough patch, but entropy is what it is.

Since GDPR has the same impact on a broader set of personal data, it seems likely that the same outcome will eventually occur. Either protect the data yourself, or outsource the problem to a broker.

The broker needs to provide analytics tools so you can do all the market and product research you wanted the data for. It would also be handy if they'd take care of AAA, minimizing the impact of change (name, address, legal status, &c).

And who's in a great position to do all those things already? Google and Facebook.

Monday, April 16, 2018

Crash notifier

Say you're on OSX and working with some C++ software that might crash when bugs are found... and say you don't want to tail or search the voluminous logs just to know whether there's crashes to investigate. The crashes will show up in the Console application, but it destroys your performance to leave it open.

The programs I care about all have names starting with XM. This one-liner pops a notification and then opens the crash report in Console, which I can close when I'm done.

Jacks-MacBook-Pro:SCM-Framework jackcoates$ crontab -l | grep Crash
*/60 * * * * find ~/Library/Logs/DiagnosticReports -cmin -60 -name xm* -exec osascript -e 'display notification "{}" with title "Extremely Crashy!"' \; -exec open {} \;

Thursday, April 12, 2018

Where's the Product in AI?

Also posted as a Twitter thread

AI & ML products are harder than they look

AI tech is obviously overhyped, and conflating with ideas from science fiction and religion. I prefer using terms like Machine Intelligence and Cognitive Computing, just to avoid the noise. But if we strip away the most unrealistic stuff, there's some interesting paths forward.

The biggest problem is in defining strong semantic paths from the available data to valid use cases. Many approaches founder on assumptions that the data contains value, that the use case can be solved with the data, or that producer and consumer of data use terms the same way.

Given a strong data system, there is a near term opportunity to build AI-powered toolsets that help customers learn and use the data systems that are available. This is a services heavy business with tight integration to data collection and storage.

This has to be human intelligence driven and therefore services-heavy though, because the data and use cases are not similar between budget-owning organizations. There is data system similarity on low-value stories, but high-value stuff is specific to an organization.

That services work should lead to the real opportunity for cognitive computing, which is augmenting human intelligence in narrow fields. If there is room to abstract the data system, there's room to normalize customers to a tool. Then you've got a product plan, similar to SIEMs.

Put products into fields where the data exists, use cases are clear, the past predicts the future, pattern matching and cohort grouping are effective, the problem has enough value in it to justify effort, and outside context problems don't completely derail your model. Simple!

If you can describe the world in numbers without losing important context, then I can express complex relationships between the numbers.

There's a question being begged though... given a data system that successfully models, how much did the advanced system improve over a simpler approach? Is the DNN just figuring out that 95%-ile outliers are interesting?

If a problem can be solved with machine intelligence, great. If the same problem could be solved with basic statistics, that's cheaper to build, operate, and maintain. It'll be interesting to see how this all shakes out.

Update: an interesting take on this from Benedict Evans: https://www.ben-evans.com/benedictevans/2018/06/22/ways-to-think-about-machine-learning-8nefy

Update: and another from Raffael Marty: https://raffy.ch/blog/2018/08/07/ai-ml-in-cybersecurity-why-algorithms-are-dangerous/

Wednesday, April 11, 2018

Splunk Apps and Add-ons

see Twitter for discussion

What are Splunk Apps and Add-ons? What's the difference?


If you're still confused... it's not just you. The confusion roots back to fundamental disagreements on approach that are encoded into every product the company has ever shipped, so it's tough to recommend a meaningful change.


Splunk apps are folders in $SPLUNK_HOME/etc/apps. They're containers that you put splunk objects into. You can put anything in them: code, knowledge management configuration, dashboard elements, libraries, binaries, images, whatever. If you just want to put some stuff together and run it on your laptop, you're done at this point. Put things in a folder for organization. Or don't. Whatever.

If you want to distribute components in a large environment, if you want to depend on shared components, if you want to avoid huge multi-function monoliths, then you start dividing apps into different types. This is why you see the terms "App" and "Add-on" in Splunk. The App refers to the visible front-end app that a user will interact with. The Add-on refers to administrator-only components. This is where the Splexicon definitions start and stop.

There are multiple types of Add-ons. Their definitions are not entirely well established, and have come and gone in official documentation. Right now, it's here, but don't be surprised if that breaks:


Since I helped to write these definitions in the first place, I feel confident in stating what they should be. However, these rules are breached as often as they are observed, and Splunk themselves are the most likely to ignore all of this guidance. If you want to follow the best possible practice, buy Kyle Smith's book and read that. Here are the possible types:

IA: Input Add-on

This includes and configures data collection inputs only. In practice, these are rare and the functionality is usually stuffed into a TA.


TA: Technology Add-on

This includes and configures knowledge management objects. In practice, many TA's also include data collection inputs. A TA would be able to translate the field names provided by a vendor to field names expected by your users, as well as recognizing and tagging specific event types.


SA: Supporting Add-on

This includes supporting libraries and searches needed to manage a class of data. Let's say we're building a security monitor and considering whether authentication attempts seem malicious or not. An SA could include lookup and summary generators to normalize and aggregate the data from many authentication systems and ask generic questions for reporting and alerting.

  • Example: https://splunkbase.splunk.com/app/1621/ 
  • Goes on Search Heads
  • You should absolutely have savedsearches.conf
  • It would make sense to include lookups and some dashboards, prebuilt panels, modular alerts, modular visualizations
  • Some SA's include all the IA stuff mentioned above.

DA: Domain Add-on

This includes supporting libraries and searches needed to manage a domain of problems. Let's say we're considering PCI requirement 4, focused on antivirus software being present, configured, and not reporting infections. A DA might include lookup and summary generators to prepare those answers, dashboards to investigate further, and correlation searches to alert on problems.
  • Example: https://splunkbase.splunk.com/app/2897/ (the "dirty dozen" PCI requirements that can be measured from machine data are each represented with a DA)
  • Goes on Search Heads
  • You should absolutely include dashboards, prebuilt panels, modular alerts, modular visualizations
  • It would make sense to include lookups and savedsearches.conf
And so finally, the App.

App

The front end that ties it all together and makes it usable. If it's done well, users have no idea everything before this was ever involved. This goes on search heads only.


Wednesday, March 21, 2018

Windex: Find Splunk apps that have index time operations

If a Splunk app has index-time operations, it has to be installed on the first heavy forwarder or indexer to perform those operations on the data that's coming in. If it doesn't have those operations, then it only needs to be installed on the search head to perform its search time operations on the data that's found.

Simple right?

There is no comprehensive list of index-time operations.


So a few years ago I got annoyed after asking for such a list for the hundredth time or so, and I banged out a script that would answer the question. One caution is that there might be new index time operations since I wrote the script.

#!/bin/bash

# Script to figure out if index-time extractions are done.
# Run "./windex.sh | sort | uniq"
# Note that Bash is required.

# Online at https://pastebin.com/JVPsqcCV

# TODO: command line argument to set path instead of hard-coding ./splunk/etc/apps
# TODO: print the offending line number too?

echo "These add-ons have index-time field extractions."
echo "================================================"

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setadefaulthostforaninput
# Add-on sets host field.

echo "-----------------------------"
echo "Add-ons which set host field:"
echo "-----------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old
## Question at hand ##
#| xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#"
## the resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old | xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq


# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Bypassautomaticsourcetypeassignment
# Add-on sets sourcetype field.

echo "---------------------------------------------------------------------------"
echo "Add-ons which set sourcetype field (ignoring the old school eventgen ones):"
echo "---------------------------------------------------------------------------"

## Sets sourcetype at all ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v .\old
## Question at hand ##
#| xargs egrep '^sourcetype|sourcetype::' | grep -v "#"
## Resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq | sort

## Sets sourcetype for the old school eventgen ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old
## Question at hand ##
# | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype
## the resulting list of add-ons ##
# | awk '{FS="/"; print $4}'| uniq

## In the first list but not in the second list ##
# comm -23 <(list1) <(list2)

comm -23 <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs egrep '^sourcetype|sourcetype::' | grep -v "#" | awk '{FS="/"; print $4}'| sort | uniq) <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype | awk 'BEGIN {FS="/"}; {print $4}' | sort | uniq)

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureindex-timefieldextraction
# Add-on uses TRANSFORMS- statement in props.conf.

echo "-------------------------------------------------"
echo "Add-ons which use an explicit TRANSFORMS- stanza:"
echo "-------------------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old
## Question at hand ##
# | xargs grep -e ^TRANSFORMS- | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TRANSFORMS- | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Extractfieldsfromfileswithstructureddata
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on uses indexed extractions

echo "--------------------------------------"
echo "Add-ons which use Indexed Extractions:"
echo "--------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^INDEXED_EXTRACTIONS -e FIELD_DELIMITER | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs egrep '^INDEXED_EXTRACTIONS|FIELD_DELIMITER' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Handleeventtimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/HowSplunkextractstimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on sets timestamp

echo "--------------------------------"
echo "Add-ons which assign timestamps:"
echo "--------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^TIME_FORMAT | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TIME_FORMAT | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureeventlinebreaking
# Add-on sets line breaking

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "--------------------------------------"
echo "Add-ons which configure line breaking:"
echo "--------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Abouteventsegmentation
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setthesegmentationforeventdata
# Add-on sets segmentation behavior

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^SEGMENTATION | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "-------------------------------------------"
echo "Add-ons which configure event segmentation:"
echo "-------------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^SEGMENTATION | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "==============================================================="
echo "That's all as of 6.3 (Ember). Future Splunks may change things."