Sunday, August 12, 2018

Merger & Acquisition Failures

Also available on Twitter.

Sometimes when two companies love each other very much... Companies buy other companies. Maybe it’s to pump marketshare or shut down competition. Sounds like a boring transaction as long as regulators don’t mind. Or maybe it’s to get technology and people.

Those are exciting projects, full of hope and dreams. And yet, so much of the time the technology is shelved and the people quit. Why is that? Because acquisition alters the delicate chemistry of teamwork and product-market-fit.

Maybe the acquired company continues to operate as a wholly own subsidiary and little changes for a long time. Or maybe the acquired company is quickly integrated into the mother ship so that all that goodness can be utilized ASAP.

I’m no expert on corporate acquisition, but I’ve had a front row seat for some of these. A few of them could even be called successful. Let’s generalize heavily from after-hours conversations that clearly have nothing to do with any of my previous employers.

The fateful day has come! Papers are signed, the Tent of Secrecy is taken down, and the press release is out. Acquiring teams are answering questions and testing defenses. They’ve got to retain key team members, integrate technology, and align the sales teams before blood spills.

At the same time, they’ve drawn political attention and are certainly facing some negative buzz. In a really challenging environment, they’re also facing coup attempts. M&A is as hard as launching companies, so it’s easy for others to snipe at.

Meanwhile, acquired teams are all over the emotional map. Excited, sad, suddenly rich, furious at how little they’re getting. Are friends now redundant, immediately or “soon”? Who's reviewing retention plans on a key team members list, and who's not: it won’t be private for long.

After an acquisition one might assume headhunter attention. When better to check in on someone’s emotional state and promise greener grass? Churn commences. The network starts to buzz, people are distracted, and some leave.  Of course, lots stay!

And maybe the folks that stay for retention bonus are a little more conservative. Bird in the hand, part of a bigger safer company, and there’s so much opportunity because everyone else in the big company is beat down and burned out. Sour like old pickles.

It seems that there’s more engineers and salespeople that make it through the acquisition. The acquired executives disappear into other projects or out of the company. Resting and vesting, pivoted into something new, but unlikely they’re still guiding their old team. Who is?

The middle managers who stay all drift up to fill recently vacated executive slots, where they either grow or flame out. Their attention is diffused into new teams and new problems. PMO steps in heavily, since the acquired company didn’t have one. Devs are largely on their own.

Nature abhors a vacuum, and someone steps in to fill this one. With luck they maintain product-market-fit and mesh with internal requirements. Or they fail and introduce interpersonal conflict to boot. Is this is the end of the acquisition road? Or does engineering lead itself?

There’s bugs to fix, and everyone knows what the old company was planning. The acquiring company has lots of new requirements too, like ADA compliance and Asian language support. Who needs customer and market input anyway? After a while the old roadmap is consumed.

There’s layoffs of course, and new requirements keep coming. “Please replace these old frameworks with a more modern workalike.” “Please rescue this customer with a technical fix for their social problems.” “Please do something about our new global vaporware initiative.”

The challenge of doing more with less is sort of fun. There’s some friends left. And the acquired person feels big. They talk with important customers and executives and they can spend more time on their home life. More folks have left, the remaining acquirees are authorities.

But the recruiter calls stopped. A temporary market slowdown, or is it personal? Can they get a job outside of the big company anymore? So they reach out and do a few interviews, pass up on some lower-paying opportunities, get shot down by something cool.

Better take more projects in the big company. By now the tech they came in with has lost its shine. Put a cucumber in brine long enough and it’s just another pickle. They’re helping new engineers with weird tools and pursuing new hobbies.

The street cred of being from an acquisition is gone, and they’re neck deep in big dull projects. “Lipstick this pig until it looks fashionable.” “Squeeze more revenue from the customer base.” “Tie yourself into the latest silly vaporware.”

Or even “Propose an acquisition to enter a new market with.” If this is success, who needs competition? When the game is no fun but you have to keep playing, people will change the rules — and that is whycome politics suck. Good luck out there, but don't stay too safe.

Sunday, July 22, 2018

Tools and the Analyst

also posted as a Twitter thread

Let’s say I’m responsible for a complex system. I might have a lot of titles, but for a big part of my job I’m an analyst of that system. I need tools to help me see into it and change its behavior. As an analyst with a tool, I have some generic use cases the tool needs to meet.

  • Tell me how things are right now
    • What is the state?
    • Is it changing?
  • Tell me how things have been over time?
    • What is the state?
    • Is there any change in progress?
    • Is the state normal?
    • Is the state good/bad/indifferent/unknown?
  • Tell me what I'm supposed to know
    • What is important?
    • What should I mitigate?
    • What can I ignore?
  • Alert me when something needs me
    • What is the problem?
    • What is the impact?
    • Are there any suggested actions?
  • How much can I trust this tool?
    • Do I see outside context changes reflected in it?
    • How does the information it gives me compare with what I see in other tools?
  • How much can I share this tool?
    • Do I understand it well enough to teach it?
    • Can I defend it?
As a generic set of use cases, this is equivalent to the old sysadmin joke, “go away or I will replace you with a small shell script”. A tool that can provide that level of judgement is also capable of doing the analyst’s job. So a lot of tools stop well short of that lofty goal and let the user fill in a great deal.

  • Alert me when a condition is met
  • Tell me how things are right now
  • Tell me how things have been over time?

Maybe the analyst can tie the tool’s output to something else that tries to fill in more meaningful answers, or maybe they just do all of that in their own heads. This is fine at the early adopter end of Geoffrey Moore’s chasm, and many vendors will stare blankly at you if you ask for more.

After all, their customers are using it now! And besides, how could they add intelligence, they don’t know how you want to use their tool? They don’t know your system. But let’s get real, the relationships between customers, vendors, tools, analysts, and systems are not stable.

The system will change, the customer’s goals will change, and the analyst won’t stay with this tool. Even if everything else stays stable, experienced analysts move on to new problems and are replaced by new folks who need to learn.

The result is that tools mature and their user communities shift, growing into mainstream adopters and becoming a norm instead of an outlier. By the time your tool is being introduced to late adopters, it needs to be able to teach a green analyst how to do the job at hand.

How’s that going to work? Here’s a few ideas:

0: ignore the problem. There’s always a cost:benefit analysis to doing something, and nature abhors a vacuum. If a vendor does nothing, perhaps the customer will find it cost-effective to solve the problem instead.
Look at open source software packages aimed into narrow user communities, such as email transfer. Learning to use the tools is a rite of passage to doing the job. This only works because of email-hosting services though.
Because email is generally handled by a 3rd party today, the pool of organizations looking at open source mail transfer agents is self-selected to shops that can take the time to learn the tools.

1: ship with best practices. If the product is aimed at a larger user community, ignoring the problem won't work well. Another approach is to build in expected norms, like the spelling and grammar checkers in modern office suites.
An advanced user will chafe and may turn these features off, but the built-in and automated nature has potential to improve outcomes across the board. That potential is not always realized though, as users can still ignore the tool’s advice.
An outcome of embarrassing typos is one thing, but an outcome of service outage is another. Since there is risk, vendors are incentivized to provide anodyne advice and false-positive prone warnings, which analysts rapidly learn to ignore.

2: invest into a services community and partner ecosystem. No one can teach well as a person who learned first. Some very successful organizations build passionate communities of educators, developers, and deployment engineers.
Organizations with armies of partners have huge reach compared with more narrowly scoped organizations. However, an army marches on its stomach and all these people have to be paid. The overall cost and complexity for a customer goes up in-line with ecosystem size.

3: invest into machine intelligence. If the data has few outside context problems, a machine intelligence approach can help the analyst answer qualitative questions about the data they’re seeing from the system. Normal, abnormal: no prob! Good, bad: maybe.
It takes effort and risk is not eliminated, so it’s best to think of this as a hybrid between the best-practice and services approaches. Consultants need to help with the implementation at any given customer, and the result is a best practice that needs regular re-tuning.

Perhaps we are seeing a reason why most technology vendors don’t last as independent entities very long.

Monday, July 2, 2018

Dev and Test with Customer Data Samples

The short answer is don’t do it. Accepting customer data samples will only lead to sorrow.


  • Not even if they gave it to you on purpose so you can develop for their use case or troubleshoot their problem.
  • The person who wants you to do that may not be fully authorized to give you that data and you may not be fully authorized to keep it. What if both of you had to explain this transaction to your respective VPs? In advance of a press conference?
  • Even if the customer has changed the names, it’s often possible or even easy to triangulate from outside context and reconstruct damaging data. That’s if they did it right, turning Alice into Bob consistently. More likely they wiped all names and replaced them with XXXXXX, in which case you may be lucky, but probably have garbage instead of data.
  • Even if everyone in the transaction today understands everything that is going on… Norton’s Law is going to get you. Someone else will find this sample after you're gone and do a dumb.
  • Instead of taking data directly, work with your customer to produce a safe sample.


REDUCE THE DATA

At first, you may look at a big data problem as a Volume or Velocity issue, but those are scaling issues that are easily dealt with later. Variety is the hardest part of the equation, so it should be handled first.

Are you working with machine-created logs or human-created documents? 


Logs


  1. Find blocks of data that will trigger concern. Since we do not care about recreating a realistic log stream, we can chose to focus only on these events. If we do want the log stream to be realistic for a sales demo, we will need to consider the non-concerning events too, but finding the parts with sensitive data helps you prioritize.

  2. Identify single line events.
    • TIME HOST Service Started Status OK
  3. Identify multi-line transactional events.
    • TIME HOST Received message 123 from Alice to Bob
    • TIME HOST Processed message 123 Status Clean
    • TIME HOST Delivered message 123 to Bob from Alice
  4. Copy the minimum amount of data necessary to produce a trigger into a new file.


Documents


  1. Find individual blocks of data that will trigger concern: names, identifiers, health information, financial information.

  2. Find patterns and sequences of concerning data. For instance a PDF of a government form is a recognizable container of data, so the header is sufficient indicator that you’ve got a problem file. A submitted Microsoft Word resume might have any format, though. 

  3. Copy the minimum amount of data necessary to produce a trigger into a new file.


TOKENIZE THE DATA

Simply replacing the content of all sensitive data fields with XXXXXX will work for a single event, but it destroys context. Transactions make no sense, interactions between systems make no sense, and it’s impossible to use the data sample for anything but demoware. If you need to produce data for developing or testing a real product, you need transactions and interactions.


  1. In the new files, replace the sensitive data blocks with tokens. 
    1. Use a standard format that can be clearly identified and has a beginning and end, such as ###VARIABLE###.
    2. Be descriptive with your variables: ###USER_EMAIL### or ###PROJECT_NAME### or ###PASSWORD_TEXT### make more sense than ###EMADDR###, ###PJID###, or ###PWD###. Characters are cheap. Getting confused is costly.
    3. Note that you may need to use a sequence number to distinguish multiple actors or attributes. For example, an email transaction has a minimum of two accounts involved, so ###EMAIL_ACCOUNT### is insufficient. ###EMAIL_ACCOUNT_1### and ###EMAIL_ACCOUNT_2### will produce better sample data. 

  2. Use randomly generated text or lorem ipsum to replace non-triggering data as needed.
 Defining "as needed" can seem more art than science, but as a rule of thumb it's less important in logs than documents.

GENERATE THE TEST FILES

Raw samples as above are now suitable for processing in a tool like EventGen or GoGen. This allows you to safely produce any desired Volume, Velocity, and Variety of realistic events without directly using customer data or creating a triangulation problem.

Sunday, June 24, 2018

From Enterprise to Cloud, Badly

Also posted as a Twitter thread

Sometimes enterprise companies try to go cloudy for bad reasons with bad outcomes. I’m going to talk about those instead of well-planned initiatives with good outcomes because it’s more fun.

So you’ve got an enterprise product and you’re facing pressure to go to the Cloud… after all, that’s regular recurrent revenue instead of up-front tranches, and très chic as well! Unfortunately, there are more ways to fail than there are to succeed. Let’s review.

What is the real motivation? Is it to better serve existing customers or to capture new customers? Companies can rationalize an emotional desire to do something with anecdotal data from cherry-picked customers. It is sad to see reality reasserted by subsequent events.

Where is the data suggesting that existing customers need this, and are you really sure that you’ve captured what they want and planned a move that will help them? Have you all considered that you’re changing the compliance picture for both parties?

Maybe the answer is that you don’t care about existing customers and will leave them on existing product while pursuing new customers. Assuming that the product market fit of your current successful company can be recreated in a new market on a new platform shows some hubris.

If it’s for a new market, what is the plan to break into that market? Does your brand help you there at all? Do you have people who can do it? Is there a plan to avoid losing your existing market? Are you building a new team, or adding this job to your existing one?

Selling enterprise solutions into a market that doesn’t already use them is a very tough job, starting with explaining why there is a problem and ending with why you have a solution. Your enterprise company solves a problem. Does this cloud market have that problem too?

Are you planning to continue producing the on-prem variant or going cloud-only? The textbook answer is to build for one and back-port to the other, which is terrible. The resulting products are suboptimal for one or both environments.

Option: build for on-prem first. Your cloud product is customer one for a new set of features you’re adding to on-prem. Fully automated configurability via API (Application Programming Interface) and/or CLI (Command Line Interface) and/or conf files. Fully UX'ed (User Experience) configuration for admin CRUD (Create Read Update Delete) functions. Really solid RBAC (Role Based Access Control) and AAA (Authentication, Authorization, and Access).

The features required by a Cloud deployment are interesting to the largest enterprise customers, and not so important to anyone else. They want those features because they’re operating a service for internal teams to operate. You’re making your product ready for use by an MSP (Managed Service Provider).

In fact, you probably are one now. There’s a hosted variant of your product, and a team operating it, staffed with non-developers. It's a services company inside of a products company. Welcome to the SHATTERDOME.

This sucks. Everything’s a ticket, ops hates your product, your customers hate ops, and the MSP is not as profitable as the pure software side. Worst, your best customers have jumped into it with both feet because you're owning their ops problems now. Suboptimal.

Another option is to select partners to run this MSP business, but that produces so many weird problems. Loss of customer control, coopetition, revenue recognition, channel stuffing, product release planning, blame games at every step of customer acquisition and retention.

A service-partner MSP model also gives competitors and analysts a pain point to focus on and open weird new threat surfaces. But it keeps your software product company’s motivations and responsibilities crystal clear! That clarity may not be as obvious to the customer.

That option sucks, so let’s do cloud first and back-port to on-prem. Now you’re building a work-alike second version of your old product, new code with parallel functionality. Contractually you’ve got to keep the old product going, but the A Team has moved on. Hire accordingly.

Cloud first is great! There's all this IaaS (Infrastructure as a Service) stuff you can use! Which doesn't exist on-prem, and differs between platforms. That's okay, there's all these abstraction layers! Which add expense and hurt performance. Do this enough and you're shipping the old product in a VM (Virtual Machine).

Wait a minute. The goals of cloudy micro-service engineering are to enable cost-optimization via component standardization, load-shifting, and multi-tenancy. So we're not going to ship our Cloud product until it does those! Good plan.

Functions are costing money when they’re used, customers are getting what they need, done. There’s a lot more code and companies between your functionality and the hardware though, and that means more opportunities for failure. Say hi to Murphy.

Object-oriented programming versus functional programming comes to mind. Every micro-service is a black box with APIs describing how it fits into the larger system. Harder to troubleshoot, easier to scale, costly to design and maintain. The right tool for some jobs.

For instance, it’s brilliant for sharing resources between low utilization customers. It’s fabulous for ensuring that AAA and RBAC are designed correctly at the foundation. It’s just right for scaling dependent processes with uneven requirements.

Does that means it’s better for streaming or batching? YMMV (Your Mileage May Vary). What problems can it solve? Anything, given sufficient time and effort. Just like OOP vs FP, the engineering isn’t very important as long as it’s in service to a business need.

Multi-tenancy assumes large numbers of low-interaction customers. Someone who doesn’t use your product heavily and will self-service their own management needs is a perfect fit, and if you have lots of those someones you can make a profit.

If that isn’t what your business looks like, it’s silly to hope new customers will appear fast enough for the old business. If you customers are migrating to cloud for SaaS, then you do want this new Cloudy product so you can follow them. Otherwise, why would you do it.

36 person-months later… “The new product is misaligned with our customers!” “Fix sales and marketing?” “We could go back to the drawing board.” “If only we had the same architecture and customer base as a company with a different product market fit!”

Disrupting your own product before someone else does is fine, unless you're doing it too early. It might make more sense to just invest in some one else’s company and revisit this idea later.

Monday, June 4, 2018

It's not a platform without partners

Also posted as a Twitter thread

What are the major decisions that a platform needs to make in order to balance incentivizing development vs. maintaining quality and control over their 3rd party app marketplace?

Let's look at this on three scales, in which the right answer for a given team is somewhere between two unrealistic and absolutist extremes.

First decision scale: Allow freeform development or provide a limited toolkit.

A lot of platform vendors assume that everyone building things for them is a developer, because they developed something. These vendors plan for developer support that they can't build, or build stuff that goes unused.

If the people solving problems on a platform are making a living by selling software that they wrote, they're developers. The platform should not proscribe their toolchain choices. They need a freeform environment that lets them do anything, and they don't want safety lines.

They need thorough documentation far more than they need anything else. Seriously, just direct resources to development and tech writing.

If the people solving problems on a platform are making a living by selling or using software that someone else wrote, they are not developers. Call them consultants, integrators, PS, admins, engineers, or architects instead.

Consultants who develop have different needs than developers who consult. They may want to teach their customers to fish, or they may want to be fishmongers, but they're not often trying to create new seafood concepts.

They want an easy way to connect components together to increase value. They want an easy and popular language with lots of community support, libraries of common functions, and simple guardrails that keep things safe and reasonable.

Second decision scale: Allow content dependencies and component reuse or force monolithic apps.

The chaos of extensibility, DLL Hell, a rich ecosystem of shared responsibility and global namespace? A platform that enables connectivity and dependency opens the door to expansion, competition, and growth, at the cost of instability.

The control of stability, bloated monoliths, a statically linked walled garden of singleton solutions? A platform that encourages safety and stability is easy to depend on, at the cost of expensive, repetitive efforts to reach limited solutions.

To consider this difference with more realism and less extremism, compare the Win32 ecosystem of the aughts with the IOS ecosystem of the teens (and note that the former added monolithic containers while the latter added sharing interfaces).

Third decision scale: Allow partners to write closed source or force them to be open source.

The verbs in that are not accidental. A platform has to offer a least some support for closed source, by keeping the source away from the user. Perhaps it goes farther and supports licensing, or brokers purchase for the partners, or not.

Of course, a partner can always choose to post their source code. If the platform only supports an scripting language and the user can just read the JS or PY files, then the partner doesn't have a choice: it's open source.

This scale decides if the possible business models are based on selling software, or selling services. Another way to say that: partners in the ecosystem can grow exponentially at high risk, or linearly at lower risk.

I've matrixed these scales and I don't have great examples for all of the possible combinations, but I do have a suspicion that going above the Bill Gates Line needs freeform development... more thought on that later.

Monday, May 7, 2018

GDPR is great for Facebook and Google

Also posted as a Twitter thread

GDPR is going to be great for Facebook and Google.

"Over time, all data approaches deleted, or public." -- Norton's Law. See http://idlewords.com/talks/haunted_by_data.htm by Pinboard and https://www.youtube.com/watch?v=NKpuX_yzdYs by AndrewYNg for more background and viewpoints.

Picture two types of data store, public and private. If your store is private, you can use it to advantage as suggested by Andrew Ng's talk. If your store is public, everyone can use it and advantage is created in other ways.

Say a company wants to live dangerously and creates a private store of personally identifiable information about people. Say that company suffers a cybersecurity incident and the store of data becomes public. Say that there is no long term negative impact on that company.

Say that this pattern happens over and over again for many years. A sensible executive might infer that it's not so dangerous to live dangerously.

Of course I'm talking about credit card number storage in the early days of web retail, not anything modern :) For all of its issues, PCI changed the picture of credit card storage by putting real financial penalties on the problem.

Now companies either perform the minimum due diligence, with separate cardholder data environments and regular audits, or they outsource card-handling to another company that is focused on this problem.

Putting more and more credit card data into a single store obviously creates a watering hole problem, but it also allows focusing protective efforts. Overall it's a net good. Until that third party hits a rough patch, but entropy is what it is.

Since GDPR has the same impact on a broader set of personal data, it seems likely that the same outcome will eventually occur. Either protect the data yourself, or outsource the problem to a broker.

The broker needs to provide analytics tools so you can do all the market and product research you wanted the data for. It would also be handy if they'd take care of AAA, minimizing the impact of change (name, address, legal status, &c).

And who's in a great position to do all those things already? Google and Facebook.

Monday, April 16, 2018

Crash notifier

Say you're on OSX and working with some C++ software that might crash when bugs are found... and say you don't want to tail or search the voluminous logs just to know whether there's crashes to investigate. The crashes will show up in the Console application, but it destroys your performance to leave it open.

The programs I care about all have names starting with XM. This one-liner pops a notification and then opens the crash report in Console, which I can close when I'm done.

Jacks-MacBook-Pro:SCM-Framework jackcoates$ crontab -l | grep Crash
*/60 * * * * find ~/Library/Logs/DiagnosticReports -cmin -60 -name xm* -exec osascript -e 'display notification "{}" with title "Extremely Crashy!"' \; -exec open {} \;

Thursday, April 12, 2018

Where's the Product in AI?

Also posted as a Twitter thread

AI & ML products are harder than they look

AI tech is obviously overhyped, and conflating with ideas from science fiction and religion. I prefer using terms like Machine Intelligence and Cognitive Computing, just to avoid the noise. But if we strip away the most unrealistic stuff, there's some interesting paths forward.

The biggest problem is in defining strong semantic paths from the available data to valid use cases. Many approaches founder on assumptions that the data contains value, that the use case can be solved with the data, or that producer and consumer of data use terms the same way.

Given a strong data system, there is a near term opportunity to build AI-powered toolsets that help customers learn and use the data systems that are available. This is a services heavy business with tight integration to data collection and storage.

This has to be human intelligence driven and therefore services-heavy though, because the data and use cases are not similar between budget-owning organizations. There is data system similarity on low-value stories, but high-value stuff is specific to an organization.

That services work should lead to the real opportunity for cognitive computing, which is augmenting human intelligence in narrow fields. If there is room to abstract the data system, there's room to normalize customers to a tool. Then you've got a product plan, similar to SIEMs.

Put products into fields where the data exists, use cases are clear, the past predicts the future, pattern matching and cohort grouping are effective, the problem has enough value in it to justify effort, and outside context problems don't completely derail your model. Simple!

If you can describe the world in numbers without losing important context, then I can express complex relationships between the numbers.

There's a question being begged though... given a data system that successfully models, how much did the advanced system improve over a simpler approach? Is the DNN just figuring out that 95%-ile outliers are interesting?

If a problem can be solved with machine intelligence, great. If the same problem could be solved with basic statistics, that's cheaper to build, operate, and maintain. It'll be interesting to see how this all shakes out.

Update: an interesting take on this from Benedict Evans: https://www.ben-evans.com/benedictevans/2018/06/22/ways-to-think-about-machine-learning-8nefy

Update: and another from Raffael Marty: https://raffy.ch/blog/2018/08/07/ai-ml-in-cybersecurity-why-algorithms-are-dangerous/

Wednesday, April 11, 2018

Splunk Apps and Add-ons

see Twitter for discussion

What are Splunk Apps and Add-ons? What's the difference?


If you're still confused... it's not just you. The confusion roots back to fundamental disagreements on approach that are encoded into every product the company has ever shipped, so it's tough to recommend a meaningful change.


Splunk apps are folders in $SPLUNK_HOME/etc/apps. They're containers that you put splunk objects into. You can put anything in them: code, knowledge management configuration, dashboard elements, libraries, binaries, images, whatever. If you just want to put some stuff together and run it on your laptop, you're done at this point. Put things in a folder for organization. Or don't. Whatever.

If you want to distribute components in a large environment, if you want to depend on shared components, if you want to avoid huge multi-function monoliths, then you start dividing apps into different types. This is why you see the terms "App" and "Add-on" in Splunk. The App refers to the visible front-end app that a user will interact with. The Add-on refers to administrator-only components. This is where the Splexicon definitions start and stop.

There are multiple types of Add-ons. Their definitions are not entirely well established, and have come and gone in official documentation. Right now, it's here, but don't be surprised if that breaks:


Since I helped to write these definitions in the first place, I feel confident in stating what they should be. However, these rules are breached as often as they are observed, and Splunk themselves are the most likely to ignore all of this guidance. If you want to follow the best possible practice, buy Kyle Smith's book and read that. Here are the possible types:

IA: Input Add-on

This includes and configures data collection inputs only. In practice, these are rare and the functionality is usually stuffed into a TA.


TA: Technology Add-on

This includes and configures knowledge management objects. In practice, many TA's also include data collection inputs. A TA would be able to translate the field names provided by a vendor to field names expected by your users, as well as recognizing and tagging specific event types.


SA: Supporting Add-on

This includes supporting libraries and searches needed to manage a class of data. Let's say we're building a security monitor and considering whether authentication attempts seem malicious or not. An SA could include lookup and summary generators to normalize and aggregate the data from many authentication systems and ask generic questions for reporting and alerting.

  • Example: https://splunkbase.splunk.com/app/1621/ 
  • Goes on Search Heads
  • You should absolutely have savedsearches.conf
  • It would make sense to include lookups and some dashboards, prebuilt panels, modular alerts, modular visualizations
  • Some SA's include all the IA stuff mentioned above.

DA: Domain Add-on

This includes supporting libraries and searches needed to manage a domain of problems. Let's say we're considering PCI requirement 4, focused on antivirus software being present, configured, and not reporting infections. A DA might include lookup and summary generators to prepare those answers, dashboards to investigate further, and correlation searches to alert on problems.
  • Example: https://splunkbase.splunk.com/app/2897/ (the "dirty dozen" PCI requirements that can be measured from machine data are each represented with a DA)
  • Goes on Search Heads
  • You should absolutely include dashboards, prebuilt panels, modular alerts, modular visualizations
  • It would make sense to include lookups and savedsearches.conf
And so finally, the App.

App

The front end that ties it all together and makes it usable. If it's done well, users have no idea everything before this was ever involved. This goes on search heads only.


Wednesday, March 21, 2018

Windex: Find Splunk apps that have index time operations

If a Splunk app has index-time operations, it has to be installed on the first heavy forwarder or indexer to perform those operations on the data that's coming in. If it doesn't have those operations, then it only needs to be installed on the search head to perform its search time operations on the data that's found.

Simple right?

There is no comprehensive list of index-time operations.


So a few years ago I got annoyed after asking for such a list for the hundredth time or so, and I banged out a script that would answer the question. One caution is that there might be new index time operations since I wrote the script.

#!/bin/bash

# Script to figure out if index-time extractions are done.
# Run "./windex.sh | sort | uniq"
# Note that Bash is required.

# Online at https://pastebin.com/JVPsqcCV

# TODO: command line argument to set path instead of hard-coding ./splunk/etc/apps
# TODO: print the offending line number too?

echo "These add-ons have index-time field extractions."
echo "================================================"

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setadefaulthostforaninput
# Add-on sets host field.

echo "-----------------------------"
echo "Add-ons which set host field:"
echo "-----------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old
## Question at hand ##
#| xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#"
## the resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old | xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq


# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Bypassautomaticsourcetypeassignment
# Add-on sets sourcetype field.

echo "---------------------------------------------------------------------------"
echo "Add-ons which set sourcetype field (ignoring the old school eventgen ones):"
echo "---------------------------------------------------------------------------"

## Sets sourcetype at all ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v .\old
## Question at hand ##
#| xargs egrep '^sourcetype|sourcetype::' | grep -v "#"
## Resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq | sort

## Sets sourcetype for the old school eventgen ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old
## Question at hand ##
# | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype
## the resulting list of add-ons ##
# | awk '{FS="/"; print $4}'| uniq

## In the first list but not in the second list ##
# comm -23 <(list1) <(list2)

comm -23 <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs egrep '^sourcetype|sourcetype::' | grep -v "#" | awk '{FS="/"; print $4}'| sort | uniq) <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype | awk 'BEGIN {FS="/"}; {print $4}' | sort | uniq)

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureindex-timefieldextraction
# Add-on uses TRANSFORMS- statement in props.conf.

echo "-------------------------------------------------"
echo "Add-ons which use an explicit TRANSFORMS- stanza:"
echo "-------------------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old
## Question at hand ##
# | xargs grep -e ^TRANSFORMS- | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TRANSFORMS- | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Extractfieldsfromfileswithstructureddata
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on uses indexed extractions

echo "--------------------------------------"
echo "Add-ons which use Indexed Extractions:"
echo "--------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^INDEXED_EXTRACTIONS -e FIELD_DELIMITER | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs egrep '^INDEXED_EXTRACTIONS|FIELD_DELIMITER' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Handleeventtimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/HowSplunkextractstimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on sets timestamp

echo "--------------------------------"
echo "Add-ons which assign timestamps:"
echo "--------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^TIME_FORMAT | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TIME_FORMAT | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureeventlinebreaking
# Add-on sets line breaking

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "--------------------------------------"
echo "Add-ons which configure line breaking:"
echo "--------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Abouteventsegmentation
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setthesegmentationforeventdata
# Add-on sets segmentation behavior

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^SEGMENTATION | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "-------------------------------------------"
echo "Add-ons which configure event segmentation:"
echo "-------------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^SEGMENTATION | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "==============================================================="
echo "That's all as of 6.3 (Ember). Future Splunks may change things."

Tuesday, March 6, 2018

Purge old reminders

Apple reminders are great! Easy to create, easy to check across multiple devices, easy to share with family or co-workers, easy to close when you've done the task. Until eventually you've got hundreds of completed reminders per list...

Why does everyone's phone suddenly say we need hundreds of packages of tortillas? Why am I being reminded to do things for a job I left years ago? And you suddenly realize there's a design constraint and no one has designed for this and the app starts misbehaving terribly.

Well the obvious answer is to stop using Apple reminders. But instead, I wrote some AppleScript to automatically delete old reminders. It works well enough that I forgot it existed until I happened to look at crontab for another reason.

purgeoldreminders.scpt


set monthago to (current date) - (30 * days)

tell application "Reminders"
set myLists to name of every list
repeat with thisList in myLists
tell list thisList
delete (every reminder whose completion date is less than monthago)
end tell
end repeat
end tell


Crontab

Jacks-MacBook-Pro:bin symlink jackcoates$ crontab -l | grep purge
20 20 */7 * * osascript ~/Library/Mobile\ Documents/com~apple~ScriptEditor2/Documents/purgeoldreminders.scpt
Jacks-MacBook-Pro:bin symlink jackcoates$

Sunday, March 4, 2018

Automating ERs through Support is Crap

Automating ERs through Support is Crap

Also posted as a Twitter thread

  • Enhancement request tickets filed through support are a sign that the product team is failing. Here’s why.
  • How does a product team know what to do? There are three forces at play to produce product market fit: customer requests, market analysis, and engineering research.
  • Customer requests are the purest and most immediate source of information. Customers know what works and doesn’t work because they use the product for practical ends. At most IT companies, nobody in marketing and engineering can say that.
  • But, customers also produce chaff. They may blame the product for externalities, they may “ask for faster horses”, they may miss the big picture and ask for bad ideas. They may even rant about a bad day with the product without any actionable outcomes.
  • Market analysis seems very remote from a customer’s real work, but it’s just as important. If you can’t explain what your TAM is, how your features address that, and a ballpark value for those features, you’re probably not long for this world.
  • Market analysis is also a trailing indicator though, especially if you’re relying on third parties. You will probably be too late to capture an opportunity if you let this be your only guide; you’re effectively aiming at the center of Gartner’s MQ instead of the right side.
  • Engineering research is the most important piece. What can your team build, at what cost, in what time? Does it solve the problem? Will we get paid for it? Are you innovating?
  • If the use case is valuable but you can’t build it at a price the market will bear, then it’s not a use case. If it is obviated by a coming change, then it’s not a use case. If it’s too difficult for customers to succeed with, then it’s not a use case.
  • Customers, Market, Engineering. A product team gets at these three information sources through a lot of mechanisms, but none of them should be through a game of telephone. Unfortunately, that is exactly what a ER support ticket produces.
  • The customer with an idea or problem files a ticket. Maybe a CSR follows up to ask for clarification, maybe not. Maybe the ticket is clear, maybe not. Sadly, it’s rarely clear or actionable.
  • The customer has a real problem, but does not have full context of the engineering and marketing requirements. They do not know roadmap or budget, design history or strategy or your internal Overton window.
  • Now what? The ticket's in a slush pile queue that the PM is supposed to go through. Or maybe it's auto converted into a ticket tracker story -- IOW, a slush pile with overhead. So far so good. With the best of intentions, a process is initiated, based on bug triage.
  • Customer ERs are then grouped like bugs: “Invalid”, “Cannot Reproduce”, “Already Fixed”, “Won’t Fix”,  and the rare “Open”. You’ll notice that 4/5ths of those mean NO. “WTF”, “Works for me”, “Yeah yeah”, and “we see and disagree”.
  • Only these aren’t bugs, they’re ideas, and most of those “no” answers produce hard feelings. No matter how cheery the automated email response is, it still feels like a shit sandwich to the customer who bothered to share.
  • Even worse, this process exists at all because of scale. No one starts down this road because things are fine, they do it because they’re already feeling overwhelmed. The triage meetings quickly devolve, fewer tickets are reviewed, and they may even stop happening at all.
  • What if an ER survives triage? Now you’re convinced the customer has a good suggestion. Odds are this ticket isn’t even near the level of detail that the team needs. No UX, security, integration, no idea how it will impact the plan. Just an idea that needs shaping and scheduling
  • If it’s a big idea, then it needs effort from the whole team. That means a bunch of established processes and artifacts which at best are linked to this ticket. You set up some customer meetings and discovery sessions, and the ER is forgotten in the rush of productive work.
  • If it’s a small idea, needing no real effort, it still needs to be scheduled. What are the odds that it’s more important than something else that needs doing? The ER sits on the slush pile. Or it’s scheduled into sprint at low priority, then punted or batch closed without action.
  • Worst of all are the medium sized ideas: too big to ignore, too small to work on right now. They just float along in the bathtub ring until they’re no longer valid and can finally be closed. Probably with a birthday cupcake for 1000 days in the queue without ever seeing a sprint.
  • Transforming ideas into software is creative work, and there's no good way to automate and manage that. The only thing that works is product teams communicating directly and frankly with users. Explain what you can and can’t do, learn what they need and want, look for leverage.
  • Don’t just give up and ask customer support to do that product work for you, it’s not their job.

License models suck

Also posted as a Twitter thread

Most license models suck in some way:
* Flat rate sucks for the vendor because it leaves money on the table from large customers. This can be acceptable for an inherently limited product model (e.g. per user with a bulk discount), but it is not ideal for scalable enterprise software.

* Per unit of a metric sucks for the customer because it's disincenting use of whatever metric you select.
- Charge by data ingested? Now there's a reason not to ingest data.
- Charge by CPUs used? Now there's a reason not to provision hardware.
- Charge by seat? Now there's a reason to withhold access.

* Indirect charges suck because they feel unfair. For instance, charging by the size of the customer's company only fits top-down sales models. Any indirect model other than a flat rate increases sales friction needlessly.

Per unit of a metric tends to win out of those models. It's easy to explain, Easy to measure and bill, easy to enforce if you choose to. But, it also produces some specific problems:

What if the units charged for have variable values to the customer? Then you're right back at the flat rate problems. Customer feels like they’re being charged for stuff they don't use, and vendor feels like they're selling at a lower cost than they could.

More insidious: a successful pricing model shapes a company. The metric that they charge for becomes the measurement that sales optimizes for, and that establishes what products and services the company can realistically pursue.

If you see a great idea fizzle and die at a vendor that should be an obvious technical fit... maybe it's their pricing model.

Metrics in Splunk, and Observability

Also posted as a Twitter thread
  • I've got some thoughts about Splunk and metrics for observability...
  • The event-first Splunk can now store metrics efficiently. That has potential: 1 dashboard, a single glass of pain.
  • I'm excited to see annotations and mcatalog; I'm hoping it allows resolution of a nasty problem with multi-source metric comparison.
  • Metrics are quantitative. "Your volume has N bytes free". Good? Bad? Quantitative metrics are almost entirely useless for decisions.
  • (I actually think they are useless. A triggered  metric like "DISK FULL, 10 periods" is an event, not a metric. Splitting hairs.)
  • Decisions from metrics need qualitative context. "Allocate more space now or later?" "How much more?" "What about budget & schedule?"
  • Quantitative data, qualitative context, quantitative decision. If the context is only in humans, then humans need training to use it.
  • "Fellow human, I teach you tool's contextual framework. It emits X metric, Y units, Z interval. Normal = A-S today. If X>N, runbook!"
  • Encode that into a KPI? Hasn't improved anything. Still breaks when change means normal is wrong. Human has to know context to fix.
  • Compare many KPIs? Not even feasible without qualitative metrics. "Q: Need more storage?" Looks at 4-tier hybrid hierarchy, "A: ???"
  • In Metrics Store's catalog, seems that unit size is unknown, but there’s periodicity & granularity? If the source gathered them?
  • Why don't sources just send context? Tools should compute useful values & compare metrics qualitatively. "Tier 3 is 95% full."
  • Contextual decisions could be automated. "Usage will exceed capacity during your vacation, I think we should buy more space now."
  • Data system problems could be seen. "Dashboard expects 15 metrics/period, now getting 3 from 1/6 of probes, & 1 OutOfCheese Error."
  • Answer to "Why don't you just" questions is "Why should I". Splunk can answer that. Where's CIM for Metrics? Real attributes and KPIs?
  • Determining importance of a metric needs context. "Disk full" is pitifully primitive. A service provider or vendor knows better KPIs.
  • Sure would be nice to have vendor-specific tools for detailed analysis and role-specific tools with Splunk awareness metrics.

Iron Triangle and release planning

Also posted as a Twitter thread
  • Airplane wifi is too slow to be productive, so I’m going to rant about the Iron Triangle… you can promise date or features, but not date and features. So, add features to your project, and you’ll lose time.
  • Subtract features, and you’ll probably only maintain time and still just meet your deadline, because work expands to fill the introduced gap. There is always more that really should be done. Add time to your project, and you’ll add features, but not as many as you think.
  • Of course, subtract time and you lose features. In my experience it’s utterly useless to talk about the resources edge of the triangle, because Brook’s Law… but hey, let’s go ahead. Add people, lose time and features. Subtract people, lose time and features.
  • Things can get weird though, because people are not frictionless spheres in a vacuum. Subtract the person who was a disruptive drag on productivity and you might turn a failing project into a success.
  • Iron Triangle concepts are pretty well understood in theory, but I don’t know that everyone grasps how they drive long term success and failure of a company’s product portfolio. To overly simplify complexity into a legible system: there’s 2 types of release: iteration & big bang.
  • Evolution or Revolution. Band-aid sculpting, or fire-and-sword. Iterations are the best, because you just deliver relentless improvement. Set times in stone, fit the features and tech debt around them.
  • This is ideal because you fit well into everyone else’s schedule, customers will find it easier to upgrade, and can usually go from vague promises to concrete realities in the time frames that management needs.
  • It’s non-ideal over the longest run though, because your project is not exciting and your resources are a political target. Iterative evolution doesn’t excite; rapid adaptation to a new paradigm excites.
  • And in a world where exciting new paradigms are getting promoted every week, there’s no lack of corporate raiders eyeballing your resources. If you’re going to iterate, you’d better make sure you’re making life easier for your adventurous colleagues trying to support new markets.
  • But what if you’re the one with a new paradigm to meet? Sometimes revolutionary products are the right answer and you’ve got to produce something new, working against the political headwind. Maybe you’re lucky enough to do it in a quiet skunkworks or stealth startup.
  • Most people don’t get the luxury of independent wealth devoted to exploring a new idea, and I wouldn’t bet on it. The Iron Triangle is a lot harder to deal with if you’re doing a big bang launch where you need support across the whole organization.
  • Because every use case and feature is hard to imagine, it’s another political football, and another threat surface for your project (as well as an opportunity for ultimate success!)
  • If you’re lucky, then you can turn this into an iteration process by constructing pre-release releases to friendly customers and internals. D’uh, Agile, Lean, amirite? Yeah, this is well-known and great for shipping working software. However lots of projects with these goals and methodologies never see the light of day, because this process is untenable without executive protection.
  • So, identify your sponsor early and don’t ever let them be surprised by the Iron Triangle. And here ends my rant.