Wednesday, March 21, 2018

Windex: Find Splunk apps that have index time operations

If a Splunk app has index-time operations, it has to be installed on the first heavy forwarder or indexer to perform those operations on the data that's coming in. If it doesn't have those operations, then it only needs to be installed on the search head to perform its search time operations on the data that's found.

Simple right?

There is no comprehensive list of index-time operations.


So a few years ago I got annoyed after asking for such a list for the hundredth time or so, and I banged out a script that would answer the question. One caution is that there might be new index time operations since I wrote the script.

#!/bin/bash

# Script to figure out if index-time extractions are done.
# Run "./windex.sh | sort | uniq"
# Note that Bash is required.

# Online at https://pastebin.com/JVPsqcCV

# TODO: command line argument to set path instead of hard-coding ./splunk/etc/apps
# TODO: print the offending line number too?

echo "These add-ons have index-time field extractions."
echo "================================================"

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setadefaulthostforaninput
# Add-on sets host field.

echo "-----------------------------"
echo "Add-ons which set host field:"
echo "-----------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old
## Question at hand ##
#| xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#"
## the resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old | xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq


# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Bypassautomaticsourcetypeassignment
# Add-on sets sourcetype field.

echo "---------------------------------------------------------------------------"
echo "Add-ons which set sourcetype field (ignoring the old school eventgen ones):"
echo "---------------------------------------------------------------------------"

## Sets sourcetype at all ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v .\old
## Question at hand ##
#| xargs egrep '^sourcetype|sourcetype::' | grep -v "#"
## Resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq | sort

## Sets sourcetype for the old school eventgen ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old
## Question at hand ##
# | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype
## the resulting list of add-ons ##
# | awk '{FS="/"; print $4}'| uniq

## In the first list but not in the second list ##
# comm -23 <(list1) <(list2)

comm -23 <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs egrep '^sourcetype|sourcetype::' | grep -v "#" | awk '{FS="/"; print $4}'| sort | uniq) <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype | awk 'BEGIN {FS="/"}; {print $4}' | sort | uniq)

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureindex-timefieldextraction
# Add-on uses TRANSFORMS- statement in props.conf.

echo "-------------------------------------------------"
echo "Add-ons which use an explicit TRANSFORMS- stanza:"
echo "-------------------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old
## Question at hand ##
# | xargs grep -e ^TRANSFORMS- | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TRANSFORMS- | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Extractfieldsfromfileswithstructureddata
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on uses indexed extractions

echo "--------------------------------------"
echo "Add-ons which use Indexed Extractions:"
echo "--------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^INDEXED_EXTRACTIONS -e FIELD_DELIMITER | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs egrep '^INDEXED_EXTRACTIONS|FIELD_DELIMITER' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Handleeventtimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/HowSplunkextractstimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on sets timestamp

echo "--------------------------------"
echo "Add-ons which assign timestamps:"
echo "--------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^TIME_FORMAT | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TIME_FORMAT | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureeventlinebreaking
# Add-on sets line breaking

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "--------------------------------------"
echo "Add-ons which configure line breaking:"
echo "--------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Abouteventsegmentation
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setthesegmentationforeventdata
# Add-on sets segmentation behavior

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^SEGMENTATION | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "-------------------------------------------"
echo "Add-ons which configure event segmentation:"
echo "-------------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^SEGMENTATION | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "==============================================================="
echo "That's all as of 6.3 (Ember). Future Splunks may change things."

Tuesday, March 6, 2018

Purge old reminders

Apple reminders are great! Easy to create, easy to check across multiple devices, easy to share with family or co-workers, easy to close when you've done the task. Until eventually you've got hundreds of completed reminders per list...

Why does everyone's phone suddenly say we need hundreds of packages of tortillas? Why am I being reminded to do things for a job I left years ago? And you suddenly realize there's a design constraint and no one has designed for this and the app starts misbehaving terribly.

Well the obvious answer is to stop using Apple reminders. But instead, I wrote some AppleScript to automatically delete old reminders. It works well enough that I forgot it existed until I happened to look at crontab for another reason.

purgeoldreminders.scpt


set monthago to (current date) - (30 * days)

tell application "Reminders"
set myLists to name of every list
repeat with thisList in myLists
tell list thisList
delete (every reminder whose completion date is less than monthago)
end tell
end repeat
end tell


Crontab

Jacks-MacBook-Pro:bin symlink jackcoates$ crontab -l | grep purge
20 20 */7 * * osascript ~/Library/Mobile\ Documents/com~apple~ScriptEditor2/Documents/purgeoldreminders.scpt
Jacks-MacBook-Pro:bin symlink jackcoates$

Sunday, March 4, 2018

Automating ERs through Support is Crap

Automating ERs through Support is Crap

Also posted as a Twitter thread

  • Enhancement request tickets filed through support are a sign that the product team is failing. Here’s why.
  • How does a product team know what to do? There are three forces at play to produce product market fit: customer requests, market analysis, and engineering research.
  • Customer requests are the purest and most immediate source of information. Customers know what works and doesn’t work because they use the product for practical ends. At most IT companies, nobody in marketing and engineering can say that.
  • But, customers also produce chaff. They may blame the product for externalities, they may “ask for faster horses”, they may miss the big picture and ask for bad ideas. They may even rant about a bad day with the product without any actionable outcomes.
  • Market analysis seems very remote from a customer’s real work, but it’s just as important. If you can’t explain what your TAM is, how your features address that, and a ballpark value for those features, you’re probably not long for this world.
  • Market analysis is also a trailing indicator though, especially if you’re relying on third parties. You will probably be too late to capture an opportunity if you let this be your only guide; you’re effectively aiming at the center of Gartner’s MQ instead of the right side.
  • Engineering research is the most important piece. What can your team build, at what cost, in what time? Does it solve the problem? Will we get paid for it? Are you innovating?
  • If the use case is valuable but you can’t build it at a price the market will bear, then it’s not a use case. If it is obviated by a coming change, then it’s not a use case. If it’s too difficult for customers to succeed with, then it’s not a use case.
  • Customers, Market, Engineering. A product team gets at these three information sources through a lot of mechanisms, but none of them should be through a game of telephone. Unfortunately, that is exactly what a ER support ticket produces.
  • The customer with an idea or problem files a ticket. Maybe a CSR follows up to ask for clarification, maybe not. Maybe the ticket is clear, maybe not. Sadly, it’s rarely clear or actionable.
  • The customer has a real problem, but does not have full context of the engineering and marketing requirements. They do not know roadmap or budget, design history or strategy or your internal Overton window.
  • Now what? The ticket's in a slush pile queue that the PM is supposed to go through. Or maybe it's auto converted into a ticket tracker story -- IOW, a slush pile with overhead. So far so good. With the best of intentions, a process is initiated, based on bug triage.
  • Customer ERs are then grouped like bugs: “Invalid”, “Cannot Reproduce”, “Already Fixed”, “Won’t Fix”,  and the rare “Open”. You’ll notice that 4/5ths of those mean NO. “WTF”, “Works for me”, “Yeah yeah”, and “we see and disagree”.
  • Only these aren’t bugs, they’re ideas, and most of those “no” answers produce hard feelings. No matter how cheery the automated email response is, it still feels like a shit sandwich to the customer who bothered to share.
  • Even worse, this process exists at all because of scale. No one starts down this road because things are fine, they do it because they’re already feeling overwhelmed. The triage meetings quickly devolve, fewer tickets are reviewed, and they may even stop happening at all.
  • What if an ER survives triage? Now you’re convinced the customer has a good suggestion. Odds are this ticket isn’t even near the level of detail that the team needs. No UX, security, integration, no idea how it will impact the plan. Just an idea that needs shaping and scheduling
  • If it’s a big idea, then it needs effort from the whole team. That means a bunch of established processes and artifacts which at best are linked to this ticket. You set up some customer meetings and discovery sessions, and the ER is forgotten in the rush of productive work.
  • If it’s a small idea, needing no real effort, it still needs to be scheduled. What are the odds that it’s more important than something else that needs doing? The ER sits on the slush pile. Or it’s scheduled into sprint at low priority, then punted or batch closed without action.
  • Worst of all are the medium sized ideas: too big to ignore, too small to work on right now. They just float along in the bathtub ring until they’re no longer valid and can finally be closed. Probably with a birthday cupcake for 1000 days in the queue without ever seeing a sprint.
  • Transforming ideas into software is creative work, and there's no good way to automate and manage that. The only thing that works is product teams communicating directly and frankly with users. Explain what you can and can’t do, learn what they need and want, look for leverage.
  • Don’t just give up and ask customer support to do that product work for you, it’s not their job.

License models suck

Also posted as a Twitter thread

Most license models suck in some way:
* Flat rate sucks for the vendor because it leaves money on the table from large customers. This can be acceptable for an inherently limited product model (e.g. per user with a bulk discount), but it is not ideal for scalable enterprise software.

* Per unit of a metric sucks for the customer because it's disincenting use of whatever metric you select.
- Charge by data ingested? Now there's a reason not to ingest data.
- Charge by CPUs used? Now there's a reason not to provision hardware.
- Charge by seat? Now there's a reason to withhold access.

* Indirect charges suck because they feel unfair. For instance, charging by the size of the customer's company only fits top-down sales models. Any indirect model other than a flat rate increases sales friction needlessly.

Per unit of a metric tends to win out of those models. It's easy to explain, Easy to measure and bill, easy to enforce if you choose to. But, it also produces some specific problems:

What if the units charged for have variable values to the customer? Then you're right back at the flat rate problems. Customer feels like they’re being charged for stuff they don't use, and vendor feels like they're selling at a lower cost than they could.

More insidious: a successful pricing model shapes a company. The metric that they charge for becomes the measurement that sales optimizes for, and that establishes what products and services the company can realistically pursue.

If you see a great idea fizzle and die at a vendor that should be an obvious technical fit... maybe it's their pricing model.

Metrics in Splunk, and Observability

Also posted as a Twitter thread
  • I've got some thoughts about Splunk and metrics for observability...
  • The event-first Splunk can now store metrics efficiently. That has potential: 1 dashboard, a single glass of pain.
  • I'm excited to see annotations and mcatalog; I'm hoping it allows resolution of a nasty problem with multi-source metric comparison.
  • Metrics are quantitative. "Your volume has N bytes free". Good? Bad? Quantitative metrics are almost entirely useless for decisions.
  • (I actually think they are useless. A triggered  metric like "DISK FULL, 10 periods" is an event, not a metric. Splitting hairs.)
  • Decisions from metrics need qualitative context. "Allocate more space now or later?" "How much more?" "What about budget & schedule?"
  • Quantitative data, qualitative context, quantitative decision. If the context is only in humans, then humans need training to use it.
  • "Fellow human, I teach you tool's contextual framework. It emits X metric, Y units, Z interval. Normal = A-S today. If X>N, runbook!"
  • Encode that into a KPI? Hasn't improved anything. Still breaks when change means normal is wrong. Human has to know context to fix.
  • Compare many KPIs? Not even feasible without qualitative metrics. "Q: Need more storage?" Looks at 4-tier hybrid hierarchy, "A: ???"
  • In Metrics Store's catalog, seems that unit size is unknown, but there’s periodicity & granularity? If the source gathered them?
  • Why don't sources just send context? Tools should compute useful values & compare metrics qualitatively. "Tier 3 is 95% full."
  • Contextual decisions could be automated. "Usage will exceed capacity during your vacation, I think we should buy more space now."
  • Data system problems could be seen. "Dashboard expects 15 metrics/period, now getting 3 from 1/6 of probes, & 1 OutOfCheese Error."
  • Answer to "Why don't you just" questions is "Why should I". Splunk can answer that. Where's CIM for Metrics? Real attributes and KPIs?
  • Determining importance of a metric needs context. "Disk full" is pitifully primitive. A service provider or vendor knows better KPIs.
  • Sure would be nice to have vendor-specific tools for detailed analysis and role-specific tools with Splunk awareness metrics.

Iron Triangle and release planning

Also posted as a Twitter thread
  • Airplane wifi is too slow to be productive, so I’m going to rant about the Iron Triangle… you can promise date or features, but not date and features. So, add features to your project, and you’ll lose time.
  • Subtract features, and you’ll probably only maintain time and still just meet your deadline, because work expands to fill the introduced gap. There is always more that really should be done. Add time to your project, and you’ll add features, but not as many as you think.
  • Of course, subtract time and you lose features. In my experience it’s utterly useless to talk about the resources edge of the triangle, because Brook’s Law… but hey, let’s go ahead. Add people, lose time and features. Subtract people, lose time and features.
  • Things can get weird though, because people are not frictionless spheres in a vacuum. Subtract the person who was a disruptive drag on productivity and you might turn a failing project into a success.
  • Iron Triangle concepts are pretty well understood in theory, but I don’t know that everyone grasps how they drive long term success and failure of a company’s product portfolio. To overly simplify complexity into a legible system: there’s 2 types of release: iteration & big bang.
  • Evolution or Revolution. Band-aid sculpting, or fire-and-sword. Iterations are the best, because you just deliver relentless improvement. Set times in stone, fit the features and tech debt around them.
  • This is ideal because you fit well into everyone else’s schedule, customers will find it easier to upgrade, and can usually go from vague promises to concrete realities in the time frames that management needs.
  • It’s non-ideal over the longest run though, because your project is not exciting and your resources are a political target. Iterative evolution doesn’t excite; rapid adaptation to a new paradigm excites.
  • And in a world where exciting new paradigms are getting promoted every week, there’s no lack of corporate raiders eyeballing your resources. If you’re going to iterate, you’d better make sure you’re making life easier for your adventurous colleagues trying to support new markets.
  • But what if you’re the one with a new paradigm to meet? Sometimes revolutionary products are the right answer and you’ve got to produce something new, working against the political headwind. Maybe you’re lucky enough to do it in a quiet skunkworks or stealth startup.
  • Most people don’t get the luxury of independent wealth devoted to exploring a new idea, and I wouldn’t bet on it. The Iron Triangle is a lot harder to deal with if you’re doing a big bang launch where you need support across the whole organization.
  • Because every use case and feature is hard to imagine, it’s another political football, and another threat surface for your project (as well as an opportunity for ultimate success!)
  • If you’re lucky, then you can turn this into an iteration process by constructing pre-release releases to friendly customers and internals. D’uh, Agile, Lean, amirite? Yeah, this is well-known and great for shipping working software. However lots of projects with these goals and methodologies never see the light of day, because this process is untenable without executive protection.
  • So, identify your sponsor early and don’t ever let them be surprised by the Iron Triangle. And here ends my rant.