Monday, May 7, 2018

GDPR is great for Facebook and Google

Copying from Twitter to Blogger like carving a wheel from a rock

GDPR is going to be great for Facebook and Google.

"Over time, all data approaches deleted, or public." -- Norton's Law. See http://idlewords.com/talks/haunted_by_data.htm by Pinboard and https://www.youtube.com/watch?v=NKpuX_yzdYs by AndrewYNg for more background and viewpoints.

Picture two types of data store, public and private. If your store is private, you can use it to advantage as suggested by Andrew Ng's talk. If your store is public, everyone can use it and advantage is created in other ways.

Say a company wants to live dangerously and creates a private store of personally identifiable information about people. Say that company suffers a cybersecurity incident and the store of data becomes public. Say that there is no long term negative impact on that company.

Say that this pattern happens over and over again for many years. A sensible executive might infer that it's not so dangerous to live dangerously.

Of course I'm talking about credit card number storage in the early days of web retail, not anything modern :) For all of its issues, PCI changed the picture of credit card storage by putting real financial penalties on the problem.

Now companies either perform the minimum due diligence, with separate cardholder data environments and regular audits, or they outsource card-handling to another company that is focused on this problem.

Putting more and more credit card data into a single store obviously creates a watering hole problem, but it also allows focusing protective efforts. Overall it's a net good. Until that third party hits a rough patch, but entropy is what it is.

Since GDPR has the same impact on a broader set of personal data, it seems likely that the same outcome will eventually occur. Either protect the data yourself, or outsource the problem to a broker.

The broker needs to provide analytics tools so you can do all the market and product research you wanted the data for. It would also be handy if they'd take care of AAA, minimizing the impact of change (name, address, legal status, &c).

And who's in a great position to do all those things already? Google and Facebook.

Monday, April 16, 2018

Crash notifier

Say you're on OSX and working with some C++ software that might crash when bugs are found... and say you don't want to tail or search the voluminous logs just to know whether there's crashes to investigate. The crashes will show up in the Console application, but it destroys your performance to leave it open.

The programs I care about all have names starting with XM. This one-liner pops a notification and then opens the crash report in Console, which I can close when I'm done.

Jacks-MacBook-Pro:SCM-Framework jackcoates$ crontab -l | grep Crash
*/60 * * * * find ~/Library/Logs/DiagnosticReports -cmin -60 -name xm* -exec osascript -e 'display notification "{}" with title "Extremely Crashy!"' \; -exec open {} \;

Thursday, April 12, 2018

Where's the Product in AI?

Copying from Twitter to Blogger like dabbing paint on a cave wall

AI tech is obviously overhyped, and conflating with ideas from science fiction and religion. I prefer using terms like Machine Intelligence and Cognitive Computing, just to avoid the noise. But if we strip away the most unrealistic stuff, there's some interesting paths forward.

The biggest problem is in defining strong semantic paths from the available data to valid use cases. Many approaches founder on assumptions that the data contains value, that the use case can be solved with the data, or that producer and consumer of data use terms the same way.

Given a strong data system, there is a near term opportunity to build AI-powered toolsets that help customers learn and use the data systems that are available. This is a services heavy business with tight integration to data collection and storage.

This has to be human intelligence driven and therefore services-heavy though, because the data and use cases are not similar between budget-owning organizations. There is data system similarity on low-value stories, but high-value stuff is specific to an organization.

That services work should lead to the real opportunity for cognitive computing, which is augmenting human intelligence in narrow fields. If there is room to abstract the data system, there's room to normalize customers to a tool. Then you've got a product plan, similar to SIEMs.

Put products into fields where the data exists, use cases are clear, the past predicts the future, pattern matching and cohort grouping are effective, the problem has enough value in it to justify effort, and outside context problems don't completely derail your model. Simple!

If you can describe the world in numbers without losing important context, then I can express complex relationships between the numbers.

There's a question being begged though... given a data system that successfully models, how much did the advanced system improve over a simpler approach? Is the DNN just figuring out that 95%-ile outliers are interesting?

If a problem can be solved with machine intelligence, great. If the same problem could be solved with basic statistics, that's cheaper to build, operate, and maintain. It'll be interesting to see how this all shakes out.

Wednesday, April 11, 2018

Splunk Apps and Add-ons

What are Splunk Apps and Add-ons? What's the difference?


If you're still confused... it's not just you. The confusion roots back to fundamental disagreements on approach that are encoded into every product the company has ever shipped, so it's tough to recommend a meaningful change.


Splunk apps are folders in $SPLUNK_HOME/etc/apps. They're containers that you put splunk objects into. You can put anything in them: code, knowledge management configuration, dashboard elements, libraries, binaries, images, whatever. If you just want to put some stuff together and run it on your laptop, you're done at this point. Put things in a folder for organization. Or don't. Whatever.

If you want to distribute components in a large environment, if you want to depend on shared components, if you want to avoid huge multi-function monoliths, then you start dividing apps into different types. This is why you see the terms "App" and "Add-on" in Splunk. The App refers to the visible front-end app that a user will interact with. The Add-on refers to administrator-only components. This is where the Splexicon definitions start and stop.

There are multiple types of Add-ons. Their definitions are not entirely well established, and have come and gone in official documentation. Right now, it's here, but don't be surprised if that breaks:


Since I helped to write these definitions in the first place, I feel confident in stating what they should be. However, these rules are breached as often as they are observed, and Splunk themselves are the most likely to ignore all of this guidance. If you want to follow the best possible practice, buy Kyle Smith's book and read that. Here are the possible types:

IA: Input Add-on

This includes and configures data collection inputs only. In practice, these are rare and the functionality is usually stuffed into a TA.


TA: Technology Add-on

This includes and configures knowledge management objects. In practice, many TA's also include data collection inputs. A TA would be able to translate the field names provided by a vendor to field names expected by your users, as well as recognizing and tagging specific event types.


SA: Supporting Add-on

This includes supporting libraries and searches needed to manage a class of data. Let's say we're building a security monitor and considering whether authentication attempts seem malicious or not. An SA could include lookup and summary generators to normalize and aggregate the data from many authentication systems and ask generic questions for reporting and alerting.

  • Example: https://splunkbase.splunk.com/app/1621/ 
  • Goes on Search Heads
  • You should absolutely have savedsearches.conf
  • It would make sense to include lookups and some dashboards, prebuilt panels, modular alerts, modular visualizations
  • Some SA's include all the IA stuff mentioned above.

DA: Domain Add-on

This includes supporting libraries and searches needed to manage a domain of problems. Let's say we're considering PCI requirement 4, focused on antivirus software being present, configured, and not reporting infections. A DA might include lookup and summary generators to prepare those answers, dashboards to investigate further, and correlation searches to alert on problems.
  • Example: https://splunkbase.splunk.com/app/2897/ (the "dirty dozen" PCI requirements that can be measured from machine data are each represented with a DA)
  • Goes on Search Heads
  • You should absolutely include dashboards, prebuilt panels, modular alerts, modular visualizations
  • It would make sense to include lookups and savedsearches.conf
And so finally, the App.

App

The front end that ties it all together and makes it usable. If it's done well, users have no idea everything before this was ever involved. This goes on search heads only.


Wednesday, March 21, 2018

Windex: Find Splunk apps that have index time operations

If a Splunk app has index-time operations, it has to be installed on the first heavy forwarder or indexer to perform those operations on the data that's coming in. If it doesn't have those operations, then it only needs to be installed on the search head to perform its search time operations on the data that's found.

Simple right?

There is no comprehensive list of index-time operations.


So a few years ago I got annoyed after asking for such a list for the hundredth time or so, and I banged out a script that would answer the question. One caution is that there might be new index time operations since I wrote the script.

#!/bin/bash

# Script to figure out if index-time extractions are done.
# Run "./windex.sh | sort | uniq"
# Note that Bash is required.

# Online at https://pastebin.com/JVPsqcCV

# TODO: command line argument to set path instead of hard-coding ./splunk/etc/apps
# TODO: print the offending line number too?

echo "These add-ons have index-time field extractions."
echo "================================================"

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setadefaulthostforaninput
# Add-on sets host field.

echo "-----------------------------"
echo "Add-ons which set host field:"
echo "-----------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old
## Question at hand ##
#| xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#"
## the resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA | grep default | egrep 'inputs|props|transforms' | grep -v \.old | xargs egrep '^host|host::' | egrep -v '_host|host_' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq


# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Bypassautomaticsourcetypeassignment
# Add-on sets sourcetype field.

echo "---------------------------------------------------------------------------"
echo "Add-ons which set sourcetype field (ignoring the old school eventgen ones):"
echo "---------------------------------------------------------------------------"

## Sets sourcetype at all ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v .\old
## Question at hand ##
#| xargs egrep '^sourcetype|sourcetype::' | grep -v "#"
## Resulting list of add-ons ##
#| awk 'BEGIN {FS="/"}; {print $4}'| uniq | sort

## Sets sourcetype for the old school eventgen ##
## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old
## Question at hand ##
# | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype
## the resulting list of add-ons ##
# | awk '{FS="/"; print $4}'| uniq

## In the first list but not in the second list ##
# comm -23 <(list1) <(list2)

comm -23 <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs egrep '^sourcetype|sourcetype::' | grep -v "#" | awk '{FS="/"; print $4}'| sort | uniq) <(find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | egrep 'inputs|props' | grep -v \.old | xargs grep -A1 -e "^\[source::.*\]"| grep sourcetype | awk 'BEGIN {FS="/"}; {print $4}' | sort | uniq)

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureindex-timefieldextraction
# Add-on uses TRANSFORMS- statement in props.conf.

echo "-------------------------------------------------"
echo "Add-ons which use an explicit TRANSFORMS- stanza:"
echo "-------------------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old
## Question at hand ##
# | xargs grep -e ^TRANSFORMS- | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TRANSFORMS- | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}'| uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Extractfieldsfromfileswithstructureddata
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on uses indexed extractions

echo "--------------------------------------"
echo "Add-ons which use Indexed Extractions:"
echo "--------------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^INDEXED_EXTRACTIONS -e FIELD_DELIMITER | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs egrep '^INDEXED_EXTRACTIONS|FIELD_DELIMITER' | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Handleeventtimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/HowSplunkextractstimestamps
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Admin/Propsconf
# Add-on sets timestamp

echo "--------------------------------"
echo "Add-ons which assign timestamps:"
echo "--------------------------------"

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^TIME_FORMAT | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^TIME_FORMAT | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Configureeventlinebreaking
# Add-on sets line breaking

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "--------------------------------------"
echo "Add-ons which configure line breaking:"
echo "--------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^LINE_BREAKER -e ^SHOULD_LINEMERGE | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

# http://docs.splunk.com/Documentation/Splunk/6.3.0/Indexer/Indextimeversussearchtime
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Abouteventsegmentation
# -> http://docs.splunk.com/Documentation/Splunk/6.3.0/Data/Setthesegmentationforeventdata
# Add-on sets segmentation behavior

## Relevant conf files ##
# find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v old
## Question at hand ##
# | xargs grep -e ^SEGMENTATION | grep -v "#"
## The resulting list of add-ons ##
# | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "-------------------------------------------"
echo "Add-ons which configure event segmentation:"
echo "-------------------------------------------"

find splunk/etc/apps/ -name *.conf | grep Splunk_TA_ | grep default | grep props | grep -v \.old | xargs grep -e ^SEGMENTATION | grep -v "#" | awk 'BEGIN {FS="/"}; {print $4}' | uniq

echo "==============================================================="
echo "That's all as of 6.3 (Ember). Future Splunks may change things."

Tuesday, March 6, 2018

Purge old reminders

Apple reminders are great! Easy to create, easy to check across multiple devices, easy to share with family or co-workers, easy to close when you've done the task. Until eventually you've got hundreds of completed reminders per list...

Why does everyone's phone suddenly say we need hundreds of packages of tortillas? Why am I being reminded to do things for a job I left years ago? And you suddenly realize there's a design constraint and no one has designed for this and the app starts misbehaving terribly.

Well the obvious answer is to stop using Apple reminders. But instead, I wrote some AppleScript to automatically delete old reminders. It works well enough that I forgot it existed until I happened to look at crontab for another reason.

purgeoldreminders.scpt


set monthago to (current date) - (30 * days)

tell application "Reminders"
set myLists to name of every list
repeat with thisList in myLists
tell list thisList
delete (every reminder whose completion date is less than monthago)
end tell
end repeat
end tell


Crontab

Jacks-MacBook-Pro:bin symlink jackcoates$ crontab -l | grep purge
20 20 */7 * * osascript ~/Library/Mobile\ Documents/com~apple~ScriptEditor2/Documents/purgeoldreminders.scpt
Jacks-MacBook-Pro:bin symlink jackcoates$

Sunday, March 4, 2018

Automating ERs through Support is Crap

Automating ERs through Support is Crap


Here's some Twitter threads I've written that I can't easily find again on Twitter, so I'll copy them to Blogger like an old codger banging rocks together.

  • Enhancement request tickets filed through support are a sign that the product team is failing. Here’s why.
  • How does a product team know what to do? There are three forces at play to produce product market fit: customer requests, market analysis, and engineering research.
  • Customer requests are the purest and most immediate source of information. Customers know what works and doesn’t work because they use the product for practical ends. At most IT companies, nobody in marketing and engineering can say that.
  • But, customers also produce chaff. They may blame the product for externalities, they may “ask for faster horses”, they may miss the big picture and ask for bad ideas. They may even rant about a bad day with the product without any actionable outcomes.
  • Market analysis seems very remote from a customer’s real work, but it’s just as important. If you can’t explain what your TAM is, how your features address that, and a ballpark value for those features, you’re probably not long for this world.
  • Market analysis is also a trailing indicator though, especially if you’re relying on third parties. You will probably be too late to capture an opportunity if you let this be your only guide; you’re effectively aiming at the center of Gartner’s MQ instead of the right side.
  • Engineering research is the most important piece. What can your team build, at what cost, in what time? Does it solve the problem? Will we get paid for it? Are you innovating?
  • If the use case is valuable but you can’t build it at a price the market will bear, then it’s not a use case. If it is obviated by a coming change, then it’s not a use case. If it’s too difficult for customers to succeed with, then it’s not a use case.
  • Customers, Market, Engineering. A product team gets at these three information sources through a lot of mechanisms, but none of them should be through a game of telephone. Unfortunately, that is exactly what a ER support ticket produces.
  • The customer with an idea or problem files a ticket. Maybe a CSR follows up to ask for clarification, maybe not. Maybe the ticket is clear, maybe not. Sadly, it’s rarely clear or actionable.
  • The customer has a real problem, but does not have full context of the engineering and marketing requirements. They do not know roadmap or budget, design history or strategy or your internal Overton window.
  • Now what? The ticket's in a slush pile queue that the PM is supposed to go through. Or maybe it's auto converted into a ticket tracker story -- IOW, a slush pile with overhead. So far so good. With the best of intentions, a process is initiated, based on bug triage.
  • Customer ERs are then grouped like bugs: “Invalid”, “Cannot Reproduce”, “Already Fixed”, “Won’t Fix”,  and the rare “Open”. You’ll notice that 4/5ths of those mean NO. “WTF”, “Works for me”, “Yeah yeah”, and “we see and disagree”.
  • Only these aren’t bugs, they’re ideas, and most of those “no” answers produce hard feelings. No matter how cheery the automated email response is, it still feels like a shit sandwich to the customer who bothered to share.
  • Even worse, this process exists at all because of scale. No one starts down this road because things are fine, they do it because they’re already feeling overwhelmed. The triage meetings quickly devolve, fewer tickets are reviewed, and they may even stop happening at all.
  • What if an ER survives triage? Now you’re convinced the customer has a good suggestion. Odds are this ticket isn’t even near the level of detail that the team needs. No UX, security, integration, no idea how it will impact the plan. Just an idea that needs shaping and scheduling
  • If it’s a big idea, then it needs effort from the whole team. That means a bunch of established processes and artifacts which at best are linked to this ticket. You set up some customer meetings and discovery sessions, and the ER is forgotten in the rush of productive work.
  • If it’s a small idea, needing no real effort, it still needs to be scheduled. What are the odds that it’s more important than something else that needs doing? The ER sits on the slush pile. Or it’s scheduled into sprint at low priority, then punted or batch closed without action.
  • Worst of all are the medium sized ideas: too big to ignore, too small to work on right now. They just float along in the bathtub ring until they’re no longer valid and can finally be closed. Probably with a birthday cupcake for 1000 days in the queue without ever seeing a sprint.
  • Transforming ideas into software is creative work, and there's no good way to automate and manage that. The only thing that works is product teams communicating directly and frankly with users. Explain what you can and can’t do, learn what they need and want, look for leverage.
  • Don’t just give up and ask customer support to do that product work for you, it’s not their job.