Metrics in Splunk, and Observability
Here's some Twitter threads I've written that I can't easily find again on Twitter, so I'll copy them to Blogger like an old codger banging rocks together.
- I've got some thoughts about Splunk and metrics for observability...
- The event-first Splunk can now store metrics efficiently. That has potential: 1 dashboard, a single glass of pain.
- I'm excited to see annotations and mcatalog; I'm hoping it allows resolution of a nasty problem with multi-source metric comparison.
- Metrics are quantitative. "Your volume has N bytes free". Good? Bad? Quantitative metrics are almost entirely useless for decisions.
- (I actually think they are useless. A triggered metric like "DISK FULL, 10 periods" is an event, not a metric. Splitting hairs.)
- Decisions from metrics need qualitative context. "Allocate more space now or later?" "How much more?" "What about budget & schedule?"
- Quantitative data, qualitative context, quantitative decision. If the context is only in humans, then humans need training to use it.
- "Fellow human, I teach you tool's contextual framework. It emits X metric, Y units, Z interval. Normal = A-S today. If X>N, runbook!"
- Encode that into a KPI? Hasn't improved anything. Still breaks when change means normal is wrong. Human has to know context to fix.
- Compare many KPIs? Not even feasible without qualitative metrics. "Q: Need more storage?" Looks at 4-tier hybrid hierarchy, "A: ???"
- In Metrics Store's catalog, seems that unit size is unknown, but there’s periodicity & granularity? If the source gathered them?
- Why don't sources just send context? Tools should compute useful values & compare metrics qualitatively. "Tier 3 is 95% full."
- Contextual decisions could be automated. "Usage will exceed capacity during your vacation, I think we should buy more space now."
- Data system problems could be seen. "Dashboard expects 15 metrics/period, now getting 3 from 1/6 of probes, & 1 OutOfCheese Error."
- Answer to "Why don't you just" questions is "Why should I". Splunk can answer that. Where's CIM for Metrics? Real attributes and KPIs?
- Determining importance of a metric needs context. "Disk full" is pitifully primitive. A service provider or vendor knows better KPIs.
- Sure would be nice to have vendor-specific tools for detailed analysis and role-specific tools with Splunk awareness metrics.