Grafana just announced that their Assistant can learn your infrastructure before you even ask questions. Reading this reminded me of a conversation I had with an engineering director last week who said something that stuck: "Our AI tools are great at writing code, terrible at understanding why our systems break."
He's not wrong. I've watched countless demos where engineers show off AI generating Kubernetes manifests or autocompleting functions. But when that 3am alert fires and they need to understand why their checkout service is suddenly crawling, they're back to manually correlating logs, metrics, and traces like it's 2015.
The gap isn't in AI capability, it's in context. Your AI assistant doesn't know that your payment processor always hiccups during flash sales, or that your database connections pool differently in staging versus production. It can't tell you that the last three times this alert fired, it was because someone deployed during peak traffic hours.
This is where the observability industry is headed, and frankly, it's about time. We've spent years building better dashboards and alerts, but we haven't solved the fundamental problem: making our systems explain themselves in the moment of crisis.
The teams that figure this out first—the ones who teach their AI tools about their infrastructure's personality, not just its configuration—are going to have a massive advantage.
Here's my question: when your last major incident happened, how long did your team spend just figuring out what was broken versus actually fixing it?