My company has been through 3 different "LLM Observability" vendors and they each have failed to give us the one (simple) thing we want. Willing to pay for this.
The ONLY thing we care about is the ability to: - Log an LLM completion, and be able to press a button that lets us re-run the exact same completion in a UI (industry seems to call this the "playground"). We can rerun this completion exactly how it was in production.
What we DO NOT care about: - "datasets" - "scores" - "prompt enhancers"
What entails the LLM Completion are you talking sequence of prompts with files / mcp servers. Could you share a bit more, cause I have spent some time with this and have something that might be precisely what you are asking for...
When I think of LLM / Agent observability I think of some combination of open telemetry and like Influxdb, but I don't think that's what your asking for?
I am curious, what’s the point of re-running these interactions on a UI?
Reproduction I suppose. I would like the same things as OP too.
LLM outputs are qualitative; they can't really be automatically scored and prompt enhancements tend to multiply the bug. It can solve a problem, but introduce a new one. It's practical just to do it manually.
I'm sure if you ask Claude Code exactly that, they will develop what you want.
Tell it to create an API for the LLM data ingestion, then integrate with it on your software.
BTW, this is far from what an LLM Observability tool will offer you. You are a bit confused what O11Y is.