A Claude Code session confabulated a nonexistent Python file, persisted against five truthful "does not exist" errors, then self-diagnosed as corrupted tool output. A reconstruction from the raw transcript, a corpus scan across 3,001 sessions on whether the failure is worse in Opus 4.8, and a model-independent mitigation.
#agents
3 posts tagged with #agents. View all tags
Why OAuth scopes aren't enough for autonomous LLM agents calling MCP tools, and how we wired Tenuo capability warrants end-to-end. Scope-gated rollout, two real bugs, multi-hop delegation, and an attack the warrant catches.
Six layers of security architecture for running LLM agents as daily drivers — every design decision with production stats and companion code.