A Claude Code session confabulated a nonexistent Python file, persisted against five truthful "does not exist" errors, then self-diagnosed as corrupted tool output. A reconstruction from the raw transcript, a corpus scan across 3,001 sessions on whether the failure is worse in Opus 4.8, and a model-independent mitigation.
#ai-security
3 posts tagged with #ai-security. View all tags
An empirical study of 10,080 prompt injection attempts across 8 models, 6 defense strategies, and 7 attack types. The results challenge common assumptions about prompt-level defenses.
An introduction to the flaws in security testing for AI-generated code.