Projects
Open-source projects and research tools focused on AI agent security, MCP-era OAuth, prompt-injection defense, and the infrastructure plumbing that makes secure AI possible.
How these projects fit together#
Almost everything below is downstream of the same question: how do you build systems that delegate real work to LLMs and autonomous agents without losing control of what they actually do? Each project is a different bite at that. Some are production systems with users. Some are research benchmarks I built because I wanted answers I couldn't find anywhere else. A few are infrastructure tooling that exists because secure AI development has dependencies that aren't yet off-the-shelf.
The work splits roughly along four lines:
- Agent security and sandboxing. Permission models, runtime monitoring, and the day-to-day infrastructure that decides what an autonomous agent is allowed to touch and how cleanly you can revoke that access later. This is the largest cluster of work and the one that feeds back most directly into what I ship at Dropbox.
- Prompt-injection defense. Reproducible benchmarks and tooling for measuring how LLM applications hold up against direct, indirect, and tool-chained injection. Not because I think the attack class can be eliminated, but because we shouldn't be shipping defenses we can't measure.
- OAuth and identity for AI. MCP and agent ecosystems shifted the principal model — the entity making a request is increasingly a model, not a person — and the older OAuth patterns weren't designed for that. The projects in this area focus on token lifecycle, scoping, and revocation that actually work for agent traffic.
- Personal infrastructure and research toolkits. Smaller, scoped tools that solve a specific problem I had — secure communications, research workflow plumbing, task management — and that I've open-sourced because the underlying problem isn't unique to me.
What to look for on each card#
The cards below show the project's title, status (active / maintained / experimental / archived), tech stack, a short description, and the achievements or design goals that mattered most to me. The Learn more link drops into a longer write-up that covers the motivation, the architecture decisions, and what worked or didn't. Where source is public, you'll see a GitHub link as well — most of these are MIT or Apache-2.0 and PRs are welcome.
If you're scanning for something specific:
- For production-grade work, look for
activestatus and a non-trivial tech stack. - For research artifacts, look for
experimentalstatus and benchmark/eval-style descriptions. - For reusable infrastructure, the tags will call out OAuth, MCP, sandboxing, or observability.
The projects that don't appear here are either internal to Dropbox or still pre-release. If you're working on something adjacent — agent sandboxing, MCP server hardening, prompt-injection benchmarking, OAuth for agentic systems — I'd love to compare notes. Email me from the contact page.
MCP OAuth Framework
productionAn OAuth 2.0 framework for protecting MCP servers. Ships as three pip-installable packages: auth server, resource server, and a runnable example.
- Authorization server with PKCE, dynamic client registration, and RFC 6749 errors
- Resource server with RFC 7662 introspection and SSRF protection
- Sliding-window rate limiting at the token endpoint, friction control on tool calls
- Runnable example with Claude Code, Claude Desktop, and Cursor configs
TaskManager
productionA task manager built around a real OAuth 2.0 auth server. Includes a Python SDK and MCP server, so my AI agents can manage tasks too.
- Full OAuth 2.0 authorization server with PKCE support
- Security testing suite with Vitest
- Python SDK with complete API coverage
- MCP server for integration with LLM frontends
SMS Communications Suite
productionSend and receive SMS through GSM modems. Includes CLI tools and libraries in both Go and Python.
- Cross-platform GSM modem interface (Go + Python)
- Interactive CLI chat interface
- Integrates with TaskManager for out-of-band alerts
- Solid error handling and connection management
ReMarkable Research Toolkit
productionTools for managing research papers on reMarkable tablets. Uses AI to classify and sort them automatically.
- AI-powered research paper classification
- Zero-config rmapi binary management
- Integrates with TaskManager for research workflows
- PDF processing pipeline with content validation
Prompt Injection Defense Benchmark
completedEmpirical benchmark across 8 LLMs, 6 defenses, and 7 attack types. 10,080 tests measuring whether the phrasing of system-prompt defenses actually changes injection rates.
- 10,080 tests across 8 models × 6 defenses × 7 attacks (30 runs per combo)
- Found a 5x phrasing effect: 'log and ignore' beats 'simply ignore' on weak models
- Combined defense (sandwich + XML + log-and-ignore) hits 1.0% injection rate
- Identified few-shot poisoning as the only attack that bypasses strong defenses (29.5%)