Projects | Brooks McMillin

How these projects fit together#

Almost everything below is downstream of the same question: how do you build systems that delegate real work to LLMs and autonomous agents without losing control of what they actually do? Each project is a different bite at that. Some are production systems with users. Some are research benchmarks I built because I wanted answers I couldn't find anywhere else. A few are infrastructure tooling that exists because secure AI development has dependencies that aren't yet off-the-shelf.

The work splits roughly along four lines:

Agent security and sandboxing. Permission models, runtime monitoring, and the day-to-day infrastructure that decides what an autonomous agent is allowed to touch and how cleanly you can revoke that access later. This is the largest cluster of work and the one that feeds back most directly into what I ship at Dropbox.
Prompt-injection defense. Reproducible benchmarks and tooling for measuring how LLM applications hold up against direct, indirect, and tool-chained injection. Not because I think the attack class can be eliminated, but because we shouldn't be shipping defenses we can't measure.
OAuth and identity for AI. MCP and agent ecosystems shifted the principal model — the entity making a request is increasingly a model, not a person — and the older OAuth patterns weren't designed for that. The projects in this area focus on token lifecycle, scoping, and revocation that actually work for agent traffic.
Personal infrastructure and research toolkits. Smaller, scoped tools that solve a specific problem I had — secure communications, research workflow plumbing, task management — and that I've open-sourced because the underlying problem isn't unique to me.

What to look for on each card#

The cards below show the project's title, status (active / maintained / experimental / archived), tech stack, a short description, and the achievements or design goals that mattered most to me. The Learn more link drops into a longer write-up that covers the motivation, the architecture decisions, and what worked or didn't. Where source is public, you'll see a GitHub link as well — most of these are MIT or Apache-2.0 and PRs are welcome.

If you're scanning for something specific:

For production-grade work, look for active status and a non-trivial tech stack.
For research artifacts, look for experimental status and benchmark/eval-style descriptions.
For reusable infrastructure, the tags will call out OAuth, MCP, sandboxing, or observability.

The projects that don't appear here are either internal to Dropbox or still pre-release. If you're working on something adjacent — agent sandboxing, MCP server hardening, prompt-injection benchmarking, OAuth for agentic systems — I'd love to compare notes. Email me from the contact page.

MCP OAuth Framework

production

PythonOAuth 2.0Model Context ProtocolPKCEStarlettePostgreSQL

An OAuth 2.0 framework for protecting MCP servers. Ships as three pip-installable packages: auth server, resource server, and a runnable example.

Authorization server with PKCE, dynamic client registration, and RFC 6749 errors
Resource server with RFC 7662 introspection and SSRF protection
Sliding-window rate limiting at the token endpoint, friction control on tool calls
Runnable example with Claude Code, Claude Desktop, and Cursor configs

Learn more → GitHub →

TaskManager

production

AstroNode.jsPostgreSQLOAuth 2.0PythonMCP

A task manager built around a real OAuth 2.0 auth server. Includes a Python SDK and MCP server, so my AI agents can manage tasks too.

Full OAuth 2.0 authorization server with PKCE support
Security testing suite with Vitest
Python SDK with complete API coverage
MCP server for integration with LLM frontends

Learn more → GitHub →

SMS Communications Suite

production

GoPythonGSMSerial CommunicationTyperThreading

Send and receive SMS through GSM modems. Includes CLI tools and libraries in both Go and Python.

Cross-platform GSM modem interface (Go + Python)
Interactive CLI chat interface
Integrates with TaskManager for out-of-band alerts
Solid error handling and connection management

Learn more → GitHub →

ReMarkable Research Toolkit

production

PythonGormapiAnthropic ClaudearXiv APIPDF Processing

Tools for managing research papers on reMarkable tablets. Uses AI to classify and sort them automatically.

AI-powered research paper classification
Zero-config rmapi binary management
Integrates with TaskManager for research workflows
PDF processing pipeline with content validation

Learn more → GitHub →

Prompt Injection Defense Benchmark

completed

PythonAnthropic ClaudeOpenAI GPT-4oGoogle GeminiDeepSeekLlamaLLM-as-judge

Empirical benchmark across 8 LLMs, 6 defenses, and 7 attack types. 10,080 tests measuring whether the phrasing of system-prompt defenses actually changes injection rates.

10,080 tests across 8 models × 6 defenses × 7 attacks (30 runs per combo)
Found a 5x phrasing effect: 'log and ignore' beats 'simply ignore' on weak models
Combined defense (sandwich + XML + log-and-ignore) hits 1.0% injection rate
Identified few-shot poisoning as the only attack that bypasses strong defenses (29.5%)

Learn more → GitHub →