Work

Projects I've shipped and how I think about the problems.

How I think

Three things I keep coming back to

Not a skills list. Just the ideas I find myself returning to, and the lens I tend to bring to most problems.

Security isn't separate from capability — it's part of it.

Most agentic systems are one misdirected tool call away from a serious problem. I spend a lot of time building the layer that makes the right behavior the only possible behavior. Not filtered after the fact — enforced at construction time. Once I started seeing the problem this way, I couldn't unsee it.

Most AI teams don't actually know how their agents perform.

Cost and latency get labeled 'optimize later,' and then later arrives. I build evaluation frameworks that measure what actually matters — capability, cost, and latency simultaneously — with hard gates on regression. The uncomfortable numbers have to count. Otherwise you're just telling yourself a story.

The enterprise integrations nobody talks about are the interesting part.

Connecting Salesforce and Workday to an AI agent sounds unglamorous. But that's where agents actually become useful to people doing real work. I've built the pipelines for 40+ enterprise apps. The data is messy, the APIs are inconsistent, and it's genuinely hard. I like it more than I expected.

Projects

Things I've shipped

Three projects, each starting with something that was bothering me.

Security Infrastructure

MCP2MCP Proxy Gateway

What was going on

Every enterprise tool I touched had its own OAuth. Teams were writing bespoke integrations for every new MCP server — days of manual work, every time. Nobody was fixing it at the right layer.

What I figured out

If you build a proxy that handles RFC-compliant OAuth for any remote server, the integration cost drops to 'give me the URL.' That's the right abstraction. I went all-in on it: four discovery paths (RFC 6750, 9728, 8414, 8707) with Dynamic Client Registration as a fallback.

What happened

38+ servers in production, zero manual onboarding ops. Tool hallucinations became impossible — not because we filter bad calls, but because they can't happen by construction. That distinction matters.

GoOAuth 2.0FirestoreGCSCI/CD

Agent Evaluation

Agent Evaluation Framework

What was going on

I kept seeing evaluation reports where accuracy was 94% and teams were ready to ship — with no mention of cost or latency. Then the agent hits production and the bill arrives. Nobody had built a framework that made you look at all three at once.

What I figured out

Any regression on any axis should be a hard stop — you can't average away bad latency with good accuracy. So I built a 9-task deterministic harness that scores cost, latency, and capability simultaneously. Every axis gets a gate, not a weight.

What happened

Cut inference cost 64%, cut wall time 57%, zero capability regressions. The numbers held up. More importantly: the framework is now the gate that every model upgrade has to pass.

PythonClaude Agent SDKLangChainMLflow

Developer Tooling

aigw-mcp

What was going on

AI Gateway had a lot of power under the hood, but you had to click through a UI to get to any of it. Every config change was a human operation. Agents couldn't touch it.

What I figured out

The control plane APIs already existed — I just needed to expose them as MCP tools. A stateless Go server with no new infrastructure, no new auth, no new state. The surface area is exactly the protocol.

What happened

Agents can now configure their own traffic policies. That had never been possible before. It's a small thing on paper and a significant thing in practice.

GoMCP ProtocolREST APIs