Building AI Agents with Claude — What I've Learned So Far

I've been building with Claude for a while now, but in the last few months I've gone all-in on Anthropic's full stack: Claude as the brain, Claude Code as my development environment, and the Agents SDK for building production systems. Here's what I've learned.

Why I chose Anthropic's stack

There are a lot of options in the AI space right now. OpenAI, Google, Meta, a dozen startups. I looked at all of them. I stuck with Anthropic for three reasons:

Claude is the best conversational model I've used. It follows instructions, it reasons well, and it doesn't hallucinate as much as the alternatives. When you're building agents that talk to real customers, reliability isn't optional.
Claude Code is a genuine development tool. It's not a toy or a demo. I build real production code with it every day. The fact that it understands my codebase, can edit files, run commands, and work iteratively — that's what makes it usable for someone in my situation.
The Agents SDK gives you control. You define the agent's tools, its guardrails, its handoff logic. It's not a black box. You can see exactly what the agent is doing and why.

What's working

The pattern I've landed on is straightforward: build specialized agents that do one thing well, then compose them together. A booking agent that handles scheduling. A triage agent that figures out what the customer needs. A follow-up agent that checks in after appointments.

Each agent has a clear role, clear tools, and clear boundaries. The Agents SDK makes this natural — you define an agent with its instructions and tools, and the SDK handles the conversation loop, tool execution, and handoffs between agents.

What surprised me most is how well this works for small businesses. You don't need a massive infrastructure. A well-designed agent with the right tools and good instructions can handle 80% of the routine interactions a clinic deals with every day.

What's tricky

Guardrails are everything. An agent without guardrails will eventually say something wrong, make a promise you can't keep, or try to do something it shouldn't. You need to think carefully about what the agent can and can't do, and build those limits in from the start.

Testing is hard. How do you test a system that generates different responses every time? I've been building test suites that check for behavioral properties rather than exact outputs. Does the agent book appointments correctly? Does it refuse to give medical advice? Does it hand off to a human when it should?

Context management matters. Agents need to remember what happened earlier in a conversation but forget what happened in someone else's conversation. Getting the context window right — what to include, what to summarize, what to drop — is more art than science right now.

Why I think this is the right bet

AI agents are going to become standard infrastructure for small businesses. Not next year, not in five years — now. The technology is ready. The cost is manageable. The ROI is obvious.

A clinic that can respond to leads at 11pm, book appointments without phone tag, and follow up with patients automatically — that clinic is going to outperform the one where everything goes through a single overwhelmed receptionist.

I'm betting my career on this. Given that I'm rebuilding my career from scratch anyway, it's not as risky as it sounds. But I genuinely believe this is where things are going, and I want to be building it, not watching it happen.

If you're interested in what I'm building or want to talk about agents for your business, reach out. I'm always happy to talk shop.