I built a mailing system for my personal team of AI agents. Thirty-six specialised agents, three coordination domains, two months running on my own machine. Before it, I was the courier between every agent session by relaying messages by hand. Now the agents route their own mail.
Before agent-to-agent communication, I was the messenger
The mailing system has a rulebook for how messages flow between agents: what shape a message has, what types it can be, how a request becomes an acknowledgement and then a completion report. Claude Code, the platform I built this on, exposes hooks: small scripts that fire on file-system events. They enforce the rulebook at write time. Before any of it existed, I was the messenger.
I had a team of specialised AI agents, each running in its own session, each built for a different role. One agent for research, one for engineering planning, one for code review. The work flowed across them. A research finding had to reach the agent that would plan from it. A plan had to reach the agent that would review it. The output of one agent was the input of the next, almost every time.
But the agents were sealed off from each other. No shared memory, no shared event bus, no way to see what the others had done. When one finished its part of a workflow, the next had no way to know unless I told it. I read the first agent's output, opened the next agent's session, typed the relevant context in, waited for the reply, copied it back. That was the operating loop.
I was the latency layer between my own agents. Every coordination signal passed through me. I could not scale, could not step away, could not run more than few agents at a time without being the limit.
Sessions were isolated by design. Nothing off-the-shelf fit the shape of the problem, because the shape of the problem was a team of language models talking to one human operator who was not supposed to be in the loop forever.
The mailing system took me out of the loop.
Where the design came from
I started from one question: how do these agents talk to each other without me in the middle. The most familiar transport pattern we know is mailing, so that is what I built. Each agent got an outbox to write into, an inbox to read from, and a delivery layer between them. That was the seed.
The rest of the system (the router, the schema enforcement, the notification layer) was not designed up-front. Each piece was built when the simpler version stopped working. The simpler version stopped working fast.
The system today, in six layers
The mailing system has six operational layers, built in the order they were needed. The scale stayed personal (one developer, one machine, thirty-six agents), but each layer was built to production discipline. They all sit on top of one file.
That file is the routing table: a single JSON document with every agent's name, inbox path, outbox path, and coordination domain. The router reads it to deliver mail. The wake server reads it to know which inboxes to watch. The health checks read it to know which agents to sweep. It is the address book the rest of the system assumes. One file, one source of truth. A stale entry in it produces dropped messages with no trace, which is why no other component is allowed to embed paths or names independently.
1. Inboxes and outboxes. Each agent has three directories: inbox/ for active mail, completed/ for closed Commissions, archive/ for everything else read. The split is the flush rule made structural. A message that has been actioned does not sit in the live inbox to be loaded again the next time the agent opens it.
2. Wake channels. Notifications run on Claude Code's MCP (Model Context Protocol) channel feature. A small server watches every agent's inbox and fires a push event into active sessions when mail lands. The base read-path still works without it. This is an optimisation that turns polling into push, nothing more.
3. Hook-enforced schema. Every message carries YAML frontmatter: type, from, to, a commission-ref UUID for chain correlation. An outbox gate runs as a write hook and validates the frontmatter at the moment the file is written. A message without a valid type cannot leave the outbox at all. This is what turns the schema from a documentation artefact into a runtime contract.
[Figure 1]
4. Typed message lifecycle. Eight types: Commission, Acknowledgement, Completion Report, Notification, Finding, Amendment, Withdrawal, Bug. Each carries different semantics and a different lifecycle. The central one is the Commission chain. A Commission is a task assignment. The recipient sends an Acknowledgement to confirm receipt. The recipient executes. The recipient sends a Completion Report to close the loop. Closure has two parts that must happen in the same session: the report goes out, the original Commission moves to completed/. Neither part counts without the other.
[Figure 2]
5. Health checks. The system monitors itself. A network sweep walks every agent's registered directory and verifies the inbox structure, the hooks, the references. A dedicated check confirms the outbox router hook is registered in each agent's settings. This one was built after I found that 86% of agents were missing the hook entirely, which is the most common silent failure in the network. Messages sit in the outbox unrouted. No error, no retry, nothing in the log. A structural audit scans all outboxes for header consistency. A failure-modes catalogue documents what breaks and how to detect it.
6. Rules in agent skills. Some mailing system rules cannot live in hooks, because they govern model behavior. The flush rule is an example: once a message has been read and acted on, it leaves the live inbox. Commissions move to completed/ after closure; Notifications move to archive/ after they are read. The live inbox holds outstanding work and nothing else. This rule is encoded as text in every agent's skill file, read at session start, applied as part of normal work. The reading side of the contract lives at this layer because deciding whether a message has been acted on is a semantic call, not a file-system event.
To see the layers working together: an agent writes a typed message into its own outbox. The hook fires the moment the file is written. The gate validates the frontmatter. The router looks up the recipient in the routing table, assigns the next sequence number, and drops the file into the recipient's inbox. The wake server pushes a notification into the recipient's session if one is active. The next time the recipient runs its inbox check, the new file is at the top.
[Figure 3]
The system is not large. It runs on one machine. But the discipline applied to it (schema validation at the write boundary, automated hook checks, formal deprecation when something is replaced) can be applied to a multi-tenant production system.
What this signals
The discovery I did not expect was what hooks do beyond gating. Claude Code's write-time hook fires on every outbox file and validates the message frontmatter against the schema. When validation fails, the hook hands the failure back to the model, and the model rewrites until the schema is met. The gate became a refinement loop.
I would start differently today. Research first: how electronic mail works, how communication protocols in mailing systems work, how contract handshakes work. Then build the system on already well-researched, well-proven methods rather than shape it from imagination.
What does not show up in the diagrams is the agent who owns the communication system. The rules and the protocols live in that agent's skill file; the agent carries the system's coherence over time. A diagram of inboxes and routers shows the operational layer. The steward is the layer that keeps it correct as the system evolves.