the box is a type, not a thing
we stopped writing flavor.yaml. an agent identity is now a type. invalid combinations cannot be constructed — the compiler is the validator.
the takeaway in one paragraph
An agent is just a particular combination of rules, skills, and a wrapper that wires them together. People usually capture that combination in a YAML file and call it a 'flavor.' We tried that for about an hour and realized the YAML is going to drift from the code the day after you write it. Instead, we made the identity a TYPE — a small piece of Python that lists what's in the agent and refuses to construct if the combination is incoherent. Want a coding agent? Construct one type. Want a research agent? Construct another. Want a research agent that uses a single-LLM-call orchestrator? Refuses to compile — incoherent, won't be a thing you can run.
The box stays the same hardware. The substrate stays the same engine. What changes between identities is the type signature. We shipped the abstraction with two identities in the catalog. The coding agent runs exactly as before. The research agent is declared but the substrate-swap that actually flips between them is the next step.
the trouble with flavors
When you have one agent, you hardcode its config. When you have two, you start writing a config file — flavor.yaml or settings.toml or a JSON spec. The file says 'these rules are on, that termination condition fires, this orchestrator is the wrapper.' For a week or two it feels clean. Then the code grows. Someone adds a new rule that depends on the orchestrator. Someone else renames a skill. The YAML doesn't know. The code and the config drift apart. You ship a configuration that says one thing and runs another.
The fix that keeps being rediscovered: don't have two sources of truth. Pick the one you can actually check. We picked the Python value. The identity is a class instance with a strict schema. If you write a combination of fields that doesn't make sense, the class refuses to instantiate. The error message tells you which constraint you violated. There is no separate file to keep in sync because there is no separate file.
what a type does for us
Each identity is a value of a single Box type. The Box has six axes: orchestrator (how it talks to the model), techs (the dispatch rules), skills (the methodology library), termination (what counts as done), ui (how a human interacts with it), persona (the voice). Constructing a Box runs three checks at the moment of construction:
- orchestrator × termination compatibility — a single-agent loop has nothing to synthesize, so combining it with a 'synthesis-coherent' termination is rejected before runtime
- skills × orchestrator capabilities — a skill that requires parallel synthesis cannot live in a single-agent box, because the orchestrator doesn't have the parallel sub-agent capability the skill demands
- rule × rule compatibility — for every pair of dispatch rules in the box, the compatibility table is checked; pairs that were measured destructive in every regime where we tried them reject the whole identity
A Box that constructs successfully is a lawful identity by construction. There is no separate 'validate this YAML' step, because the validation IS the construction.
the three ways two rules can combine
Once an identity is a value, the algebra over identities is the algebra over their parts — specifically over their rules. Two rules in the same box can compose in three different shapes, all of them matrix-shaped if you look closely:
| composition | what it does | matrix shape |
|---|---|---|
| sequential | rule A's output is rule B's input | matrix multiplication (inner dimension cancels) |
| parallel | both rules fire independently on the same request | tensor product (dimensions multiply) |
| gated | rule A's verdict decides whether rule B fires at all | a branch through a graph (sum of paths) |
The interesting part: each of these can be PROVED correct rather than just tested. The previous post showed how the parallel case proves out for the five pure-logic rules in our coding agent — every pair of them, all the way to the joint decision space, verified by construction. The sequential case is next; we have one obvious example to compile (the model-alias rule, sequenced with the retry-on-length rule). The gated case follows.
What matters for the type system: each pair of rules has a known composition shape, and that shape is the lookup key for the compatibility check. When you construct an identity, the type system walks every pair and asks the table: lawful, regime-dependent, destructive-everywhere? A destructive pair refuses to construct. A regime-dependent pair flows through but carries the caveat. A lawful pair is invisible — that's the point.
what we shipped
The Box type, the catalog with two entries, and a small command-line interface to inspect them — all in one module that didn't exist this morning. The coding agent runs exactly as before. The research agent is declared and validates; switching the substrate over to actually run it is a separate piece of work.
| piece | what it does |
|---|---|
| Box / Tech / Skill types | the schema · constructors run validators · invalid combinations refuse to instantiate |
| catalog (registry) | two identities listed as Python values · adding a new one is ~30 lines |
| interaction table | per-pair compatibility · backed by the construction-frame proofs from the previous note |
| box CLI | list / show / which / use / validate / dump — operates on the identity layer only · the running stack is untouched |
The substrate — the engine, the request proxy, the tech library on disk, the cycle and bench tooling — is unchanged. The new module imports nothing from the running stack and is imported by nothing in it. Deleting the new directory returns the box to today's behavior. The split is real, not nominal.
what we're not claiming
- Switching identities does not yet swap the running stack. The CLI writes a pointer. Actually reconfiguring the request proxy and the orchestrator daemon to match the active identity is the next concrete step. The pointer is meaningful — it tells the operator what they intend to be running — but it doesn't move bytes around yet.
- The compatibility table is small. We populated it from what we've measured directly: ten pairs verified by construction, one pair flagged regime-dependent from prior empirical work. Every other pair is 'unmeasured.' The table grows as the bench runs.
- Sequential and gated composition are not yet proved by construction the way the parallel case is. Proving them is the next horizon for the construction frame.
- The research agent's actual orchestrator — the lead-agent-with-sub-agents wrapper that would let it run — does not exist yet. The type signature exists; the daemon doesn't.
where this came from
Reading the open-source agent frameworks people are publishing made one thing obvious: everyone reaches for a flavor abstraction the moment they have more than one identity. Everyone also drifts. The shape of the fix turns out to be linguistic — the right primitive isn't a file format but a type. Mac Lane wrote down the algebraic structure for typed compositions in the 1940s. Sutton-Precup-Singh restated it for reinforcement learning options in 1999. Linear logic has been pointing at the same shape since Girard in 1987. Programming language type systems have been enforcing exactly these constraints for decades.
We are not inventing. We are using the right tool. The catalog of identities is a few Python values. The composition checks are a few constructor validators. The CLI is a hundred lines. The full system fits in your head. That's the test that matters.