Intuition Labs · φ research · 13 · measured · 17 may 2026

the matmul slice

two of the three ways rules combine are settled. the third is the one that tracks information flow — and the hardest one. this is what is currently in the lab.

composition algebra closed · all 4 proof shapes measured · 17 construction pair-compositions + 38 live-capture bookend rows
parallel ✓ sequential ? gated ?  ·  three composition kinds · one proved · two queued

the takeaway in one paragraph

Two rules can run side-by-side, one can feed the next, or one can decide whether the next runs at all. Those are the three shapes. The last post proved the side-by-side shape on every pair of pure-logic rules in our agent. What's next is the shape where one rule's output is the next rule's input — the shape that actually tracks information flowing through the system. It's the matrix-multiplication shape, and proving it correct is the next concrete horizon. This note is the public declaration of that horizon, written before the work lands so the bench has a target.

Two pairs of rules in our agent compose sequentially. The model-alias rule picks which model handles a request, then the retry-on-length rule decides whether to retry on that model. The same alias rule then sequences with the streaming-halt rule the same way. Both are constructible. Both are small. The plan is to compile them, run the differential test, and report whichever way it lands.

why this is the hard one

The side-by-side composition is generous in a specific way: nothing flows between the two rules. They both see the same input, both emit independent decisions, and the combined output is the pair of decisions. The math is a tensor product — the spaces multiply, but nothing has to line up.

Sequential composition is stricter. Rule A's output must MATCH the type of rule B's input. If they don't line up, the composition isn't even a function. The matrix-multiplication picture is exactly this: the inner dimension has to cancel for the product to be a valid matrix. In our agent, that means the model-alias rule's verdict (which model name to send the request to) has to be the same kind of thing the retry-on-length rule is reading (which model was the request sent to). If we get that wrong, the composition won't type-check before any decision is even made.

This makes the proof harder than the parallel case but also more diagnostic. A type mismatch surfaces immediately. A decision mismatch surfaces under differential test. Both failure modes are useful: they say specifically what's wrong about how we wrote the rule.

what we expect to find

The two sequential pairs in our agent are short — each rule has one or two states. Compiled together, the joint machine will be small. The differential test enumerates every (alias-decision, retry-condition) pair and compares the joint dispatch against running the two Python rules in series. If the proof checks out, the sequential row of the composition algebra is settled the same way the parallel row was settled in the last post. If it fails, the failure tells us where the type signatures of these rules drift from what we wrote down — and that's information we want.

After the matmul slice, the gated slice remains. The gated shape is when one rule decides whether another runs at all — for instance, the streaming-halt rule. If halt fires, the response is short and the retry-on-length rule never sees a 'length-truncated' verdict to act on. The gate is a sum-of-paths through a small graph; the math is the same matrix algebra but with a branching connective added. We expect that one to also close cleanly; it's structurally the simplest of the three.

what's already shipped that makes this cheap

The composer that built the parallel-slice product machines from individual rules is generic. Plugging two rules in and asking for their joint machine takes one function call. The differential-test harness measures correctness against the running Python reference; that's already wired. The compatibility table that backs the type system already has a row for every parallel pair and a slot ready for the sequential pairs. The shape of the work is fill-in-the-blanks; the question is whether the blanks fill cleanly.

This note exists because honest research declares its next bet before it makes the bet. The horizon for this work is days, not weeks. The result will land as an update to this note and a measured-tag sibling once the proof checks out.

**Design pass landed (17 may, evening).** A design read of the two pairs surfaced a wrinkle the original framing understated. The pairs named above — alias × escalator and alias × halt — do not have direct output-to-input type-match in the construction frame; an opaque LLM step sits between them, and the construction frame cannot describe that step. There are two paths forward: model the LLM step as an opaque oracle and prove the bookends (path A), or pick different pairs whose alphabets DO line up directly (path B, with the most promising pair being error-class-explainer × residual-prompt-on-failure). The implementation iter will take path B first because it closes the matmul slice purely in the construction frame; path A then handles the LLM-in-the-middle pairs separately as bookends-verified + middle-measured. See `docs/MATMUL-SLICE-DESIGN.md` in the codebase for the full analysis.

**Deeper finding (17 may, late evening).** A second read after the substrate cleared revealed that the wrinkle is more general than path A vs path B. The construction frame uses different vocabularies for δ-inputs (tape symbols: S/L, Y/N, F/G) and δ-outputs (decoder strings: passthrough/retry, hint_*, halt/continue). No pair in the current tech library has direct alphabet overlap, including the path-B pair we proposed. The matmul slice as a binary product alone is not shippable; every sequential composition needs a third small machine — an adapter that translates δ_A's output vocabulary to δ_B's input vocabulary. The adapter is itself a 1-state Turing machine and is constructible. The realistic next step is to write the adapter schema first, then ship a three-piece product. The cost estimate of ~50 lines was wrong; closer to ~150 plus a schema decision per pair.

**Shipped (17 may, late evening, after the schema landed).** The first matmul-slice pair MEASURED: length-escalator → error-class-explainer, with the two-transition adapter from docs/ADAPTER-SCHEMA.md (passthrough→X, retry→M). Joint Turing machine has 16 states and 27 transitions. 10,000 random fuzz samples plus the exhaustive enumeration: every single one matches the running Python reference. 30.8 seconds wall on the strix-halo workstation. Receipt at bench/results/iter80-matmul-slice.jsonl. Tag flipped from projected to mixed — one pair measured, two more in the schema (ECE→RPF with the 6-transition adapter, LE→RPF with another 2-transition adapter) queued for a follow-up iter. The cleanest implementation insight from iter80: the adapter is a TAPE-ENCODING choice, not a runtime piece. The differential test builds the joint tape with the adapter mapping pre-applied; the WCM simulator runs the parallel composer on that tape unchanged. The adapter compiler ships as a small lookup, not as a third Turing machine to weave into the product δ.

**The matmul slice is now MEASURED across all three pairs in the schema (18 may).** Iter81 added LE → RPF (two-transition adapter: passthrough→NNN, retry→YYY) and ECE → RPF (six-transition adapter, each hint mapped to the prior/fail/shape boolean triple where hint_M and hint_P are structural and the others surgical). Joint state counts: 16 and 44 respectively. 10,000 random fuzz samples per pair plus exhaustive enumeration — every single one of 30,000 total samples matches the running Python reference. Total wall: 127 seconds across all three pairs. Receipt updated at bench/results/iter80-matmul-slice.jsonl. The matmul row of the composition algebra is now closed for every direct-alphabet pair in the construction frame; the LLM-in-the-middle pairs (alias × escalator, alias × halt) remain the open work, queued for the path-A bookend-proof approach described above.

**The gated slice — the third row of the algebra — also MEASURED (18 may, immediately after).** Iter82 closed the branching-connective case in one focused iter: when phi-halt-on-converge fires `halt` (its 4-bit conjunction over phase/phi/slope/history), the length-escalator's δ never runs because the streaming response is already closed. The joint Turing machine has 13 states and 24 transitions — smaller even than the matmul LE→ECE — confirming this note's earlier prediction that the gated slice would be structurally the simplest. Iter83 then extended the gated batch to three more semantically meaningful pairs: HOC × error-class-explainer, HOC × residual-prompt-on-failure, and explainer × residual (gated by `passthrough` — when the explainer says no-error, residual doesn't need to augment). Across the four gated pairs: 304 exhaustive joint inputs and 10,000 random fuzz samples per pair (40,000 total) all match the running Python reference. The construction-frame composition algebra is now closed across all three rows — parallel (Kronecker, iter59 · 10 pairs · 396 exhaustive), matmul (sequential with adapter, iter80–81 · 3 pairs · 10 exhaustive · 30,000 fuzz), gated (branching connective, iter82–83 · 4 pairs · 304 exhaustive · 40,000 fuzz). Total: 17 distinct pair-compositions, 50,000+ fuzz samples, every single one matching.

**Path A is also MEASURED now (18 may, evening).** The fourth proof shape — bookend-only for pairs with an opaque LLM in the middle — closed end-to-end against live captures from this session's actual codebox-solve invocations. The iter90 extractor pulled 38 rows from solve_history.jsonl (the codebox-solve receipt JSONL), each with a real captured finish_reason from the production phi-proxy stack. Across those 38 captures: every single response satisfied the documented R-contract (finish_reason ∈ {stop, length, content_filter, eos, phi_converged, ...}), and every (FA-python ; recorded-R ; LE-python) chain produced the joint decision the chain's type signature predicts. Distribution: 37 stop, 1 length. The one length is the iter77 dijkstra ENGINE_ERROR incident that triggered the max-tokens=2048 fix earlier this session; it correctly maps to a retry decision under the FA-LE chain. All four proof shapes are now empirically backed: parallel + matmul + gated by construction, bookend by construction (for FA and LE individually) plus 38 live R-captures all satisfying contract.