Case study · 2026 · Showcase prototype

À La CarteÀ La Carte

A reading-group companion that frames training-data flywheels as a kitchen. Built as a Snorkel Flow / Foundry showcase — and, almost accidentally, a working argument that the typed-component pattern from frontier-lab game-playing research generalizes outside of games.

Role
Solo design & build
Concept · UI · prototyping
Built with
Claude · Cursor · React
HTML/CSS · iterative bake-off
Technical lineage
AutoHarness · CWM
Snorkel Flow · Foundry
Status
Showcase prototype
May 2026 · in progress
The problem

Reading groups break in predictable ways.

Speakers struggle to anticipate which questions will land and which will derail. Audiences want to engage but can't always tell what's worth asking. The room runs out of time before it runs out of curiosity.

À La Carte is a companion app for the speaker and the audience, themed as a kitchen. The agent — Remy, a young apprentice — generates short, named primitives called recipe cards. Speakers prep with them. Audiences bring them to the table. The same card system runs both sides of the room.

I built it as a showcase: end-to-end, deployable on Snorkel Flow and Foundry, with the kind of architectural rigor that lets a research engineer recognize the pattern underneath. The result is four scenes — the card, the rail, the menu, the box — each art-directed to demonstrate a different dimension of the system.

Six recipe card types. Five lifecycle states. One shared model that gets sharper at every table he runs. The box fills, session by session — and that filling is the product.
The flywheel made physical · every reading group leaves a tip
v4 · the catalog Open in new tab
The system

Six cards, six functional roles.

The card typology is the centerpiece. Each type plays a structurally distinct role in the agent architecture — not a stylistic category. The constraint is enforced before content is written, not after.

Recipe cards are the unit of the system. They are small (a few sentences), named (each gets a deterministic ID), reusable (they survive past the session that produced them), and typed (each card type does one specific kind of work). The typology is principled rather than decorative — and that's what makes the recipe catalog a canonical component library rather than a pile of notes.

Card type
Functional role
What it does
Definition
Observation function
Surfaces what the audience perceives about the paper-state at this moment — the "what are we looking at" call.
Common Stumble
Legal-action enumerator
Names the reasoning moves that are not valid here; tells Remy and the speaker which inferences will mislead.
Speaker Phrasing
Transition function
Moves state forward — how to take an audience question and produce a grounded next state of understanding.
Spicy Question
Value function
Estimates which directions of inquiry are high-value; flags questions worth holding the room for.
Open Question
Inference function
Estimates the audience's hidden state — what they might actually be confused about beneath the question they asked.
Analogy
Regularizer
Anchors generation to the paper itself; prevents Remy from drifting into trivial or unfaithful encodings.

Each type lives in one of five lifecycle states — new, in-service, edited, filed, and 86'd — which map onto the refinement loop: cards arrive, get used, get corrected by the speaker, get filed into the canonical library, or get rejected. Speaker edits are the environment-feedback signal. The wooden cabinet visibly fills, drawer by drawer, session by session, as filed cards accumulate. That filling is the architecture made visible.

The four scenes

A card, a speaker, an audience, a library.

Each scene answers a different design question. The card shows the unit. The rail shows how the speaker uses it. The menu shows how the audience encounters it. The catalog shows what the system becomes.

01 · the unit

The recipe card

The centerpiece component. Two-sided: front carries human warmth — italic serif title, divider phrase, body, paperclip "pairs with" tags, signed by Remy. Back carries the engineering — confidence bars, provenance, deep link to Snorkel Flow.

Click to flip. Try all six card types and all five states with the keyboard shortcuts.

Demonstrates Component consistency · type hierarchy · craft on the centerpiece
Built with HTML · CSS custom properties · vanilla JS · SVG noise textures
v1 · the card Open in new tab
02 · the speaker

The kitchen rail

Maya's view, mid-session. Three panes: a ticket queue of incoming audience questions on the left, a brass rail of plated cards in the center, and the canonical recipe box on the right.

Click a ticket to fire its matched card. Select a card; press S to file it into the canon. Press X to 86 it — the moment where the architectural feedback loop becomes visually surfaced.

Demonstrates HITL system design · two-sided handoff · live state
Watch for The 86 toss — ~1.5s, weighted easing, kitchen-physical
v2 · the rail Open in new tab
03 · the audience

The menu

Devon's view, the night before the session. A printed menu in three courses: Pour commencer (light primers), Le plat principal (the paper, served three depths), and Pour finir (optional pairings). Prices in italic — but a course costs minutes, not dollars.

Below the courses, three guided question cards Devon can bring to the table. Each one becomes a ticket Maya sees on the rail. The marketplace, made visible.

Demonstrates Two-sided marketplace · editorial typography · order flow
Built with Real menu typography — dotted leaders, French course names, sticky walnut order bar
v3 · the menu Open in new tab
04 · the canon

The catalog

The hero. A full card-catalog cabinet — five drawers, four closed past sessions framing tonight's open one. Inside the open drawer: six tab-divided sections for the six card types, each a stack of typed components from tonight's V-JEPA 2 session.

Click any tab to jump to a section. Click any card to open its full content in a modal — including its "pairs with" graph, the cross-references that make the canon a working component library rather than a notebook. Search filters the whole stack live. The flywheel made architectural.

Demonstrates The thesis · canonical library · long-term system state · cross-card references
Built with Pure CSS — flat, no 3D · trapezoid tabs via clip-path · stacks via layered box-shadow
v4 · the catalog Open in new tab
Technical lineage

A worked example of typed code components, in a kitchen.

À La Carte's architecture is informed by two papers from Google DeepMind: Code World Models for General Game Playing (Lehrach et al., ICLR 2026) and AutoHarness (Lou et al., 2026). Both address the same problem: LLMs acting as agents in structured environments take actions the environment forbids — not because the strategy is wrong but because the model's understanding of what is legal is fragile.

The DeepMind solution pushes the LLM out of the action loop and into a code-generation loop. The LLM synthesizes typed code components — transition functions, legal-action enumerators, observation functions, value functions, inference functions — that wrap the agent and turn rule-following into program execution.

À La Carte applies the same pattern to a different environment: a reading group. Remy is the agent. The paper, the speaker's intent, and the audience's level are the environment. Recipe cards are the typed components — each card type does structurally different work, the same way each CWM component does. Speaker edits are environment feedback. The wooden cabinet, drawer by drawer, visibly filling across sessions is the canonical component library growing through refinement.

Why Snorkel Flow is the production substrate

The Snorkel platform isn't a deployment afterthought — it's the production-grade implementation of the same pattern. Programmatic labeling functions are a production analog of CWM-style typed components. Snorkel Evaluate is the verification layer that AutoHarness runs as unit tests against environment feedback. Foundry is the deployment story for the single-shared-Remy architecture, where the canonical component library grows from real production feedback.

The "open deck" / "closed deck" distinction in CWM gives us two real product modes: open deck means the speaker is in the loop, their full understanding becomes training signal post-hoc through edits; closed deck means Remy learns from audience reactions alone. Both are real product modes for a deployed companion.

À La Carte isn't a clever metaphor. It's a working argument that the AutoHarness/CWM pattern generalizes to consumer-facing AI products — and that Snorkel's existing platform is the production substrate for deploying that pattern.
Process

Built in public, with AI tools, in a bake-off.

Eleven days, four scenes, one ATS-grade case study. The build process is itself an artifact — particularly the agent bake-off package, which doubles as a portfolio demonstration of prompting, evaluation, and model-quality reasoning.

May 1 · concept lock

Locked the metaphor and the architecture simultaneously

The "À La Carte" name, the kitchen theme, the recipe card system, and the agent (Remy) all locked in one session. Six card types. Five lifecycle states. One shared model. Tagline "Let's cook." Closing line locked: "Every reading group leaves a tip — in labeled data."

May 2 · technical scaffold

Mapped the typology onto CWM components

The card typology was designed before noticing the AutoHarness/CWM connection. Once the framing clicked, the mapping was clean: Definition → observation, Common Stumble → legal-action enumerator, Speaker Phrasing → transition, Spicy → value, Open → inference, Analogy → regularizer. Architecture-implicit-in-demo, explicit-in-writeup.

May 3 · the four scenes

Shipped v1 → v4, art-directed to the same standard

The card (centerpiece component, all six types and all five states), the rail (speaker dashboard with the 86 beat), the menu (audience-side, full order flow), the box (hero, the flywheel made physical). Each scene answers a different design dimension. Each one is a self-contained HTML file — single-source, no build pipeline, no dependencies beyond Google Fonts.

May 3 · polish pass

Triaged 14 findings against the kill thresholds

Defensive DOM on state markers, mobile breakpoints across all four files, sticky-bar overlap on the menu, selection-vs-hover competition on the rail, brittle CSS hacks replaced with proper structure, heading hierarchy made semantic. Three kill-threshold items, five craft-grade items, six refinements. Polish before publish.

In progress

Case study, public writeup, and the conversation

The case study you're reading. A long-form post structured around the AutoHarness reframe, threading CWM as the underlying scaffold. Outreach to Snorkel research and design staff after the publication is live — not before. The artifact does the work; the outreach is the cash withdrawal after the credibility deposit.

The throughline

Special education taught me product design.

A note on background, because it's the differentiator that justifies the rest.

I spent thirteen years in special education. SPED teachers assess individual users with heterogeneous needs, write a personalized spec — the IEP — that other people will execute against, define measurable outcomes, instrument progress monitoring, iterate based on what's working, design accommodations as interaction patterns, and live in the edge cases because that's where SPED literally operates. That is the job description of a senior product designer working on complex systems, with the artifacts swapped out.

The IEP is a spec doc. The accommodation is an interaction pattern. The progress monitoring is the metrics layer. The IEP team meeting is cross-functional crit. The kid is the user — and the default system is hostile to them. The job is to design a workable path through it.

À La Carte is a kitchen-themed reading-group companion that demonstrates AutoHarness/CWM-style typed components running on Snorkel Flow. It's also the product of someone who has spent thirteen years designing human-in-the-loop systems for users the default system was failing. The two skills are the same skill, applied at different scales.

— closing line —

Every reading group leaves a tip — in labeled data. The box fills. The agent learns. Let's cook.

— what Remy tells every new speaker