Engineering snapshot · 2026 · Remy v1

The architecture, made concrete.

A compact, deployable typed-component interface for À La Carte: one shared RecipeCard model, six generators (one per card type), one worked evaluation rule, and a lightweight operator loop. The CWM/AutoHarness pattern at file scale — small enough to read in five minutes, complete enough to deploy.

01  ·  Shared data model

Every card is the same object.

Type defines functional role; state defines lifecycle. Provenance is required — nothing enters the catalog without a traceable source. Evaluation scores are optional but, when present, are part of the same envelope. The constraint is enforced before content is written, not after.

The shape

One schema, six functional roles, five lifecycle states

Each card carries the same fields. The card_type says what role it plays; the state says where in the loop it lives.

card_type · 6 state · 5 provenance · required scores · optional

Lifecycle

From generated to filed (or 86'd)

StateMeaning
newGenerated and awaiting use.
in_serviceCurrently in the room or on deck.
editedModified by the speaker or operator.
filedAccepted into the canonical library.
rejectedExplicitly discarded; "86'd" in UI.
models/recipe_card.py Python · Pydantic
from enum import Enum
from typing import List, Optional, Literal
from pydantic import BaseModel, Field
from datetime import datetime

class CardType(str, Enum):
    definition       = "definition"
    common_stumble   = "common_stumble"
    speaker_phrasing = "speaker_phrasing"
    spicy_question   = "spicy_question"
    open_question    = "open_question"
    analogy          = "analogy"

class CardState(str, Enum):
    new        = "new"
    in_service = "in_service"
    edited     = "edited"
    filed      = "filed"
    rejected   = "rejected"

class Provenance(BaseModel):
    source_kind: Literal["paper", "speaker_edit", "audience_question", "prior_card", "session_note"]
    source_ref:  str
    excerpt:     Optional[str] = None
    confidence:  float = Field(ge=0.0, le=1.0)

class EvaluationScores(BaseModel):
    groundedness: float = Field(ge=0.0, le=1.0)
    clarity:      float = Field(ge=0.0, le=1.0)
    usefulness:   float = Field(ge=0.0, le=1.0)
    faithfulness: float = Field(ge=0.0, le=1.0)
    overall:      float = Field(ge=0.0, le=1.0)

class RecipeCard(BaseModel):
    card_id:        str
    session_id:     str
    paper_id:       str
    card_type:      CardType
    title:          str
    body:           str
    intent_line:    str
    state:          CardState = CardState.new
    tags:           List[str] = []
    pairs_with:     List[str] = []
    audience_level: Literal["novice", "mixed", "expert"] = "mixed"
    provenance:     List[Provenance]
    scores:         Optional[EvaluationScores] = None
    created_at:     datetime
    updated_at:     datetime

02  ·  Six typed generators

The typology table, made callable.

One generator per card type. Each takes the session context, returns a schema-validated RecipeCard, and is colored to match the typology in the case study — same components, same functional roles, now as functions.

generate_definition_card(ctx)

States what the audience is looking at right now.

generate_common_stumble_card(ctx)

Names a likely invalid inference or recurring confusion.

generate_speaker_phrasing_card(ctx)

Produces speakable language that moves the room.

generate_spicy_question_card(ctx)

Finds a high-value discussion question worth holding.

generate_open_question_card(ctx)

Infers the latent confusion beneath the visible question.

generate_analogy_card(ctx)

Creates a bounded analogy that clarifies without drift.

03  ·  Worked evaluation rule

A rule that fails the right things.

Analogy cards are the riskiest card type — they are the most likely to drift into vivid-but-wrong. The evaluation rule below is the AutoHarness-style unit test: anchor to the paper, refuse known drift phrases, stay short enough to be operational in the room.

eval/analogy_groundedness.py Python · pure logic
def evaluate_analogy_groundedness(card: RecipeCard, paper_excerpt: str) -> dict:
    assert card.card_type == CardType.analogy

    banned_patterns = [
        "human-like consciousness",
        "magic",
        "it just knows",
        "literally creates the video",
    ]

    grounded_keywords = ["feature", "representation", "masked", "predict", "context"]

    body_lower    = card.body.lower()
    excerpt_lower = paper_excerpt.lower()

    banned_hit      = any(p in body_lower for p in banned_patterns)
    keyword_overlap = sum(1 for k in grounded_keywords if k in body_lower and k in excerpt_lower)
    short_enough    = len(card.body.split()) <= 60

    passed = (not banned_hit) and keyword_overlap >= 2 and short_enough

    return {
        "passed":          passed,
        "banned_hit":      banned_hit,
        "keyword_overlap": keyword_overlap,
        "length_words":    len(card.body.split()),
    }

04  ·  Tests & deploy

What runs before deploy.

Schema validates, generators route, the evaluation rule discriminates between grounded and drifting analogies, and the human loop is reachable. After that, the snapshot is just a matter of pushing.

Test plan

Four checks before ship

  • SCHEMA Model validates required fields and enum constraints.
  • ROUTING Each generator returns the correct card_type.
  • EVAL Grounded analogy passes; drifting analogy fails.
  • HUMAN LOOP Operator can edit, reject, or file a card.

Push flow

From local to public

terminal git
git init
git add .
git commit -m "Add Remy v1 interface snapshot"
git branch -M main
git remote add origin <repo-url>
git push -u origin main