ADR-001: Reproducibility as an architecture requirement (multi-step pipeline + depersonalisation before any external LLM)
Status
Proposed
Date: 2026-06-18 Deciders: Founder/PM (concept stage)
Context
Curator's differentiation rests on analysis a coach can trust enough to hand a client, not on transcription (commoditized). A single call to a general LLM produces output that varies run-to-run and offers no evidence trail — failing the trust gate. Separately, sessions contain sensitive personal data; sending raw transcripts to an external LLM is a trust/compliance hazard (US-first: CCPA/CPRA per the studio page). These two forces — reproducibility and data safety — are architectural, not feature-level, and must be decided before the MVP is built.
Decision
We will treat reproducibility and depersonalisation as architecture requirements, not later optimizations:
1. Session analysis is produced by a multi-step pipeline (decompose → extract with evidence quotes → calibrate → assemble), designed to be as deterministic as practical and to attach a transcript quote to every conclusion — rather than a single free-form LLM prompt.
2. Depersonalisation runs before any external LLM call: identifying entities are replaced with placeholders and only re-identified inside the protected perimeter.
3. The reproducibility target is an explicit, tested metric (see 10-metric-design-experimentation.md), gating the move from pilot to v1.
Consequences
Positive
- Output is trustworthy and auditable (evidence-linked), which is the moat.
- Data-safety posture is built-in, reducing legal/trust risk and easing the sensitive-data JTBD.
- Reproducibility becomes measurable, so "good enough" is a number, not an opinion.
Negative
- Significantly more engineering than a thin LLM wrapper; slower to first demo.
- A multi-step pipeline costs more per session (multiple model calls) and adds latency.
- Reproducibility may prove unattainable to target — this is the project's top risk and could force a pivot.
Neutral
- Transcription/diarization remains an integration (Whisper/AssemblyAI/Fireflies-class), not in scope to build.
- Implies "build the analysis core in-house" rather than assembling on a third-party builder (slower start, but owns the moat and data schema).
Alternatives Considered
Single-prompt LLM analysis
Fastest to build; rejected — non-reproducible and no evidence trail, which fails the core trust gate.
Send raw transcripts to external LLM (skip depersonalisation)
Simpler; rejected — unacceptable sensitive-data exposure and compliance risk.
Buy/assemble on an existing no-code/agent builder
Faster MVP; rejected as the core — cedes the moat, the data schema, and reproducibility control (acceptable only for non-core glue).
References
09-product-strategy.md(Bet 1: reproducibility is the moat)10-metric-design-experimentation.md(reproducibility experiment + F1 ≥ 0.70 target)- curator.html / studio page (perimeters, depersonalisation, CCPA/CPRA)