adrdraft

ADR-001: Reproducibility as an architecture requirement (multi-step pipeline + depersonalisation before any external LLM)

Status

Proposed

Date: 2026-06-18 Deciders: Founder/PM (concept stage)

Context

Curator's differentiation rests on analysis a coach can trust enough to hand a client, not on transcription (commoditized). A single call to a general LLM produces output that varies run-to-run and offers no evidence trail — failing the trust gate. Separately, sessions contain sensitive personal data; sending raw transcripts to an external LLM is a trust/compliance hazard (US-first: CCPA/CPRA per the studio page). These two forces — reproducibility and data safety — are architectural, not feature-level, and must be decided before the MVP is built.

Decision

We will treat reproducibility and depersonalisation as architecture requirements, not later optimizations: 1. Session analysis is produced by a multi-step pipeline (decompose → extract with evidence quotes → calibrate → assemble), designed to be as deterministic as practical and to attach a transcript quote to every conclusion — rather than a single free-form LLM prompt. 2. Depersonalisation runs before any external LLM call: identifying entities are replaced with placeholders and only re-identified inside the protected perimeter. 3. The reproducibility target is an explicit, tested metric (see 10-metric-design-experimentation.md), gating the move from pilot to v1.

Consequences

Positive

Output is trustworthy and auditable (evidence-linked), which is the moat.
Data-safety posture is built-in, reducing legal/trust risk and easing the sensitive-data JTBD.
Reproducibility becomes measurable, so "good enough" is a number, not an opinion.

Negative

Significantly more engineering than a thin LLM wrapper; slower to first demo.
A multi-step pipeline costs more per session (multiple model calls) and adds latency.
Reproducibility may prove unattainable to target — this is the project's top risk and could force a pivot.

Neutral

Transcription/diarization remains an integration (Whisper/AssemblyAI/Fireflies-class), not in scope to build.
Implies "build the analysis core in-house" rather than assembling on a third-party builder (slower start, but owns the moat and data schema).

Alternatives Considered

Single-prompt LLM analysis

Fastest to build; rejected — non-reproducible and no evidence trail, which fails the core trust gate.

Send raw transcripts to external LLM (skip depersonalisation)

Simpler; rejected — unacceptable sensitive-data exposure and compliance risk.

Buy/assemble on an existing no-code/agent builder

Faster MVP; rejected as the core — cedes the moat, the data schema, and reproducibility control (acceptable only for non-core glue).

References

09-product-strategy.md (Bet 1: reproducibility is the moat)
10-metric-design-experimentation.md (reproducibility experiment + F1 ≥ 0.70 target)
curator.html / studio page (perimeters, depersonalisation, CCPA/CPRA)