Specification: Curator pilot MVP — "one trustworthy session loop"
Outcome Definition
A coach can take a consented session recording and, in minutes, receive a reproducible, evidence-linked analysis and an editable client-ready follow-up draft — then send it themselves. Success is binary-testable: given a consented session, the system produces (a) a speaker-separated transcript, (b) a depersonalised analysis with a transcript quote behind each conclusion, and (c) an editable draft summary; and the coach can edit and mark it sent.
Scope
In scope (pilot MVP): - Session intake: upload audio or connect a video-call recording; capture per-session consent. - Transcription + speaker separation (via integrated ASR vendor). - Depersonalisation before any external LLM call; re-identification only inside the protected perimeter. - Reproducible analysis pipeline: topic, insights, commitments, next steps — each with a transcript quote. - Editable draft client summary; coach edits → marks sent. - Minimal client dossier: per-client accumulation of insights/commitments across sessions.
Out of scope (names the adjacent capability excluded): - Live in-session AI suggestions (adjacent: real-time coaching assist) — deferred, sensitive. - Full mobile client app (adjacent: client-side mobile) — web + bot only. - Coach marketplace / public showcase (adjacent: growth surface). - Payments/billing automation (adjacent: practice-management ops) — tracked, not built in pilot. - Therapy/clinical features (boundary, not a deferral).
Acceptance Criteria (binary-testable, confidence-tagged)
- AC1 [Validated-by-design]: No session is transcribed/analysed without a recorded consent for that session.
- AC2 [Assumed]: Transcript is speaker-separated; client vs coach turns are labeled (target diarization accuracy set in metric 10).
- AC3 [Validated-by-design]: Identifying entities are replaced with placeholders before any external LLM call; a PII-leak audit on pilot sessions finds 0 raw identifiers sent externally.
- AC4 [Assumed]: Every analysis conclusion (insight/commitment/next step) is accompanied by ≥1 transcript quote.
- AC5 [Unknown→to test]: Re-running analysis on the same session yields extraction agreement within the defined noise band and F1 ≥ 0.70 vs a coach gold sample.
- AC6 [Validated-by-design]: The draft summary is editable and nothing is sent to a client automatically — the coach explicitly sends.
- AC7 [Assumed]: The client dossier shows, per client, accumulated insights/commitments and their status across sessions.
Dependencies
| Dependency | Owner | Status | Impact if delayed |
|---|---|---|---|
| ASR/diarization vendor (Whisper/AssemblyAI/Fireflies-class) | Eng | TBD | No transcript → blocks all |
| LLM provider(s) with switchability | Eng | TBD | Analysis pipeline |
| Pilot coaches (3–5) recruited | Founder | TBD | No validation |
| Consent/PII legal review (CCPA/CPRA) | Founder/legal | TBD | Compliance risk |
Failure Conditions & Mitigations
- ASR quality too low for target language → swap vendor; AC2 gate.
- Reproducibility below target after 3 iterations → pipeline redesign or pivot (stage 25), do not ship untrusted output.
- Out-of-scope/clinical signal in a session → alert the coach; the human decides referral; system does not act.
- Consent missing → session runs without transcription (degraded, safe).
Assumption Registry
| Assumption | Confidence | Note |
|---|---|---|
| Coaches will run real sessions through a pilot tool | Assumed | Validate in pilot |
| F1 ≥ 0.70 = "client-safe" | Unknown | Inherited from plan page; recalibrate |
| Depersonalisation is feasible without wrecking analysis quality | Assumed | Test in pipeline |
| Payer = coach | Unknown | Open business-model question |
Zero-Question Score (self-assessed)
~0.7 — the what is specified and testable; the open unknowns (reproducibility threshold, payer, vendor choice) are explicitly marked, not hidden. Not a true zero-question spec because two load-bearing items (AC5 threshold, payer) remain Unknown by nature at concept stage.
Self-Critique (≥3)
- AC5 is the spec's keystone and is the least certain — the spec is only as strong as the unproven reproducibility result.
- "Light edits / sent" as the success proxy may not equal genuine trust; a direct trust measure may be needed.
- The dossier (AC7) is in-scope but thin here; if continuity is actually core to retention, this under-specifies it (deliberate, to keep pilot small).