specificationdraft

Specification: Curator pilot MVP — "one trustworthy session loop"

Outcome Definition

A coach can take a consented session recording and, in minutes, receive a reproducible, evidence-linked analysis and an editable client-ready follow-up draft — then send it themselves. Success is binary-testable: given a consented session, the system produces (a) a speaker-separated transcript, (b) a depersonalised analysis with a transcript quote behind each conclusion, and (c) an editable draft summary; and the coach can edit and mark it sent.

Scope

In scope (pilot MVP): - Session intake: upload audio or connect a video-call recording; capture per-session consent. - Transcription + speaker separation (via integrated ASR vendor). - Depersonalisation before any external LLM call; re-identification only inside the protected perimeter. - Reproducible analysis pipeline: topic, insights, commitments, next steps — each with a transcript quote. - Editable draft client summary; coach edits → marks sent. - Minimal client dossier: per-client accumulation of insights/commitments across sessions.

Out of scope (names the adjacent capability excluded): - Live in-session AI suggestions (adjacent: real-time coaching assist) — deferred, sensitive. - Full mobile client app (adjacent: client-side mobile) — web + bot only. - Coach marketplace / public showcase (adjacent: growth surface). - Payments/billing automation (adjacent: practice-management ops) — tracked, not built in pilot. - Therapy/clinical features (boundary, not a deferral).

Acceptance Criteria (binary-testable, confidence-tagged)

  • AC1 [Validated-by-design]: No session is transcribed/analysed without a recorded consent for that session.
  • AC2 [Assumed]: Transcript is speaker-separated; client vs coach turns are labeled (target diarization accuracy set in metric 10).
  • AC3 [Validated-by-design]: Identifying entities are replaced with placeholders before any external LLM call; a PII-leak audit on pilot sessions finds 0 raw identifiers sent externally.
  • AC4 [Assumed]: Every analysis conclusion (insight/commitment/next step) is accompanied by ≥1 transcript quote.
  • AC5 [Unknown→to test]: Re-running analysis on the same session yields extraction agreement within the defined noise band and F1 ≥ 0.70 vs a coach gold sample.
  • AC6 [Validated-by-design]: The draft summary is editable and nothing is sent to a client automatically — the coach explicitly sends.
  • AC7 [Assumed]: The client dossier shows, per client, accumulated insights/commitments and their status across sessions.

Dependencies

Dependency Owner Status Impact if delayed
ASR/diarization vendor (Whisper/AssemblyAI/Fireflies-class) Eng TBD No transcript → blocks all
LLM provider(s) with switchability Eng TBD Analysis pipeline
Pilot coaches (3–5) recruited Founder TBD No validation
Consent/PII legal review (CCPA/CPRA) Founder/legal TBD Compliance risk

Failure Conditions & Mitigations

  • ASR quality too low for target language → swap vendor; AC2 gate.
  • Reproducibility below target after 3 iterations → pipeline redesign or pivot (stage 25), do not ship untrusted output.
  • Out-of-scope/clinical signal in a session → alert the coach; the human decides referral; system does not act.
  • Consent missing → session runs without transcription (degraded, safe).

Assumption Registry

Assumption Confidence Note
Coaches will run real sessions through a pilot tool Assumed Validate in pilot
F1 ≥ 0.70 = "client-safe" Unknown Inherited from plan page; recalibrate
Depersonalisation is feasible without wrecking analysis quality Assumed Test in pipeline
Payer = coach Unknown Open business-model question

Zero-Question Score (self-assessed)

~0.7 — the what is specified and testable; the open unknowns (reproducibility threshold, payer, vendor choice) are explicitly marked, not hidden. Not a true zero-question spec because two load-bearing items (AC5 threshold, payer) remain Unknown by nature at concept stage.

Self-Critique (≥3)

  1. AC5 is the spec's keystone and is the least certain — the spec is only as strong as the unproven reproducibility result.
  2. "Light edits / sent" as the success proxy may not equal genuine trust; a direct trust measure may be needed.
  3. The dossier (AC7) is in-scope but thin here; if continuity is actually core to retention, this under-specifies it (deliberate, to keep pilot small).