Speech Regime Classification: From Rules to Hybrid Architecture
How we went from 40% accuracy with a rule-based system to 97% with SetFit + LLM arbitration, and why embedding models turned out to be a reliable fallback
The Problem
A speech regime is not the topic of an utterance, nor its emotional coloring. It's the functional state of the speech apparatus: is the person building new meaning (BUILD), seeking contact (SEEK), revealing suppressed content (UNSEAL) — or sealing off vulnerability (SEAL), blocking entry (LOCK), leaking energy (DRAIN).
The Mindloom ontology defines 10 regimes. The task: identify the regime of each speech act with accuracy sufficient for clinical application.
Evolution of the Approach
Stage 1: Rule-based (40.8%)
The first version was a set of keyword detectors and scoring functions. Each regime had its own marker set. It worked on "clean" examples, but boundary cases destroyed accuracy. SEAL and VOID confused each other constantly.
Stage 2: SetFit embeddings (73%)
SetFit — a few-shot sentence transformer. Trained on 2,000 examples across 10 regimes. OOD (out-of-distribution) accuracy rose to 73%, but the model was confident in wrong answers. The R5 resolver (confusion matrix-based) didn't help — the model didn't know when it was wrong.
Stage 3: LLM arbitration (93.5%)
The key breakthrough — a structured LLM prompt with boundary case examples. Claude Haiku receives the top-2 candidates from the embedding model plus context. Three rounds of prompt tuning: 79.5% → 88.5% → 93.5%.
Stage 4: Final calibration (97%)
A dedicated v4 prompt with boundary examples for each problematic pair. SHIFT 28% → 100%. VOID 64% → 95%. Hard OOD-100: 87% exact, 93% top-2.
The Architectural Lesson
An embedding model is not a "weaker version of an LLM." It's a different type of knowledge: it sees the entire semantic space, while the LLM sees specific context. The hybrid is stronger than either alone.
Pipeline: Keywords → SetFit → LLM (when confidence is low). Three layers, each with its own competence. Not a cascade of fallbacks, but a distributed system of expertise.