This research addresses a critical gap in clinical AI: while large commercial language models (LLMs) show impressive general capabilities, they consistently underperform on specialized medical tasks like emergency department triage.
The paper investigates why general-purpose LLMs fall short in the ED triage setting — where accurate, real-time acuity classification is a matter of patient safety — and demonstrates that purpose-built small language models (SLMs), trained on curated clinical data, can achieve accuracy levels meeting clinical standards.
Key findings:
- Large commercial models fail ED triage due to lack of domain grounding, sensitivity to input phrasing, and inability to reliably follow structured clinical reasoning
- Specialized training on clinical triage data enables small models to dramatically outperform their larger, general-purpose counterparts
- The approach shows that model size is not the limiting factor — domain-specific training data and fine-tuning strategy are what drive clinical accuracy
- Results suggest a viable path toward deployable, cost-effective AI triage support tools that meet the accuracy bar required for real clinical environments
