Solving Emergency Department Triage with Small Language Models: Why Large Commercial Models Fail and How Specialized Training Achieves Clinical Accuracy

This research addresses a critical gap in clinical AI: while large commercial language models (LLMs) show impressive general capabilities, they consistently underperform on specialized medical tasks like emergency department triage.

The paper investigates why general-purpose LLMs fall short in the ED triage setting — where accurate, real-time acuity classification is a matter of patient safety — and demonstrates that purpose-built small language models (SLMs), trained on curated clinical data, can achieve accuracy levels meeting clinical standards.

Key findings:

Large commercial models fail ED triage due to lack of domain grounding, sensitivity to input phrasing, and inability to reliably follow structured clinical reasoning
Specialized training on clinical triage data enables small models to dramatically outperform their larger, general-purpose counterparts
The approach shows that model size is not the limiting factor — domain-specific training data and fine-tuning strategy are what drive clinical accuracy
Results suggest a viable path toward deployable, cost-effective AI triage support tools that meet the accuracy bar required for real clinical environments

Read on ResearchGate →