Overview
The Emirati Arabic TTS project focuses on building high-quality speech synthesis for the Emirati Arabic dialect — one of the most underrepresented Arabic varieties in speech AI. The project spans multiple model generations and architectures, from classical acoustic model + vocoder pipelines to modern end-to-end and LLM-based TTS systems.
Models
emirati-fastpitch-bilingual-v1.0
Bilingual (Arabic + English) FastPitch acoustic model trained on Emirati dialect speech data. Uses NVIDIA NeMo framework with custom text normalization for Arabic numerals, abbreviations, and mixed-language input. Paired with emirati-hifigan-bilingual-v1.0 vocoder for waveform generation.
- Framework: NVIDIA NeMo
- Architecture: FastPitch (non-autoregressive transformer)
- Language: Emirati Arabic / English bilingual
- Features: Extended TTS frontend, G2P, number verbalization
emirati-vits-male-1.0
End-to-end VITS model for Emirati male voice synthesis. VITS combines acoustic modeling and vocoding in a single network, enabling lower latency and more natural prosody compared to two-stage pipelines.
- Architecture: VITS (end-to-end)
- Voice: Male Emirati speaker
- Training: Custom Emirati dialect dataset
qwen3.5-TTS-Emirati
Latest generation Emirati TTS based on the Qwen3.5 large language model architecture fine-tuned for speech synthesis. Enables more natural intonation, better handling of dialectal features, and improved mixed Arabic/English codeswitching.
- Base model: Qwen3.5 (fine-tuned for TTS)
- Language: Emirati Arabic with codeswitching support
- 78+ downloads on HuggingFace
Text Normalization Pipeline
All Emirati TTS models are backed by a production-grade TTS frontend pipeline covering:
- Arabic numeral verbalization (cardinal, ordinal, currency, dates)
- Abbreviation and acronym expansion
- Grapheme-to-Phoneme (G2P) for Arabic phoneme inventory
- Unicode normalization and script detection
- Mixed Arabic/English text handling