Emirati Arabic TTS

Overview

The Emirati Arabic TTS project focuses on building high-quality speech synthesis for the Emirati Arabic dialect — one of the most underrepresented Arabic varieties in speech AI. The project spans multiple model generations and architectures, from classical acoustic model + vocoder pipelines to modern end-to-end and LLM-based TTS systems.

Models

emirati-fastpitch-bilingual-v1.0

Bilingual (Arabic + English) FastPitch acoustic model trained on Emirati dialect speech data. Uses NVIDIA NeMo framework with custom text normalization for Arabic numerals, abbreviations, and mixed-language input. Paired with emirati-hifigan-bilingual-v1.0 vocoder for waveform generation.

Framework: NVIDIA NeMo
Architecture: FastPitch (non-autoregressive transformer)
Language: Emirati Arabic / English bilingual
Features: Extended TTS frontend, G2P, number verbalization

emirati-vits-male-1.0

End-to-end VITS model for Emirati male voice synthesis. VITS combines acoustic modeling and vocoding in a single network, enabling lower latency and more natural prosody compared to two-stage pipelines.

Architecture: VITS (end-to-end)
Voice: Male Emirati speaker
Training: Custom Emirati dialect dataset

qwen3.5-TTS-Emirati

Latest generation Emirati TTS based on the Qwen3.5 large language model architecture fine-tuned for speech synthesis. Enables more natural intonation, better handling of dialectal features, and improved mixed Arabic/English codeswitching.

Base model: Qwen3.5 (fine-tuned for TTS)
Language: Emirati Arabic with codeswitching support
78+ downloads on HuggingFace

Text Normalization Pipeline

All Emirati TTS models are backed by a production-grade TTS frontend pipeline covering:

Arabic numeral verbalization (cardinal, ordinal, currency, dates)
Abbreviation and acronym expansion
Grapheme-to-Phoneme (G2P) for Arabic phoneme inventory
Unicode normalization and script detection
Mixed Arabic/English text handling

🗣 Emirati Arabic TTS