Fork of NVIDIA NeMo introducing the EmiratiG2P module — a dialect-specific grapheme-to-phoneme converter for Emirati Arabic, enabling accurate IPA transcription for high-quality TTS synthesis.
EmiratiG2P module
~1,127 lines of phonological transformation rules extending NeMo’s IpaG2p base class, covering Gulf Arabic-specific phonology:
- Qaf fronting — dialectal realization of ق as /g/ or /dʒ/
- Diphthong monophthongization — Gulf Arabic vowel shifts
- Sun letter assimilation — proper definite article phonology
- Mixed code-switching — Arabic/English bilingual text handling
Configuration
YAML-based pipeline configuration with 20+ tunable parameters for phonological rule control. Includes 19 unit tests covering dialect edge cases.
Part of the Emirati TTS pipeline
Used with emirati-fastpitch-bilingual-v1.0, emirati-hifigan-bilingual-v1.0, and emirati-vits-male-1.0 for end-to-end Emirati Arabic speech synthesis.
Python · PyTorch · IPA · Apache 2.0