Fork of Qwen3-TTS adding real-time streaming inference with full Arabic language support, achieving approximately 6ร inference speedup over the baseline.
Key additions
- OpenAI-compatible
/v1/audio/speechendpoint with Server-Sent Events (SSE) streaming - Arabic language support via warm-start embedding initialization
- Language auto-detection through Unicode scanning (Arabic vs Latin)
- Optimized for NVIDIA DGX Spark (ARM64 / Grace Blackwell)
- Docker support for ARM64 deployment
Arabic support approach
Arabic text is detected automatically via Unicode range scanning. The model uses warm-start embedding initialization โ copying weights from existing Arabic phoneme embeddings โ rather than training from scratch, enabling high-quality Gulf Arabic synthesis with significantly less data.
Related models on HuggingFace
qwen3.5-TTS-Emiratiโ Emirati Arabic fine-tuneqwen3-TTS-KSAโ Saudi Arabic fine-tune
Python ยท PyTorch ยท Docker ยท Apache 2.0