Featuring ultra-realistic 48kHz zero-shot voice cloning with rich emotional and lifelike expressiveness
16GB RAM recommended. 18GB+ storage recommended.
macOS 15+: M-series chips required.
Windows 10/11 64-bit: NVIDIA GPU with 8GB+ VRAM required.
Note: For NVIDIA GPUs, install a newer driver.Only the dots.tts-mf model is downloaded by default, which meets the needs of most users.If you select the dots.tts-soar model, the program will automatically download the corresponding model, which will take up an additional approximately 10 GB of disk space.
dots.tts is a cutting-edge, open-source text-to-speech (TTS) system developed by the RedNote (Xiaohongshu) AI Team (HI-Lab). This project represents a state-of-the-art advancement in the open-source community, designed to deliver ultra-high-fidelity, highly expressive, and multilingual voice cloning.
For general users and developers, the most intuitive capabilities of this project can be highlighted through the following pros and cons:
dots.tts natively outputs ultra-clear 48 kHz high-fidelity audio, preserving rich vocal details.The architecture of dots.tts completely discards the traditional "discrete token (quantization)" approach used by many mainstream TTS systems (like VITS or early autoregressive models).
Instead, it utilizes a fully continuous, end-to-end autoregressive structure. The backbone seamlessly pairs a semantic encoder, a Large Language Model (LLM), and an autoregressive flow-matching acoustic head over a 48 kHz AudioVAE (Audio Variational Autoencoder). Because there are no discrete tokens anywhere in the pipeline, the system achieves lossless audio quality and remarkably smooth intonation.