There were several issues, but mainly misleading explanation and the way the files get download - if it crash it didn't resume, also you did not mention installation of Cuda/Pytorch at all.
Spark-TTS Installation (Windows Guide)
1. Install Conda (if you haven’t already)
Download Miniconda and install it.
Make sure to check "Add Conda to PATH" during installation.
Download Spark-TTS
You have two options to get the files:
Option 1 (Recommended for Windows): Download ZIP manually
Go to Spark-TTS GitHub
Click "Code" > "Download ZIP", then extract it.
Option 2: Use Git (Optional)
If you prefer using Git, install Git and run:
git clone https://github.com/SparkAudio/Spark-TTS.git
2. Create a Conda Environment
Open Command Prompt (cmd) and run:
conda create -n sparktts python=3.12 -y
conda activate sparktts
This creates and activates a Python 3.12 environment for Spark-TTS.
3. Install Dependencies
Inside the Spark-TTS folder (whether from ZIP or Git), run:
pip install -r requirements.txt
4. Install PyTorch (Auto-Detect CUDA or CPU)
pip install torch torchvision torchaudio --index-url https://pytorch.org/get-started/previous-versions/
# OR Manually install a specific CUDA version (if needed)
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 # Older GPUs
5. Download the Model
There are two ways to get the model files. Pick one:
Option 1 (Recommended): Using Python
Create a new file in the Spark-TTS folder called download_model.py
, paste this inside, and run it:
from huggingface_hub import snapshot_download
import os
# Set download path
model_dir = "pretrained_models/Spark-TTS-0.5B"
# Check if model already exists
if os.path.exists(model_dir) and len(os.listdir(model_dir)) > 0:
print("Model files already exist. Skipping download.")
else:
print("Downloading model files...")
snapshot_download(
repo_id="SparkAudio/Spark-TTS-0.5B",
local_dir=model_dir,
resume_download=True # Resumes partial downloads
)
print("Download complete!")
Run it with:
python download_model.py
✅ Option 2: Using Git (If You Installed It)
mkdir pretrained_models
git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B
Either method works—choose whichever is easier for you.
6. Run Spark-TTS
Web UI (Recommended)
For an interactive browser-based interface, run:
python webui.py
This launches a local web server where you can enter text and generate speech or clone a voice.
7. Troubleshooting & Common Questions
🔎 Before Asking for Help
Many common issues are already covered in existing discussions, documentation, or online resources. Please:
Search GitHub issues first 🕵️♂️
Check the documentation 📖
Google or use AI tools (ChatGPT, DeepSeek, etc.)
If you still need help, please explain what you’ve already tried so we can assist you better!
Now you’re good to go! 🚀🔥
python -m cli.inference \
--text "text to synthesis." \
--device 0 \
--save_dir "path/to/save/audio" \
--model_dir pretrained_models/Spark-TTS-0.5B \
--prompt_text "transcript of the prompt audio" \
--prompt_speech_path "path/to/prompt_audio"