Installation of Spark-TTS for Windows #5

2025-03-13

22
1

There were several issues, but mainly misleading explanation and the way the files get download - if it crash it didn't resume, also you did not mention installation of Cuda/Pytorch at all.

Spark-TTS Installation (Windows Guide)

1. Install Conda (if you haven’t already)

  • Download Miniconda and install it.

  • Make sure to check "Add Conda to PATH" during installation.

Download Spark-TTS

You have two options to get the files:

Option 1 (Recommended for Windows): Download ZIP manually

Option 2: Use Git (Optional)

  • If you prefer using Git, install Git and run:

    git clone https://github.com/SparkAudio/Spark-TTS.git

2. Create a Conda Environment

Open Command Prompt (cmd) and run:

conda create -n sparktts python=3.12 -y
conda activate sparktts

This creates and activates a Python 3.12 environment for Spark-TTS.


3. Install Dependencies

Inside the Spark-TTS folder (whether from ZIP or Git), run:

pip install -r requirements.txt

4. Install PyTorch (Auto-Detect CUDA or CPU)

pip install torch torchvision torchaudio --index-url https://pytorch.org/get-started/previous-versions/

# OR Manually install a specific CUDA version (if needed)
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118  # Older GPUs

5. Download the Model

There are two ways to get the model files. Pick one:

Option 1 (Recommended): Using Python
Create a new file in the Spark-TTS folder called download_model.py, paste this inside, and run it:

from huggingface_hub import snapshot_download
import os

# Set download path
model_dir = "pretrained_models/Spark-TTS-0.5B"

# Check if model already exists
if os.path.exists(model_dir) and len(os.listdir(model_dir)) > 0:
    print("Model files already exist. Skipping download.")
else:
    print("Downloading model files...")
    snapshot_download(
        repo_id="SparkAudio/Spark-TTS-0.5B",
        local_dir=model_dir,
        resume_download=True  # Resumes partial downloads
    )
    print("Download complete!")

Run it with:

python download_model.py

Option 2: Using Git (If You Installed It)

mkdir pretrained_models
git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B

Either method works—choose whichever is easier for you.


6. Run Spark-TTS

Web UI (Recommended)

For an interactive browser-based interface, run:

python webui.py

This launches a local web server where you can enter text and generate speech or clone a voice.


7. Troubleshooting & Common Questions

🔎 Before Asking for Help
Many common issues are already covered in existing discussions, documentation, or online resources. Please:

  • Search GitHub issues first 🕵️‍♂️

  • Check the documentation 📖

  • Google or use AI tools (ChatGPT, DeepSeek, etc.)

If you still need help, please explain what you’ve already tried so we can assist you better!


Now you’re good to go! 🚀🔥

python -m cli.inference \
    --text "text to synthesis." \
    --device 0 \
    --save_dir "path/to/save/audio" \
    --model_dir pretrained_models/Spark-TTS-0.5B \
    --prompt_text "transcript of the prompt audio" \
    --prompt_speech_path "path/to/prompt_audio"