Files

Spencer 9917d44f5d docs: add HuggingFace cache troubleshooting to README

- Document HF_HOME environment variable for writable cache
- Add systemd service permission guidance for /tmp paths
- Troubleshooting steps for read-only file system errors

2026-02-26 15:56:09 -06:00

4.9 KiB

Executable File

Raw Blame History

Vox - Discord Text-to-Speech Bot

A Python-based Discord bot that generates neural text-to-speech using voice cloning from reference WAV files.

Project Structure

Vox/
├── bot.py                 # Main entry point, Discord bot implementation
├── config.py              # Configuration management using environment variables
├── voice_manager.py       # Voice discovery, loading, and user preferences
├── audio_effects.py       # Audio post-processing effects (7 effects)
├── audio_preprocessor.py  # Audio preprocessing for voice cloning
├── numba_config.py        # Numba JIT compiler cache configuration
├── requirements.txt       # Python dependencies
├── launch.sh              # Shell script to start the bot
├── pockettts.service      # Systemd service file for Linux deployment
├── README.md             # Comprehensive documentation
├── .env                   # Production environment configuration
├── .env.testing           # Testing environment configuration
├── .env.example           # Environment configuration template
└── voices/               # Directory for voice WAV files
    ├── preferences.json  # User voice/effect preferences (auto-generated)
    └── *.wav             # Voice reference files

Core Functionality

TTS Implementation

Engine: Pocket TTS (pocket-tts library) for neural text-to-speech synthesis
Voice Cloning: Uses reference WAV files to clone voices via model.get_state_for_audio_prompt()
On-demand Loading: Voices are loaded only when first needed, then cached

Discord Integration

Monitors a configured text channel for messages
Joins the user's voice channel when they speak
Uses discord.FFmpegPCMAudio with piped WAV data for streaming

Audio Processing Pipeline

Text Message → Pocket TTS → Audio Effects → Normalize → FFmpeg → Discord VC

Dependencies

Library	Purpose
`discord.py[voice]>=2.3.0`	Discord bot API with voice support
`pocket-tts>=0.1.0`	Neural TTS engine with voice cloning
`scipy>=1.10.0`	Scientific computing (audio I/O)
`numpy>=1.24.0`	Numerical computing
`librosa>=0.10.0`	Audio analysis and effects
`noisereduce>=3.0.0`	Noise reduction preprocessing
`soundfile>=0.12.0`	Audio file I/O
`python-dotenv>=1.0.0`	Environment variable loading

System Requirements: Python 3.10+, FFmpeg

Key Modules

`TTSBot` (bot.py)

Main Discord bot class that extends commands.Bot. Handles:

Message processing and TTS queue
Voice channel connections
Slash command registration
Startup initialization (loads TTS model, discovers voices)

`VoiceManager` (voice_manager.py)

Manages voice files and user preferences:

Discovers voices from WAV files in voices/ directory
On-demand voice loading with caching
Per-user voice selection and effect preferences
Preferences persistence to JSON

`AudioEffects` (audio_effects.py)

Provides 7 post-processing effects:

Pitch (-12 to +12 semitones)
Speed (0.5x to 2.0x)
Echo (0-100%)
Robot (0-100%) - Ring modulation
Chorus (0-100%) - Multiple voice layering
Tremolo Depth (0.0-1.0)
Tremolo Rate (0.0-10.0 Hz)

`AudioPreprocessor` (audio_preprocessor.py)

Prepares voice reference files for cloning:

Load and resample to 22050 Hz
Normalize volume
Trim silence
Noise reduction
Limit length (default 15 seconds)

`Config` (config.py)

Centralized configuration management with environment-aware loading and validation.

Slash Commands

Command	Description
`/voice list`	Show available voices
`/voice set <name>`	Select your voice
`/voice current`	Show current voice
`/voice refresh`	Rescan for new voices
`/voice preview <name>`	Preview before committing
`/effects list`	Show your effect settings
`/effects set <effect> <value>`	Adjust effects
`/effects reset`	Reset to defaults

Features

Voice Cloning: Add new voices by placing .wav files in voices/ directory
Per-User Customization: Each user can have their own voice and effect preferences
Hot-Reload: Rescan for new voices without restart (/voice refresh)
Message Queue: Queues messages for sequential playback
Inactivity Management: Disconnects after 10 minutes of inactivity
Testing Support: Separate .env.testing configuration for safe development

Configuration (.env)

DISCORD_TOKEN=your_bot_token
TEXT_CHANNEL_ID=channel_id_to_monitor
VOICES_DIR=./voices
DEFAULT_VOICE=optional_default_voice_name

Running the Bot

# Production
python bot.py

# Testing (uses .env.testing)
python bot.py testing

# Or use the launch script
./launch.sh

For production deployment on Linux, a systemd service file (pockettts.service) is included.

4.9 KiB Executable File Raw Blame History