- Document HF_HOME environment variable for writable cache - Add systemd service permission guidance for /tmp paths - Troubleshooting steps for read-only file system errors
4.9 KiB
Executable File
4.9 KiB
Executable File
Vox - Discord Text-to-Speech Bot
A Python-based Discord bot that generates neural text-to-speech using voice cloning from reference WAV files.
Project Structure
Vox/
├── bot.py # Main entry point, Discord bot implementation
├── config.py # Configuration management using environment variables
├── voice_manager.py # Voice discovery, loading, and user preferences
├── audio_effects.py # Audio post-processing effects (7 effects)
├── audio_preprocessor.py # Audio preprocessing for voice cloning
├── numba_config.py # Numba JIT compiler cache configuration
├── requirements.txt # Python dependencies
├── launch.sh # Shell script to start the bot
├── pockettts.service # Systemd service file for Linux deployment
├── README.md # Comprehensive documentation
├── .env # Production environment configuration
├── .env.testing # Testing environment configuration
├── .env.example # Environment configuration template
└── voices/ # Directory for voice WAV files
├── preferences.json # User voice/effect preferences (auto-generated)
└── *.wav # Voice reference files
Core Functionality
TTS Implementation
- Engine: Pocket TTS (
pocket-ttslibrary) for neural text-to-speech synthesis - Voice Cloning: Uses reference WAV files to clone voices via
model.get_state_for_audio_prompt() - On-demand Loading: Voices are loaded only when first needed, then cached
Discord Integration
- Monitors a configured text channel for messages
- Joins the user's voice channel when they speak
- Uses
discord.FFmpegPCMAudiowith piped WAV data for streaming
Audio Processing Pipeline
Text Message → Pocket TTS → Audio Effects → Normalize → FFmpeg → Discord VC
Dependencies
| Library | Purpose |
|---|---|
discord.py[voice]>=2.3.0 |
Discord bot API with voice support |
pocket-tts>=0.1.0 |
Neural TTS engine with voice cloning |
scipy>=1.10.0 |
Scientific computing (audio I/O) |
numpy>=1.24.0 |
Numerical computing |
librosa>=0.10.0 |
Audio analysis and effects |
noisereduce>=3.0.0 |
Noise reduction preprocessing |
soundfile>=0.12.0 |
Audio file I/O |
python-dotenv>=1.0.0 |
Environment variable loading |
System Requirements: Python 3.10+, FFmpeg
Key Modules
TTSBot (bot.py)
Main Discord bot class that extends commands.Bot. Handles:
- Message processing and TTS queue
- Voice channel connections
- Slash command registration
- Startup initialization (loads TTS model, discovers voices)
VoiceManager (voice_manager.py)
Manages voice files and user preferences:
- Discovers voices from WAV files in
voices/directory - On-demand voice loading with caching
- Per-user voice selection and effect preferences
- Preferences persistence to JSON
AudioEffects (audio_effects.py)
Provides 7 post-processing effects:
- Pitch (-12 to +12 semitones)
- Speed (0.5x to 2.0x)
- Echo (0-100%)
- Robot (0-100%) - Ring modulation
- Chorus (0-100%) - Multiple voice layering
- Tremolo Depth (0.0-1.0)
- Tremolo Rate (0.0-10.0 Hz)
AudioPreprocessor (audio_preprocessor.py)
Prepares voice reference files for cloning:
- Load and resample to 22050 Hz
- Normalize volume
- Trim silence
- Noise reduction
- Limit length (default 15 seconds)
Config (config.py)
Centralized configuration management with environment-aware loading and validation.
Slash Commands
| Command | Description |
|---|---|
/voice list |
Show available voices |
/voice set <name> |
Select your voice |
/voice current |
Show current voice |
/voice refresh |
Rescan for new voices |
/voice preview <name> |
Preview before committing |
/effects list |
Show your effect settings |
/effects set <effect> <value> |
Adjust effects |
/effects reset |
Reset to defaults |
Features
- Voice Cloning: Add new voices by placing
.wavfiles invoices/directory - Per-User Customization: Each user can have their own voice and effect preferences
- Hot-Reload: Rescan for new voices without restart (
/voice refresh) - Message Queue: Queues messages for sequential playback
- Inactivity Management: Disconnects after 10 minutes of inactivity
- Testing Support: Separate
.env.testingconfiguration for safe development
Configuration (.env)
DISCORD_TOKEN=your_bot_token
TEXT_CHANNEL_ID=channel_id_to_monitor
VOICES_DIR=./voices
DEFAULT_VOICE=optional_default_voice_name
Running the Bot
# Production
python bot.py
# Testing (uses .env.testing)
python bot.py testing
# Or use the launch script
./launch.sh
For production deployment on Linux, a systemd service file (pockettts.service) is included.