# Vox - Discord Text-to-Speech Bot A Python-based Discord bot that generates neural text-to-speech using voice cloning from reference WAV files. ## Project Structure ``` Vox/ ├── bot.py # Main entry point, Discord bot implementation ├── config.py # Configuration management using environment variables ├── voice_manager.py # Voice discovery, loading, and user preferences ├── audio_effects.py # Audio post-processing effects (7 effects) ├── audio_preprocessor.py # Audio preprocessing for voice cloning ├── numba_config.py # Numba JIT compiler cache configuration ├── requirements.txt # Python dependencies ├── launch.sh # Shell script to start the bot ├── pockettts.service # Systemd service file for Linux deployment ├── README.md # Comprehensive documentation ├── .env # Production environment configuration ├── .env.testing # Testing environment configuration ├── .env.example # Environment configuration template └── voices/ # Directory for voice WAV files ├── preferences.json # User voice/effect preferences (auto-generated) └── *.wav # Voice reference files ``` ## Core Functionality ### TTS Implementation - **Engine**: Pocket TTS (`pocket-tts` library) for neural text-to-speech synthesis - **Voice Cloning**: Uses reference WAV files to clone voices via `model.get_state_for_audio_prompt()` - **On-demand Loading**: Voices are loaded only when first needed, then cached ### Discord Integration - Monitors a configured text channel for messages - Joins the user's voice channel when they speak - Uses `discord.FFmpegPCMAudio` with piped WAV data for streaming ### Audio Processing Pipeline ``` Text Message → Pocket TTS → Audio Effects → Normalize → FFmpeg → Discord VC ``` ## Dependencies | Library | Purpose | |---------|---------| | `discord.py[voice]>=2.3.0` | Discord bot API with voice support | | `pocket-tts>=0.1.0` | Neural TTS engine with voice cloning | | `scipy>=1.10.0` | Scientific computing (audio I/O) | | `numpy>=1.24.0` | Numerical computing | | `librosa>=0.10.0` | Audio analysis and effects | | `noisereduce>=3.0.0` | Noise reduction preprocessing | | `soundfile>=0.12.0` | Audio file I/O | | `python-dotenv>=1.0.0` | Environment variable loading | **System Requirements**: Python 3.10+, FFmpeg ## Key Modules ### `TTSBot` (bot.py) Main Discord bot class that extends `commands.Bot`. Handles: - Message processing and TTS queue - Voice channel connections - Slash command registration - Startup initialization (loads TTS model, discovers voices) ### `VoiceManager` (voice_manager.py) Manages voice files and user preferences: - Discovers voices from WAV files in `voices/` directory - On-demand voice loading with caching - Per-user voice selection and effect preferences - Preferences persistence to JSON ### `AudioEffects` (audio_effects.py) Provides 7 post-processing effects: 1. **Pitch** (-12 to +12 semitones) 2. **Speed** (0.5x to 2.0x) 3. **Echo** (0-100%) 4. **Robot** (0-100%) - Ring modulation 5. **Chorus** (0-100%) - Multiple voice layering 6. **Tremolo Depth** (0.0-1.0) 7. **Tremolo Rate** (0.0-10.0 Hz) ### `AudioPreprocessor` (audio_preprocessor.py) Prepares voice reference files for cloning: 1. Load and resample to 22050 Hz 2. Normalize volume 3. Trim silence 4. Noise reduction 5. Limit length (default 15 seconds) ### `Config` (config.py) Centralized configuration management with environment-aware loading and validation. ## Slash Commands | Command | Description | |---------|-------------| | `/voice list` | Show available voices | | `/voice set ` | Select your voice | | `/voice current` | Show current voice | | `/voice refresh` | Rescan for new voices | | `/voice preview ` | Preview before committing | | `/effects list` | Show your effect settings | | `/effects set ` | Adjust effects | | `/effects reset` | Reset to defaults | ## Features - **Voice Cloning**: Add new voices by placing `.wav` files in `voices/` directory - **Per-User Customization**: Each user can have their own voice and effect preferences - **Hot-Reload**: Rescan for new voices without restart (`/voice refresh`) - **Message Queue**: Queues messages for sequential playback - **Inactivity Management**: Disconnects after 10 minutes of inactivity - **Testing Support**: Separate `.env.testing` configuration for safe development ## Configuration (.env) ```env DISCORD_TOKEN=your_bot_token TEXT_CHANNEL_ID=channel_id_to_monitor VOICES_DIR=./voices DEFAULT_VOICE=optional_default_voice_name ``` ## Running the Bot ```bash # Production python bot.py # Testing (uses .env.testing) python bot.py testing # Or use the launch script ./launch.sh ``` For production deployment on Linux, a systemd service file (`pockettts.service`) is included.