# Pocket TTS Discord Bot A Discord bot that reads messages aloud using [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) with voice cloning from a reference WAV file. ## Features - ๐ŸŽค **Voice Cloning**: Uses a reference WAV file to clone a voice - ๐Ÿ“ **Auto-read Messages**: Automatically reads all messages from a configured text channel - ๐Ÿ”Š **Voice Channel Streaming**: Streams generated audio to the voice channel where the message author is - ๐Ÿ“‹ **Message Queue**: Messages are queued and spoken in order - ๐Ÿ”„ **Per-User Voice Selection**: Each user can choose their own TTS voice via `/voice` commands - ๐Ÿ’พ **Voice Persistence**: User voice preferences are saved and restored on restart - ๐Ÿ”„ **Hot-reload Voices**: Add new voices without restarting the bot using `/voice refresh` - ๐Ÿงช **Test Mode**: Separate testing configuration for safe development - ๐Ÿ“ฆ **Auto-updates**: Automatically checks for and installs dependency updates on startup - ๐Ÿ‘‚ **Voice Preview**: Preview voices with `/voice preview` before committing to them - ๐ŸŽต **Audio Effects**: 7 different effects to customize your voice (pitch, speed, echo, robot, chorus, tremolo) - โšก **Unlimited Effects**: Use as many effects as you want (warning shown when >2 active) - โฑ๏ธ **Processing Indicator**: Shows when audio processing is taking longer than expected ## Prerequisites - Python 3.10+ - FFmpeg installed and available in PATH - A Discord bot token - A reference voice WAV file (3-10 seconds of clear speech recommended) ## Installation 1. **Clone the repository**: ```bash git clone cd PocketTTSBot ``` 2. **Create a virtual environment**: ```bash python -m venv venv # Windows venv\Scripts\activate # Linux/macOS source venv/bin/activate ``` 3. **Install dependencies**: ```bash pip install -r requirements.txt ``` 4. **Install FFmpeg**: - **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH - **Linux**: `sudo apt install ffmpeg` - **macOS**: `brew install ffmpeg` ## Configuration 1. **Create a Discord Bot**: - Go to [Discord Developer Portal](https://discord.com/developers/applications) - Create a new application - Go to the "Bot" section and create a bot - Copy the bot token - Enable these Privileged Gateway Intents: - Message Content Intent - Server Members Intent (optional) 2. **Invite the Bot to your server**: - Go to OAuth2 > URL Generator - Select scopes: `bot` - Select permissions: `Connect`, `Speak`, `Send Messages`, `Read Message History` - Use the generated URL to invite the bot 3. **Get Channel ID**: - Enable Developer Mode in Discord (Settings > Advanced > Developer Mode) - Right-click the text channel you want to monitor and click "Copy ID" 4. **Create `.env` file**: ```bash cp .env.example .env ``` Edit `.env` with your values: ```env DISCORD_TOKEN=your_bot_token_here TEXT_CHANNEL_ID=123456789012345678 VOICES_DIR=./voices DEFAULT_VOICE=estinien ``` 5. **Add voice reference files**: - Create a `voices/` directory: `mkdir voices` - Place `.wav` files in the `voices/` directory - Each file should contain 3-10 seconds of clear speech - File names become voice names (e.g., `MasterChief.wav` โ†’ `/voice set masterchief`) - Higher quality audio = better voice cloning results ## Usage 1. **Start the bot**: ```bash python bot.py ``` 2. **Using the bot**: - Join a voice channel in your Discord server - Type a message in the configured text channel - The bot will join your voice channel and read your message aloud - Messages are queued if the bot is already speaking 3. **Voice Commands** (Slash Commands): - `/voice list` - Shows all available voices - `/voice set ` - Change your personal TTS voice - `/voice current` - Shows your current voice - `/voice refresh` - Re-scan for new voice files (no restart needed) - `/voice preview ` - Preview a voice before selecting it ### Test Mode Run the bot in testing mode to use a separate configuration: ```bash python bot.py testing ``` This loads `.env.testing` instead of `.env`, allowing you to: - Use a different Discord bot token for testing - Monitor a different text channel - Test new features without affecting the production bot Create `.env.testing` by copying `.env.example` and configuring it with your testing values. ### Audio Effects Transform your TTS voice with 7 different audio effects: #### Available Effects: **๐ŸŽต Pitch** (`/effects set pitch `) - Range: -12 to +12 semitones - Default: 0 (no change) - Positive = higher/chipmunk voice - Negative = lower/deeper voice **โšก Speed** (`/effects set speed `) - Range: 0.5 to 2.0 - Default: 1.0x (normal speed) - Higher = faster speech - Lower = slower speech **๐Ÿ”Š Echo** (`/effects set echo `) - Range: 0-100% - Default: 0% (off) - Adds spatial delay and reverb effect - Higher values = more pronounced echo **๐Ÿค– Robot** (`/effects set robot `) - Range: 0-100% - Default: 0% (off) - Applies ring modulation for sci-fi robotic voice - Higher values = more robotic distortion **๐ŸŽถ Chorus** (`/effects set chorus `) - Range: 0-100% - Default: 0% (off) - Creates "multiple voices" effect with slight pitch variations - Higher values = more voices and depth **ใ€ฐ๏ธ Tremolo Depth** (`/effects set tremolo_depth `) - Range: 0.0 to 1.0 - Default: 0.0 (off) - Controls amplitude modulation amount - Higher = more warble/vintage radio effect **๐Ÿ“ณ Tremolo Rate** (`/effects set tremolo_rate `) - Range: 0.0 to 10.0 Hz - Default: 0.0 Hz (off) - Controls how fast the tremolo warbles - Requires tremolo_depth > 0 to have effect #### Effect Commands: - `/effects list` - Show all your current effect settings - `/effects set ` - Change an effect value - `/effects reset` - Reset all effects to defaults (with confirmation) #### Effect Application Order: Effects are applied in this sequence: 1. Pitch shift 2. Speed change 3. Echo/Reverb 4. Chorus 5. Tremolo 6. Robot voice #### Performance Notes: - **No limit** on number of active effects - โš ๏ธ Warning shown when you have more than 2 active effects - More effects = longer processing time - Some effects (like pitch shift and chorus) are more CPU-intensive - Processing time is logged to console for monitoring ### Preview with Effects Test any combination of voice and effects before committing: **Preview a voice:** - `/voice preview ` - Preview with your current effects **Preview with specific effects:** - `/voice preview pitch:5 speed:1.5` - Preview with pitch +5 and 1.5x speed - All effect parameters are optional and default to your current settings **Example combinations to try:** - Robot voice: `/effects set robot 75` - Deep scary voice: `/effects set pitch -8` - Fast chipmunk: `/effects set pitch 8 speed:1.5` - Radio announcer: `/effects set echo 40 tremolo_depth:0.3 tremolo_rate:4` ## How It Works ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Text Channel โ”‚ --> โ”‚ Pocket TTS โ”‚ --> โ”‚ Voice Channel โ”‚ โ”‚ (configured) โ”‚ โ”‚ (generate) โ”‚ โ”‚ (user's VC) โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ–ฒ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ” โ”‚ voices/ โ”‚ โ”‚ per-user โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` 1. Bot monitors the configured text channel for new messages 2. When a message is received, it's added to the queue 3. The bot generates speech using Pocket TTS with the cloned voice 4. Audio is streamed to the voice channel where the message author is ## Troubleshooting ### Bot doesn't respond to messages - Ensure Message Content Intent is enabled in Discord Developer Portal - Check that the TEXT_CHANNEL_ID is correct - Verify the bot has permissions to read the channel ### No audio in voice channel - Ensure FFmpeg is installed and in PATH - Check that the bot has Connect and Speak permissions - Verify your voice.wav file is valid ### Voice quality issues - Use a higher quality reference WAV file - Ensure the reference audio is clear with minimal background noise - Try a longer reference clip (5-10 seconds) ### HuggingFace cache read-only error If you see errors like `OSError: [Errno 30] Read-only file system` when the bot tries to download the TTS model: 1. **Set a writable cache directory**: Add to your `.env` file: ```env HF_HOME=/tmp/huggingface ``` 2. **Create and set permissions** on the directory: ```bash sudo mkdir /tmp/huggingface sudo chown -R $USER:$USER /tmp/huggingface ``` 3. **If using systemd service**: Ensure the service has write access to `/tmp` or the chosen cache directory. You may need to add `ReadWritePaths=/tmp/huggingface` to the service file or remove `ProtectHome=read-only`. 4. **Restart the bot**: ```bash sudo systemctl restart vox.service ``` ## Linux Server Deployment To run the bot as a service on a Linux server: ### Quick Setup (Recommended) ```bash # Make the setup script executable chmod +x setup_linux.sh # Run the setup script ./setup_linux.sh ``` The script will: - Check system dependencies (Python 3.10+, FFmpeg, pip) - Create a virtual environment and install dependencies - Create `.env` template if needed - Optionally install and configure the systemd service ### Manual Setup 1. **Install system dependencies**: ```bash # Ubuntu/Debian sudo apt update sudo apt install python3 python3-pip python3-venv ffmpeg # Fedora sudo dnf install python3 python3-pip ffmpeg # Arch sudo pacman -S python python-pip ffmpeg ``` 2. **Set up the project**: ```bash cd /path/to/PocketTTSBot python3 -m venv venv source venv/bin/activate pip install -r requirements.txt ``` 3. **Configure the service**: Edit `pockettts.service` and replace: - `YOUR_USERNAME` with your Linux username - Update paths if your bot is not in `/home/YOUR_USERNAME/PocketTTSBot` 4. **Install the service**: ```bash sudo cp pockettts.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable pockettts # Start on boot sudo systemctl start pockettts # Start now ``` ### Service Management ```bash # Check status sudo systemctl status pockettts # View logs (live) journalctl -u pockettts -f # View recent logs journalctl -u pockettts --since "1 hour ago" # Restart after changes sudo systemctl restart pockettts # Stop the bot sudo systemctl stop pockettts # Disable auto-start sudo systemctl disable pockettts ``` ### Updating the Bot ```bash cd /path/to/PocketTTSBot git pull # If using git source venv/bin/activate pip install -r requirements.txt sudo systemctl restart pockettts ``` ## License MIT License