Spencer Grimes 7e76deed3d feat: wire up all effects to audio processing pipeline
- Updated queue system to pass effects as dict instead of individual params
- Updated process_queue to handle effects_dict for previews
- Updated speak_message to extract all 7 effects from user settings
- Updated _generate_wav_bytes to accept effects dict and pass all params
- Updated _handle_voice_preview to use new effects dict system
- Effects now actually process the audio:
  - pitch, speed, echo, robot, chorus, tremolo_depth, tremolo_rate
- Fixed preview effect description to use preview_effects dict
2026-01-31 17:25:52 -06:00
2026-01-18 17:08:37 -06:00
2026-01-31 14:42:08 -06:00
2026-01-31 14:42:08 -06:00
2026-01-18 17:08:37 -06:00

Pocket TTS Discord Bot

A Discord bot that reads messages aloud using Pocket TTS with voice cloning from a reference WAV file.

Features

  • 🎤 Voice Cloning: Uses a reference WAV file to clone a voice
  • 📝 Auto-read Messages: Automatically reads all messages from a configured text channel
  • 🔊 Voice Channel Streaming: Streams generated audio to the voice channel where the message author is
  • 📋 Message Queue: Messages are queued and spoken in order
  • 🔄 Per-User Voice Selection: Each user can choose their own TTS voice via /voice commands
  • 💾 Voice Persistence: User voice preferences are saved and restored on restart
  • 🔄 Hot-reload Voices: Add new voices without restarting the bot using /voice refresh
  • 🧪 Test Mode: Separate testing configuration for safe development
  • 📦 Auto-updates: Automatically checks for and installs dependency updates on startup
  • 👂 Voice Preview: Preview voices with /voice preview before committing to them
  • 🎵 Audio Effects: Apply pitch shift and speed changes to your TTS voice

Prerequisites

  • Python 3.10+
  • FFmpeg installed and available in PATH
  • A Discord bot token
  • A reference voice WAV file (3-10 seconds of clear speech recommended)

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd PocketTTSBot
    
  2. Create a virtual environment:

    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # Linux/macOS
    source venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Install FFmpeg:

    • Windows: Download from ffmpeg.org and add to PATH
    • Linux: sudo apt install ffmpeg
    • macOS: brew install ffmpeg

Configuration

  1. Create a Discord Bot:

    • Go to Discord Developer Portal
    • Create a new application
    • Go to the "Bot" section and create a bot
    • Copy the bot token
    • Enable these Privileged Gateway Intents:
      • Message Content Intent
      • Server Members Intent (optional)
  2. Invite the Bot to your server:

    • Go to OAuth2 > URL Generator
    • Select scopes: bot
    • Select permissions: Connect, Speak, Send Messages, Read Message History
    • Use the generated URL to invite the bot
  3. Get Channel ID:

    • Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
    • Right-click the text channel you want to monitor and click "Copy ID"
  4. Create .env file:

    cp .env.example .env
    

    Edit .env with your values:

    DISCORD_TOKEN=your_bot_token_here
    TEXT_CHANNEL_ID=123456789012345678
    VOICES_DIR=./voices
    DEFAULT_VOICE=estinien
    
  5. Add voice reference files:

    • Create a voices/ directory: mkdir voices
    • Place .wav files in the voices/ directory
    • Each file should contain 3-10 seconds of clear speech
    • File names become voice names (e.g., MasterChief.wav/voice set masterchief)
    • Higher quality audio = better voice cloning results

Usage

  1. Start the bot:

    python bot.py
    
  2. Using the bot:

    • Join a voice channel in your Discord server
    • Type a message in the configured text channel
    • The bot will join your voice channel and read your message aloud
    • Messages are queued if the bot is already speaking
  3. Voice Commands (Slash Commands):

    • /voice list - Shows all available voices
    • /voice set <name> - Change your personal TTS voice
    • /voice current - Shows your current voice
    • /voice refresh - Re-scan for new voice files (no restart needed)
    • /voice preview <name> - Preview a voice before selecting it

Test Mode

Run the bot in testing mode to use a separate configuration:

python bot.py testing

This loads .env.testing instead of .env, allowing you to:

  • Use a different Discord bot token for testing
  • Monitor a different text channel
  • Test new features without affecting the production bot

Create .env.testing by copying .env.example and configuring it with your testing values.

Audio Effects

Apply pitch shift and speed changes to your TTS voice:

  • /effects list - Show your current effect settings
  • /effects set pitch <semitones> - Change pitch (-12 to +12)
    • Positive = higher/chipmunk voice
    • Negative = lower/deeper voice
    • 0 = normal pitch (default)
  • /effects set speed <multiplier> - Change speed (0.5 to 2.0)
    • Higher = faster speech
    • Lower = slower speech
    • 1.0 = normal speed (default)
  • /effects reset - Reset all effects to defaults

Note: You can use up to 2 effects simultaneously. More effects require more processing time.

Preview with Effects

Test voice and effect combinations before committing:

  • /voice preview <name> [pitch] [speed] - Preview a voice with optional effect overrides

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Text Channel   │ --> │   Pocket TTS     │ --> │  Voice Channel  │
│  (configured)   │     │   (generate)     │     │  (user's VC)    │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                              ▲
                              │
                        ┌─────┴─────┐
                        │  voices/  │
                        │ per-user  │
                        └───────────┘
  1. Bot monitors the configured text channel for new messages
  2. When a message is received, it's added to the queue
  3. The bot generates speech using Pocket TTS with the cloned voice
  4. Audio is streamed to the voice channel where the message author is

Troubleshooting

Bot doesn't respond to messages

  • Ensure Message Content Intent is enabled in Discord Developer Portal
  • Check that the TEXT_CHANNEL_ID is correct
  • Verify the bot has permissions to read the channel

No audio in voice channel

  • Ensure FFmpeg is installed and in PATH
  • Check that the bot has Connect and Speak permissions
  • Verify your voice.wav file is valid

Voice quality issues

  • Use a higher quality reference WAV file
  • Ensure the reference audio is clear with minimal background noise
  • Try a longer reference clip (5-10 seconds)

Linux Server Deployment

To run the bot as a service on a Linux server:

# Make the setup script executable
chmod +x setup_linux.sh

# Run the setup script
./setup_linux.sh

The script will:

  • Check system dependencies (Python 3.10+, FFmpeg, pip)
  • Create a virtual environment and install dependencies
  • Create .env template if needed
  • Optionally install and configure the systemd service

Manual Setup

  1. Install system dependencies:

    # Ubuntu/Debian
    sudo apt update
    sudo apt install python3 python3-pip python3-venv ffmpeg
    
    # Fedora
    sudo dnf install python3 python3-pip ffmpeg
    
    # Arch
    sudo pacman -S python python-pip ffmpeg
    
  2. Set up the project:

    cd /path/to/PocketTTSBot
    python3 -m venv venv
    source venv/bin/activate
    pip install -r requirements.txt
    
  3. Configure the service:

    Edit pockettts.service and replace:

    • YOUR_USERNAME with your Linux username
    • Update paths if your bot is not in /home/YOUR_USERNAME/PocketTTSBot
  4. Install the service:

    sudo cp pockettts.service /etc/systemd/system/
    sudo systemctl daemon-reload
    sudo systemctl enable pockettts  # Start on boot
    sudo systemctl start pockettts   # Start now
    

Service Management

# Check status
sudo systemctl status pockettts

# View logs (live)
journalctl -u pockettts -f

# View recent logs
journalctl -u pockettts --since "1 hour ago"

# Restart after changes
sudo systemctl restart pockettts

# Stop the bot
sudo systemctl stop pockettts

# Disable auto-start
sudo systemctl disable pockettts

Updating the Bot

cd /path/to/PocketTTSBot
git pull  # If using git
source venv/bin/activate
pip install -r requirements.txt
sudo systemctl restart pockettts

License

MIT License

Description
No description provided
Readme 10 MiB
Languages
Python 99.9%
Shell 0.1%