Files
Vox/README.md
2026-01-18 17:08:37 -06:00

4.4 KiB

Pocket TTS Discord Bot

A Discord bot that reads messages aloud using Pocket TTS with voice cloning from a reference WAV file.

Features

  • 🎤 Voice Cloning: Uses a reference WAV file to clone a voice
  • 📝 Auto-read Messages: Automatically reads all messages from a configured text channel
  • 🔊 Voice Channel Streaming: Streams generated audio to the voice channel where the message author is
  • 📋 Message Queue: Messages are queued and spoken in order

Prerequisites

  • Python 3.10+
  • FFmpeg installed and available in PATH
  • A Discord bot token
  • A reference voice WAV file (3-10 seconds of clear speech recommended)

Installation

  1. Clone the repository:

    git clone <repository-url>
    cd PocketTTSBot
    
  2. Create a virtual environment:

    python -m venv venv
    
    # Windows
    venv\Scripts\activate
    
    # Linux/macOS
    source venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Install FFmpeg:

    • Windows: Download from ffmpeg.org and add to PATH
    • Linux: sudo apt install ffmpeg
    • macOS: brew install ffmpeg

Configuration

  1. Create a Discord Bot:

    • Go to Discord Developer Portal
    • Create a new application
    • Go to the "Bot" section and create a bot
    • Copy the bot token
    • Enable these Privileged Gateway Intents:
      • Message Content Intent
      • Server Members Intent (optional)
  2. Invite the Bot to your server:

    • Go to OAuth2 > URL Generator
    • Select scopes: bot
    • Select permissions: Connect, Speak, Send Messages, Read Message History
    • Use the generated URL to invite the bot
  3. Get Channel ID:

    • Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
    • Right-click the text channel you want to monitor and click "Copy ID"
  4. Create .env file:

    cp .env.example .env
    

    Edit .env with your values:

    DISCORD_TOKEN=your_bot_token_here
    TEXT_CHANNEL_ID=123456789012345678
    VOICE_WAV_PATH=./voice.wav
    
  5. Add a voice reference file:

    • Place a WAV file named voice.wav in the project directory
    • The file should contain 3-10 seconds of clear speech
    • Higher quality audio = better voice cloning results

Usage

  1. Start the bot:

    python bot.py
    
  2. Using the bot:

    • Join a voice channel in your Discord server
    • Type a message in the configured text channel
    • The bot will join your voice channel and read your message aloud
    • Messages are queued if the bot is already speaking

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Text Channel   │ --> │   Pocket TTS     │ --> │  Voice Channel  │
│  (configured)   │     │   (generate)     │     │  (user's VC)    │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                              ▲
                              │
                        ┌─────┴─────┐
                        │ voice.wav │
                        │ (speaker) │
                        └───────────┘
  1. Bot monitors the configured text channel for new messages
  2. When a message is received, it's added to the queue
  3. The bot generates speech using Pocket TTS with the cloned voice
  4. Audio is streamed to the voice channel where the message author is

Troubleshooting

Bot doesn't respond to messages

  • Ensure Message Content Intent is enabled in Discord Developer Portal
  • Check that the TEXT_CHANNEL_ID is correct
  • Verify the bot has permissions to read the channel

No audio in voice channel

  • Ensure FFmpeg is installed and in PATH
  • Check that the bot has Connect and Speak permissions
  • Verify your voice.wav file is valid

Voice quality issues

  • Use a higher quality reference WAV file
  • Ensure the reference audio is clear with minimal background noise
  • Try a longer reference clip (5-10 seconds)

License

MIT License