Initial commit

2026-01-18 17:08:37 -06:00
commit ae1c2a65d3
28 changed files with 719 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,138 @@
+# Pocket TTS Discord Bot
+
+A Discord bot that reads messages aloud using [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) with voice cloning from a reference WAV file.
+
+## Features
+
+- 🎤 **Voice Cloning**: Uses a reference WAV file to clone a voice
+- 📝 **Auto-read Messages**: Automatically reads all messages from a configured text channel
+- 🔊 **Voice Channel Streaming**: Streams generated audio to the voice channel where the message author is
+- 📋 **Message Queue**: Messages are queued and spoken in order
+
+## Prerequisites
+
+- Python 3.10+
+- FFmpeg installed and available in PATH
+- A Discord bot token
+- A reference voice WAV file (3-10 seconds of clear speech recommended)
+
+## Installation
+
+1. **Clone the repository**:
+   ```bash
+   git clone <repository-url>
+   cd PocketTTSBot
+   ```
+
+2. **Create a virtual environment**:
+   ```bash
+   python -m venv venv
+   
+   # Windows
+   venv\Scripts\activate
+   
+   # Linux/macOS
+   source venv/bin/activate
+   ```
+
+3. **Install dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+
+4. **Install FFmpeg**:
+   - **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH
+   - **Linux**: `sudo apt install ffmpeg`
+   - **macOS**: `brew install ffmpeg`
+
+## Configuration
+
+1. **Create a Discord Bot**:
+   - Go to [Discord Developer Portal](https://discord.com/developers/applications)
+   - Create a new application
+   - Go to the "Bot" section and create a bot
+   - Copy the bot token
+   - Enable these Privileged Gateway Intents:
+     - Message Content Intent
+     - Server Members Intent (optional)
+
+2. **Invite the Bot to your server**:
+   - Go to OAuth2 > URL Generator
+   - Select scopes: `bot`
+   - Select permissions: `Connect`, `Speak`, `Send Messages`, `Read Message History`
+   - Use the generated URL to invite the bot
+
+3. **Get Channel ID**:
+   - Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
+   - Right-click the text channel you want to monitor and click "Copy ID"
+
+4. **Create `.env` file**:
+   ```bash
+   cp .env.example .env
+   ```
+   
+   Edit `.env` with your values:
+   ```env
+   DISCORD_TOKEN=your_bot_token_here
+   TEXT_CHANNEL_ID=123456789012345678
+   VOICE_WAV_PATH=./voice.wav
+   ```
+
+5. **Add a voice reference file**:
+   - Place a WAV file named `voice.wav` in the project directory
+   - The file should contain 3-10 seconds of clear speech
+   - Higher quality audio = better voice cloning results
+
+## Usage
+
+1. **Start the bot**:
+   ```bash
+   python bot.py
+   ```
+
+2. **Using the bot**:
+   - Join a voice channel in your Discord server
+   - Type a message in the configured text channel
+   - The bot will join your voice channel and read your message aloud
+   - Messages are queued if the bot is already speaking
+
+## How It Works
+
+```
+┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
+│  Text Channel   │ --> │   Pocket TTS     │ --> │  Voice Channel  │
+│  (configured)   │     │   (generate)     │     │  (user's VC)    │
+└─────────────────┘     └──────────────────┘     └─────────────────┘
+                              ▲
+                              │
+                        ┌─────┴─────┐
+                        │ voice.wav │
+                        │ (speaker) │
+                        └───────────┘
+```
+
+1. Bot monitors the configured text channel for new messages
+2. When a message is received, it's added to the queue
+3. The bot generates speech using Pocket TTS with the cloned voice
+4. Audio is streamed to the voice channel where the message author is
+
+## Troubleshooting
+
+### Bot doesn't respond to messages
+- Ensure Message Content Intent is enabled in Discord Developer Portal
+- Check that the TEXT_CHANNEL_ID is correct
+- Verify the bot has permissions to read the channel
+
+### No audio in voice channel
+- Ensure FFmpeg is installed and in PATH
+- Check that the bot has Connect and Speak permissions
+- Verify your voice.wav file is valid
+
+### Voice quality issues
+- Use a higher quality reference WAV file
+- Ensure the reference audio is clear with minimal background noise
+- Try a longer reference clip (5-10 seconds)
+
+## License
+
+MIT License