139 lines
4.4 KiB
Markdown
139 lines
4.4 KiB
Markdown
# Pocket TTS Discord Bot
|
|
|
|
A Discord bot that reads messages aloud using [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) with voice cloning from a reference WAV file.
|
|
|
|
## Features
|
|
|
|
- 🎤 **Voice Cloning**: Uses a reference WAV file to clone a voice
|
|
- 📝 **Auto-read Messages**: Automatically reads all messages from a configured text channel
|
|
- 🔊 **Voice Channel Streaming**: Streams generated audio to the voice channel where the message author is
|
|
- 📋 **Message Queue**: Messages are queued and spoken in order
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.10+
|
|
- FFmpeg installed and available in PATH
|
|
- A Discord bot token
|
|
- A reference voice WAV file (3-10 seconds of clear speech recommended)
|
|
|
|
## Installation
|
|
|
|
1. **Clone the repository**:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd PocketTTSBot
|
|
```
|
|
|
|
2. **Create a virtual environment**:
|
|
```bash
|
|
python -m venv venv
|
|
|
|
# Windows
|
|
venv\Scripts\activate
|
|
|
|
# Linux/macOS
|
|
source venv/bin/activate
|
|
```
|
|
|
|
3. **Install dependencies**:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. **Install FFmpeg**:
|
|
- **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH
|
|
- **Linux**: `sudo apt install ffmpeg`
|
|
- **macOS**: `brew install ffmpeg`
|
|
|
|
## Configuration
|
|
|
|
1. **Create a Discord Bot**:
|
|
- Go to [Discord Developer Portal](https://discord.com/developers/applications)
|
|
- Create a new application
|
|
- Go to the "Bot" section and create a bot
|
|
- Copy the bot token
|
|
- Enable these Privileged Gateway Intents:
|
|
- Message Content Intent
|
|
- Server Members Intent (optional)
|
|
|
|
2. **Invite the Bot to your server**:
|
|
- Go to OAuth2 > URL Generator
|
|
- Select scopes: `bot`
|
|
- Select permissions: `Connect`, `Speak`, `Send Messages`, `Read Message History`
|
|
- Use the generated URL to invite the bot
|
|
|
|
3. **Get Channel ID**:
|
|
- Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
|
|
- Right-click the text channel you want to monitor and click "Copy ID"
|
|
|
|
4. **Create `.env` file**:
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
Edit `.env` with your values:
|
|
```env
|
|
DISCORD_TOKEN=your_bot_token_here
|
|
TEXT_CHANNEL_ID=123456789012345678
|
|
VOICE_WAV_PATH=./voice.wav
|
|
```
|
|
|
|
5. **Add a voice reference file**:
|
|
- Place a WAV file named `voice.wav` in the project directory
|
|
- The file should contain 3-10 seconds of clear speech
|
|
- Higher quality audio = better voice cloning results
|
|
|
|
## Usage
|
|
|
|
1. **Start the bot**:
|
|
```bash
|
|
python bot.py
|
|
```
|
|
|
|
2. **Using the bot**:
|
|
- Join a voice channel in your Discord server
|
|
- Type a message in the configured text channel
|
|
- The bot will join your voice channel and read your message aloud
|
|
- Messages are queued if the bot is already speaking
|
|
|
|
## How It Works
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
│ Text Channel │ --> │ Pocket TTS │ --> │ Voice Channel │
|
|
│ (configured) │ │ (generate) │ │ (user's VC) │
|
|
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
|
▲
|
|
│
|
|
┌─────┴─────┐
|
|
│ voice.wav │
|
|
│ (speaker) │
|
|
└───────────┘
|
|
```
|
|
|
|
1. Bot monitors the configured text channel for new messages
|
|
2. When a message is received, it's added to the queue
|
|
3. The bot generates speech using Pocket TTS with the cloned voice
|
|
4. Audio is streamed to the voice channel where the message author is
|
|
|
|
## Troubleshooting
|
|
|
|
### Bot doesn't respond to messages
|
|
- Ensure Message Content Intent is enabled in Discord Developer Portal
|
|
- Check that the TEXT_CHANNEL_ID is correct
|
|
- Verify the bot has permissions to read the channel
|
|
|
|
### No audio in voice channel
|
|
- Ensure FFmpeg is installed and in PATH
|
|
- Check that the bot has Connect and Speak permissions
|
|
- Verify your voice.wav file is valid
|
|
|
|
### Voice quality issues
|
|
- Use a higher quality reference WAV file
|
|
- Ensure the reference audio is clear with minimal background noise
|
|
- Try a longer reference clip (5-10 seconds)
|
|
|
|
## License
|
|
|
|
MIT License
|