README Updates: - Updated features list with all new capabilities - Comprehensive Audio Effects section covering all 7 effects: - Pitch, Speed, Echo, Robot, Chorus, Tremolo Depth, Tremolo Rate - Detailed effect ranges, defaults, and descriptions - Effect application order documentation - Performance notes and warnings - Enhanced Preview with Effects section with examples - Example effect combinations for users to try Version Bump: - Bumped __version__ from 1.1.0 to 1.2.0 Major features in 1.2.0: - 4 new voice effects (echo, robot, chorus, tremolo) - Unlimited effects with performance warnings - Complete effects pipeline implementation - Enhanced preview system
348 lines
10 KiB
Markdown
348 lines
10 KiB
Markdown
# Pocket TTS Discord Bot
|
|
|
|
A Discord bot that reads messages aloud using [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) with voice cloning from a reference WAV file.
|
|
|
|
## Features
|
|
|
|
- 🎤 **Voice Cloning**: Uses a reference WAV file to clone a voice
|
|
- 📝 **Auto-read Messages**: Automatically reads all messages from a configured text channel
|
|
- 🔊 **Voice Channel Streaming**: Streams generated audio to the voice channel where the message author is
|
|
- 📋 **Message Queue**: Messages are queued and spoken in order
|
|
- 🔄 **Per-User Voice Selection**: Each user can choose their own TTS voice via `/voice` commands
|
|
- 💾 **Voice Persistence**: User voice preferences are saved and restored on restart
|
|
- 🔄 **Hot-reload Voices**: Add new voices without restarting the bot using `/voice refresh`
|
|
- 🧪 **Test Mode**: Separate testing configuration for safe development
|
|
- 📦 **Auto-updates**: Automatically checks for and installs dependency updates on startup
|
|
- 👂 **Voice Preview**: Preview voices with `/voice preview` before committing to them
|
|
- 🎵 **Audio Effects**: 7 different effects to customize your voice (pitch, speed, echo, robot, chorus, tremolo)
|
|
- ⚡ **Unlimited Effects**: Use as many effects as you want (warning shown when >2 active)
|
|
- ⏱️ **Processing Indicator**: Shows when audio processing is taking longer than expected
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.10+
|
|
- FFmpeg installed and available in PATH
|
|
- A Discord bot token
|
|
- A reference voice WAV file (3-10 seconds of clear speech recommended)
|
|
|
|
## Installation
|
|
|
|
1. **Clone the repository**:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd PocketTTSBot
|
|
```
|
|
|
|
2. **Create a virtual environment**:
|
|
```bash
|
|
python -m venv venv
|
|
|
|
# Windows
|
|
venv\Scripts\activate
|
|
|
|
# Linux/macOS
|
|
source venv/bin/activate
|
|
```
|
|
|
|
3. **Install dependencies**:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. **Install FFmpeg**:
|
|
- **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH
|
|
- **Linux**: `sudo apt install ffmpeg`
|
|
- **macOS**: `brew install ffmpeg`
|
|
|
|
## Configuration
|
|
|
|
1. **Create a Discord Bot**:
|
|
- Go to [Discord Developer Portal](https://discord.com/developers/applications)
|
|
- Create a new application
|
|
- Go to the "Bot" section and create a bot
|
|
- Copy the bot token
|
|
- Enable these Privileged Gateway Intents:
|
|
- Message Content Intent
|
|
- Server Members Intent (optional)
|
|
|
|
2. **Invite the Bot to your server**:
|
|
- Go to OAuth2 > URL Generator
|
|
- Select scopes: `bot`
|
|
- Select permissions: `Connect`, `Speak`, `Send Messages`, `Read Message History`
|
|
- Use the generated URL to invite the bot
|
|
|
|
3. **Get Channel ID**:
|
|
- Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
|
|
- Right-click the text channel you want to monitor and click "Copy ID"
|
|
|
|
4. **Create `.env` file**:
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
Edit `.env` with your values:
|
|
```env
|
|
DISCORD_TOKEN=your_bot_token_here
|
|
TEXT_CHANNEL_ID=123456789012345678
|
|
VOICES_DIR=./voices
|
|
DEFAULT_VOICE=estinien
|
|
```
|
|
|
|
5. **Add voice reference files**:
|
|
- Create a `voices/` directory: `mkdir voices`
|
|
- Place `.wav` files in the `voices/` directory
|
|
- Each file should contain 3-10 seconds of clear speech
|
|
- File names become voice names (e.g., `MasterChief.wav` → `/voice set masterchief`)
|
|
- Higher quality audio = better voice cloning results
|
|
|
|
## Usage
|
|
|
|
1. **Start the bot**:
|
|
```bash
|
|
python bot.py
|
|
```
|
|
|
|
2. **Using the bot**:
|
|
- Join a voice channel in your Discord server
|
|
- Type a message in the configured text channel
|
|
- The bot will join your voice channel and read your message aloud
|
|
- Messages are queued if the bot is already speaking
|
|
|
|
3. **Voice Commands** (Slash Commands):
|
|
- `/voice list` - Shows all available voices
|
|
- `/voice set <name>` - Change your personal TTS voice
|
|
- `/voice current` - Shows your current voice
|
|
- `/voice refresh` - Re-scan for new voice files (no restart needed)
|
|
- `/voice preview <name>` - Preview a voice before selecting it
|
|
|
|
### Test Mode
|
|
|
|
Run the bot in testing mode to use a separate configuration:
|
|
|
|
```bash
|
|
python bot.py testing
|
|
```
|
|
|
|
This loads `.env.testing` instead of `.env`, allowing you to:
|
|
- Use a different Discord bot token for testing
|
|
- Monitor a different text channel
|
|
- Test new features without affecting the production bot
|
|
|
|
Create `.env.testing` by copying `.env.example` and configuring it with your testing values.
|
|
|
|
### Audio Effects
|
|
|
|
Transform your TTS voice with 7 different audio effects:
|
|
|
|
#### Available Effects:
|
|
|
|
**🎵 Pitch** (`/effects set pitch <semitones>`)
|
|
- Range: -12 to +12 semitones
|
|
- Default: 0 (no change)
|
|
- Positive = higher/chipmunk voice
|
|
- Negative = lower/deeper voice
|
|
|
|
**⚡ Speed** (`/effects set speed <multiplier>`)
|
|
- Range: 0.5 to 2.0
|
|
- Default: 1.0x (normal speed)
|
|
- Higher = faster speech
|
|
- Lower = slower speech
|
|
|
|
**🔊 Echo** (`/effects set echo <percentage>`)
|
|
- Range: 0-100%
|
|
- Default: 0% (off)
|
|
- Adds spatial delay and reverb effect
|
|
- Higher values = more pronounced echo
|
|
|
|
**🤖 Robot** (`/effects set robot <percentage>`)
|
|
- Range: 0-100%
|
|
- Default: 0% (off)
|
|
- Applies ring modulation for sci-fi robotic voice
|
|
- Higher values = more robotic distortion
|
|
|
|
**🎶 Chorus** (`/effects set chorus <percentage>`)
|
|
- Range: 0-100%
|
|
- Default: 0% (off)
|
|
- Creates "multiple voices" effect with slight pitch variations
|
|
- Higher values = more voices and depth
|
|
|
|
**〰️ Tremolo Depth** (`/effects set tremolo_depth <value>`)
|
|
- Range: 0.0 to 1.0
|
|
- Default: 0.0 (off)
|
|
- Controls amplitude modulation amount
|
|
- Higher = more warble/vintage radio effect
|
|
|
|
**📳 Tremolo Rate** (`/effects set tremolo_rate <hertz>`)
|
|
- Range: 0.0 to 10.0 Hz
|
|
- Default: 0.0 Hz (off)
|
|
- Controls how fast the tremolo warbles
|
|
- Requires tremolo_depth > 0 to have effect
|
|
|
|
#### Effect Commands:
|
|
- `/effects list` - Show all your current effect settings
|
|
- `/effects set <effect> <value>` - Change an effect value
|
|
- `/effects reset` - Reset all effects to defaults (with confirmation)
|
|
|
|
#### Effect Application Order:
|
|
Effects are applied in this sequence:
|
|
1. Pitch shift
|
|
2. Speed change
|
|
3. Echo/Reverb
|
|
4. Chorus
|
|
5. Tremolo
|
|
6. Robot voice
|
|
|
|
#### Performance Notes:
|
|
- **No limit** on number of active effects
|
|
- ⚠️ Warning shown when you have more than 2 active effects
|
|
- More effects = longer processing time
|
|
- Some effects (like pitch shift and chorus) are more CPU-intensive
|
|
- Processing time is logged to console for monitoring
|
|
|
|
### Preview with Effects
|
|
|
|
Test any combination of voice and effects before committing:
|
|
|
|
**Preview a voice:**
|
|
- `/voice preview <voice_name>` - Preview with your current effects
|
|
|
|
**Preview with specific effects:**
|
|
- `/voice preview <voice_name> pitch:5 speed:1.5` - Preview with pitch +5 and 1.5x speed
|
|
- All effect parameters are optional and default to your current settings
|
|
|
|
**Example combinations to try:**
|
|
- Robot voice: `/effects set robot 75`
|
|
- Deep scary voice: `/effects set pitch -8`
|
|
- Fast chipmunk: `/effects set pitch 8 speed:1.5`
|
|
- Radio announcer: `/effects set echo 40 tremolo_depth:0.3 tremolo_rate:4`
|
|
|
|
## How It Works
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
│ Text Channel │ --> │ Pocket TTS │ --> │ Voice Channel │
|
|
│ (configured) │ │ (generate) │ │ (user's VC) │
|
|
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
|
▲
|
|
│
|
|
┌─────┴─────┐
|
|
│ voices/ │
|
|
│ per-user │
|
|
└───────────┘
|
|
```
|
|
|
|
1. Bot monitors the configured text channel for new messages
|
|
2. When a message is received, it's added to the queue
|
|
3. The bot generates speech using Pocket TTS with the cloned voice
|
|
4. Audio is streamed to the voice channel where the message author is
|
|
|
|
## Troubleshooting
|
|
|
|
### Bot doesn't respond to messages
|
|
- Ensure Message Content Intent is enabled in Discord Developer Portal
|
|
- Check that the TEXT_CHANNEL_ID is correct
|
|
- Verify the bot has permissions to read the channel
|
|
|
|
### No audio in voice channel
|
|
- Ensure FFmpeg is installed and in PATH
|
|
- Check that the bot has Connect and Speak permissions
|
|
- Verify your voice.wav file is valid
|
|
|
|
### Voice quality issues
|
|
- Use a higher quality reference WAV file
|
|
- Ensure the reference audio is clear with minimal background noise
|
|
- Try a longer reference clip (5-10 seconds)
|
|
|
|
## Linux Server Deployment
|
|
|
|
To run the bot as a service on a Linux server:
|
|
|
|
### Quick Setup (Recommended)
|
|
|
|
```bash
|
|
# Make the setup script executable
|
|
chmod +x setup_linux.sh
|
|
|
|
# Run the setup script
|
|
./setup_linux.sh
|
|
```
|
|
|
|
The script will:
|
|
- Check system dependencies (Python 3.10+, FFmpeg, pip)
|
|
- Create a virtual environment and install dependencies
|
|
- Create `.env` template if needed
|
|
- Optionally install and configure the systemd service
|
|
|
|
### Manual Setup
|
|
|
|
1. **Install system dependencies**:
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo apt update
|
|
sudo apt install python3 python3-pip python3-venv ffmpeg
|
|
|
|
# Fedora
|
|
sudo dnf install python3 python3-pip ffmpeg
|
|
|
|
# Arch
|
|
sudo pacman -S python python-pip ffmpeg
|
|
```
|
|
|
|
2. **Set up the project**:
|
|
```bash
|
|
cd /path/to/PocketTTSBot
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Configure the service**:
|
|
|
|
Edit `pockettts.service` and replace:
|
|
- `YOUR_USERNAME` with your Linux username
|
|
- Update paths if your bot is not in `/home/YOUR_USERNAME/PocketTTSBot`
|
|
|
|
4. **Install the service**:
|
|
```bash
|
|
sudo cp pockettts.service /etc/systemd/system/
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable pockettts # Start on boot
|
|
sudo systemctl start pockettts # Start now
|
|
```
|
|
|
|
### Service Management
|
|
|
|
```bash
|
|
# Check status
|
|
sudo systemctl status pockettts
|
|
|
|
# View logs (live)
|
|
journalctl -u pockettts -f
|
|
|
|
# View recent logs
|
|
journalctl -u pockettts --since "1 hour ago"
|
|
|
|
# Restart after changes
|
|
sudo systemctl restart pockettts
|
|
|
|
# Stop the bot
|
|
sudo systemctl stop pockettts
|
|
|
|
# Disable auto-start
|
|
sudo systemctl disable pockettts
|
|
```
|
|
|
|
### Updating the Bot
|
|
|
|
```bash
|
|
cd /path/to/PocketTTSBot
|
|
git pull # If using git
|
|
source venv/bin/activate
|
|
pip install -r requirements.txt
|
|
sudo systemctl restart pockettts
|
|
```
|
|
|
|
## License
|
|
|
|
MIT License
|