- Added new audio_effects.py module with pitch shift and speed change - Pitch range: -12 to +12 semitones (higher = chipmunk, lower = deeper) - Speed range: 0.5 to 2.0x (higher = faster, lower = slower) - Maximum 2 active effects per user (performance optimization) - Added /effects command group: - /effects list - Shows current effects with descriptions - /effects set pitch|speed <value> - Apply effects - /effects reset - Confirmation UI to clear all effects - Effects persist across restarts in preferences.json - Updated /voice preview to support optional pitch/speed parameters - Effects applied in _generate_wav_bytes using librosa - Added performance warnings when processing takes >1 second - Updated README with effects documentation
282 lines
8.4 KiB
Markdown
282 lines
8.4 KiB
Markdown
# Pocket TTS Discord Bot
|
|
|
|
A Discord bot that reads messages aloud using [Pocket TTS](https://github.com/kyutai-labs/pocket-tts) with voice cloning from a reference WAV file.
|
|
|
|
## Features
|
|
|
|
- 🎤 **Voice Cloning**: Uses a reference WAV file to clone a voice
|
|
- 📝 **Auto-read Messages**: Automatically reads all messages from a configured text channel
|
|
- 🔊 **Voice Channel Streaming**: Streams generated audio to the voice channel where the message author is
|
|
- 📋 **Message Queue**: Messages are queued and spoken in order
|
|
- 🔄 **Per-User Voice Selection**: Each user can choose their own TTS voice via `/voice` commands
|
|
- 💾 **Voice Persistence**: User voice preferences are saved and restored on restart
|
|
- 🔄 **Hot-reload Voices**: Add new voices without restarting the bot using `/voice refresh`
|
|
- 🧪 **Test Mode**: Separate testing configuration for safe development
|
|
- 📦 **Auto-updates**: Automatically checks for and installs dependency updates on startup
|
|
- 👂 **Voice Preview**: Preview voices with `/voice preview` before committing to them
|
|
- 🎵 **Audio Effects**: Apply pitch shift and speed changes to your TTS voice
|
|
|
|
## Prerequisites
|
|
|
|
- Python 3.10+
|
|
- FFmpeg installed and available in PATH
|
|
- A Discord bot token
|
|
- A reference voice WAV file (3-10 seconds of clear speech recommended)
|
|
|
|
## Installation
|
|
|
|
1. **Clone the repository**:
|
|
```bash
|
|
git clone <repository-url>
|
|
cd PocketTTSBot
|
|
```
|
|
|
|
2. **Create a virtual environment**:
|
|
```bash
|
|
python -m venv venv
|
|
|
|
# Windows
|
|
venv\Scripts\activate
|
|
|
|
# Linux/macOS
|
|
source venv/bin/activate
|
|
```
|
|
|
|
3. **Install dependencies**:
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
4. **Install FFmpeg**:
|
|
- **Windows**: Download from [ffmpeg.org](https://ffmpeg.org/download.html) and add to PATH
|
|
- **Linux**: `sudo apt install ffmpeg`
|
|
- **macOS**: `brew install ffmpeg`
|
|
|
|
## Configuration
|
|
|
|
1. **Create a Discord Bot**:
|
|
- Go to [Discord Developer Portal](https://discord.com/developers/applications)
|
|
- Create a new application
|
|
- Go to the "Bot" section and create a bot
|
|
- Copy the bot token
|
|
- Enable these Privileged Gateway Intents:
|
|
- Message Content Intent
|
|
- Server Members Intent (optional)
|
|
|
|
2. **Invite the Bot to your server**:
|
|
- Go to OAuth2 > URL Generator
|
|
- Select scopes: `bot`
|
|
- Select permissions: `Connect`, `Speak`, `Send Messages`, `Read Message History`
|
|
- Use the generated URL to invite the bot
|
|
|
|
3. **Get Channel ID**:
|
|
- Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
|
|
- Right-click the text channel you want to monitor and click "Copy ID"
|
|
|
|
4. **Create `.env` file**:
|
|
```bash
|
|
cp .env.example .env
|
|
```
|
|
|
|
Edit `.env` with your values:
|
|
```env
|
|
DISCORD_TOKEN=your_bot_token_here
|
|
TEXT_CHANNEL_ID=123456789012345678
|
|
VOICES_DIR=./voices
|
|
DEFAULT_VOICE=estinien
|
|
```
|
|
|
|
5. **Add voice reference files**:
|
|
- Create a `voices/` directory: `mkdir voices`
|
|
- Place `.wav` files in the `voices/` directory
|
|
- Each file should contain 3-10 seconds of clear speech
|
|
- File names become voice names (e.g., `MasterChief.wav` → `/voice set masterchief`)
|
|
- Higher quality audio = better voice cloning results
|
|
|
|
## Usage
|
|
|
|
1. **Start the bot**:
|
|
```bash
|
|
python bot.py
|
|
```
|
|
|
|
2. **Using the bot**:
|
|
- Join a voice channel in your Discord server
|
|
- Type a message in the configured text channel
|
|
- The bot will join your voice channel and read your message aloud
|
|
- Messages are queued if the bot is already speaking
|
|
|
|
3. **Voice Commands** (Slash Commands):
|
|
- `/voice list` - Shows all available voices
|
|
- `/voice set <name>` - Change your personal TTS voice
|
|
- `/voice current` - Shows your current voice
|
|
- `/voice refresh` - Re-scan for new voice files (no restart needed)
|
|
- `/voice preview <name>` - Preview a voice before selecting it
|
|
|
|
### Test Mode
|
|
|
|
Run the bot in testing mode to use a separate configuration:
|
|
|
|
```bash
|
|
python bot.py testing
|
|
```
|
|
|
|
This loads `.env.testing` instead of `.env`, allowing you to:
|
|
- Use a different Discord bot token for testing
|
|
- Monitor a different text channel
|
|
- Test new features without affecting the production bot
|
|
|
|
Create `.env.testing` by copying `.env.example` and configuring it with your testing values.
|
|
|
|
### Audio Effects
|
|
|
|
Apply pitch shift and speed changes to your TTS voice:
|
|
|
|
- `/effects list` - Show your current effect settings
|
|
- `/effects set pitch <semitones>` - Change pitch (-12 to +12)
|
|
- Positive = higher/chipmunk voice
|
|
- Negative = lower/deeper voice
|
|
- 0 = normal pitch (default)
|
|
- `/effects set speed <multiplier>` - Change speed (0.5 to 2.0)
|
|
- Higher = faster speech
|
|
- Lower = slower speech
|
|
- 1.0 = normal speed (default)
|
|
- `/effects reset` - Reset all effects to defaults
|
|
|
|
**Note**: You can use up to 2 effects simultaneously. More effects require more processing time.
|
|
|
|
### Preview with Effects
|
|
|
|
Test voice and effect combinations before committing:
|
|
- `/voice preview <name> [pitch] [speed]` - Preview a voice with optional effect overrides
|
|
|
|
## How It Works
|
|
|
|
```
|
|
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
|
|
│ Text Channel │ --> │ Pocket TTS │ --> │ Voice Channel │
|
|
│ (configured) │ │ (generate) │ │ (user's VC) │
|
|
└─────────────────┘ └──────────────────┘ └─────────────────┘
|
|
▲
|
|
│
|
|
┌─────┴─────┐
|
|
│ voices/ │
|
|
│ per-user │
|
|
└───────────┘
|
|
```
|
|
|
|
1. Bot monitors the configured text channel for new messages
|
|
2. When a message is received, it's added to the queue
|
|
3. The bot generates speech using Pocket TTS with the cloned voice
|
|
4. Audio is streamed to the voice channel where the message author is
|
|
|
|
## Troubleshooting
|
|
|
|
### Bot doesn't respond to messages
|
|
- Ensure Message Content Intent is enabled in Discord Developer Portal
|
|
- Check that the TEXT_CHANNEL_ID is correct
|
|
- Verify the bot has permissions to read the channel
|
|
|
|
### No audio in voice channel
|
|
- Ensure FFmpeg is installed and in PATH
|
|
- Check that the bot has Connect and Speak permissions
|
|
- Verify your voice.wav file is valid
|
|
|
|
### Voice quality issues
|
|
- Use a higher quality reference WAV file
|
|
- Ensure the reference audio is clear with minimal background noise
|
|
- Try a longer reference clip (5-10 seconds)
|
|
|
|
## Linux Server Deployment
|
|
|
|
To run the bot as a service on a Linux server:
|
|
|
|
### Quick Setup (Recommended)
|
|
|
|
```bash
|
|
# Make the setup script executable
|
|
chmod +x setup_linux.sh
|
|
|
|
# Run the setup script
|
|
./setup_linux.sh
|
|
```
|
|
|
|
The script will:
|
|
- Check system dependencies (Python 3.10+, FFmpeg, pip)
|
|
- Create a virtual environment and install dependencies
|
|
- Create `.env` template if needed
|
|
- Optionally install and configure the systemd service
|
|
|
|
### Manual Setup
|
|
|
|
1. **Install system dependencies**:
|
|
```bash
|
|
# Ubuntu/Debian
|
|
sudo apt update
|
|
sudo apt install python3 python3-pip python3-venv ffmpeg
|
|
|
|
# Fedora
|
|
sudo dnf install python3 python3-pip ffmpeg
|
|
|
|
# Arch
|
|
sudo pacman -S python python-pip ffmpeg
|
|
```
|
|
|
|
2. **Set up the project**:
|
|
```bash
|
|
cd /path/to/PocketTTSBot
|
|
python3 -m venv venv
|
|
source venv/bin/activate
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Configure the service**:
|
|
|
|
Edit `pockettts.service` and replace:
|
|
- `YOUR_USERNAME` with your Linux username
|
|
- Update paths if your bot is not in `/home/YOUR_USERNAME/PocketTTSBot`
|
|
|
|
4. **Install the service**:
|
|
```bash
|
|
sudo cp pockettts.service /etc/systemd/system/
|
|
sudo systemctl daemon-reload
|
|
sudo systemctl enable pockettts # Start on boot
|
|
sudo systemctl start pockettts # Start now
|
|
```
|
|
|
|
### Service Management
|
|
|
|
```bash
|
|
# Check status
|
|
sudo systemctl status pockettts
|
|
|
|
# View logs (live)
|
|
journalctl -u pockettts -f
|
|
|
|
# View recent logs
|
|
journalctl -u pockettts --since "1 hour ago"
|
|
|
|
# Restart after changes
|
|
sudo systemctl restart pockettts
|
|
|
|
# Stop the bot
|
|
sudo systemctl stop pockettts
|
|
|
|
# Disable auto-start
|
|
sudo systemctl disable pockettts
|
|
```
|
|
|
|
### Updating the Bot
|
|
|
|
```bash
|
|
cd /path/to/PocketTTSBot
|
|
git pull # If using git
|
|
source venv/bin/activate
|
|
pip install -r requirements.txt
|
|
sudo systemctl restart pockettts
|
|
```
|
|
|
|
## License
|
|
|
|
MIT License
|