README Updates: - Updated features list with all new capabilities - Comprehensive Audio Effects section covering all 7 effects: - Pitch, Speed, Echo, Robot, Chorus, Tremolo Depth, Tremolo Rate - Detailed effect ranges, defaults, and descriptions - Effect application order documentation - Performance notes and warnings - Enhanced Preview with Effects section with examples - Example effect combinations for users to try Version Bump: - Bumped __version__ from 1.1.0 to 1.2.0 Major features in 1.2.0: - 4 new voice effects (echo, robot, chorus, tremolo) - Unlimited effects with performance warnings - Complete effects pipeline implementation - Enhanced preview system
10 KiB
Pocket TTS Discord Bot
A Discord bot that reads messages aloud using Pocket TTS with voice cloning from a reference WAV file.
Features
- 🎤 Voice Cloning: Uses a reference WAV file to clone a voice
- 📝 Auto-read Messages: Automatically reads all messages from a configured text channel
- 🔊 Voice Channel Streaming: Streams generated audio to the voice channel where the message author is
- 📋 Message Queue: Messages are queued and spoken in order
- 🔄 Per-User Voice Selection: Each user can choose their own TTS voice via
/voicecommands - 💾 Voice Persistence: User voice preferences are saved and restored on restart
- 🔄 Hot-reload Voices: Add new voices without restarting the bot using
/voice refresh - 🧪 Test Mode: Separate testing configuration for safe development
- 📦 Auto-updates: Automatically checks for and installs dependency updates on startup
- 👂 Voice Preview: Preview voices with
/voice previewbefore committing to them - 🎵 Audio Effects: 7 different effects to customize your voice (pitch, speed, echo, robot, chorus, tremolo)
- ⚡ Unlimited Effects: Use as many effects as you want (warning shown when >2 active)
- ⏱️ Processing Indicator: Shows when audio processing is taking longer than expected
Prerequisites
- Python 3.10+
- FFmpeg installed and available in PATH
- A Discord bot token
- A reference voice WAV file (3-10 seconds of clear speech recommended)
Installation
-
Clone the repository:
git clone <repository-url> cd PocketTTSBot -
Create a virtual environment:
python -m venv venv # Windows venv\Scripts\activate # Linux/macOS source venv/bin/activate -
Install dependencies:
pip install -r requirements.txt -
Install FFmpeg:
- Windows: Download from ffmpeg.org and add to PATH
- Linux:
sudo apt install ffmpeg - macOS:
brew install ffmpeg
Configuration
-
Create a Discord Bot:
- Go to Discord Developer Portal
- Create a new application
- Go to the "Bot" section and create a bot
- Copy the bot token
- Enable these Privileged Gateway Intents:
- Message Content Intent
- Server Members Intent (optional)
-
Invite the Bot to your server:
- Go to OAuth2 > URL Generator
- Select scopes:
bot - Select permissions:
Connect,Speak,Send Messages,Read Message History - Use the generated URL to invite the bot
-
Get Channel ID:
- Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
- Right-click the text channel you want to monitor and click "Copy ID"
-
Create
.envfile:cp .env.example .envEdit
.envwith your values:DISCORD_TOKEN=your_bot_token_here TEXT_CHANNEL_ID=123456789012345678 VOICES_DIR=./voices DEFAULT_VOICE=estinien -
Add voice reference files:
- Create a
voices/directory:mkdir voices - Place
.wavfiles in thevoices/directory - Each file should contain 3-10 seconds of clear speech
- File names become voice names (e.g.,
MasterChief.wav→/voice set masterchief) - Higher quality audio = better voice cloning results
- Create a
Usage
-
Start the bot:
python bot.py -
Using the bot:
- Join a voice channel in your Discord server
- Type a message in the configured text channel
- The bot will join your voice channel and read your message aloud
- Messages are queued if the bot is already speaking
-
Voice Commands (Slash Commands):
/voice list- Shows all available voices/voice set <name>- Change your personal TTS voice/voice current- Shows your current voice/voice refresh- Re-scan for new voice files (no restart needed)/voice preview <name>- Preview a voice before selecting it
Test Mode
Run the bot in testing mode to use a separate configuration:
python bot.py testing
This loads .env.testing instead of .env, allowing you to:
- Use a different Discord bot token for testing
- Monitor a different text channel
- Test new features without affecting the production bot
Create .env.testing by copying .env.example and configuring it with your testing values.
Audio Effects
Transform your TTS voice with 7 different audio effects:
Available Effects:
🎵 Pitch (/effects set pitch <semitones>)
- Range: -12 to +12 semitones
- Default: 0 (no change)
- Positive = higher/chipmunk voice
- Negative = lower/deeper voice
⚡ Speed (/effects set speed <multiplier>)
- Range: 0.5 to 2.0
- Default: 1.0x (normal speed)
- Higher = faster speech
- Lower = slower speech
🔊 Echo (/effects set echo <percentage>)
- Range: 0-100%
- Default: 0% (off)
- Adds spatial delay and reverb effect
- Higher values = more pronounced echo
🤖 Robot (/effects set robot <percentage>)
- Range: 0-100%
- Default: 0% (off)
- Applies ring modulation for sci-fi robotic voice
- Higher values = more robotic distortion
🎶 Chorus (/effects set chorus <percentage>)
- Range: 0-100%
- Default: 0% (off)
- Creates "multiple voices" effect with slight pitch variations
- Higher values = more voices and depth
〰️ Tremolo Depth (/effects set tremolo_depth <value>)
- Range: 0.0 to 1.0
- Default: 0.0 (off)
- Controls amplitude modulation amount
- Higher = more warble/vintage radio effect
📳 Tremolo Rate (/effects set tremolo_rate <hertz>)
- Range: 0.0 to 10.0 Hz
- Default: 0.0 Hz (off)
- Controls how fast the tremolo warbles
- Requires tremolo_depth > 0 to have effect
Effect Commands:
/effects list- Show all your current effect settings/effects set <effect> <value>- Change an effect value/effects reset- Reset all effects to defaults (with confirmation)
Effect Application Order:
Effects are applied in this sequence:
- Pitch shift
- Speed change
- Echo/Reverb
- Chorus
- Tremolo
- Robot voice
Performance Notes:
- No limit on number of active effects
- ⚠️ Warning shown when you have more than 2 active effects
- More effects = longer processing time
- Some effects (like pitch shift and chorus) are more CPU-intensive
- Processing time is logged to console for monitoring
Preview with Effects
Test any combination of voice and effects before committing:
Preview a voice:
/voice preview <voice_name>- Preview with your current effects
Preview with specific effects:
/voice preview <voice_name> pitch:5 speed:1.5- Preview with pitch +5 and 1.5x speed- All effect parameters are optional and default to your current settings
Example combinations to try:
- Robot voice:
/effects set robot 75 - Deep scary voice:
/effects set pitch -8 - Fast chipmunk:
/effects set pitch 8 speed:1.5 - Radio announcer:
/effects set echo 40 tremolo_depth:0.3 tremolo_rate:4
How It Works
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Text Channel │ --> │ Pocket TTS │ --> │ Voice Channel │
│ (configured) │ │ (generate) │ │ (user's VC) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
▲
│
┌─────┴─────┐
│ voices/ │
│ per-user │
└───────────┘
- Bot monitors the configured text channel for new messages
- When a message is received, it's added to the queue
- The bot generates speech using Pocket TTS with the cloned voice
- Audio is streamed to the voice channel where the message author is
Troubleshooting
Bot doesn't respond to messages
- Ensure Message Content Intent is enabled in Discord Developer Portal
- Check that the TEXT_CHANNEL_ID is correct
- Verify the bot has permissions to read the channel
No audio in voice channel
- Ensure FFmpeg is installed and in PATH
- Check that the bot has Connect and Speak permissions
- Verify your voice.wav file is valid
Voice quality issues
- Use a higher quality reference WAV file
- Ensure the reference audio is clear with minimal background noise
- Try a longer reference clip (5-10 seconds)
Linux Server Deployment
To run the bot as a service on a Linux server:
Quick Setup (Recommended)
# Make the setup script executable
chmod +x setup_linux.sh
# Run the setup script
./setup_linux.sh
The script will:
- Check system dependencies (Python 3.10+, FFmpeg, pip)
- Create a virtual environment and install dependencies
- Create
.envtemplate if needed - Optionally install and configure the systemd service
Manual Setup
-
Install system dependencies:
# Ubuntu/Debian sudo apt update sudo apt install python3 python3-pip python3-venv ffmpeg # Fedora sudo dnf install python3 python3-pip ffmpeg # Arch sudo pacman -S python python-pip ffmpeg -
Set up the project:
cd /path/to/PocketTTSBot python3 -m venv venv source venv/bin/activate pip install -r requirements.txt -
Configure the service:
Edit
pockettts.serviceand replace:YOUR_USERNAMEwith your Linux username- Update paths if your bot is not in
/home/YOUR_USERNAME/PocketTTSBot
-
Install the service:
sudo cp pockettts.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable pockettts # Start on boot sudo systemctl start pockettts # Start now
Service Management
# Check status
sudo systemctl status pockettts
# View logs (live)
journalctl -u pockettts -f
# View recent logs
journalctl -u pockettts --since "1 hour ago"
# Restart after changes
sudo systemctl restart pockettts
# Stop the bot
sudo systemctl stop pockettts
# Disable auto-start
sudo systemctl disable pockettts
Updating the Bot
cd /path/to/PocketTTSBot
git pull # If using git
source venv/bin/activate
pip install -r requirements.txt
sudo systemctl restart pockettts
License
MIT License