Files

Spencer Grimes 85a334a57b docs: update README with comprehensive effects documentation and bump version to 1.2.0

README Updates:
- Updated features list with all new capabilities
- Comprehensive Audio Effects section covering all 7 effects:
  - Pitch, Speed, Echo, Robot, Chorus, Tremolo Depth, Tremolo Rate
- Detailed effect ranges, defaults, and descriptions
- Effect application order documentation
- Performance notes and warnings
- Enhanced Preview with Effects section with examples
- Example effect combinations for users to try

Version Bump:
- Bumped __version__ from 1.1.0 to 1.2.0

Major features in 1.2.0:
- 4 new voice effects (echo, robot, chorus, tremolo)
- Unlimited effects with performance warnings
- Complete effects pipeline implementation
- Enhanced preview system

2026-01-31 17:33:28 -06:00

10 KiB

Raw Blame History

Pocket TTS Discord Bot

A Discord bot that reads messages aloud using Pocket TTS with voice cloning from a reference WAV file.

Features

🎤 Voice Cloning: Uses a reference WAV file to clone a voice
📝 Auto-read Messages: Automatically reads all messages from a configured text channel
🔊 Voice Channel Streaming: Streams generated audio to the voice channel where the message author is
📋 Message Queue: Messages are queued and spoken in order
🔄 Per-User Voice Selection: Each user can choose their own TTS voice via /voice commands
💾 Voice Persistence: User voice preferences are saved and restored on restart
🔄 Hot-reload Voices: Add new voices without restarting the bot using /voice refresh
🧪 Test Mode: Separate testing configuration for safe development
📦 Auto-updates: Automatically checks for and installs dependency updates on startup
👂 Voice Preview: Preview voices with /voice preview before committing to them
🎵 Audio Effects: 7 different effects to customize your voice (pitch, speed, echo, robot, chorus, tremolo)
⚡ Unlimited Effects: Use as many effects as you want (warning shown when >2 active)
⏱️ Processing Indicator: Shows when audio processing is taking longer than expected

Prerequisites

Python 3.10+
FFmpeg installed and available in PATH
A Discord bot token
A reference voice WAV file (3-10 seconds of clear speech recommended)

Installation

Clone the repository:

git clone <repository-url>
cd PocketTTSBot

Create a virtual environment:

python -m venv venv

# Windows
venv\Scripts\activate

# Linux/macOS
source venv/bin/activate

Install dependencies:
```
pip install -r requirements.txt
```
Install FFmpeg:
- Windows: Download from ffmpeg.org and add to PATH
- Linux: sudo apt install ffmpeg
- macOS: brew install ffmpeg

Configuration

Create a Discord Bot:
- Go to Discord Developer Portal
- Create a new application
- Go to the "Bot" section and create a bot
- Copy the bot token
- Enable these Privileged Gateway Intents:
  - Message Content Intent
  - Server Members Intent (optional)
Invite the Bot to your server:
- Go to OAuth2 > URL Generator
- Select scopes: bot
- Select permissions: Connect, Speak, Send Messages, Read Message History
- Use the generated URL to invite the bot
Get Channel ID:
- Enable Developer Mode in Discord (Settings > Advanced > Developer Mode)
- Right-click the text channel you want to monitor and click "Copy ID"

Create .env file:

cp .env.example .env

Edit .env with your values:

DISCORD_TOKEN=your_bot_token_here
TEXT_CHANNEL_ID=123456789012345678
VOICES_DIR=./voices
DEFAULT_VOICE=estinien

Add voice reference files:
- Create a voices/ directory: mkdir voices
- Place .wav files in the voices/ directory
- Each file should contain 3-10 seconds of clear speech
- File names become voice names (e.g., MasterChief.wav → /voice set masterchief)
- Higher quality audio = better voice cloning results

Usage

Start the bot:
```
python bot.py
```
Using the bot:
- Join a voice channel in your Discord server
- Type a message in the configured text channel
- The bot will join your voice channel and read your message aloud
- Messages are queued if the bot is already speaking
Voice Commands (Slash Commands):
- /voice list - Shows all available voices
- /voice set <name> - Change your personal TTS voice
- /voice current - Shows your current voice
- /voice refresh - Re-scan for new voice files (no restart needed)
- /voice preview <name> - Preview a voice before selecting it

Test Mode

Run the bot in testing mode to use a separate configuration:

python bot.py testing

This loads .env.testing instead of .env, allowing you to:

Use a different Discord bot token for testing
Monitor a different text channel
Test new features without affecting the production bot

Create .env.testing by copying .env.example and configuring it with your testing values.

Audio Effects

Transform your TTS voice with 7 different audio effects:

Available Effects:

🎵 Pitch (/effects set pitch <semitones>)

Range: -12 to +12 semitones
Default: 0 (no change)
Positive = higher/chipmunk voice
Negative = lower/deeper voice

⚡ Speed (/effects set speed <multiplier>)

Range: 0.5 to 2.0
Default: 1.0x (normal speed)
Higher = faster speech
Lower = slower speech

🔊 Echo (/effects set echo <percentage>)

Range: 0-100%
Default: 0% (off)
Adds spatial delay and reverb effect
Higher values = more pronounced echo

🤖 Robot (/effects set robot <percentage>)

Range: 0-100%
Default: 0% (off)
Applies ring modulation for sci-fi robotic voice
Higher values = more robotic distortion

🎶 Chorus (/effects set chorus <percentage>)

Range: 0-100%
Default: 0% (off)
Creates "multiple voices" effect with slight pitch variations
Higher values = more voices and depth

〰️ Tremolo Depth (/effects set tremolo_depth <value>)

Range: 0.0 to 1.0
Default: 0.0 (off)
Controls amplitude modulation amount
Higher = more warble/vintage radio effect

📳 Tremolo Rate (/effects set tremolo_rate <hertz>)

Range: 0.0 to 10.0 Hz
Default: 0.0 Hz (off)
Controls how fast the tremolo warbles
Requires tremolo_depth > 0 to have effect

Effect Commands:

/effects list - Show all your current effect settings
/effects set <effect> <value> - Change an effect value
/effects reset - Reset all effects to defaults (with confirmation)

Effect Application Order:

Effects are applied in this sequence:

Pitch shift
Speed change
Echo/Reverb
Chorus
Tremolo
Robot voice

Performance Notes:

No limit on number of active effects
⚠️ Warning shown when you have more than 2 active effects
More effects = longer processing time
Some effects (like pitch shift and chorus) are more CPU-intensive
Processing time is logged to console for monitoring

Preview with Effects

Test any combination of voice and effects before committing:

Preview a voice:

/voice preview <voice_name> - Preview with your current effects

Preview with specific effects:

/voice preview <voice_name> pitch:5 speed:1.5 - Preview with pitch +5 and 1.5x speed
All effect parameters are optional and default to your current settings

Example combinations to try:

Robot voice: /effects set robot 75
Deep scary voice: /effects set pitch -8
Fast chipmunk: /effects set pitch 8 speed:1.5
Radio announcer: /effects set echo 40 tremolo_depth:0.3 tremolo_rate:4

How It Works

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  Text Channel   │ --> │   Pocket TTS     │ --> │  Voice Channel  │
│  (configured)   │     │   (generate)     │     │  (user's VC)    │
└─────────────────┘     └──────────────────┘     └─────────────────┘
                              ▲
                              │
                        ┌─────┴─────┐
                        │  voices/  │
                        │ per-user  │
                        └───────────┘

Bot monitors the configured text channel for new messages
When a message is received, it's added to the queue
The bot generates speech using Pocket TTS with the cloned voice
Audio is streamed to the voice channel where the message author is

Troubleshooting

Bot doesn't respond to messages

Ensure Message Content Intent is enabled in Discord Developer Portal
Check that the TEXT_CHANNEL_ID is correct
Verify the bot has permissions to read the channel

No audio in voice channel

Ensure FFmpeg is installed and in PATH
Check that the bot has Connect and Speak permissions
Verify your voice.wav file is valid

Voice quality issues

Use a higher quality reference WAV file
Ensure the reference audio is clear with minimal background noise
Try a longer reference clip (5-10 seconds)

Linux Server Deployment

To run the bot as a service on a Linux server:

Quick Setup (Recommended)

# Make the setup script executable
chmod +x setup_linux.sh

# Run the setup script
./setup_linux.sh

The script will:

Check system dependencies (Python 3.10+, FFmpeg, pip)
Create a virtual environment and install dependencies
Create .env template if needed
Optionally install and configure the systemd service

Manual Setup

Install system dependencies:

# Ubuntu/Debian
sudo apt update
sudo apt install python3 python3-pip python3-venv ffmpeg

# Fedora
sudo dnf install python3 python3-pip ffmpeg

# Arch
sudo pacman -S python python-pip ffmpeg

Set up the project:

cd /path/to/PocketTTSBot
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Configure the service:

Edit pockettts.service and replace:
- YOUR_USERNAME with your Linux username
- Update paths if your bot is not in /home/YOUR_USERNAME/PocketTTSBot

Install the service:

sudo cp pockettts.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable pockettts  # Start on boot
sudo systemctl start pockettts   # Start now

Service Management

# Check status
sudo systemctl status pockettts

# View logs (live)
journalctl -u pockettts -f

# View recent logs
journalctl -u pockettts --since "1 hour ago"

# Restart after changes
sudo systemctl restart pockettts

# Stop the bot
sudo systemctl stop pockettts

# Disable auto-start
sudo systemctl disable pockettts

Updating the Bot

cd /path/to/PocketTTSBot
git pull  # If using git
source venv/bin/activate
pip install -r requirements.txt
sudo systemctl restart pockettts

License

MIT License

10 KiB Raw Blame History