docs: add HuggingFace cache troubleshooting to README
- Document HF_HOME environment variable for writable cache - Add systemd service permission guidance for /tmp paths - Troubleshooting steps for read-only file system errors
This commit is contained in:
0
.env.example
Normal file → Executable file
0
.env.example
Normal file → Executable file
3
.env.testing
Normal file → Executable file
3
.env.testing
Normal file → Executable file
@@ -16,3 +16,6 @@ VOICES_DIR=./voices
|
|||||||
# Default voice name (optional - uses first found voice if not set)
|
# Default voice name (optional - uses first found voice if not set)
|
||||||
# This should match the filename without .wav extension (case-insensitive)
|
# This should match the filename without .wav extension (case-insensitive)
|
||||||
# DEFAULT_VOICE=masterchief
|
# DEFAULT_VOICE=masterchief
|
||||||
|
|
||||||
|
# HuggingFace cache directory (must be writable)
|
||||||
|
HF_HOME=/tmp/huggingface
|
||||||
|
|||||||
0
.gitignore
vendored
Normal file → Executable file
0
.gitignore
vendored
Normal file → Executable file
21
README.md
Normal file → Executable file
21
README.md
Normal file → Executable file
@@ -253,6 +253,27 @@ Test any combination of voice and effects before committing:
|
|||||||
- Ensure the reference audio is clear with minimal background noise
|
- Ensure the reference audio is clear with minimal background noise
|
||||||
- Try a longer reference clip (5-10 seconds)
|
- Try a longer reference clip (5-10 seconds)
|
||||||
|
|
||||||
|
### HuggingFace cache read-only error
|
||||||
|
If you see errors like `OSError: [Errno 30] Read-only file system` when the bot tries to download the TTS model:
|
||||||
|
|
||||||
|
1. **Set a writable cache directory**: Add to your `.env` file:
|
||||||
|
```env
|
||||||
|
HF_HOME=/tmp/huggingface
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Create and set permissions** on the directory:
|
||||||
|
```bash
|
||||||
|
sudo mkdir /tmp/huggingface
|
||||||
|
sudo chown -R $USER:$USER /tmp/huggingface
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **If using systemd service**: Ensure the service has write access to `/tmp` or the chosen cache directory. You may need to add `ReadWritePaths=/tmp/huggingface` to the service file or remove `ProtectHome=read-only`.
|
||||||
|
|
||||||
|
4. **Restart the bot**:
|
||||||
|
```bash
|
||||||
|
sudo systemctl restart vox.service
|
||||||
|
```
|
||||||
|
|
||||||
## Linux Server Deployment
|
## Linux Server Deployment
|
||||||
|
|
||||||
To run the bot as a service on a Linux server:
|
To run the bot as a service on a Linux server:
|
||||||
|
|||||||
0
audio_effects.py
Normal file → Executable file
0
audio_effects.py
Normal file → Executable file
0
audio_preprocessor.py
Normal file → Executable file
0
audio_preprocessor.py
Normal file → Executable file
4
launch.sh
Executable file
4
launch.sh
Executable file
@@ -0,0 +1,4 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
cd /home/artanis/Documents/Vox/
|
||||||
|
source venv/bin/activate
|
||||||
|
python bot.py
|
||||||
0
media/Subnautica/CyclopsEngineOff.oga
Normal file → Executable file
0
media/Subnautica/CyclopsEngineOff.oga
Normal file → Executable file
0
media/Subnautica/CyclopsEngineOn.oga
Normal file → Executable file
0
media/Subnautica/CyclopsEngineOn.oga
Normal file → Executable file
0
media/Subnautica/CyclopsOverheat.oga
Normal file → Executable file
0
media/Subnautica/CyclopsOverheat.oga
Normal file → Executable file
0
media/Subnautica/Cyclops_Welcome.oga
Normal file → Executable file
0
media/Subnautica/Cyclops_Welcome.oga
Normal file → Executable file
0
media/Subnautica/Cyclops_Welcome2.oga
Normal file → Executable file
0
media/Subnautica/Cyclops_Welcome2.oga
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_03.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_03.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_05.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_05.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_06.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_06.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_08.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_08.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_09.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_09.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_10.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_10.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_11.wav
Normal file → Executable file
0
media/TF2/Ronin/diag_gs_titanRonin_embark_11.wav
Normal file → Executable file
0
numba_config.py
Normal file → Executable file
0
numba_config.py
Normal file → Executable file
0
pockettts.service
Normal file → Executable file
0
pockettts.service
Normal file → Executable file
0
requirements.txt
Normal file → Executable file
0
requirements.txt
Normal file → Executable file
140
research/overview.md
Executable file
140
research/overview.md
Executable file
@@ -0,0 +1,140 @@
|
|||||||
|
# Vox - Discord Text-to-Speech Bot
|
||||||
|
|
||||||
|
A Python-based Discord bot that generates neural text-to-speech using voice cloning from reference WAV files.
|
||||||
|
|
||||||
|
## Project Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
Vox/
|
||||||
|
├── bot.py # Main entry point, Discord bot implementation
|
||||||
|
├── config.py # Configuration management using environment variables
|
||||||
|
├── voice_manager.py # Voice discovery, loading, and user preferences
|
||||||
|
├── audio_effects.py # Audio post-processing effects (7 effects)
|
||||||
|
├── audio_preprocessor.py # Audio preprocessing for voice cloning
|
||||||
|
├── numba_config.py # Numba JIT compiler cache configuration
|
||||||
|
├── requirements.txt # Python dependencies
|
||||||
|
├── launch.sh # Shell script to start the bot
|
||||||
|
├── pockettts.service # Systemd service file for Linux deployment
|
||||||
|
├── README.md # Comprehensive documentation
|
||||||
|
├── .env # Production environment configuration
|
||||||
|
├── .env.testing # Testing environment configuration
|
||||||
|
├── .env.example # Environment configuration template
|
||||||
|
└── voices/ # Directory for voice WAV files
|
||||||
|
├── preferences.json # User voice/effect preferences (auto-generated)
|
||||||
|
└── *.wav # Voice reference files
|
||||||
|
```
|
||||||
|
|
||||||
|
## Core Functionality
|
||||||
|
|
||||||
|
### TTS Implementation
|
||||||
|
- **Engine**: Pocket TTS (`pocket-tts` library) for neural text-to-speech synthesis
|
||||||
|
- **Voice Cloning**: Uses reference WAV files to clone voices via `model.get_state_for_audio_prompt()`
|
||||||
|
- **On-demand Loading**: Voices are loaded only when first needed, then cached
|
||||||
|
|
||||||
|
### Discord Integration
|
||||||
|
- Monitors a configured text channel for messages
|
||||||
|
- Joins the user's voice channel when they speak
|
||||||
|
- Uses `discord.FFmpegPCMAudio` with piped WAV data for streaming
|
||||||
|
|
||||||
|
### Audio Processing Pipeline
|
||||||
|
```
|
||||||
|
Text Message → Pocket TTS → Audio Effects → Normalize → FFmpeg → Discord VC
|
||||||
|
```
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
| Library | Purpose |
|
||||||
|
|---------|---------|
|
||||||
|
| `discord.py[voice]>=2.3.0` | Discord bot API with voice support |
|
||||||
|
| `pocket-tts>=0.1.0` | Neural TTS engine with voice cloning |
|
||||||
|
| `scipy>=1.10.0` | Scientific computing (audio I/O) |
|
||||||
|
| `numpy>=1.24.0` | Numerical computing |
|
||||||
|
| `librosa>=0.10.0` | Audio analysis and effects |
|
||||||
|
| `noisereduce>=3.0.0` | Noise reduction preprocessing |
|
||||||
|
| `soundfile>=0.12.0` | Audio file I/O |
|
||||||
|
| `python-dotenv>=1.0.0` | Environment variable loading |
|
||||||
|
|
||||||
|
**System Requirements**: Python 3.10+, FFmpeg
|
||||||
|
|
||||||
|
## Key Modules
|
||||||
|
|
||||||
|
### `TTSBot` (bot.py)
|
||||||
|
Main Discord bot class that extends `commands.Bot`. Handles:
|
||||||
|
- Message processing and TTS queue
|
||||||
|
- Voice channel connections
|
||||||
|
- Slash command registration
|
||||||
|
- Startup initialization (loads TTS model, discovers voices)
|
||||||
|
|
||||||
|
### `VoiceManager` (voice_manager.py)
|
||||||
|
Manages voice files and user preferences:
|
||||||
|
- Discovers voices from WAV files in `voices/` directory
|
||||||
|
- On-demand voice loading with caching
|
||||||
|
- Per-user voice selection and effect preferences
|
||||||
|
- Preferences persistence to JSON
|
||||||
|
|
||||||
|
### `AudioEffects` (audio_effects.py)
|
||||||
|
Provides 7 post-processing effects:
|
||||||
|
1. **Pitch** (-12 to +12 semitones)
|
||||||
|
2. **Speed** (0.5x to 2.0x)
|
||||||
|
3. **Echo** (0-100%)
|
||||||
|
4. **Robot** (0-100%) - Ring modulation
|
||||||
|
5. **Chorus** (0-100%) - Multiple voice layering
|
||||||
|
6. **Tremolo Depth** (0.0-1.0)
|
||||||
|
7. **Tremolo Rate** (0.0-10.0 Hz)
|
||||||
|
|
||||||
|
### `AudioPreprocessor` (audio_preprocessor.py)
|
||||||
|
Prepares voice reference files for cloning:
|
||||||
|
1. Load and resample to 22050 Hz
|
||||||
|
2. Normalize volume
|
||||||
|
3. Trim silence
|
||||||
|
4. Noise reduction
|
||||||
|
5. Limit length (default 15 seconds)
|
||||||
|
|
||||||
|
### `Config` (config.py)
|
||||||
|
Centralized configuration management with environment-aware loading and validation.
|
||||||
|
|
||||||
|
## Slash Commands
|
||||||
|
|
||||||
|
| Command | Description |
|
||||||
|
|---------|-------------|
|
||||||
|
| `/voice list` | Show available voices |
|
||||||
|
| `/voice set <name>` | Select your voice |
|
||||||
|
| `/voice current` | Show current voice |
|
||||||
|
| `/voice refresh` | Rescan for new voices |
|
||||||
|
| `/voice preview <name>` | Preview before committing |
|
||||||
|
| `/effects list` | Show your effect settings |
|
||||||
|
| `/effects set <effect> <value>` | Adjust effects |
|
||||||
|
| `/effects reset` | Reset to defaults |
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- **Voice Cloning**: Add new voices by placing `.wav` files in `voices/` directory
|
||||||
|
- **Per-User Customization**: Each user can have their own voice and effect preferences
|
||||||
|
- **Hot-Reload**: Rescan for new voices without restart (`/voice refresh`)
|
||||||
|
- **Message Queue**: Queues messages for sequential playback
|
||||||
|
- **Inactivity Management**: Disconnects after 10 minutes of inactivity
|
||||||
|
- **Testing Support**: Separate `.env.testing` configuration for safe development
|
||||||
|
|
||||||
|
## Configuration (.env)
|
||||||
|
|
||||||
|
```env
|
||||||
|
DISCORD_TOKEN=your_bot_token
|
||||||
|
TEXT_CHANNEL_ID=channel_id_to_monitor
|
||||||
|
VOICES_DIR=./voices
|
||||||
|
DEFAULT_VOICE=optional_default_voice_name
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running the Bot
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Production
|
||||||
|
python bot.py
|
||||||
|
|
||||||
|
# Testing (uses .env.testing)
|
||||||
|
python bot.py testing
|
||||||
|
|
||||||
|
# Or use the launch script
|
||||||
|
./launch.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
For production deployment on Linux, a systemd service file (`pockettts.service`) is included.
|
||||||
0
voice_manager.py
Normal file → Executable file
0
voice_manager.py
Normal file → Executable file
0
voices/ChoGath.wav
Normal file → Executable file
0
voices/ChoGath.wav
Normal file → Executable file
0
voices/Estinien.wav
Normal file → Executable file
0
voices/Estinien.wav
Normal file → Executable file
0
voices/Gaius.wav
Normal file → Executable file
0
voices/Gaius.wav
Normal file → Executable file
0
voices/Gibralter_funny.wav
Normal file → Executable file
0
voices/Gibralter_funny.wav
Normal file → Executable file
0
voices/Gibralter_good.wav
Normal file → Executable file
0
voices/Gibralter_good.wav
Normal file → Executable file
0
voices/HankHill.wav
Normal file → Executable file
0
voices/HankHill.wav
Normal file → Executable file
0
voices/Johnny.wav
Normal file → Executable file
0
voices/Johnny.wav
Normal file → Executable file
0
voices/MasterChief.wav
Normal file → Executable file
0
voices/MasterChief.wav
Normal file → Executable file
0
voices/SelfHelpSingh.wav
Normal file → Executable file
0
voices/SelfHelpSingh.wav
Normal file → Executable file
0
voices/Trump.wav
Normal file → Executable file
0
voices/Trump.wav
Normal file → Executable file
Reference in New Issue
Block a user