docs: add HuggingFace cache troubleshooting to README

- Document HF_HOME environment variable for writable cache - Add systemd service permission guidance for /tmp paths - Troubleshooting steps for read-only file system errors
2026-02-26 15:56:09 -06:00
parent 85a334a57b
commit 9917d44f5d
36 changed files with 168 additions and 0 deletions
--- a/.env.example
+++ b/.env.example
--- a/.env.testing
+++ b/.env.testing
@@ -16,3 +16,6 @@ VOICES_DIR=./voices
 # Default voice name (optional - uses first found voice if not set)
 # This should match the filename without .wav extension (case-insensitive)
 # DEFAULT_VOICE=masterchief
 # HuggingFace cache directory (must be writable)
 HF_HOME=/tmp/huggingface
--- a/.gitignore
+++ b/.gitignore
--- a/README.md
+++ b/README.md
@@ -253,6 +253,27 @@ Test any combination of voice and effects before committing:
 - Ensure the reference audio is clear with minimal background noise
 - Try a longer reference clip (5-10 seconds)
 ### HuggingFace cache read-only error
 If you see errors like `OSError: [Errno 30] Read-only file system` when the bot tries to download the TTS model:
 1. **Set a writable cache directory**: Add to your `.env` file:
   ```env
   HF_HOME=/tmp/huggingface
   ```
 2. **Create and set permissions** on the directory:
   ```bash
   sudo mkdir /tmp/huggingface
   sudo chown -R $USER:$USER /tmp/huggingface
   ```
 3. **If using systemd service**: Ensure the service has write access to `/tmp` or the chosen cache directory. You may need to add `ReadWritePaths=/tmp/huggingface` to the service file or remove `ProtectHome=read-only`.
 4. **Restart the bot**:
   ```bash
   sudo systemctl restart vox.service
   ```
 ## Linux Server Deployment
 To run the bot as a service on a Linux server:
--- a/audio_effects.py
+++ b/audio_effects.py
--- a/audio_preprocessor.py
+++ b/audio_preprocessor.py
--- a/bot.py
+++ b/bot.py
--- a/config.py
+++ b/config.py
--- a/launch.sh
+++ b/launch.sh
@@ -0,0 +1,4 @@
 #!/bin/bash
 cd /home/artanis/Documents/Vox/
 source venv/bin/activate
 python bot.py
--- a/media/Subnautica/CyclopsEngineOff.oga
+++ b/media/Subnautica/CyclopsEngineOff.oga
--- a/media/Subnautica/CyclopsEngineOn.oga
+++ b/media/Subnautica/CyclopsEngineOn.oga
--- a/media/Subnautica/CyclopsOverheat.oga
+++ b/media/Subnautica/CyclopsOverheat.oga
--- a/media/Subnautica/Cyclops_Welcome.oga
+++ b/media/Subnautica/Cyclops_Welcome.oga
--- a/media/Subnautica/Cyclops_Welcome2.oga
+++ b/media/Subnautica/Cyclops_Welcome2.oga
--- a/media/TF2/Ronin/diag_gs_titanRonin_embark_03.wav
+++ b/media/TF2/Ronin/diag_gs_titanRonin_embark_03.wav
--- a/media/TF2/Ronin/diag_gs_titanRonin_embark_05.wav
+++ b/media/TF2/Ronin/diag_gs_titanRonin_embark_05.wav
--- a/media/TF2/Ronin/diag_gs_titanRonin_embark_06.wav
+++ b/media/TF2/Ronin/diag_gs_titanRonin_embark_06.wav
--- a/media/TF2/Ronin/diag_gs_titanRonin_embark_08.wav
+++ b/media/TF2/Ronin/diag_gs_titanRonin_embark_08.wav
--- a/media/TF2/Ronin/diag_gs_titanRonin_embark_09.wav
+++ b/media/TF2/Ronin/diag_gs_titanRonin_embark_09.wav
--- a/media/TF2/Ronin/diag_gs_titanRonin_embark_10.wav
+++ b/media/TF2/Ronin/diag_gs_titanRonin_embark_10.wav
--- a/media/TF2/Ronin/diag_gs_titanRonin_embark_11.wav
+++ b/media/TF2/Ronin/diag_gs_titanRonin_embark_11.wav
--- a/numba_config.py
+++ b/numba_config.py
--- a/pockettts.service
+++ b/pockettts.service
--- a/requirements.txt
+++ b/requirements.txt
--- a/research/overview.md
+++ b/research/overview.md
@@ -0,0 +1,140 @@
 # Vox - Discord Text-to-Speech Bot
 A Python-based Discord bot that generates neural text-to-speech using voice cloning from reference WAV files.
 ## Project Structure
 ```
 Vox/
 ├── bot.py                 # Main entry point, Discord bot implementation
 ├── config.py              # Configuration management using environment variables
 ├── voice_manager.py       # Voice discovery, loading, and user preferences
 ├── audio_effects.py       # Audio post-processing effects (7 effects)
 ├── audio_preprocessor.py  # Audio preprocessing for voice cloning
 ├── numba_config.py        # Numba JIT compiler cache configuration
 ├── requirements.txt       # Python dependencies
 ├── launch.sh              # Shell script to start the bot
 ├── pockettts.service      # Systemd service file for Linux deployment
 ├── README.md             # Comprehensive documentation
 ├── .env                   # Production environment configuration
 ├── .env.testing           # Testing environment configuration
 ├── .env.example           # Environment configuration template
 └── voices/               # Directory for voice WAV files
    ├── preferences.json  # User voice/effect preferences (auto-generated)
    └── *.wav             # Voice reference files
 ```
 ## Core Functionality
 ### TTS Implementation
 - **Engine**: Pocket TTS (`pocket-tts` library) for neural text-to-speech synthesis
 - **Voice Cloning**: Uses reference WAV files to clone voices via `model.get_state_for_audio_prompt()`
 - **On-demand Loading**: Voices are loaded only when first needed, then cached
 ### Discord Integration
 - Monitors a configured text channel for messages
 - Joins the user's voice channel when they speak
 - Uses `discord.FFmpegPCMAudio` with piped WAV data for streaming
 ### Audio Processing Pipeline
 ```
 Text Message → Pocket TTS → Audio Effects → Normalize → FFmpeg → Discord VC
 ```
 ## Dependencies
 | Library | Purpose |
 |---------|---------|
 | `discord.py[voice]>=2.3.0` | Discord bot API with voice support |
 | `pocket-tts>=0.1.0` | Neural TTS engine with voice cloning |
 | `scipy>=1.10.0` | Scientific computing (audio I/O) |
 | `numpy>=1.24.0` | Numerical computing |
 | `librosa>=0.10.0` | Audio analysis and effects |
 | `noisereduce>=3.0.0` | Noise reduction preprocessing |
 | `soundfile>=0.12.0` | Audio file I/O |
 | `python-dotenv>=1.0.0` | Environment variable loading |
 **System Requirements**: Python 3.10+, FFmpeg
 ## Key Modules
 ### `TTSBot` (bot.py)
 Main Discord bot class that extends `commands.Bot`. Handles:
 - Message processing and TTS queue
 - Voice channel connections
 - Slash command registration
 - Startup initialization (loads TTS model, discovers voices)
 ### `VoiceManager` (voice_manager.py)
 Manages voice files and user preferences:
 - Discovers voices from WAV files in `voices/` directory
 - On-demand voice loading with caching
 - Per-user voice selection and effect preferences
 - Preferences persistence to JSON
 ### `AudioEffects` (audio_effects.py)
 Provides 7 post-processing effects:
 1. **Pitch** (-12 to +12 semitones)
 2. **Speed** (0.5x to 2.0x)
 3. **Echo** (0-100%)
 4. **Robot** (0-100%) - Ring modulation
 5. **Chorus** (0-100%) - Multiple voice layering
 6. **Tremolo Depth** (0.0-1.0)
 7. **Tremolo Rate** (0.0-10.0 Hz)
 ### `AudioPreprocessor` (audio_preprocessor.py)
 Prepares voice reference files for cloning:
 1. Load and resample to 22050 Hz
 2. Normalize volume
 3. Trim silence
 4. Noise reduction
 5. Limit length (default 15 seconds)
 ### `Config` (config.py)
 Centralized configuration management with environment-aware loading and validation.
 ## Slash Commands
 | Command | Description |
 |---------|-------------|
 | `/voice list` | Show available voices |
 | `/voice set <name>` | Select your voice |
 | `/voice current` | Show current voice |
 | `/voice refresh` | Rescan for new voices |
 | `/voice preview <name>` | Preview before committing |
 | `/effects list` | Show your effect settings |
 | `/effects set <effect> <value>` | Adjust effects |
 | `/effects reset` | Reset to defaults |
 ## Features
 - **Voice Cloning**: Add new voices by placing `.wav` files in `voices/` directory
 - **Per-User Customization**: Each user can have their own voice and effect preferences
 - **Hot-Reload**: Rescan for new voices without restart (`/voice refresh`)
 - **Message Queue**: Queues messages for sequential playback
 - **Inactivity Management**: Disconnects after 10 minutes of inactivity
 - **Testing Support**: Separate `.env.testing` configuration for safe development
 ## Configuration (.env)
 ```env
 DISCORD_TOKEN=your_bot_token
 TEXT_CHANNEL_ID=channel_id_to_monitor
 VOICES_DIR=./voices
 DEFAULT_VOICE=optional_default_voice_name
 ```
 ## Running the Bot
 ```bash
 # Production
 python bot.py
 # Testing (uses .env.testing)
 python bot.py testing
 # Or use the launch script
 ./launch.sh
 ```
 For production deployment on Linux, a systemd service file (`pockettts.service`) is included.
--- a/voice_manager.py
+++ b/voice_manager.py
--- a/voices/ChoGath.wav
+++ b/voices/ChoGath.wav
--- a/voices/Estinien.wav
+++ b/voices/Estinien.wav
--- a/voices/Gaius.wav
+++ b/voices/Gaius.wav
--- a/voices/Gibralter_funny.wav
+++ b/voices/Gibralter_funny.wav
--- a/voices/Gibralter_good.wav
+++ b/voices/Gibralter_good.wav
--- a/voices/HankHill.wav
+++ b/voices/HankHill.wav
--- a/voices/Johnny.wav
+++ b/voices/Johnny.wav
--- a/voices/MasterChief.wav
+++ b/voices/MasterChief.wav
--- a/voices/SelfHelpSingh.wav
+++ b/voices/SelfHelpSingh.wav
--- a/voices/Trump.wav
+++ b/voices/Trump.wav