docs: add HuggingFace cache troubleshooting to README

- Document HF_HOME environment variable for writable cache - Add systemd service permission guidance for /tmp paths - Troubleshooting steps for read-only file system errors
2026-02-26 15:56:09 -06:00
parent 85a334a57b
commit 9917d44f5d
36 changed files with 168 additions and 0 deletions
--- a/research/overview.md
+++ b/research/overview.md
@@ -0,0 +1,140 @@
+# Vox - Discord Text-to-Speech Bot
+
+A Python-based Discord bot that generates neural text-to-speech using voice cloning from reference WAV files.
+
+## Project Structure
+
+```
+Vox/
+├── bot.py                 # Main entry point, Discord bot implementation
+├── config.py              # Configuration management using environment variables
+├── voice_manager.py       # Voice discovery, loading, and user preferences
+├── audio_effects.py       # Audio post-processing effects (7 effects)
+├── audio_preprocessor.py  # Audio preprocessing for voice cloning
+├── numba_config.py        # Numba JIT compiler cache configuration
+├── requirements.txt       # Python dependencies
+├── launch.sh              # Shell script to start the bot
+├── pockettts.service      # Systemd service file for Linux deployment
+├── README.md             # Comprehensive documentation
+├── .env                   # Production environment configuration
+├── .env.testing           # Testing environment configuration
+├── .env.example           # Environment configuration template
+└── voices/               # Directory for voice WAV files
+    ├── preferences.json  # User voice/effect preferences (auto-generated)
+    └── *.wav             # Voice reference files
+```
+
+## Core Functionality
+
+### TTS Implementation
+- **Engine**: Pocket TTS (`pocket-tts` library) for neural text-to-speech synthesis
+- **Voice Cloning**: Uses reference WAV files to clone voices via `model.get_state_for_audio_prompt()`
+- **On-demand Loading**: Voices are loaded only when first needed, then cached
+
+### Discord Integration
+- Monitors a configured text channel for messages
+- Joins the user's voice channel when they speak
+- Uses `discord.FFmpegPCMAudio` with piped WAV data for streaming
+
+### Audio Processing Pipeline
+```
+Text Message → Pocket TTS → Audio Effects → Normalize → FFmpeg → Discord VC
+```
+
+## Dependencies
+
+| Library | Purpose |
+|---------|---------|
+| `discord.py[voice]>=2.3.0` | Discord bot API with voice support |
+| `pocket-tts>=0.1.0` | Neural TTS engine with voice cloning |
+| `scipy>=1.10.0` | Scientific computing (audio I/O) |
+| `numpy>=1.24.0` | Numerical computing |
+| `librosa>=0.10.0` | Audio analysis and effects |
+| `noisereduce>=3.0.0` | Noise reduction preprocessing |
+| `soundfile>=0.12.0` | Audio file I/O |
+| `python-dotenv>=1.0.0` | Environment variable loading |
+
+**System Requirements**: Python 3.10+, FFmpeg
+
+## Key Modules
+
+### `TTSBot` (bot.py)
+Main Discord bot class that extends `commands.Bot`. Handles:
+- Message processing and TTS queue
+- Voice channel connections
+- Slash command registration
+- Startup initialization (loads TTS model, discovers voices)
+
+### `VoiceManager` (voice_manager.py)
+Manages voice files and user preferences:
+- Discovers voices from WAV files in `voices/` directory
+- On-demand voice loading with caching
+- Per-user voice selection and effect preferences
+- Preferences persistence to JSON
+
+### `AudioEffects` (audio_effects.py)
+Provides 7 post-processing effects:
+1. **Pitch** (-12 to +12 semitones)
+2. **Speed** (0.5x to 2.0x)
+3. **Echo** (0-100%)
+4. **Robot** (0-100%) - Ring modulation
+5. **Chorus** (0-100%) - Multiple voice layering
+6. **Tremolo Depth** (0.0-1.0)
+7. **Tremolo Rate** (0.0-10.0 Hz)
+
+### `AudioPreprocessor` (audio_preprocessor.py)
+Prepares voice reference files for cloning:
+1. Load and resample to 22050 Hz
+2. Normalize volume
+3. Trim silence
+4. Noise reduction
+5. Limit length (default 15 seconds)
+
+### `Config` (config.py)
+Centralized configuration management with environment-aware loading and validation.
+
+## Slash Commands
+
+| Command | Description |
+|---------|-------------|
+| `/voice list` | Show available voices |
+| `/voice set <name>` | Select your voice |
+| `/voice current` | Show current voice |
+| `/voice refresh` | Rescan for new voices |
+| `/voice preview <name>` | Preview before committing |
+| `/effects list` | Show your effect settings |
+| `/effects set <effect> <value>` | Adjust effects |
+| `/effects reset` | Reset to defaults |
+
+## Features
+
+- **Voice Cloning**: Add new voices by placing `.wav` files in `voices/` directory
+- **Per-User Customization**: Each user can have their own voice and effect preferences
+- **Hot-Reload**: Rescan for new voices without restart (`/voice refresh`)
+- **Message Queue**: Queues messages for sequential playback
+- **Inactivity Management**: Disconnects after 10 minutes of inactivity
+- **Testing Support**: Separate `.env.testing` configuration for safe development
+
+## Configuration (.env)
+
+```env
+DISCORD_TOKEN=your_bot_token
+TEXT_CHANNEL_ID=channel_id_to_monitor
+VOICES_DIR=./voices
+DEFAULT_VOICE=optional_default_voice_name
+```
+
+## Running the Bot
+
+```bash
+# Production
+python bot.py
+
+# Testing (uses .env.testing)
+python bot.py testing
+
+# Or use the launch script
+./launch.sh
+```
+
+For production deployment on Linux, a systemd service file (`pockettts.service`) is included.