"vault backup: 2025-11-12 10:59:08 from Flow"

This commit is contained in:
2025-11-12 10:59:09 +09:00
parent 88e126ee56
commit 333caaccbb
22 changed files with 1101 additions and 3 deletions

View File

@@ -0,0 +1,622 @@
## Project Overview
Building a centralized AI-powered hub that connects personal devices, processes information, and makes intelligent decisions using Google Gemini API. The system manages markdown notes (Obsidian vault), sends reminders, and executes automations across 20-30 personal devices.
## Core Requirements
- **Scale**: Personal use, 20-30 devices maximum
- **Deployment**: Docker Compose on Linux server
- **AI**: Google Gemini API with caching and rate limiting
- **Notes**: Obsidian vault synced via Git (read-write access)
- **Network**: All devices connected via VPN (bidirectional communication)
- **Notifications**: Primary channel is Discord webhooks
- **Integrations**: Home Assistant (optional middleware)
- **Storage**: 7-day data retention for events/logs
## Technology Stack
### Hub (Docker Compose Services)
- **API**: FastAPI (Python)
- **Database**: PostgreSQL 16
- **Vector DB**: ChromaDB (note embeddings)
- **Cache/Queue**: Redis 7
- **Worker**: Background task processor
### Device Agents
- **Language**: Python (cross-platform)
- **Deployment**:
- Linux: systemd service
- Windows: Task Scheduler / Windows Service
- Mobile: Termux (Android) / Shortcuts (iOS)
### External Services
- **AI**: Google Gemini API
- **Notifications**: Discord webhooks
- **Home Automation**: Home Assistant REST API
- **Code Repository**: Gitea (local, for agent updates)
- **Version Control**: Git (Obsidian vault sync)
## Architecture Decisions
### Note Management
- Hub has **read-write** access to Obsidian vault
- Git workflow: pull → process → commit → push
- Reminder lines **deleted** after processing (clean removal)
- Git conflicts trigger Discord alerts for manual resolution
### Device Communication
- Devices authenticate with API keys
- Agents report events and queue locally if hub offline
- Hub can send commands to devices (predefined command set)
- Agent state is authoritative over hub's cached state
### AI Integration (Gemini)
- 24-hour cache for similar queries (Redis)
- Rate limit: 100 requests/hour
- Circuit breaker: 5 failures = 15min pause
- Fallback: Simple regex parsing if Gemini unavailable
### Data Management
- Events/logs: 7-day retention, auto-cleanup
- Completed reminders: Delete after processing
- Agent timezones: Translate to hub timezone
- Incremental note indexing using `git diff`
### Automation Rules
- YAML configuration files (version controlled)
- Home Assistant handles repeatable triggers
- Hub handles one-off events and AI decisions
## Project Structure
```
personal-ai-hub/
├── AGENTS.md # This file - update as you progress
├── README.md # User-facing documentation
├── docker-compose.yml # Service orchestration
├── .env.example # Environment variables template
├── .gitignore
├── hub/
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── main.py # FastAPI application entry point
│ ├── worker.py # Background task processor
│ ├── models.py # Database models (SQLAlchemy)
│ ├── schemas.py # Pydantic schemas for API
│ ├── config.yaml.example # Automation rules template
│ ├── alembic/ # Database migrations
│ │ ├── alembic.ini
│ │ └── versions/
│ ├── api/
│ │ ├── __init__.py
│ │ ├── devices.py # Device registration endpoints
│ │ ├── events.py # Event submission/retrieval
│ │ ├── reminders.py # Reminder management
│ │ ├── webhooks.py # Webhook endpoints
│ │ ├── health.py # Health check endpoint
│ │ └── admin.py # Admin/management endpoints
│ ├── services/
│ │ ├── __init__.py
│ │ ├── gemini.py # Gemini API client
│ │ ├── notes.py # Markdown parser & git operations
│ │ ├── reminders.py # Reminder scheduling logic
│ │ ├── webhooks.py # Discord/webhook sender
│ │ ├── home_assistant.py # HA integration
│ │ ├── automation.py # YAML rule engine
│ │ ├── vector_store.py # ChromaDB interface
│ │ └── cache.py # Redis caching layer
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── auth.py # API key authentication
│ │ ├── logging.py # Structured logging
│ │ └── timezone.py # Timezone conversion
│ └── tests/
│ ├── test_api.py
│ ├── test_notes.py
│ ├── test_reminders.py
│ └── test_automation.py
├── agent/
│ ├── agent.py # Main agent script
│ ├── requirements.txt
│ ├── config.example.json # Agent configuration template
│ ├── version.txt # Current agent version
│ ├── collectors/
│ │ ├── __init__.py
│ │ ├── system_metrics.py # CPU, memory, disk
│ │ ├── application.py # Running apps, active window
│ │ └── custom.py # User-defined collectors
│ ├── executors/
│ │ ├── __init__.py
│ │ └── commands.py # Command execution handlers
│ ├── install/
│ │ ├── install.sh # Linux installation script
│ │ ├── install.ps1 # Windows installation script
│ │ ├── systemd/
│ │ │ └── hub-agent.service
│ │ └── windows/
│ │ └── task-scheduler.xml
│ └── tests/
│ └── test_agent.py
├── docs/
│ ├── setup.md # Initial setup guide
│ ├── api.md # API documentation
│ ├── automation-guide.md # Writing automation rules
│ ├── agent-installation.md # Device agent setup
│ └── troubleshooting.md # Common issues
├── scripts/
│ ├── backup.sh # Database backup script
│ ├── cleanup.sh # Manual data cleanup
│ └── init-vault.sh # Initialize test vault
└── config/
├── automation-rules.yaml # Default automation rules
└── device-whitelist.yaml # Optional device restrictions
```
---
## Development Phases
### Phase 1: Foundation ⬜ NOT STARTED
**Goal**: Basic infrastructure running with device registration and health monitoring
**Tasks**:
- [ ] Create `docker-compose.yml` with all services (postgres, redis, chromadb, hub-api, worker)
- [ ] Create `.env.example` with all required environment variables
- [ ] Set up FastAPI application structure in `hub/main.py`
- [ ] Define database models in `hub/models.py` (Device, Event, Reminder tables)
- [ ] Create Alembic migration for initial schema
- [ ] Implement `/health` endpoint showing system status
- [ ] Implement device registration endpoint (`POST /devices/register`)
- [ ] Implement API key authentication middleware
- [ ] Set up structured JSON logging
- [ ] Create basic README.md with quickstart instructions
**Acceptance Criteria**:
- [ ] `docker-compose up` successfully starts all services
- [ ] Can register a device and receive API key
- [ ] `/health` endpoint returns status of all components
- [ ] Logs are structured and readable
- [ ] Database migrations apply cleanly
**Files to Create**:
- `docker-compose.yml`
- `.env.example`
- `hub/main.py`
- `hub/models.py`
- `hub/api/devices.py`
- `hub/api/health.py`
- `hub/utils/auth.py`
- `hub/utils/logging.py`
- `hub/alembic/versions/001_initial_schema.py`
- `README.md`
**Progress**: 0/10 tasks complete
---
### Phase 2: Device Agent ⬜ NOT STARTED
**Goal**: Cross-platform agent that reports to hub and executes commands
**Tasks**:
- [ ] Create `agent/agent.py` main script
- [ ] Implement heartbeat mechanism (report every 5 minutes)
- [ ] Implement system metrics collection (CPU, memory, disk)
- [ ] Implement event queue for offline operation
- [ ] Create device command execution framework
- [ ] Add auto-update check on startup (version endpoint)
- [ ] Create Linux systemd service file
- [ ] Create Windows Task Scheduler XML
- [ ] Write `install.sh` for Linux
- [ ] Write `install.ps1` for PowerShell
- [ ] Document agent configuration format
**Acceptance Criteria**:
- [ ] Agent successfully registers with hub on first run
- [ ] Agent sends heartbeat every 5 minutes
- [ ] Agent queues events when hub is unreachable
- [ ] Agent can execute basic commands from hub
- [ ] Agent installs as service on Linux
- [ ] Agent installs as scheduled task on Windows
- [ ] Agent checks for updates on startup
**Files to Create**:
- `agent/agent.py`
- `agent/config.example.json`
- `agent/collectors/system_metrics.py`
- `agent/executors/commands.py`
- `agent/install/install.sh`
- `agent/install/install.ps1`
- `agent/install/systemd/hub-agent.service`
- `docs/agent-installation.md`
- `hub/api/devices.py` (add version endpoint)
**Progress**: 0/11 tasks complete
---
### Phase 3: Notes & Reminders ⬜ NOT STARTED
**Goal**: Parse markdown notes, extract reminders, send Discord notifications
**Tasks**:
- [ ] Implement git operations in `services/notes.py` (pull, commit, push)
- [ ] Create markdown parser for reminder syntax
- [ ] Implement reminder extraction with date parsing
- [ ] Set up ChromaDB for note embeddings
- [ ] Create note indexing worker (incremental via git diff)
- [ ] Implement reminder scheduler (checks every minute)
- [ ] Create Discord webhook sender
- [ ] Implement reminder deletion from markdown files
- [ ] Add git conflict detection and Discord alerts
- [ ] Create reminder management API endpoints
- [ ] Add error notification for malformed reminder syntax
- [ ] Write tests for reminder parsing
**Reminder Syntax to Support**:
```markdown
@remind 2024-11-15 Review proposal
@remind in 3 days Check on project
@remind daily at 09:00 Stand-up meeting
```
**Acceptance Criteria**:
- [ ] Hub can read notes from mounted Obsidian vault
- [ ] Hub detects new/modified notes via git diff
- [ ] Reminders are correctly parsed from markdown
- [ ] Scheduled reminders trigger at correct time
- [ ] Discord webhook delivers notification
- [ ] Processed reminder line is deleted from note
- [ ] Git commits and pushes changes successfully
- [ ] Git conflicts are detected and alerted
**Files to Create**:
- `hub/services/notes.py`
- `hub/services/reminders.py`
- `hub/services/webhooks.py`
- `hub/services/vector_store.py`
- `hub/api/reminders.py`
- `hub/worker.py` (reminder scheduler)
- `hub/models.py` (add Reminder table)
- `hub/tests/test_notes.py`
- `hub/tests/test_reminders.py`
- `docs/automation-guide.md` (reminder syntax section)
**Progress**: 0/12 tasks complete
---
### Phase 4: AI Integration ⬜ NOT STARTED
**Goal**: Gemini API integration with caching, rate limiting, and semantic search
**Tasks**:
- [ ] Implement Gemini API client in `services/gemini.py`
- [ ] Set up Redis caching layer (24hr TTL)
- [ ] Implement rate limiting (100 req/hour)
- [ ] Implement circuit breaker (5 failures = 15min pause)
- [ ] Create fallback regex-based reminder parser
- [ ] Implement note embedding generation
- [ ] Create semantic search over notes using ChromaDB
- [ ] Add cost tracking (log token usage)
- [ ] Enhance reminder parsing with natural language support
- [ ] Create Gemini health check for monitoring
- [ ] Add Gemini context builder (device states + notes)
- [ ] Write tests for Gemini integration
**Acceptance Criteria**:
- [ ] Gemini API successfully processes queries
- [ ] Responses are cached and reused appropriately
- [ ] Rate limiting prevents quota exhaustion
- [ ] Circuit breaker triggers after repeated failures
- [ ] System falls back to regex parsing when Gemini down
- [ ] Natural language dates parsed correctly ("next Tuesday")
- [ ] Semantic search returns relevant notes
- [ ] Token usage is logged for cost monitoring
**Files to Create**:
- `hub/services/gemini.py`
- `hub/services/cache.py`
- `hub/utils/circuit_breaker.py`
- `hub/tests/test_gemini.py`
- Update `hub/services/notes.py` (add embeddings)
- Update `hub/services/reminders.py` (add NLP parsing)
**Progress**: 0/12 tasks complete
---
### Phase 5: Automation Engine ⬜ NOT STARTED
**Goal**: YAML-based rules that trigger actions based on device events
**Tasks**:
- [ ] Create YAML rule schema definition
- [ ] Implement rule parser in `services/automation.py`
- [ ] Create rule evaluation engine
- [ ] Implement trigger matching (device, event, time)
- [ ] Implement condition evaluation
- [ ] Implement action execution (webhook, command, search)
- [ ] Add Home Assistant REST API client
- [ ] Create bidirectional HA webhook integration
- [ ] Add dry-run mode for testing rules
- [ ] Create rule management API endpoints
- [ ] Write automation guide documentation
- [ ] Write tests for automation engine
**Example Rule Format**:
```yaml
rules:
- name: "Evening work reminder"
trigger:
device: "laptop"
event: "work_apps_closed"
time_after: "17:00"
action:
type: "search_notes"
query: "today's todos"
notify: discord
enabled: true
```
**Acceptance Criteria**:
- [ ] YAML rules load correctly from config file
- [ ] Rules trigger based on device events
- [ ] Time-based conditions work correctly
- [ ] Actions execute successfully (Discord, HA, device commands)
- [ ] Home Assistant can trigger hub via webhook
- [ ] Hub can trigger HA automations via REST
- [ ] Dry-run mode shows what would happen without executing
- [ ] Invalid rules are caught with helpful errors
**Files to Create**:
- `hub/services/automation.py`
- `hub/services/home_assistant.py`
- `hub/api/automations.py`
- `config/automation-rules.yaml`
- `docs/automation-guide.md`
- `hub/tests/test_automation.py`
**Progress**: 0/12 tasks complete
---
### Phase 6: Agent Auto-Update ⬜ NOT STARTED
**Goal**: Agents automatically update from Gitea repository
**Tasks**:
- [ ] Create agent version endpoint in hub API
- [ ] Implement version checking in agent
- [ ] Create agent download endpoint (proxy to Gitea)
- [ ] Implement agent self-update logic (download, backup, replace, restart)
- [ ] Add rollback mechanism (keep last working version)
- [ ] Create agent release workflow documentation
- [ ] Test update on Linux
- [ ] Test update on Windows
- [ ] Add update notification to Discord
- [ ] Create version tracking in hub database
**Acceptance Criteria**:
- [ ] Agent checks version on startup
- [ ] Agent downloads new version when available
- [ ] Agent backs up current version before updating
- [ ] Agent restarts after successful update
- [ ] Agent can rollback to previous version on failure
- [ ] Update process works on both Linux and Windows
- [ ] Hub tracks which agents are on which versions
**Files to Create**:
- `hub/api/admin.py` (version endpoints)
- Update `agent/agent.py` (add update logic)
- `scripts/release-agent.sh` (helper for releases)
- `docs/agent-installation.md` (update section)
**Progress**: 0/10 tasks complete
---
### Phase 7: Monitoring & Polish ⬜ NOT STARTED
**Goal**: Observability, documentation, and production readiness
**Tasks**:
- [ ] Create simple web dashboard (device grid, event stream)
- [ ] Enhance `/health` endpoint with detailed metrics
- [ ] Implement data retention cleanup job (7 days)
- [ ] Add database backup script
- [ ] Create troubleshooting documentation
- [ ] Write comprehensive API documentation
- [ ] Add Prometheus metrics endpoint (optional)
- [ ] Set up critical alerts (hub down, device offline >1hr)
- [ ] Create testing guide
- [ ] Final end-to-end testing across all components
- [ ] Performance testing with 30 devices
- [ ] Security audit (API keys, git credentials, etc.)
**Acceptance Criteria**:
- [ ] Dashboard shows real-time system status
- [ ] Old data automatically cleaned up after 7 days
- [ ] Database backups run automatically
- [ ] All documentation is complete and accurate
- [ ] Critical alerts deliver to Discord
- [ ] System handles 30 concurrent device connections
- [ ] No security vulnerabilities in authentication/authorization
**Files to Create**:
- `hub/api/dashboard.py` (simple UI endpoints)
- `scripts/backup.sh`
- `scripts/cleanup.sh`
- `docs/troubleshooting.md`
- `docs/api.md`
- `docs/testing.md`
- Update `README.md` (comprehensive)
**Progress**: 0/12 tasks complete
---
## Current Status
**Overall Progress**: 0% (0/7 phases complete)
**Current Phase**: Phase 1 - Foundation
**Blockers**: None
**Notes**:
- Project planning complete, ready to begin implementation
- All architecture decisions finalized
- Development environment ready (Linux server, Docker, Gitea)
---
## Instructions for Agents
### How to Use This File
1. **Start with Phase 1** and work sequentially through phases
2. **Check off tasks** as you complete them using `[x]`
3. **Update progress** counters (e.g., "3/10 tasks complete")
4. **Mark phases** as complete when all tasks done: ⬜ → 🟡 → ✅
- ⬜ NOT STARTED
- 🟡 IN PROGRESS
- ✅ COMPLETE
5. **Update "Current Status"** section with your progress
6. **Add notes** in "Blockers" or "Notes" if you encounter issues
7. **Commit changes** to this file after each work session
### Phase Status Icons
Use these when updating phase headers:
- ⬜ NOT STARTED - No work begun on this phase
- 🟡 IN PROGRESS - At least one task started
- ✅ COMPLETE - All tasks finished and acceptance criteria met
### Before Starting a Phase
1. Read through all tasks and acceptance criteria
2. Review the files to create
3. Check dependencies on previous phases
4. Update phase status to 🟡 IN PROGRESS
### When Completing a Task
1. Mark the task checkbox: `- [x]`
2. Update the progress counter
3. Commit the code changes
4. Update this file
### When Completing a Phase
1. Verify all acceptance criteria are met
2. Mark phase as ✅ COMPLETE
3. Update overall progress percentage
4. Move to next phase
5. Add any lessons learned in Notes section
### Git Commit Messages
Use conventional commits format:
```
feat(phase1): implement device registration endpoint
fix(phase3): correct reminder date parsing
docs: update AGENTS.md progress
test(phase2): add agent heartbeat tests
```
### Testing Requirements
- Write unit tests for core logic
- Write integration tests for API endpoints
- Test happy path and error cases
- Run tests before marking phase complete
### Documentation Requirements
- Update relevant docs when adding features
- Include code examples in documentation
- Keep API docs in sync with implementation
- Document configuration options
---
## Environment Setup
### Required Environment Variables
```bash
# Gemini API
GEMINI_API_KEY=your_key_here
# Database
DATABASE_URL=postgresql://user:pass@db:5432/hub
REDIS_URL=redis://redis:6379
# Git Configuration (for notes)
GIT_USER_NAME="Personal AI Hub"
GIT_USER_EMAIL="hub@yourdomain.local"
GIT_REMOTE_URL=https://gitea.local/user/obsidian-vault.git
# Home Assistant
HOME_ASSISTANT_URL=http://homeassistant.local:8123
HOME_ASSISTANT_TOKEN=your_ha_token
# Discord
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/...
# Agent Updates
GITEA_URL=http://gitea.local
GITEA_TOKEN=your_gitea_token
# Security
API_SECRET_KEY=generate_random_key_here
# Paths
NOTES_PATH=/app/notes
NOTES_ALLOWED_FOLDERS=reminders,tasks,projects
```
### Development Tools Needed
- Docker & Docker Compose
- Python 3.11+
- Git
- Text editor / IDE
- curl or Postman (API testing)
- Access to: Gemini API, Discord webhook, Gitea instance
---
## Helpful Resources
### API Clients
- **FastAPI Docs**: https://fastapi.tiangolo.com/
- **Gemini API**: https://ai.google.dev/docs
- **Discord Webhooks**: https://discord.com/developers/docs/resources/webhook
### Libraries
- **SQLAlchemy**: https://docs.sqlalchemy.org/
- **ChromaDB**: https://docs.trychroma.com/
- **Redis-py**: https://redis-py.readthedocs.io/
- **GitPython**: https://gitpython.readthedocs.io/
### Patterns
- **Structured Logging**: Use JSON format with correlation IDs
- **Error Handling**: Always log, alert on critical errors
- **API Design**: RESTful, versioned endpoints
- **Testing**: Pytest with fixtures for database/API
---
## Contact & Support
If you encounter issues or need clarification:
1. Check troubleshooting doc
2. Ask for help