Initial commit: Sentry-Emote system monitor
- Aggregator: Flask-based event broker with priority queue - Frontend: OLED-optimized UI with animations - Detectors: disk, cpu, memory, service, network - Unified entry point (sentry.py) with process management - Heartbeat TTL system for auto-clearing stale events Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
21
.gitignore
vendored
Normal file
21
.gitignore
vendored
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
# Virtual environment
|
||||||
|
venv/
|
||||||
|
|
||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
*.egg-info/
|
||||||
|
|
||||||
|
# Runtime files
|
||||||
|
status.json
|
||||||
|
|
||||||
|
# IDE
|
||||||
|
.vscode/
|
||||||
|
.idea/
|
||||||
|
|
||||||
|
# OS
|
||||||
|
.DS_Store
|
||||||
|
Thumbs.db
|
||||||
|
|
||||||
|
# Claude Code
|
||||||
|
.claude/
|
||||||
121
CLAUDE.md
Normal file
121
CLAUDE.md
Normal file
@@ -0,0 +1,121 @@
|
|||||||
|
# CLAUDE.md
|
||||||
|
|
||||||
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||||||
|
|
||||||
|
## Project Overview
|
||||||
|
|
||||||
|
Sentry-Emote is a minimalist system status monitor designed for an old Pixel phone used as an ambient display. It uses ASCII "emotes" to represent system health instead of complex graphs.
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
**Publisher/Subscriber model:**
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────┐ POST /event ┌─────────────┐ GET /status ┌─────────────┐
|
||||||
|
│ Detectors │ ──────────────────▶ │ Aggregator │ ◀────────────────── │ Emote-UI │
|
||||||
|
│ (sensors) │ │ (broker) │ │ (display) │
|
||||||
|
└─────────────┘ └─────────────┘ └─────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
- **Aggregator** (`aggregator.py`) — Flask service managing the event queue and priority logic
|
||||||
|
- **Detectors** (`detectors/*.py`) — Independent scripts monitoring system metrics
|
||||||
|
- **Emote-UI** (`index.html`) — OLED-optimized web frontend
|
||||||
|
- **Sentry** (`sentry.py`) — Unified entry point managing all processes
|
||||||
|
|
||||||
|
## Quick Start
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Setup
|
||||||
|
python -m venv venv
|
||||||
|
source venv/bin/activate # or .\venv\Scripts\activate on Windows
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
# Run everything
|
||||||
|
python sentry.py
|
||||||
|
```
|
||||||
|
|
||||||
|
UI available at http://localhost:5000
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
Edit `config.json` to configure the aggregator URL, enable/disable detectors, and set thresholds.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"aggregator_url": "http://localhost:5000",
|
||||||
|
"aggregator": { "script": "aggregator.py" },
|
||||||
|
"detectors": [
|
||||||
|
{
|
||||||
|
"name": "cpu",
|
||||||
|
"enabled": true,
|
||||||
|
"script": "detectors/cpu.py",
|
||||||
|
"env": {
|
||||||
|
"CHECK_INTERVAL": "30",
|
||||||
|
"THRESHOLD_WARNING": "85",
|
||||||
|
"THRESHOLD_CRITICAL": "95"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Detectors
|
||||||
|
|
||||||
|
| Detector | Script | Required Env Vars |
|
||||||
|
|----------|--------|-------------------|
|
||||||
|
| Disk Space | `detectors/disk_space.py` | — |
|
||||||
|
| CPU | `detectors/cpu.py` | — |
|
||||||
|
| Memory | `detectors/memory.py` | — |
|
||||||
|
| Service | `detectors/service.py` | `SERVICES` (comma-separated process names) |
|
||||||
|
| Network | `detectors/network.py` | `HOSTS` (comma-separated hostnames/IPs) |
|
||||||
|
|
||||||
|
All detectors support: `AGGREGATOR_URL`, `CHECK_INTERVAL`, `THRESHOLD_WARNING`, `THRESHOLD_CRITICAL`
|
||||||
|
|
||||||
|
## API Endpoints
|
||||||
|
|
||||||
|
- `POST /event` — Register event: `{"id": "name", "priority": 1-4, "message": "optional", "ttl": optional_seconds}`
|
||||||
|
- `POST /clear` — Clear event: `{"id": "name"}`
|
||||||
|
- `GET /status` — Current state JSON
|
||||||
|
- `GET /events` — List active events
|
||||||
|
|
||||||
|
## Priority System
|
||||||
|
|
||||||
|
Lower number = higher priority. Events with a `ttl` auto-expire (heartbeat pattern).
|
||||||
|
|
||||||
|
| Priority | State | Emote | Color | Behavior |
|
||||||
|
|----------|----------|----------|--------|----------|
|
||||||
|
| 1 | Critical | `( x_x)` | Red | Shaking animation |
|
||||||
|
| 2 | Warning | `( o_o)` | Yellow | Breathing animation |
|
||||||
|
| 3 | Notify | `( 'o')` | Blue | Popping animation, 10s default TTL |
|
||||||
|
| 4 | Optimal | `( ^_^)` | Green | Default when no events |
|
||||||
|
|
||||||
|
## Testing Events
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Warning with 30s TTL
|
||||||
|
curl -X POST -H "Content-Type: application/json" \
|
||||||
|
-d '{"id":"test","priority":2,"message":"Test warning","ttl":30}' \
|
||||||
|
http://localhost:5000/event
|
||||||
|
|
||||||
|
# Clear manually
|
||||||
|
curl -X POST -H "Content-Type: application/json" \
|
||||||
|
-d '{"id":"test"}' \
|
||||||
|
http://localhost:5000/clear
|
||||||
|
```
|
||||||
|
|
||||||
|
## File Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
├── sentry.py # Unified entry point
|
||||||
|
├── aggregator.py # Event broker/API server
|
||||||
|
├── index.html # OLED-optimized frontend
|
||||||
|
├── config.json # Runtime configuration
|
||||||
|
├── detectors/
|
||||||
|
│ ├── disk_space.py
|
||||||
|
│ ├── cpu.py
|
||||||
|
│ ├── memory.py
|
||||||
|
│ ├── service.py
|
||||||
|
│ └── network.py
|
||||||
|
├── requirements.txt
|
||||||
|
└── SPEC.md # Original project specification
|
||||||
|
```
|
||||||
79
SPEC.md
Normal file
79
SPEC.md
Normal file
@@ -0,0 +1,79 @@
|
|||||||
|
# SPEC.md: Project "Sentry-Emote"
|
||||||
|
|
||||||
|
## 1. Overview
|
||||||
|
|
||||||
|
**Purpose:** Repurpose an old Pixel phone (OLED screen) as an ambient, glanceable system status monitor for a home server.
|
||||||
|
**Design Philosophy:** Minimalist, binary-state, and high-signal. Use an "Emote" (ASCII/Emoji) to represent system health instead of complex graphs.
|
||||||
|
**Target Device:** Android Pixel (accessed via Fully Kiosk Browser).
|
||||||
|
|
||||||
|
## 2. System Architecture
|
||||||
|
|
||||||
|
The system follows a decoupled **Publisher/Subscriber** model to ensure extensibility.
|
||||||
|
|
||||||
|
- **Aggregator (The Broker):** A central Python service running on the server. It manages the event queue and generates the state.
|
||||||
|
- **Detectors (The Publishers):** Independent scripts (Python, Bash, etc.) that monitor specific system metrics and "hook" into the Aggregator.
|
||||||
|
- **Emote-UI (The Subscriber):** A mobile-optimized web frontend that displays the current highest-priority emote.
|
||||||
|
|
||||||
|
## 3. Data Specification
|
||||||
|
|
||||||
|
### 3.1 `status.json` (State Registry)
|
||||||
|
|
||||||
|
The Aggregator outputs this file every time the state changes.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"current_state": "optimal",
|
||||||
|
"active_emote": "( ^_^)",
|
||||||
|
"color": "#00FF00",
|
||||||
|
"animation": "breathing",
|
||||||
|
"message": "All systems nominal",
|
||||||
|
"active_events": [
|
||||||
|
{
|
||||||
|
"id": "disk_check",
|
||||||
|
"priority": 4,
|
||||||
|
"message": "Disk 40% full"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"last_updated": "2026-02-02T17:30:00"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3.2 Priority Hierarchy
|
||||||
|
|
||||||
|
| Level | Name | Priority | Emote | Color | Logic |
|
||||||
|
| ----- | ------------ | --------- | -------- | ------ | ---------------------------------------- |
|
||||||
|
| **1** | **Critical** | Emergency | `( x_x)` | Red | Overrules all. Manual clear required. |
|
||||||
|
| **2** | **Warning** | Caution | `( o_o)` | Yellow | Overrules Optimal. Auto-clears if fixed. |
|
||||||
|
| **3** | **Notify** | Event | `( 'o')` | Blue | Transient. TTL (Time To Live) of 10s. |
|
||||||
|
| **4** | **Optimal** | Default | `( ^_^)` | Green | Active when no other events exist. |
|
||||||
|
|
||||||
|
## 4. Component Requirements
|
||||||
|
|
||||||
|
### 4.1 Aggregator (`aggregator.py`)
|
||||||
|
|
||||||
|
- **Event Bus:** Accept HTTP POST requests or watch a specific file/directory for new event signals.
|
||||||
|
- **State Management:** Maintain a list of "Active Events."
|
||||||
|
- **TTL Logic:** Automatically remove Priority 3 events after 10 seconds.
|
||||||
|
- **Deduplication:** If multiple events exist, always select the one with the lowest priority number for the `active_emote` field.
|
||||||
|
|
||||||
|
### 4.2 Emote-UI (`index.html`)
|
||||||
|
|
||||||
|
- **OLED Optimization:** Pure black background (`#000000`).
|
||||||
|
- **Glanceability:** Massive centered text for the emote.
|
||||||
|
- **Animations:** - `breathing`: Slow opacity/scale pulse.
|
||||||
|
- `shaking`: Rapid X-axis jitter for Critical.
|
||||||
|
- `popping`: Scale-up effect for Notifications.
|
||||||
|
|
||||||
|
- **Refresh:** Long-polling or `setInterval` every 2 seconds.
|
||||||
|
|
||||||
|
### 4.3 Extensibility (The Hook System)
|
||||||
|
|
||||||
|
- New detectors must be able to send an event to the Aggregator without modifying the core code.
|
||||||
|
- Example Detector Hook: `curl -X POST -d '{"id":"ssh","priority":1}' http://localhost:5000/event`
|
||||||
|
|
||||||
|
## 5. Implementation Roadmap
|
||||||
|
|
||||||
|
1. **Phase 1:** Build the `aggregator.py` with basic JSON output.
|
||||||
|
2. **Phase 2:** Build the OLED-friendly `index.html` frontend.
|
||||||
|
3. **Phase 3:** Create the first "Detector" (e.g., a simple disk space checker).
|
||||||
|
4. **Phase 4:** Implement TTL for transient notifications.
|
||||||
177
aggregator.py
Normal file
177
aggregator.py
Normal file
@@ -0,0 +1,177 @@
|
|||||||
|
"""
|
||||||
|
Sentry-Emote Aggregator
|
||||||
|
A lightweight event broker that manages priority-based system status.
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import threading
|
||||||
|
import time
|
||||||
|
from datetime import datetime
|
||||||
|
from pathlib import Path
|
||||||
|
from flask import Flask, request, jsonify, send_from_directory
|
||||||
|
|
||||||
|
app = Flask(__name__, static_folder=".")
|
||||||
|
ROOT_DIR = Path(__file__).parent
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
STATUS_FILE = Path(__file__).parent / "status.json"
|
||||||
|
DEFAULT_NOTIFY_TTL = 10 # Default TTL for Priority 3 (Notify) events
|
||||||
|
|
||||||
|
# Priority definitions
|
||||||
|
PRIORITY_CONFIG = {
|
||||||
|
1: {"name": "Critical", "emote": "( x_x)", "color": "#FF0000", "animation": "shaking"},
|
||||||
|
2: {"name": "Warning", "emote": "( o_o)", "color": "#FFFF00", "animation": "breathing"},
|
||||||
|
3: {"name": "Notify", "emote": "( 'o')", "color": "#0088FF", "animation": "popping"},
|
||||||
|
4: {"name": "Optimal", "emote": "( ^_^)", "color": "#00FF00", "animation": "breathing"},
|
||||||
|
}
|
||||||
|
|
||||||
|
# Thread-safe event storage
|
||||||
|
events_lock = threading.Lock()
|
||||||
|
active_events = {} # id -> {priority, message, timestamp, ttl}
|
||||||
|
|
||||||
|
|
||||||
|
def get_current_state():
|
||||||
|
"""Determine current state based on active events."""
|
||||||
|
with events_lock:
|
||||||
|
if not active_events:
|
||||||
|
priority = 4
|
||||||
|
events_list = []
|
||||||
|
else:
|
||||||
|
# Find highest priority (lowest number)
|
||||||
|
priority = min(e["priority"] for e in active_events.values())
|
||||||
|
events_list = [
|
||||||
|
{"id": eid, "priority": e["priority"], "message": e.get("message", "")}
|
||||||
|
for eid, e in active_events.items()
|
||||||
|
]
|
||||||
|
|
||||||
|
config = PRIORITY_CONFIG[priority]
|
||||||
|
return {
|
||||||
|
"current_state": config["name"].lower(),
|
||||||
|
"active_emote": config["emote"],
|
||||||
|
"color": config["color"],
|
||||||
|
"animation": config["animation"],
|
||||||
|
"message": config["name"] if priority == 4 else f"{config['name']} state active",
|
||||||
|
"active_events": sorted(events_list, key=lambda x: x["priority"]),
|
||||||
|
"last_updated": datetime.now().isoformat(timespec="seconds"),
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def write_status():
|
||||||
|
"""Write current state to status.json."""
|
||||||
|
state = get_current_state()
|
||||||
|
with open(STATUS_FILE, "w") as f:
|
||||||
|
json.dump(state, f, indent="\t")
|
||||||
|
return state
|
||||||
|
|
||||||
|
|
||||||
|
def cleanup_expired_events():
|
||||||
|
"""Background thread to remove expired TTL events."""
|
||||||
|
while True:
|
||||||
|
time.sleep(1)
|
||||||
|
now = time.time()
|
||||||
|
expired = []
|
||||||
|
|
||||||
|
with events_lock:
|
||||||
|
for eid, event in active_events.items():
|
||||||
|
if event.get("ttl") and now > event["ttl"]:
|
||||||
|
expired.append(eid)
|
||||||
|
|
||||||
|
for eid in expired:
|
||||||
|
del active_events[eid]
|
||||||
|
|
||||||
|
if expired:
|
||||||
|
write_status()
|
||||||
|
|
||||||
|
|
||||||
|
@app.route("/event", methods=["POST"])
|
||||||
|
def post_event():
|
||||||
|
"""
|
||||||
|
Accept a new event.
|
||||||
|
Expected JSON: {"id": "event_id", "priority": 1-4, "message": "optional", "ttl": optional_seconds}
|
||||||
|
"""
|
||||||
|
data = request.get_json(force=True)
|
||||||
|
|
||||||
|
if not data or "id" not in data or "priority" not in data:
|
||||||
|
return jsonify({"error": "Missing required fields: id, priority"}), 400
|
||||||
|
|
||||||
|
event_id = str(data["id"])
|
||||||
|
priority = int(data["priority"])
|
||||||
|
|
||||||
|
if priority not in PRIORITY_CONFIG:
|
||||||
|
return jsonify({"error": f"Invalid priority: {priority}. Must be 1-4."}), 400
|
||||||
|
|
||||||
|
event = {
|
||||||
|
"priority": priority,
|
||||||
|
"message": data.get("message", ""),
|
||||||
|
"timestamp": time.time(),
|
||||||
|
}
|
||||||
|
|
||||||
|
# Apply TTL if provided, or use default for Priority 3 (Notify)
|
||||||
|
if "ttl" in data:
|
||||||
|
event["ttl"] = time.time() + int(data["ttl"])
|
||||||
|
elif priority == 3:
|
||||||
|
event["ttl"] = time.time() + DEFAULT_NOTIFY_TTL
|
||||||
|
|
||||||
|
with events_lock:
|
||||||
|
active_events[event_id] = event
|
||||||
|
|
||||||
|
state = write_status()
|
||||||
|
return jsonify({"status": "ok", "current_state": state}), 200
|
||||||
|
|
||||||
|
|
||||||
|
@app.route("/clear", methods=["POST"])
|
||||||
|
def clear_event():
|
||||||
|
"""
|
||||||
|
Clear an event by ID.
|
||||||
|
Expected JSON: {"id": "event_id"}
|
||||||
|
"""
|
||||||
|
data = request.get_json(force=True)
|
||||||
|
|
||||||
|
if not data or "id" not in data:
|
||||||
|
return jsonify({"error": "Missing required field: id"}), 400
|
||||||
|
|
||||||
|
event_id = str(data["id"])
|
||||||
|
|
||||||
|
with events_lock:
|
||||||
|
if event_id in active_events:
|
||||||
|
del active_events[event_id]
|
||||||
|
state = write_status()
|
||||||
|
return jsonify({"status": "cleared", "current_state": state}), 200
|
||||||
|
else:
|
||||||
|
return jsonify({"error": "Event not found"}), 404
|
||||||
|
|
||||||
|
|
||||||
|
@app.route("/")
|
||||||
|
def index():
|
||||||
|
"""Serve the frontend."""
|
||||||
|
return send_from_directory(ROOT_DIR, "index.html")
|
||||||
|
|
||||||
|
|
||||||
|
@app.route("/status", methods=["GET"])
|
||||||
|
def get_status():
|
||||||
|
"""Return current status as JSON."""
|
||||||
|
return jsonify(get_current_state()), 200
|
||||||
|
|
||||||
|
|
||||||
|
@app.route("/events", methods=["GET"])
|
||||||
|
def list_events():
|
||||||
|
"""List all active events."""
|
||||||
|
with events_lock:
|
||||||
|
return jsonify({"events": dict(active_events)}), 200
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
# Write initial optimal state
|
||||||
|
write_status()
|
||||||
|
print(f"Status file: {STATUS_FILE}")
|
||||||
|
|
||||||
|
# Start TTL cleanup thread
|
||||||
|
cleanup_thread = threading.Thread(target=cleanup_expired_events, daemon=True)
|
||||||
|
cleanup_thread.start()
|
||||||
|
|
||||||
|
# Run Flask
|
||||||
|
app.run(host="0.0.0.0", port=5000, threaded=True)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
57
config.json
Normal file
57
config.json
Normal file
@@ -0,0 +1,57 @@
|
|||||||
|
{
|
||||||
|
"aggregator_url": "http://localhost:5000",
|
||||||
|
"aggregator": {
|
||||||
|
"script": "aggregator.py"
|
||||||
|
},
|
||||||
|
"detectors": [
|
||||||
|
{
|
||||||
|
"name": "disk_space",
|
||||||
|
"enabled": true,
|
||||||
|
"script": "detectors/disk_space.py",
|
||||||
|
"env": {
|
||||||
|
"CHECK_INTERVAL": "300",
|
||||||
|
"THRESHOLD_WARNING": "85",
|
||||||
|
"THRESHOLD_CRITICAL": "95"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "cpu",
|
||||||
|
"enabled": true,
|
||||||
|
"script": "detectors/cpu.py",
|
||||||
|
"env": {
|
||||||
|
"CHECK_INTERVAL": "30",
|
||||||
|
"THRESHOLD_WARNING": "85",
|
||||||
|
"THRESHOLD_CRITICAL": "95"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "memory",
|
||||||
|
"enabled": true,
|
||||||
|
"script": "detectors/memory.py",
|
||||||
|
"env": {
|
||||||
|
"CHECK_INTERVAL": "30",
|
||||||
|
"THRESHOLD_WARNING": "85",
|
||||||
|
"THRESHOLD_CRITICAL": "95"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "service",
|
||||||
|
"enabled": false,
|
||||||
|
"script": "detectors/service.py",
|
||||||
|
"env": {
|
||||||
|
"CHECK_INTERVAL": "30",
|
||||||
|
"SERVICES": "nginx,postgres"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "network",
|
||||||
|
"enabled": false,
|
||||||
|
"script": "detectors/network.py",
|
||||||
|
"env": {
|
||||||
|
"CHECK_INTERVAL": "60",
|
||||||
|
"HOSTS": "8.8.8.8,google.com",
|
||||||
|
"TIMEOUT": "5"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
83
detectors/cpu.py
Normal file
83
detectors/cpu.py
Normal file
@@ -0,0 +1,83 @@
|
|||||||
|
"""
|
||||||
|
CPU Usage Detector
|
||||||
|
Monitors CPU usage and reports to the aggregator when thresholds are exceeded.
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
AGGREGATOR_URL - URL of the aggregator (default: http://localhost:5000)
|
||||||
|
CHECK_INTERVAL - Seconds between checks (default: 30)
|
||||||
|
THRESHOLD_CRITICAL - Percent usage for critical alert (default: 95)
|
||||||
|
THRESHOLD_WARNING - Percent usage for warning alert (default: 85)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import psutil
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration from environment
|
||||||
|
AGGREGATOR_URL = os.environ.get("AGGREGATOR_URL", "http://localhost:5000")
|
||||||
|
CHECK_INTERVAL = int(os.environ.get("CHECK_INTERVAL", 30))
|
||||||
|
THRESHOLD_CRITICAL = int(os.environ.get("THRESHOLD_CRITICAL", 95))
|
||||||
|
THRESHOLD_WARNING = int(os.environ.get("THRESHOLD_WARNING", 85))
|
||||||
|
|
||||||
|
EVENT_ID = "cpu_usage"
|
||||||
|
|
||||||
|
|
||||||
|
def send_event(priority, message):
|
||||||
|
"""Send an event to the aggregator with heartbeat TTL."""
|
||||||
|
ttl = CHECK_INTERVAL * 2
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/event",
|
||||||
|
json={"id": EVENT_ID, "priority": priority, "message": message, "ttl": ttl},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
print(f"[EVENT] {message} (priority {priority}, ttl {ttl}s) -> {response.status_code}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to send event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def clear_event():
|
||||||
|
"""Clear the event from the aggregator."""
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/clear",
|
||||||
|
json={"id": EVENT_ID},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
if response.status_code == 200:
|
||||||
|
print(f"[CLEAR] {EVENT_ID}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to clear event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print(f"CPU Usage Detector started")
|
||||||
|
print(f" Aggregator: {AGGREGATOR_URL}")
|
||||||
|
print(f" Interval: {CHECK_INTERVAL}s")
|
||||||
|
print(f" Thresholds: Warning={THRESHOLD_WARNING}%, Critical={THRESHOLD_CRITICAL}%")
|
||||||
|
print()
|
||||||
|
|
||||||
|
alert_active = False
|
||||||
|
|
||||||
|
while True:
|
||||||
|
# Get CPU usage over a 1-second sample
|
||||||
|
cpu_percent = psutil.cpu_percent(interval=1)
|
||||||
|
|
||||||
|
if cpu_percent >= THRESHOLD_CRITICAL:
|
||||||
|
send_event(1, f"CPU at {cpu_percent:.0f}%")
|
||||||
|
alert_active = True
|
||||||
|
elif cpu_percent >= THRESHOLD_WARNING:
|
||||||
|
send_event(2, f"CPU at {cpu_percent:.0f}%")
|
||||||
|
alert_active = True
|
||||||
|
else:
|
||||||
|
print(f"[OK] CPU: {cpu_percent:.0f}%")
|
||||||
|
if alert_active:
|
||||||
|
clear_event()
|
||||||
|
alert_active = False
|
||||||
|
|
||||||
|
time.sleep(CHECK_INTERVAL - 1) # Account for 1s sample time
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
159
detectors/disk_space.py
Normal file
159
detectors/disk_space.py
Normal file
@@ -0,0 +1,159 @@
|
|||||||
|
"""
|
||||||
|
Disk Space Detector
|
||||||
|
Monitors all drives and reports to the aggregator when thresholds are exceeded.
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
AGGREGATOR_URL - URL of the aggregator (default: http://localhost:5000)
|
||||||
|
CHECK_INTERVAL - Seconds between checks (default: 300)
|
||||||
|
THRESHOLD_CRITICAL - Percent usage for critical alert (default: 95)
|
||||||
|
THRESHOLD_WARNING - Percent usage for warning alert (default: 85)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import shutil
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration from environment
|
||||||
|
AGGREGATOR_URL = os.environ.get("AGGREGATOR_URL", "http://localhost:5000")
|
||||||
|
CHECK_INTERVAL = int(os.environ.get("CHECK_INTERVAL", 300))
|
||||||
|
THRESHOLD_CRITICAL = int(os.environ.get("THRESHOLD_CRITICAL", 95))
|
||||||
|
THRESHOLD_WARNING = int(os.environ.get("THRESHOLD_WARNING", 85))
|
||||||
|
|
||||||
|
|
||||||
|
def get_all_drives():
|
||||||
|
"""Get list of mounted drives/partitions."""
|
||||||
|
import platform
|
||||||
|
drives = []
|
||||||
|
|
||||||
|
if platform.system() == "Windows":
|
||||||
|
import string
|
||||||
|
for letter in string.ascii_uppercase:
|
||||||
|
drive = f"{letter}:\\"
|
||||||
|
try:
|
||||||
|
shutil.disk_usage(drive)
|
||||||
|
drives.append(drive)
|
||||||
|
except (FileNotFoundError, PermissionError, OSError):
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
# Linux/macOS - parse /proc/mounts or /etc/mtab for real filesystems
|
||||||
|
seen_devices = set()
|
||||||
|
try:
|
||||||
|
with open("/proc/mounts", "r") as f:
|
||||||
|
for line in f:
|
||||||
|
parts = line.split()
|
||||||
|
if len(parts) < 2:
|
||||||
|
continue
|
||||||
|
device, mount = parts[0], parts[1]
|
||||||
|
# Skip virtual filesystems
|
||||||
|
if not device.startswith("/dev/"):
|
||||||
|
continue
|
||||||
|
# Skip duplicate devices (e.g., bind mounts)
|
||||||
|
if device in seen_devices:
|
||||||
|
continue
|
||||||
|
seen_devices.add(device)
|
||||||
|
try:
|
||||||
|
shutil.disk_usage(mount)
|
||||||
|
drives.append(mount)
|
||||||
|
except (FileNotFoundError, PermissionError, OSError):
|
||||||
|
pass
|
||||||
|
except FileNotFoundError:
|
||||||
|
# Fallback for macOS or systems without /proc/mounts
|
||||||
|
for mount in ["/", "/home", "/var"]:
|
||||||
|
if os.path.exists(mount):
|
||||||
|
try:
|
||||||
|
shutil.disk_usage(mount)
|
||||||
|
drives.append(mount)
|
||||||
|
except (FileNotFoundError, PermissionError, OSError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
return drives
|
||||||
|
|
||||||
|
|
||||||
|
def check_disk(drive):
|
||||||
|
"""Check disk usage for a drive. Returns (percent_used, total_gb, used_gb)."""
|
||||||
|
try:
|
||||||
|
usage = shutil.disk_usage(drive)
|
||||||
|
total_gb = usage.total / (1024 ** 3)
|
||||||
|
used_gb = usage.used / (1024 ** 3)
|
||||||
|
percent = (usage.used / usage.total) * 100
|
||||||
|
return percent, total_gb, used_gb
|
||||||
|
except Exception:
|
||||||
|
return None, None, None
|
||||||
|
|
||||||
|
|
||||||
|
def send_event(event_id, priority, message):
|
||||||
|
"""Send an event to the aggregator with heartbeat TTL."""
|
||||||
|
ttl = CHECK_INTERVAL * 2 # Event expires if not refreshed
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/event",
|
||||||
|
json={"id": event_id, "priority": priority, "message": message, "ttl": ttl},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
print(f"[EVENT] {event_id}: {message} (priority {priority}, ttl {ttl}s) -> {response.status_code}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to send event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def clear_event(event_id):
|
||||||
|
"""Clear an event from the aggregator."""
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/clear",
|
||||||
|
json={"id": event_id},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
if response.status_code == 200:
|
||||||
|
print(f"[CLEAR] {event_id}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to clear event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print(f"Disk Space Detector started")
|
||||||
|
print(f" Aggregator: {AGGREGATOR_URL}")
|
||||||
|
print(f" Interval: {CHECK_INTERVAL}s")
|
||||||
|
print(f" Thresholds: Warning={THRESHOLD_WARNING}%, Critical={THRESHOLD_CRITICAL}%")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Track active alerts to know when to clear
|
||||||
|
active_alerts = set()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
drives = get_all_drives()
|
||||||
|
print(f"[CHECK] Scanning {len(drives)} drive(s)...")
|
||||||
|
|
||||||
|
current_alerts = set()
|
||||||
|
|
||||||
|
for drive in drives:
|
||||||
|
percent, total_gb, used_gb = check_disk(drive)
|
||||||
|
if percent is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
# Create a clean event ID from drive path
|
||||||
|
event_id = f"disk_{drive.replace(':', '').replace('/', '_').replace('\\', '').strip('_') or 'root'}"
|
||||||
|
|
||||||
|
if percent >= THRESHOLD_CRITICAL:
|
||||||
|
message = f"{drive} at {percent:.0f}% ({used_gb:.1f}/{total_gb:.1f} GB)"
|
||||||
|
send_event(event_id, 1, message)
|
||||||
|
current_alerts.add(event_id)
|
||||||
|
elif percent >= THRESHOLD_WARNING:
|
||||||
|
message = f"{drive} at {percent:.0f}% ({used_gb:.1f}/{total_gb:.1f} GB)"
|
||||||
|
send_event(event_id, 2, message)
|
||||||
|
current_alerts.add(event_id)
|
||||||
|
else:
|
||||||
|
print(f"[OK] {drive}: {percent:.0f}%")
|
||||||
|
|
||||||
|
# Clear alerts that are no longer active
|
||||||
|
for event_id in active_alerts - current_alerts:
|
||||||
|
clear_event(event_id)
|
||||||
|
|
||||||
|
active_alerts = current_alerts
|
||||||
|
|
||||||
|
print(f"[SLEEP] Next check in {CHECK_INTERVAL}s\n")
|
||||||
|
time.sleep(CHECK_INTERVAL)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
85
detectors/memory.py
Normal file
85
detectors/memory.py
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
"""
|
||||||
|
Memory Usage Detector
|
||||||
|
Monitors RAM usage and reports to the aggregator when thresholds are exceeded.
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
AGGREGATOR_URL - URL of the aggregator (default: http://localhost:5000)
|
||||||
|
CHECK_INTERVAL - Seconds between checks (default: 30)
|
||||||
|
THRESHOLD_CRITICAL - Percent usage for critical alert (default: 95)
|
||||||
|
THRESHOLD_WARNING - Percent usage for warning alert (default: 85)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import time
|
||||||
|
import psutil
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration from environment
|
||||||
|
AGGREGATOR_URL = os.environ.get("AGGREGATOR_URL", "http://localhost:5000")
|
||||||
|
CHECK_INTERVAL = int(os.environ.get("CHECK_INTERVAL", 30))
|
||||||
|
THRESHOLD_CRITICAL = int(os.environ.get("THRESHOLD_CRITICAL", 95))
|
||||||
|
THRESHOLD_WARNING = int(os.environ.get("THRESHOLD_WARNING", 85))
|
||||||
|
|
||||||
|
EVENT_ID = "memory_usage"
|
||||||
|
|
||||||
|
|
||||||
|
def send_event(priority, message):
|
||||||
|
"""Send an event to the aggregator with heartbeat TTL."""
|
||||||
|
ttl = CHECK_INTERVAL * 2
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/event",
|
||||||
|
json={"id": EVENT_ID, "priority": priority, "message": message, "ttl": ttl},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
print(f"[EVENT] {message} (priority {priority}, ttl {ttl}s) -> {response.status_code}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to send event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def clear_event():
|
||||||
|
"""Clear the event from the aggregator."""
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/clear",
|
||||||
|
json={"id": EVENT_ID},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
if response.status_code == 200:
|
||||||
|
print(f"[CLEAR] {EVENT_ID}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to clear event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
print(f"Memory Usage Detector started")
|
||||||
|
print(f" Aggregator: {AGGREGATOR_URL}")
|
||||||
|
print(f" Interval: {CHECK_INTERVAL}s")
|
||||||
|
print(f" Thresholds: Warning={THRESHOLD_WARNING}%, Critical={THRESHOLD_CRITICAL}%")
|
||||||
|
print()
|
||||||
|
|
||||||
|
alert_active = False
|
||||||
|
|
||||||
|
while True:
|
||||||
|
mem = psutil.virtual_memory()
|
||||||
|
mem_percent = mem.percent
|
||||||
|
used_gb = mem.used / (1024 ** 3)
|
||||||
|
total_gb = mem.total / (1024 ** 3)
|
||||||
|
|
||||||
|
if mem_percent >= THRESHOLD_CRITICAL:
|
||||||
|
send_event(1, f"Memory at {mem_percent:.0f}% ({used_gb:.1f}/{total_gb:.1f} GB)")
|
||||||
|
alert_active = True
|
||||||
|
elif mem_percent >= THRESHOLD_WARNING:
|
||||||
|
send_event(2, f"Memory at {mem_percent:.0f}% ({used_gb:.1f}/{total_gb:.1f} GB)")
|
||||||
|
alert_active = True
|
||||||
|
else:
|
||||||
|
print(f"[OK] Memory: {mem_percent:.0f}% ({used_gb:.1f}/{total_gb:.1f} GB)")
|
||||||
|
if alert_active:
|
||||||
|
clear_event()
|
||||||
|
alert_active = False
|
||||||
|
|
||||||
|
time.sleep(CHECK_INTERVAL)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
115
detectors/network.py
Normal file
115
detectors/network.py
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
"""
|
||||||
|
Network/Ping Detector
|
||||||
|
Monitors if hosts are reachable via ping.
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
AGGREGATOR_URL - URL of the aggregator (default: http://localhost:5000)
|
||||||
|
CHECK_INTERVAL - Seconds between checks (default: 60)
|
||||||
|
HOSTS - Comma-separated list of hosts to ping (required)
|
||||||
|
Example: "8.8.8.8,google.com,192.168.1.1"
|
||||||
|
TIMEOUT - Ping timeout in seconds (default: 5)
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import platform
|
||||||
|
import subprocess
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration from environment
|
||||||
|
AGGREGATOR_URL = os.environ.get("AGGREGATOR_URL", "http://localhost:5000")
|
||||||
|
CHECK_INTERVAL = int(os.environ.get("CHECK_INTERVAL", 60))
|
||||||
|
HOSTS = os.environ.get("HOSTS", "")
|
||||||
|
TIMEOUT = int(os.environ.get("TIMEOUT", 5))
|
||||||
|
|
||||||
|
|
||||||
|
def ping(host):
|
||||||
|
"""Ping a host. Returns True if reachable."""
|
||||||
|
param = "-n" if platform.system().lower() == "windows" else "-c"
|
||||||
|
timeout_param = "-w" if platform.system().lower() == "windows" else "-W"
|
||||||
|
timeout_val = str(TIMEOUT * 1000) if platform.system().lower() == "windows" else str(TIMEOUT)
|
||||||
|
|
||||||
|
try:
|
||||||
|
result = subprocess.run(
|
||||||
|
["ping", param, "1", timeout_param, timeout_val, host],
|
||||||
|
stdout=subprocess.DEVNULL,
|
||||||
|
stderr=subprocess.DEVNULL,
|
||||||
|
timeout=TIMEOUT + 2
|
||||||
|
)
|
||||||
|
return result.returncode == 0
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
return False
|
||||||
|
except Exception:
|
||||||
|
return False
|
||||||
|
|
||||||
|
|
||||||
|
def send_event(event_id, priority, message):
|
||||||
|
"""Send an event to the aggregator with heartbeat TTL."""
|
||||||
|
ttl = CHECK_INTERVAL * 2
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/event",
|
||||||
|
json={"id": event_id, "priority": priority, "message": message, "ttl": ttl},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
print(f"[EVENT] {event_id}: {message} (priority {priority}, ttl {ttl}s) -> {response.status_code}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to send event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def clear_event(event_id):
|
||||||
|
"""Clear the event from the aggregator."""
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/clear",
|
||||||
|
json={"id": event_id},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
if response.status_code == 200:
|
||||||
|
print(f"[CLEAR] {event_id}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to clear event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if not HOSTS:
|
||||||
|
print("ERROR: HOSTS environment variable is required")
|
||||||
|
print("Example: HOSTS=8.8.8.8,google.com python detectors/network.py")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
hosts = [h.strip() for h in HOSTS.split(",") if h.strip()]
|
||||||
|
|
||||||
|
print(f"Network/Ping Detector started")
|
||||||
|
print(f" Aggregator: {AGGREGATOR_URL}")
|
||||||
|
print(f" Interval: {CHECK_INTERVAL}s")
|
||||||
|
print(f" Timeout: {TIMEOUT}s")
|
||||||
|
print(f" Monitoring: {', '.join(hosts)}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Track which hosts have active alerts
|
||||||
|
active_alerts = set()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
current_alerts = set()
|
||||||
|
|
||||||
|
for host in hosts:
|
||||||
|
event_id = f"ping_{host.replace('.', '_').replace(':', '_')}"
|
||||||
|
|
||||||
|
if ping(host):
|
||||||
|
print(f"[OK] Host '{host}' is reachable")
|
||||||
|
else:
|
||||||
|
send_event(event_id, 1, f"Host '{host}' is unreachable")
|
||||||
|
current_alerts.add(event_id)
|
||||||
|
|
||||||
|
# Clear alerts for hosts that are now reachable
|
||||||
|
for event_id in active_alerts - current_alerts:
|
||||||
|
clear_event(event_id)
|
||||||
|
|
||||||
|
active_alerts = current_alerts
|
||||||
|
|
||||||
|
time.sleep(CHECK_INTERVAL)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
108
detectors/service.py
Normal file
108
detectors/service.py
Normal file
@@ -0,0 +1,108 @@
|
|||||||
|
"""
|
||||||
|
Service Health Detector
|
||||||
|
Monitors if specific processes/services are running.
|
||||||
|
|
||||||
|
Environment variables:
|
||||||
|
AGGREGATOR_URL - URL of the aggregator (default: http://localhost:5000)
|
||||||
|
CHECK_INTERVAL - Seconds between checks (default: 30)
|
||||||
|
SERVICES - Comma-separated list of process names to monitor (required)
|
||||||
|
Example: "nginx,postgres,redis"
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
import psutil
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration from environment
|
||||||
|
AGGREGATOR_URL = os.environ.get("AGGREGATOR_URL", "http://localhost:5000")
|
||||||
|
CHECK_INTERVAL = int(os.environ.get("CHECK_INTERVAL", 30))
|
||||||
|
SERVICES = os.environ.get("SERVICES", "")
|
||||||
|
|
||||||
|
|
||||||
|
def get_running_processes():
|
||||||
|
"""Get set of running process names."""
|
||||||
|
running = set()
|
||||||
|
for proc in psutil.process_iter(['name']):
|
||||||
|
try:
|
||||||
|
name = proc.info['name']
|
||||||
|
if name:
|
||||||
|
# Store both with and without common extensions
|
||||||
|
running.add(name.lower())
|
||||||
|
if name.lower().endswith('.exe'):
|
||||||
|
running.add(name.lower()[:-4])
|
||||||
|
except (psutil.NoSuchProcess, psutil.AccessDenied):
|
||||||
|
pass
|
||||||
|
return running
|
||||||
|
|
||||||
|
|
||||||
|
def send_event(event_id, priority, message):
|
||||||
|
"""Send an event to the aggregator with heartbeat TTL."""
|
||||||
|
ttl = CHECK_INTERVAL * 2
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/event",
|
||||||
|
json={"id": event_id, "priority": priority, "message": message, "ttl": ttl},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
print(f"[EVENT] {event_id}: {message} (priority {priority}, ttl {ttl}s) -> {response.status_code}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to send event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def clear_event(event_id):
|
||||||
|
"""Clear the event from the aggregator."""
|
||||||
|
try:
|
||||||
|
response = requests.post(
|
||||||
|
f"{AGGREGATOR_URL}/clear",
|
||||||
|
json={"id": event_id},
|
||||||
|
timeout=5
|
||||||
|
)
|
||||||
|
if response.status_code == 200:
|
||||||
|
print(f"[CLEAR] {event_id}")
|
||||||
|
except requests.RequestException as e:
|
||||||
|
print(f"[ERROR] Failed to clear event: {e}")
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
if not SERVICES:
|
||||||
|
print("ERROR: SERVICES environment variable is required")
|
||||||
|
print("Example: SERVICES=nginx,postgres,redis python detectors/service.py")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
services = [s.strip().lower() for s in SERVICES.split(",") if s.strip()]
|
||||||
|
|
||||||
|
print(f"Service Health Detector started")
|
||||||
|
print(f" Aggregator: {AGGREGATOR_URL}")
|
||||||
|
print(f" Interval: {CHECK_INTERVAL}s")
|
||||||
|
print(f" Monitoring: {', '.join(services)}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Track which services have active alerts
|
||||||
|
active_alerts = set()
|
||||||
|
|
||||||
|
while True:
|
||||||
|
running = get_running_processes()
|
||||||
|
current_alerts = set()
|
||||||
|
|
||||||
|
for service in services:
|
||||||
|
event_id = f"service_{service}"
|
||||||
|
|
||||||
|
if service not in running:
|
||||||
|
send_event(event_id, 1, f"Service '{service}' is not running")
|
||||||
|
current_alerts.add(event_id)
|
||||||
|
else:
|
||||||
|
print(f"[OK] Service '{service}' is running")
|
||||||
|
|
||||||
|
# Clear alerts for services that are now running
|
||||||
|
for event_id in active_alerts - current_alerts:
|
||||||
|
clear_event(event_id)
|
||||||
|
|
||||||
|
active_alerts = current_alerts
|
||||||
|
|
||||||
|
time.sleep(CHECK_INTERVAL)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
122
index.html
Normal file
122
index.html
Normal file
@@ -0,0 +1,122 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no">
|
||||||
|
<title>Sentry-Emote</title>
|
||||||
|
<style>
|
||||||
|
* {
|
||||||
|
margin: 0;
|
||||||
|
padding: 0;
|
||||||
|
box-sizing: border-box;
|
||||||
|
}
|
||||||
|
|
||||||
|
body {
|
||||||
|
background: #000000;
|
||||||
|
min-height: 100vh;
|
||||||
|
display: flex;
|
||||||
|
flex-direction: column;
|
||||||
|
justify-content: center;
|
||||||
|
align-items: center;
|
||||||
|
font-family: monospace;
|
||||||
|
overflow: hidden;
|
||||||
|
}
|
||||||
|
|
||||||
|
#emote {
|
||||||
|
font-size: 18vw;
|
||||||
|
font-weight: bold;
|
||||||
|
text-align: center;
|
||||||
|
transition: color 0.3s ease;
|
||||||
|
}
|
||||||
|
|
||||||
|
#message {
|
||||||
|
font-size: 4vw;
|
||||||
|
margin-top: 2vh;
|
||||||
|
opacity: 0.7;
|
||||||
|
text-align: center;
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Breathing animation - slow pulse */
|
||||||
|
.breathing {
|
||||||
|
animation: breathe 3s ease-in-out infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes breathe {
|
||||||
|
0%, 100% {
|
||||||
|
opacity: 1;
|
||||||
|
transform: scale(1);
|
||||||
|
}
|
||||||
|
50% {
|
||||||
|
opacity: 0.7;
|
||||||
|
transform: scale(0.98);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Shaking animation - rapid jitter for Critical */
|
||||||
|
.shaking {
|
||||||
|
animation: shake 0.15s linear infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes shake {
|
||||||
|
0%, 100% { transform: translateX(0); }
|
||||||
|
25% { transform: translateX(-5px); }
|
||||||
|
75% { transform: translateX(5px); }
|
||||||
|
}
|
||||||
|
|
||||||
|
/* Popping animation - scale up for Notifications */
|
||||||
|
.popping {
|
||||||
|
animation: pop 1s ease-in-out infinite;
|
||||||
|
}
|
||||||
|
|
||||||
|
@keyframes pop {
|
||||||
|
0%, 100% {
|
||||||
|
transform: scale(1);
|
||||||
|
}
|
||||||
|
50% {
|
||||||
|
transform: scale(1.08);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div id="emote" class="breathing">( ^_^)</div>
|
||||||
|
<div id="message">Loading...</div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
const emoteEl = document.getElementById('emote');
|
||||||
|
const messageEl = document.getElementById('message');
|
||||||
|
const POLL_INTERVAL = 2000;
|
||||||
|
|
||||||
|
async function fetchStatus() {
|
||||||
|
try {
|
||||||
|
const response = await fetch('/status');
|
||||||
|
if (!response.ok) throw new Error('Failed to fetch');
|
||||||
|
const data = await response.json();
|
||||||
|
updateDisplay(data);
|
||||||
|
} catch (err) {
|
||||||
|
messageEl.textContent = 'Connection lost...';
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
function updateDisplay(data) {
|
||||||
|
emoteEl.textContent = data.active_emote;
|
||||||
|
emoteEl.style.color = data.color;
|
||||||
|
messageEl.style.color = data.color;
|
||||||
|
|
||||||
|
// Show event message if available, otherwise show state
|
||||||
|
const topEvent = data.active_events && data.active_events[0];
|
||||||
|
messageEl.textContent = (topEvent && topEvent.message) || data.message;
|
||||||
|
|
||||||
|
// Update animation class
|
||||||
|
emoteEl.className = '';
|
||||||
|
if (data.animation) {
|
||||||
|
emoteEl.classList.add(data.animation);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Initial fetch and start polling
|
||||||
|
fetchStatus();
|
||||||
|
setInterval(fetchStatus, POLL_INTERVAL);
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
14
requirements.txt
Normal file
14
requirements.txt
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
blinker==1.9.0
|
||||||
|
certifi==2026.1.4
|
||||||
|
charset-normalizer==3.4.4
|
||||||
|
click==8.3.1
|
||||||
|
colorama==0.4.6
|
||||||
|
Flask==3.1.2
|
||||||
|
idna==3.11
|
||||||
|
itsdangerous==2.2.0
|
||||||
|
Jinja2==3.1.6
|
||||||
|
MarkupSafe==3.0.3
|
||||||
|
psutil==7.2.2
|
||||||
|
requests==2.32.5
|
||||||
|
urllib3==2.6.3
|
||||||
|
Werkzeug==3.1.5
|
||||||
264
sentry.py
Normal file
264
sentry.py
Normal file
@@ -0,0 +1,264 @@
|
|||||||
|
"""
|
||||||
|
Sentry-Emote
|
||||||
|
Single entry point for the entire system - aggregator + all detectors.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python sentry.py [--config config.json]
|
||||||
|
"""
|
||||||
|
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import signal
|
||||||
|
import subprocess
|
||||||
|
import sys
|
||||||
|
import time
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import requests
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
DEFAULT_CONFIG = "config.json"
|
||||||
|
RESTART_DELAY = 5
|
||||||
|
AGGREGATOR_STARTUP_TIMEOUT = 10
|
||||||
|
|
||||||
|
|
||||||
|
class SentryEmote:
|
||||||
|
def __init__(self, config_path):
|
||||||
|
self.config_path = Path(config_path)
|
||||||
|
self.base_dir = self.config_path.parent
|
||||||
|
self.processes = {} # name -> {process, config}
|
||||||
|
self.running = True
|
||||||
|
self.config = None
|
||||||
|
|
||||||
|
def load_config(self):
|
||||||
|
"""Load configuration from JSON file."""
|
||||||
|
with open(self.config_path) as f:
|
||||||
|
self.config = json.load(f)
|
||||||
|
return self.config
|
||||||
|
|
||||||
|
def start_process(self, name, script, env=None):
|
||||||
|
"""Start a Python script as a subprocess."""
|
||||||
|
script_path = self.base_dir / script
|
||||||
|
|
||||||
|
if not script_path.exists():
|
||||||
|
print(f"[{name}] Script not found: {script_path}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
# Build environment
|
||||||
|
proc_env = os.environ.copy()
|
||||||
|
if env:
|
||||||
|
proc_env.update(env)
|
||||||
|
|
||||||
|
try:
|
||||||
|
process = subprocess.Popen(
|
||||||
|
[sys.executable, "-u", str(script_path)],
|
||||||
|
env=proc_env,
|
||||||
|
stdout=subprocess.PIPE,
|
||||||
|
stderr=subprocess.STDOUT,
|
||||||
|
bufsize=1,
|
||||||
|
universal_newlines=True,
|
||||||
|
)
|
||||||
|
print(f"[{name}] Started (PID {process.pid})")
|
||||||
|
return process
|
||||||
|
except Exception as e:
|
||||||
|
print(f"[{name}] Failed to start: {e}")
|
||||||
|
return None
|
||||||
|
|
||||||
|
def wait_for_aggregator(self, url, timeout=AGGREGATOR_STARTUP_TIMEOUT):
|
||||||
|
"""Wait for the aggregator to become available."""
|
||||||
|
print(f"[aggregator] Waiting for service at {url}...")
|
||||||
|
start = time.time()
|
||||||
|
while time.time() - start < timeout:
|
||||||
|
try:
|
||||||
|
response = requests.get(f"{url}/status", timeout=2)
|
||||||
|
if response.status_code == 200:
|
||||||
|
print(f"[aggregator] Service ready")
|
||||||
|
return True
|
||||||
|
except requests.RequestException:
|
||||||
|
pass
|
||||||
|
time.sleep(0.5)
|
||||||
|
print(f"[aggregator] Timeout waiting for service")
|
||||||
|
return False
|
||||||
|
|
||||||
|
def stream_output(self, name, process):
|
||||||
|
"""Read and print output from a process (non-blocking)."""
|
||||||
|
if process.stdout:
|
||||||
|
while True:
|
||||||
|
line = process.stdout.readline()
|
||||||
|
if not line:
|
||||||
|
break
|
||||||
|
print(f"[{name}] {line.rstrip()}")
|
||||||
|
|
||||||
|
def start_aggregator(self):
|
||||||
|
"""Start the aggregator service."""
|
||||||
|
agg_config = self.config.get("aggregator", {})
|
||||||
|
script = agg_config.get("script", "aggregator.py")
|
||||||
|
env = agg_config.get("env", {})
|
||||||
|
|
||||||
|
process = self.start_process("aggregator", script, env)
|
||||||
|
if process:
|
||||||
|
self.processes["aggregator"] = {
|
||||||
|
"process": process,
|
||||||
|
"config": {"name": "aggregator", "script": script, "env": env},
|
||||||
|
}
|
||||||
|
# Wait for aggregator to be ready
|
||||||
|
url = self.config.get("aggregator_url", "http://localhost:5000")
|
||||||
|
return self.wait_for_aggregator(url)
|
||||||
|
return False
|
||||||
|
|
||||||
|
def start_detectors(self):
|
||||||
|
"""Start all enabled detectors."""
|
||||||
|
url = self.config.get("aggregator_url", "http://localhost:5000")
|
||||||
|
|
||||||
|
for detector in self.config.get("detectors", []):
|
||||||
|
if not detector.get("enabled", True):
|
||||||
|
continue
|
||||||
|
|
||||||
|
name = detector["name"]
|
||||||
|
env = {"AGGREGATOR_URL": url}
|
||||||
|
env.update(detector.get("env", {}))
|
||||||
|
|
||||||
|
process = self.start_process(name, detector["script"], env)
|
||||||
|
if process:
|
||||||
|
self.processes[name] = {
|
||||||
|
"process": process,
|
||||||
|
"config": detector,
|
||||||
|
}
|
||||||
|
|
||||||
|
def check_processes(self):
|
||||||
|
"""Check for crashed processes and restart them."""
|
||||||
|
for name, info in list(self.processes.items()):
|
||||||
|
process = info["process"]
|
||||||
|
|
||||||
|
# Stream any available output
|
||||||
|
self.stream_output(name, process)
|
||||||
|
|
||||||
|
# Check if process has exited
|
||||||
|
if process.poll() is not None:
|
||||||
|
print(f"[{name}] Exited with code {process.returncode}, restarting in {RESTART_DELAY}s...")
|
||||||
|
time.sleep(RESTART_DELAY)
|
||||||
|
|
||||||
|
if self.running:
|
||||||
|
config = info["config"]
|
||||||
|
env = {"AGGREGATOR_URL": self.config.get("aggregator_url", "http://localhost:5000")}
|
||||||
|
env.update(config.get("env", {}))
|
||||||
|
|
||||||
|
new_process = self.start_process(name, config["script"], env)
|
||||||
|
if new_process:
|
||||||
|
self.processes[name]["process"] = new_process
|
||||||
|
|
||||||
|
def stop_all(self):
|
||||||
|
"""Stop all processes (detectors first, then aggregator)."""
|
||||||
|
self.running = False
|
||||||
|
print("\nShutting down Sentry-Emote...")
|
||||||
|
|
||||||
|
# Stop detectors first
|
||||||
|
for name, info in list(self.processes.items()):
|
||||||
|
if name == "aggregator":
|
||||||
|
continue
|
||||||
|
process = info["process"]
|
||||||
|
if process.poll() is None:
|
||||||
|
print(f"[{name}] Stopping...")
|
||||||
|
process.terminate()
|
||||||
|
try:
|
||||||
|
process.wait(timeout=5)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
process.kill()
|
||||||
|
|
||||||
|
# Stop aggregator last
|
||||||
|
if "aggregator" in self.processes:
|
||||||
|
process = self.processes["aggregator"]["process"]
|
||||||
|
if process.poll() is None:
|
||||||
|
print(f"[aggregator] Stopping...")
|
||||||
|
process.terminate()
|
||||||
|
try:
|
||||||
|
process.wait(timeout=5)
|
||||||
|
except subprocess.TimeoutExpired:
|
||||||
|
process.kill()
|
||||||
|
|
||||||
|
print("Sentry-Emote stopped.")
|
||||||
|
|
||||||
|
def run(self):
|
||||||
|
"""Main run loop."""
|
||||||
|
self.load_config()
|
||||||
|
|
||||||
|
print("=" * 50)
|
||||||
|
print(" Sentry-Emote")
|
||||||
|
print("=" * 50)
|
||||||
|
print(f"Config: {self.config_path}")
|
||||||
|
print(f"Aggregator URL: {self.config.get('aggregator_url')}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
enabled = [d["name"] for d in self.config.get("detectors", []) if d.get("enabled", True)]
|
||||||
|
disabled = [d["name"] for d in self.config.get("detectors", []) if not d.get("enabled", True)]
|
||||||
|
|
||||||
|
print(f"Detectors enabled: {', '.join(enabled) or 'none'}")
|
||||||
|
if disabled:
|
||||||
|
print(f"Detectors disabled: {', '.join(disabled)}")
|
||||||
|
print()
|
||||||
|
|
||||||
|
# Start aggregator first
|
||||||
|
if not self.start_aggregator():
|
||||||
|
print("Failed to start aggregator, exiting.")
|
||||||
|
self.stop_all()
|
||||||
|
return
|
||||||
|
|
||||||
|
# Give it a moment to initialize
|
||||||
|
time.sleep(1)
|
||||||
|
|
||||||
|
# Start detectors
|
||||||
|
self.start_detectors()
|
||||||
|
|
||||||
|
print()
|
||||||
|
print("=" * 50)
|
||||||
|
print(f" UI available at: {self.config.get('aggregator_url')}")
|
||||||
|
print("=" * 50)
|
||||||
|
print()
|
||||||
|
|
||||||
|
try:
|
||||||
|
while self.running:
|
||||||
|
self.check_processes()
|
||||||
|
time.sleep(1)
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
pass
|
||||||
|
finally:
|
||||||
|
self.stop_all()
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
# Parse arguments
|
||||||
|
config_path = DEFAULT_CONFIG
|
||||||
|
if len(sys.argv) > 1:
|
||||||
|
if sys.argv[1] in ("-h", "--help"):
|
||||||
|
print(__doc__)
|
||||||
|
sys.exit(0)
|
||||||
|
elif sys.argv[1] == "--config" and len(sys.argv) > 2:
|
||||||
|
config_path = sys.argv[2]
|
||||||
|
else:
|
||||||
|
config_path = sys.argv[1]
|
||||||
|
|
||||||
|
# Resolve config path
|
||||||
|
config_path = Path(config_path)
|
||||||
|
if not config_path.is_absolute():
|
||||||
|
config_path = Path(__file__).parent / config_path
|
||||||
|
|
||||||
|
if not config_path.exists():
|
||||||
|
print(f"Config file not found: {config_path}")
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
# Setup signal handlers
|
||||||
|
sentry = SentryEmote(config_path)
|
||||||
|
|
||||||
|
def signal_handler(sig, frame):
|
||||||
|
sentry.stop_all()
|
||||||
|
sys.exit(0)
|
||||||
|
|
||||||
|
signal.signal(signal.SIGINT, signal_handler)
|
||||||
|
signal.signal(signal.SIGTERM, signal_handler)
|
||||||
|
|
||||||
|
# Run
|
||||||
|
sentry.run()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
Reference in New Issue
Block a user