Files
LLM-Powered-Monitoring-Agent/SPEC.md
2025-09-14 22:01:00 -05:00

6.9 KiB
Executable File

Project Specification: LLM-Powered Monitoring Agent

1. Project Goal

The primary goal of this project is to develop a self-contained Python script, monitor_agent.py, that functions as a monitoring agent. This agent will collect system and network data, use a locally hosted Large Language Model (LLM) to analyze the data for anomalies, and send alerts through Discord and Home Assistant if an anomaly is detected.

2. Core Components

The project will be composed of the following files:

  • monitor_agent.py: The main Python script containing the core logic for data collection, analysis, and alerting.
  • config.py: A configuration file to store sensitive information and settings, such as API keys and URLs.
  • requirements.txt: A file listing all the necessary Python libraries for the project.
  • README.md: A documentation file providing an overview of the project, setup instructions, and usage examples.
  • .gitignore: A file to specify which files and directories should be ignored by Git.
  • PROGRESS.md: A file to track the development progress of the project.
  • data_storage.py: Handles loading, storing, and calculating baselines from historical data.
  • CONSTRAINTS.md: Defines constraints and guidelines for the LLM's analysis.
  • known_issues.json: A JSON file containing a list of known issues to be considered by the LLM.
  • AGENTS.md: Documents the human and autonomous agents involved in the project.

3. Functional Requirements

3.1. Configuration

  • The agent must load configuration from config.py.
  • The configuration shall include placeholders for:
    • DISCORD_WEBHOOK_URL
    • HOME_ASSISTANT_URL
    • HOME_ASSISTANT_TOKEN
    • GOOGLE_HOME_SPEAKER_ID
    • DAILY_RECAP_TIME
    • NMAP_TARGETS
    • NMAP_SCAN_OPTIONS

3.2. Data Ingestion and Parsing

  • The agent must be able to collect and parse system logs (syslog and auth.log).
  • The agent must be able to collect and parse network metrics.
  • The parsing of this data should result in a structured format (JSON or Python dictionary).

3.3. Monitored Metrics

  • CPU Temperature: The agent will monitor the CPU temperature.
  • GPU Temperature: The agent will monitor the GPU temperature.
  • System Login Attempts: The agent will monitor system login attempts.
  • Network Scan Results (Nmap): The agent will periodically perform Nmap scans to discover hosts and open ports, logging detailed information including IP addresses, host status, and open ports with service details.

3.4. LLM Analysis

  • The agent must use a local LLM (via Ollama) to analyze the collected data.
  • The agent must construct a specific prompt to guide the LLM in identifying anomalies, incorporating historical baselines and known issues.
  • The LLM's response will be a structured JSON object with severity (high, medium, low, none) and reason fields.

3.5. Alerting

  • The agent must be able to send alerts to a Discord webhook.
  • The agent must be able to trigger a text-to-speech (TTS) alert on a Google Home speaker via Home Assistant.

3.6. Alerting Logic

  • Immediate alerts (Discord and Home Assistant) will only be sent for "high" severity anomalies.
  • A daily recap of all anomalies (high, medium, and low) will be sent at a configurable time.

3.7. Main Loop

  • The agent will run in a continuous loop.
  • The loop will execute the data collection, analysis, and alerting steps periodically.
  • The frequency of the monitoring loop will be configurable.

4. Data Storage and Baselining

  • 4.1. Data Storage: The agent will store historical monitoring data in a JSON file (monitoring_data.json).
  • 4.2. Baselining: The agent will calculate baseline averages for key metrics (e.g., RTT, packet loss, temperatures, open ports) from the stored historical data. This baseline will be used by the LLM to improve anomaly detection accuracy.

5. Technical Requirements

  • Language: Python 3.8+
  • LLM: llama3.1:8b running on a local Ollama instance.
  • Prerequisites: nmap, lm-sensors
  • Libraries:
    • ollama
    • discord-webhook
    • requests
    • syslog-rfc5424-parser
    • pingparsing
    • python-nmap

6. Project Structure

/
├── .gitignore
├── AGENTS.md
├── config.py
├── CONSTRAINTS.md
├── data_storage.py
├── known_issues.json
├── log_position.txt
├── auth_log_position.txt
├── monitor_agent.py
├── PROMPT.md
├── README.md
├── requirements.txt
├── PROGRESS.md
└── SPEC.md

7. Testing and Debugging

The script is equipped with a test mode, that only runs the script once, and not continuously. To enable, change the TEST_MODE variable in config.py to True. Once finished testing, change the variable back to False.

8. Future Enhancements

8.1. Process Monitoring

Description: The agent will be able to monitor a list of critical processes to ensure they are running. If a process is not running, an anomaly will be generated.

Implementation Plan:

  1. Configuration: Add a new list variable to config.py named PROCESSES_TO_MONITOR which will contain the names of the processes to be monitored.
  2. Data Ingestion: Create a new function in monitor_agent.py called get_running_processes() that uses the psutil library to get a list of all running processes.
  3. Data Analysis: In analyze_data_locally(), compare the list of running processes with the PROCESSES_TO_MONITOR list from the configuration. If a process from the configured list is not found in the running processes, generate a "high" severity anomaly.
  4. LLM Integration: The existing generate_llm_report() function will be used to generate a report for the new anomaly type.
  5. Alerting: The existing alerting system will be used to send alerts for the new anomaly type.

8.2. Docker Container Monitoring

Description: The agent will be able to monitor a list of critical Docker containers to ensure they are running and healthy. If a container is not running or is in an unhealthy state, an anomaly will be generated.

Implementation Plan:

  1. Configuration: Add a new list variable to config.py named DOCKER_CONTAINERS_TO_MONITOR which will contain the names of the Docker containers to be monitored.
  2. Data Ingestion: Create a new function in monitor_agent.py called get_docker_container_status() that uses the docker Python library to get the status of all running containers.
  3. Data Analysis: In analyze_data_locally(), iterate through the DOCKER_CONTAINERS_TO_MONITOR list. For each container, check its status. If a container is not running or its status is not "running", generate a "high" severity anomaly.
  4. LLM Integration: The existing generate_llm_report() function will be used to generate a report for the new anomaly type.
  5. Alerting: The existing alerting system will be used to send alerts for the new anomaly type.