Completed NMAP & Refactoring

2025-08-20 15:16:21 -05:00
parent dd673829d2
commit 63ee043f34
7 changed files with 268 additions and 245 deletions
--- a/README.md
+++ b/README.md
@@ -1,104 +1,93 @@
 # LLM-Powered Monitoring Agent

-This project is a self-hosted monitoring agent that uses a local Large Language Model (LLM) to detect anomalies in system and network data. It's designed to be a simple, self-contained Python script that can be easily deployed on a server.
+This project implements an LLM-powered monitoring agent designed to continuously collect system and network data, analyze it against historical baselines, and alert on anomalies. The agent leverages a local Large Language Model (LLM) for intelligent anomaly detection and integrates with Discord and Google Home for notifications.

-## 1. Installation
+## Features

-To get started, you'll need to have Python 3.8 or newer installed. Then, follow these steps:
+-   **System Log Monitoring**: Tracks new entries in `/var/log/syslog` and `/var/log/auth.log` (for login attempts).
+-   **Network Metrics**: Gathers network performance data by pinging a public IP (e.g., 8.8.8.8).
+-   **Hardware Monitoring**: Collects CPU and GPU temperature data.
+-   **Nmap Scanning**: Periodically performs network scans to discover hosts and open ports.
+-   **Historical Baseline Analysis**: Compares current data against a 24-hour rolling baseline to identify deviations.
+-   **LLM-Powered Anomaly Detection**: Utilizes a local LLM (Ollama with Llama3.1) to analyze combined system data, baselines, and Nmap changes for anomalies.
+-   **Alerting**: Sends high-severity anomaly alerts to Discord and Google Home speakers (via Home Assistant).
+-   **Daily Recap**: Provides a daily summary of detected events.

-1.  **Clone the repository or download the files:**
+## Recent Improvements

-    ```bash
-    git clone <repository_url>
-    cd <repository_directory>
-    ```
+-   **Enhanced Nmap Data Logging**: The Nmap scan results are now processed and stored in a more structured format, including:
+    -   Discovered IP addresses.
+    -   Status of each host.
+    -   Detailed list of open ports for each host, including service, product, and version information.
+    This significantly improves the clarity and utility of Nmap data for anomaly detection.
+-   **Code Refactoring (`monitor_agent.py`)**:
+    -   **Optimized Sensor Data Collection**: CPU and GPU temperature data are now collected with a single call to the `sensors` command, improving efficiency.
+    -   **Efficient Login Attempt Logging**: The agent now tracks its position in `/var/log/auth.log`, preventing redundant reads of the entire file and improving performance for large log files.
+    -   **Modular Main Loop**: The core monitoring logic has been broken down into smaller, more manageable functions, enhancing readability and maintainability.
+    -   **Separated LLM Prompt Building**: The complex LLM prompt construction logic has been moved into a dedicated function, making `analyze_data_with_llm` more focused.
+-   **Code Refactoring (`data_storage.py`)**:
+    -   **Streamlined Baseline Calculations**: Helper functions have been introduced to reduce code duplication and improve clarity in the calculation of average metrics for baselines.

-2.  **Create and activate a Python virtual environment:**
-
-    ```bash
-    python -m venv venv
-    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
-    ```
-
-3.  **Install the required Python libraries:**
-
-    ```bash
-    pip install -r requirements.txt
-    ```
-
-## 2. Setup
-
-Before running the agent, you need to configure it and ensure the necessary services are running.
+## Setup and Installation

 ### Prerequisites

-   **Ollama:** The agent requires that [Ollama](https://ollama.com/) is installed and running on the server.
-   **LLM Model:** You must have the `llama3.1:8b` model pulled and available in Ollama. You can pull it with the following command:
+-   Python 3.x
+-   `ollama` installed and running with the `llama3.1:8b` model pulled (`ollama pull llama3.1:8b`)
+-   `nmap` installed
+-   `lm-sensors` installed (for CPU/GPU temperature monitoring)
+-   Discord webhook URL
+-   (Optional) Home Assistant instance with a long-lived access token and a Google Home speaker configured.

+### Installation
+
+1.  Clone the repository:
    ```bash
-    ollama pull llama3.1:8b
+    git clone <repository_url>
+    cd LLM-Powered-Monitoring-Agent
    ```
+2.  Install Python dependencies:
+    ```bash
+    pip install -r requirements.txt
+    ```
+3.  Configure the agent:
+    -   Open `config.py` and update the following variables:
+        -   `DISCORD_WEBHOOK_URL`
+        -   `HOME_ASSISTANT_URL` (if using Google Home alerts)
+        -   `HOME_ASSISTANT_TOKEN` (if using Google Home alerts)
+        -   `GOOGLE_HOME_SPEAKER_ID` (if using Google Home alerts)
+        -   `NMAP_TARGETS` (e.g., "192.168.1.0/24" or "192.168.1.100")
+        -   `NMAP_SCAN_OPTIONS` (default is "-sS -T4")
+        -   `DAILY_RECAP_TIME` (e.g., "20:00" for 8 PM)
+        -   `TEST_MODE` (set to `True` for a single run, `False` for continuous operation)

-### Configuration
+## Usage

-All configuration is done in the `config.py` file. You will need to replace the placeholder values with your actual credentials and URLs.
-
-   `DISCORD_WEBHOOK_URL`: Your Discord channel's webhook URL. This is used to send alerts.
-   `HOME_ASSISTANT_URL`: The URL of your Home Assistant instance (e.g., `http://192.168.1.50:8123`).
-   `HOME_ASSISTANT_TOKEN`: A Long-Lived Access Token for your Home Assistant instance. You can generate this in your Home Assistant profile settings.
-   `GOOGLE_HOME_SPEAKER_ID`: The `media_player` entity ID for your Google Home speaker in Home Assistant (e.g., `media_player.kitchen_speaker`).
-
-## 3. Usage
-
-Once the installation and setup are complete, you can run the monitoring agent with the following command:
+To run the monitoring agent:

 ```bash
 python monitor_agent.py
 ```

-The script will start a continuous monitoring loop. Every 5 minutes, it will:
+### Test Mode

-1.  Collect simulated system and network data.
-2.  Send the data to the local LLM for analysis.
-3.  If the LLM detects a **high-severity** anomaly, it will send an alert to your configured Discord channel and broadcast a message to your Google Home speaker via Home Assistant.
-4.  At the time specified in `DAILY_RECAP_TIME`, a summary of all anomalies for the day will be sent to the Discord channel.
+Set `TEST_MODE = True` in `config.py` to run the agent once and exit. This is useful for testing configurations and initial setup.

-The script will print its status and any detected anomalies to the console.
+## Extending and Customizing

-### Nmap Scans
+-   **Adding New Metrics**: You can add new data collection functions in `monitor_agent.py` and include their results in the `combined_data` dictionary.
+-   **Customizing LLM Analysis**: Modify the `CONSTRAINTS.md` file to provide specific instructions or constraints to the LLM for anomaly detection.
+-   **Known Issues**: Update `known_issues.json` with any known or expected system behaviors to prevent the LLM from flagging them as anomalies.
+-   **Alerting Mechanisms**: Implement additional alerting functions (e.g., email, SMS) in `monitor_agent.py` and integrate them into the anomaly detection logic.

-The agent uses `nmap` to scan the network for open ports. By default, it uses a TCP SYN scan (`-sS`), which requires root privileges. If the script is not run as root, it will fall back to a TCP connect scan (`-sT`), which does not require root privileges but is slower and more likely to be detected.
+## Project Structure

-To run the agent with root privileges, use the `sudo` command:
-
-```bash
-sudo python monitor_agent.py
-```
-
-## 4. Features
-
-### Priority System
-
-The monitoring agent uses a priority system to classify anomalies. The LLM is instructed to return a severity level for each anomaly it detects. The possible severity levels are:
-
-   **high**: Indicates a critical issue that requires immediate attention. An alert is sent to Discord and Google Home.
-   **medium**: Indicates a non-critical issue that should be investigated. No alert is sent.
-   **low**: Indicates a minor issue or a potential false positive. No alert is sent.
-   **none**: Indicates that no anomaly was detected.
-
-### Known Issues Feed
-
-The agent uses a `known_issues.json` file to provide the LLM with a list of known issues and their resolutions. This helps the LLM to avoid flagging resolved or expected issues as anomalies.
-
-You can add new issues to the `known_issues.json` file by following the existing format. Each issue should have an "issue" and a "resolution" key. For example:
-
-```json
-[
-    {
-        "issue": "CPU temperature spikes to 80C under heavy load",
-        "resolution": "This is normal behavior for this CPU model and is not a cause for concern."
-    }
-]
-```
-
-**Note on Mock Data:** The current version of the script uses mock data for system logs and network metrics. To use this in a real-world scenario, you would need to replace the mock data with actual data from your systems.
+-   `monitor_agent.py`: Main script for data collection, LLM interaction, and alerting.
+-   `data_storage.py`: Handles loading, storing, and calculating baselines from historical data.
+-   `config.py`: Stores configurable parameters for the agent.
+-   `requirements.txt`: Lists Python dependencies.
+-   `CONSTRAINTS.md`: Defines constraints and guidelines for the LLM's analysis.
+-   `known_issues.json`: A JSON file containing a list of known issues to be considered by the LLM.
+-   `monitoring_data.json`: (Generated) Stores historical monitoring data.
+-   `log_position.txt`: (Generated) Stores the last read position for `/var/log/syslog`.
+-   `auth_log_position.txt`: (Generated) Stores the last read position for `/var/log/auth.log`.