Attempting to remove the LLM out of processing

Trying to help the LLM
2025-08-23 19:03:40 -05:00 · 2025-08-23 16:04:49 -05:00
8 changed files with 188 additions and 67 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,4 @@ __pycache__/
 monitoring_data.json
 log_position.txt
 auth_log_position.txt
-monitoring_agent.log
+monitoring_agent.log*
--- a/PROGRESS.md
+++ b/PROGRESS.md
@@ -61,13 +61,10 @@
 37. [x] Update `README.md` with current project status and improvements.
 38. [x] Create `AGENTS.md` to document human and autonomous agents.

-## Keeping track of Current Objectives
-
-[x] Improve "high" priority detection by explicitly instructing LLM to output severity in structured JSON format.
-[x] Implement dynamic contextual information (Known/Resolved Issues Feed) for LLM to improve severity detection.
-
-## TODO
+## Previous TODO

+- [x] Improve "high" priority detection by explicitly instructing LLM to output severity in structured JSON format.
+- [x] Implement dynamic contextual information (Known/Resolved Issues Feed) for LLM to improve severity detection.
 - [x] Change baseline calculations to only use integers instead of floats.
 - [x] Add a log file that only keeps records for the past 24 hours.
 - [x] Log all LLM responses to the console.
@@ -75,4 +72,25 @@
 - [x] Get hostnames of devices in Nmap scan.
 - [x] Filter out RTT fluctuations below 10 seconds.
 - [x] Filter out temperature fluctuations with differences less than 5 degrees.
- [ ] Create a list of known port numbers and their applications for the LLM to check against to see if an open port is a threat
+- [x] Create a list of known port numbers and their applications for the LLM to check against to see if an open port is a threat
+- [x] When calculating averages, please round up to the nearest integer. We only want to deliver whole integers to the LLM to process, and nothing with decimal points. It gets confused with decimal points.
+- [x] In the discord message, please include the exact specific details and the log of the problem that prompted the alert
+
+## TODO
+
+## Phase 7: Offloading Analysis from LLM
+
+39. [x] Create a new function `analyze_data_locally` in `monitor_agent.py`.
+    39.1. [x] This function will take `data`, `baselines`, `known_issues`, and `port_applications` as input.
+    39.2. [x] It will contain the logic to compare current data with baselines and predefined thresholds.
+    39.3. [x] It will be responsible for identifying anomalies for various metrics (CPU/GPU temp, network RTT, failed logins, Nmap changes).
+    39.4. [x] It will return a list of dictionaries, where each dictionary represents an anomaly and contains 'severity' and 'reason' keys.
+40. [x] Refactor `analyze_data_with_llm` into a new function called `generate_llm_report`.
+    40.1. [x] This function will take the list of anomalies from `analyze_data_locally` as input.
+    40.2. [x] It will construct a simple prompt to ask the LLM to generate a human-readable summary of the anomalies.
+    40.3. [x] The LLM will no longer be making analytical decisions.
+41. [x] Update `run_monitoring_cycle` to orchestrate the new workflow.
+    41.1. [x] Call `analyze_data_locally` to get the list of anomalies.
+    41.2. [x] If anomalies are found, call `generate_llm_report` to create the report.
+    41.3. [x] Use the output of `generate_llm_report` for alerting.
+42. [x] Remove the detailed analytical instructions from `build_llm_prompt` as they will be handled by `analyze_data_locally`.
--- a/config.py
+++ b/config.py
@@ -9,7 +9,7 @@ HOME_ASSISTANT_TOKEN = "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJjOGRmZjI
 GOOGLE_HOME_SPEAKER_ID = "media_player.spencer_room_speaker"

 # Daily Recap Time (in 24-hour format, e.g., "20:00")
-DAILY_RECAP_TIME = "20:00"
+DAILY_RECAP_TIME = "18:28"

 # Nmap Configuration
 NMAP_TARGETS = "192.168.2.0/24"
--- a/data_storage.py
+++ b/data_storage.py
@@ -1,6 +1,7 @@
 import json
 import os
 from datetime import datetime, timedelta, timezone
+import math

 DATA_FILE = 'monitoring_data.json'

@@ -19,7 +20,7 @@ def store_data(new_data):
 def _calculate_average(data, key1, key2):
    """Helper function to calculate the average of a nested key in a list of dicts."""
    values = [d[key1][key2] for d in data if key1 in d and key2 in d[key1] and d[key1][key2] != "N/A"]
-    return int(sum(values) / len(values)) if values else 0
+    return math.ceil(sum(values) / len(values)) if values else 0

 def calculate_baselines():
    data = load_data()
--- a/monitor_agent.py
+++ b/monitor_agent.py
@@ -15,6 +15,8 @@ import nmap
 import logging
 from logging.handlers import TimedRotatingFileHandler

+import schedule
+
 # Load configuration
 import config

@@ -190,67 +192,102 @@ def get_nmap_scan_results():
        logger.error(f"Error performing Nmap scan: {e}")
        return {"error": "Nmap scan failed"}

-# --- LLM Interaction Function ---
+# --- Data Analysis ---

-def build_llm_prompt(data, baselines, nmap_changes, constraints, known_issues):
-    """Builds the prompt for the LLM analysis."""
-    return f"""
-    **Role:** You are a dedicated and expert system administrator. Your primary role is to identify anomalies and provide concise, actionable reports.
+def analyze_data_locally(data, baselines, known_issues, port_applications):
+    """Analyzes the collected data to find anomalies without using an LLM."""
+    anomalies = []

-    **Instruction:** Analyze the following system and network data for any activity that appears out of place or different. Consider unusual values, errors, or unexpected patterns as anomalies. Compare the current data with the historical baseline data to identify significant deviations. Consult the known issues feed to avoid flagging resolved or expected issues. Pay special attention to the Nmap scan results for any new or unexpected open ports. Pay special attention to network RTT fluctuations, but only report them as an anomaly if the fluctuation is greater than 10 seconds. Similarly, only report temperature fluctuations if the difference is greater than 5 degrees.
+    # Temperature checks
+    cpu_temp = data.get("cpu_temperature", {}).get("cpu_temperature")
+    gpu_temp = data.get("gpu_temperature", {}).get("gpu_temperature")
+    baseline_cpu_temp = baselines.get("average_cpu_temperature")
+    baseline_gpu_temp = baselines.get("average_gpu_temperature")

-    **Context:**
-    Here is the system data in JSON format for your analysis: {json.dumps(data, indent=2)}
+    if isinstance(cpu_temp, (int, float)) and isinstance(baseline_cpu_temp, (int, float)):
+        if abs(cpu_temp - baseline_cpu_temp) > 5:
+            anomalies.append({
+                "severity": "medium",
+                "reason": f"CPU temperature deviation detected. Current: {cpu_temp}°C, Baseline: {baseline_cpu_temp}°C"
+            })

-    **Historical Baseline Data:**
-    {json.dumps(baselines, indent=2)}
+    if isinstance(gpu_temp, (int, float)) and isinstance(baseline_gpu_temp, (int, float)):
+        if abs(gpu_temp - baseline_gpu_temp) > 5:
+            anomalies.append({
+                "severity": "medium",
+                "reason": f"GPU temperature deviation detected. Current: {gpu_temp}°C, Baseline: {baseline_gpu_temp}°C"
+            })

-    **Nmap Scan Changes:**
-    {json.dumps(nmap_changes, indent=2)}
+    # Network RTT check
+    current_rtt = data.get("network_metrics", {}).get("rtt_avg")
+    baseline_rtt = baselines.get("average_rtt_avg")

-    **Known Issues Feed:**
-    {json.dumps(known_issues, indent=2)}
+    if isinstance(current_rtt, (int, float)) and isinstance(baseline_rtt, (int, float)):
+        if abs(current_rtt - baseline_rtt) > 10000:
+            anomalies.append({
+                "severity": "high",
+                "reason": f"High network RTT fluctuation detected. Current: {current_rtt}ms, Baseline: {baseline_rtt}ms"
+            })

-    **Constraints and Guidelines:**
-    {constraints}
+    # Failed login attempts check
+    failed_logins = data.get("login_attempts", {}).get("failed_login_attempts")
+    if failed_logins:
+        anomalies.append({
+            "severity": "high",
+            "reason": f"{len(failed_logins)} failed login attempts detected."
+        })

-    **Output Request:** If you find an anomaly, provide a report as a single JSON object with two keys: "severity" and "reason". The "severity" must be one of "high", "medium", "low", or "none". The "reason" must be a natural language explanation of the anomaly. Please include specific values if the anomoly has them. If no anomaly is found, return a single JSON object with "severity" set to "none" and "reason" as an empty string. Do not wrap the JSON in markdown or any other formatting. Only return the JSON, and nothing else.
-
-
-    **Reasoning Hint:** Think step by step to come to your conclusion. This is very important.
-    """
-
-def analyze_data_with_llm(data, baselines):
-    """Analyzes data with the local LLM."""
-    with open("CONSTRAINTS.md", "r") as f:
-        constraints = f.read()
-
-    with open("known_issues.json", "r") as f:
-        known_issues = json.load(f)
-
-    # Compare current nmap results with baseline
-    nmap_changes = {"new_hosts": [], "changed_ports": {}}
+    # Nmap scan changes check
    if "nmap_results" in data and "host_ports" in baselines:
        current_hosts_info = {host['ip']: host for host in data["nmap_results"].get("hosts", [])}
        current_hosts = set(current_hosts_info.keys())
        baseline_hosts = set(baselines["host_ports"].keys())

        # New hosts
-        nmap_changes["new_hosts"] = sorted(list(current_hosts - baseline_hosts))
+        new_hosts = sorted(list(current_hosts - baseline_hosts))
+        for host in new_hosts:
+            anomalies.append({
+                "severity": "high",
+                "reason": f"New host detected on the network: {host}"
+            })

        # Changed ports on existing hosts
        for host_ip in current_hosts.intersection(baseline_hosts):
            current_ports = set(p['port'] for p in current_hosts_info[host_ip].get("open_ports", []))
-
            baseline_ports = set(baselines["host_ports"].get(host_ip, []))

            newly_opened = sorted(list(current_ports - baseline_ports))
-            newly_closed = sorted(list(baseline_ports - current_ports))
+            
+            for port in newly_opened:
+                port_info = port_applications.get(str(port), "Unknown")
+                anomalies.append({
+                    "severity": "medium",
+                    "reason": f"New port opened on {host_ip}: {port} ({port_info})"
+                })

-            if newly_opened or newly_closed:
-                nmap_changes["changed_ports"][host_ip] = {"opened": newly_opened, "closed": newly_closed}
+    return anomalies

-    prompt = build_llm_prompt(data, baselines, nmap_changes, constraints, known_issues)
+# --- LLM Interaction Function ---
+
+def build_llm_prompt(anomalies):
+    """Builds the prompt for the LLM to generate a report from anomalies."""
+    return f"""
+    **Role:** You are a dedicated and expert system administrator. Your primary role is to provide a concise, actionable report based on a list of pre-identified anomalies.
+
+    **Instruction:** Please synthesize the following list of anomalies into a single, human-readable report. The report should be a single JSON object with two keys: "severity" and "reason". The "severity" should be the highest severity from the list of anomalies. The "reason" should be a summary of all the anomalies.
+
+    **Anomalies:**
+    {json.dumps(anomalies, indent=2)}
+
+    **Output Request:** Provide a report as a single JSON object with two keys: "severity" and "reason". The "severity" must be one of "high", "medium", "low", or "none". The "reason" must be a natural language explanation of the anomaly. If no anomaly is found, return a single JSON object with "severity" set to "none" and "reason" as an empty string. Do not wrap the JSON in markdown or any other formatting. Only return the JSON, and nothing else.
+    """
+
+def generate_llm_report(anomalies):
+    """Generates a report from a list of anomalies using the local LLM."""
+    if not anomalies:
+        return {"severity": "none", "reason": ""}
+
+    prompt = build_llm_prompt(anomalies)

    try:
        response = ollama.generate(model="llama3.1:8b", prompt=prompt)
@@ -282,8 +319,10 @@ def analyze_data_with_llm(data, baselines):

 # --- Alerting Functions ---

-def send_discord_alert(message):
+def send_discord_alert(llm_response, combined_data):
    """Sends an alert to Discord."""
+    reason = llm_response.get('reason', 'No reason provided.')
+    message = f"**High Severity Alert:**\n> {reason}\n\n**Relevant Data:**\n```json\n{json.dumps(combined_data, indent=2)}\n```"
    webhook = DiscordWebhook(url=config.DISCORD_WEBHOOK_URL, content=message)
    try:
        response = webhook.execute()
@@ -332,7 +371,25 @@ def is_alerting_time():

 daily_events = []

+
+def send_daily_recap():
+    """Sends a daily recap of events to Discord."""
+    global daily_events
+    if daily_events:
+        recap_message = "\n".join(daily_events)
+        webhook = DiscordWebhook(url=config.DISCORD_WEBHOOK_URL, content=f"**Daily Recap:**\n{recap_message}")
+        try:
+            response = webhook.execute()
+            if response.status_code == 200:
+                logger.info("Daily recap sent successfully.")
+            else:
+                logger.error(f"Error sending daily recap: {response.status_code} - {response.content}")
+        except Exception as e:
+            logger.error(f"Error sending daily recap: {e}")
+        daily_events = [] # Reset for the next day
+
 def run_monitoring_cycle(nmap_scan_counter):
+
    """Runs a single monitoring cycle."""
    logger.info("Running monitoring cycle...")
    system_logs = get_system_logs()
@@ -363,13 +420,22 @@ def run_monitoring_cycle(nmap_scan_counter):

        data_storage.store_data(combined_data)

-        llm_response = analyze_data_with_llm(combined_data, data_storage.calculate_baselines())
+        with open("known_issues.json", "r") as f:
+            known_issues = json.load(f)

-        if llm_response and llm_response.get('severity') != "none":
-            daily_events.append(llm_response.get('reason'))
-            if llm_response.get('severity') == "high" and is_alerting_time():
-                send_discord_alert(llm_response.get('reason'))
-                send_google_home_alert(llm_response.get('reason'))
+        with open("port_applications.json", "r") as f:
+            port_applications = json.load(f)
+
+        baselines = data_storage.calculate_baselines()
+        anomalies = analyze_data_locally(combined_data, baselines, known_issues, port_applications)
+
+        if anomalies:
+            llm_response = generate_llm_report(anomalies)
+            if llm_response and llm_response.get('severity') != "none":
+                daily_events.append(llm_response.get('reason'))
+                if llm_response.get('severity') == "high" and is_alerting_time():
+                    send_discord_alert(llm_response, combined_data)
+                    send_google_home_alert(llm_response.get('reason'))
    return nmap_scan_counter

 def main():
@@ -378,18 +444,12 @@ def main():
        logger.info("Running in test mode...")
        run_monitoring_cycle(0)
    else:
+        schedule.every().day.at(config.DAILY_RECAP_TIME).do(send_daily_recap)
        nmap_scan_counter = 0
        while True:
            nmap_scan_counter = run_monitoring_cycle(nmap_scan_counter)
-
-            # Daily Recap Logic
-            current_time = time.strftime("%H:%M")
-            if current_time == config.DAILY_RECAP_TIME and daily_events: # type: ignore
-                recap_message = "\n".join(daily_events)
-                send_discord_alert(f"**Daily Recap:**\n{recap_message}")
-                daily_events = [] # Reset for the next day
-
+            schedule.run_pending()
            time.sleep(300) # Run every 5 minutes

 if __name__ == "__main__":
-    main()
+    main()
--- a/port_applications.json
+++ b/port_applications.json
@@ -0,0 +1,19 @@
+
+{
+  "20": "FTP",
+  "21": "FTP",
+  "22": "SSH",
+  "23": "Telnet",
+  "25": "SMTP",
+  "53": "DNS",
+  "80": "HTTP",
+  "110": "POP3",
+  "143": "IMAP",
+  "443": "HTTPS",
+  "445": "SMB",
+  "587": "SMTP",
+  "993": "IMAPS",
+  "995": "POP3S",
+  "3306": "MySQL",
+  "3389": "RDP"
+}
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,6 +1,7 @@
-discord-webhook
+pingparsing
 requests
+discord-webhook
 ollama
 syslog-rfc5424-parser
-pingparsing
-python-nmap
+python-nmap
+schedule
--- a/test_output.log
+++ b/test_output.log
@@ -0,0 +1,22 @@
+Traceback (most recent call last):
+  File "/home/artanis/Documents/LLM-Powered-Monitoring-Agent/monitor_agent.py", line 31, in <module>
+    file_handler = TimedRotatingFileHandler(LOG_FILE, when="midnight", interval=1, backupCount=1)
+  File "/home/artanis/.pyenv/versions/3.13.1/lib/python3.13/logging/handlers.py", line 223, in __init__
+    BaseRotatingHandler.__init__(self, filename, 'a', encoding=encoding,
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+                                 delay=delay, errors=errors)
+                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/artanis/.pyenv/versions/3.13.1/lib/python3.13/logging/handlers.py", line 64, in __init__
+    logging.FileHandler.__init__(self, filename, mode=mode,
+    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
+                                 encoding=encoding, delay=delay,
+                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+                                 errors=errors)
+                                 ^^^^^^^^^^^^^^
+  File "/home/artanis/.pyenv/versions/3.13.1/lib/python3.13/logging/__init__.py", line 1218, in __init__
+    StreamHandler.__init__(self, self._open())
+                                 ~~~~~~~~~~^^
+  File "/home/artanis/.pyenv/versions/3.13.1/lib/python3.13/logging/__init__.py", line 1247, in _open
+    return open_func(self.baseFilename, self.mode,
+                     encoding=self.encoding, errors=self.errors)
+PermissionError: [Errno 13] Permission denied: '/home/artanis/Documents/LLM-Powered-Monitoring-Agent/monitoring_agent.log'
Author	SHA1	Message	Date
Spencer	6f7e99639c	Attempting to remove the LLM out of processing	2025-08-23 19:03:40 -05:00
Spencer	bebedb1e15	Trying to help the LLM	2025-08-23 16:04:49 -05:00