Nmap... working?

Final Commit before NMAP Integration
For real this time
2025-08-20 12:51:11 -05:00 · 2025-08-20 12:26:46 -05:00 · 2025-08-20 12:20:59 -05:00 · 2025-08-20 12:20:54 -05:00 · 2025-08-20 12:06:11 -05:00 · 2025-08-19 15:57:31 -05:00
10 changed files with 282 additions and 55 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,4 @@
+__pycache__/*
 __pycache__/
 monitoring_data.json
-__pycache__/config.cpython-313.pyc
+log_position.txt
--- a/CONSTRAINTS.md
+++ b/CONSTRAINTS.md
@@ -1,8 +1,13 @@
 ## LLM Constraints and Guidelines
+- Please do not report on anything that is older then 24 hours.
+- The server uses a custom DNS server at 192.168.2.112.

+### Important Things to Focus On:
+- Security-related events such as failed login attempts, unauthorized access, or unusual network connections.
+- Events indicating loss of connectivity or unreachable hosts.
+- Unexpected network additions or unusual traffic patterns.
+
+### Less Important Things:
 - Do not flag minor fluctuations in network Round Trip Time (RTT) as anomalies. These are considered normal network variance.
- Prioritize security-related events such as failed login attempts, unauthorized access, or unusual network connections.
- Focus on events indicating loss of connectivity or unreachable hosts.
- Highlight any unexpected network additions or unusual traffic patterns.
 - The DNS server 8.8.8.8 is Google's public DNS server and is a legitimate destination. Do not flag requests to 8.8.8.8 as anomalous.
- Action has been taken against IP addresses 45.88.8.215, 45.88.8.186, 120.48.49.12, and 23.137.255.140. These are completley banned and cannot access the system at all.
+
--- a/PROGRESS.md
+++ b/PROGRESS.md
@@ -31,4 +31,34 @@

 19. [x] Investigated and resolved issue with `jc` library
 20. [x] Removed `jc` library as a dependency
-21. [x] Implemented manual parsing of `sensors` command output
+21. [x] Implemented manual parsing of `sensors` command output
+
+## Tasks Already Done
+
+[x] Ensure we aren't using mockdata for get_system_logs() and get_network_metrics()
+[x] Improve `get_system_logs()` to read new lines since last check
+[x] Improve `get_network_metrics()` by using a library like `pingparsing`
+[x] Ensure we are including CONSTRAINTS.md in our analyze_data_with_llm() function
+[x] Summarize entire report into a single sentence to said to Home Assistant
+[x] Figure out why Home Assitant isn't using the speaker
+
+## Keeping track of Current Objectives
+
+[x] Improve "high" priority detection by explicitly instructing LLM to output severity in structured JSON format.
+[x] Implement dynamic contextual information (Known/Resolved Issues Feed) for LLM to improve severity detection.
+
+## Network Scanning (Nmap Integration)
+
+1.  [x] Add `python-nmap` to `requirements.txt` and install.
+2.  [x] Define `NMAP_TARGETS` and `NMAP_SCAN_OPTIONS` in `config.py`.
+3.  [x] Create a new function `get_nmap_scan_results()` in `monitor_agent.py`:
+    *   [x] Use `python-nmap` to perform a scan on the defined targets with the specified options.
+    *   [x] Return the parsed results.
+4.  [x] Integrate `get_nmap_scan_results()` into the main monitoring loop:
+    *   [x] Call this function periodically (e.g., less frequently than other metrics).
+    *   [x] Add the `nmap` results to the `combined_data` dictionary.
+5.  [x] Update `data_storage.py` to store `nmap` results.
+6.  [x] Extend `calculate_baselines()` in `data_storage.py` to include `nmap` baselines:
+    *   [x] Compare current `nmap` results with historical data to identify changes.
+7.  [x] Modify `analyze_data_with_llm()` prompt to include `nmap` scan results for analysis.
+8.  [x] Consider how to handle `nmap` permissions.
--- a/README.md
+++ b/README.md
@@ -65,4 +65,40 @@ The script will start a continuous monitoring loop. Every 5 minutes, it will:

 The script will print its status and any detected anomalies to the console.

+### Nmap Scans
+
+The agent uses `nmap` to scan the network for open ports. By default, it uses a TCP SYN scan (`-sS`), which requires root privileges. If the script is not run as root, it will fall back to a TCP connect scan (`-sT`), which does not require root privileges but is slower and more likely to be detected.
+
+To run the agent with root privileges, use the `sudo` command:
+
+```bash
+sudo python monitor_agent.py
+```
+
+## 4. Features
+
+### Priority System
+
+The monitoring agent uses a priority system to classify anomalies. The LLM is instructed to return a severity level for each anomaly it detects. The possible severity levels are:
+
+-   **high**: Indicates a critical issue that requires immediate attention. An alert is sent to Discord and Google Home.
+-   **medium**: Indicates a non-critical issue that should be investigated. No alert is sent.
+-   **low**: Indicates a minor issue or a potential false positive. No alert is sent.
+-   **none**: Indicates that no anomaly was detected.
+
+### Known Issues Feed
+
+The agent uses a `known_issues.json` file to provide the LLM with a list of known issues and their resolutions. This helps the LLM to avoid flagging resolved or expected issues as anomalies.
+
+You can add new issues to the `known_issues.json` file by following the existing format. Each issue should have an "issue" and a "resolution" key. For example:
+
+```json
+[
+    {
+        "issue": "CPU temperature spikes to 80C under heavy load",
+        "resolution": "This is normal behavior for this CPU model and is not a cause for concern."
+    }
+]
+```
+
 **Note on Mock Data:** The current version of the script uses mock data for system logs and network metrics. To use this in a real-world scenario, you would need to replace the mock data with actual data from your systems.
--- a/pycache/config.cpython-313.pyc
+++ b/pycache/config.cpython-313.pyc
--- a/config.py
+++ b/config.py
@@ -11,5 +11,9 @@ GOOGLE_HOME_SPEAKER_ID = "media_player.spencer_room_speaker"
 # Daily Recap Time (in 24-hour format, e.g., "20:00")
 DAILY_RECAP_TIME = "20:00"

+# Nmap Configuration
+NMAP_TARGETS = "192.168.1.0/24"
+NMAP_SCAN_OPTIONS = "-sS -T4"
+
 # Test Mode (True to run once and exit, False to run continuously)
 TEST_MODE = False
--- a/data_storage.py
+++ b/data_storage.py
@@ -1,6 +1,6 @@
 import json
 import os
-from datetime import datetime, timedelta
+from datetime import datetime, timedelta, timezone

 DATA_FILE = 'monitoring_data.json'

@@ -23,16 +23,34 @@ def calculate_baselines():

    # For simplicity, we'll average the last 24 hours of data
    # More complex logic can be added here
-    recent_data = [d for d in data if datetime.fromisoformat(d['system_logs']['timestamp'].replace('Z', '')) > datetime.now() - timedelta(hours=24)]
+    recent_data = [d for d in data if 'timestamp' in d and datetime.fromisoformat(d['timestamp'].replace('Z', '')).replace(tzinfo=timezone.utc) > datetime.now(timezone.utc) - timedelta(hours=24)]

    if not recent_data:
        return {}

    baseline_metrics = {
-        'avg_rtt': sum(d['network_metrics']['round_trip_ms_avg'] for d in recent_data) / len(recent_data),
-        'packet_loss': sum(d['network_metrics']['packet_loss_percent'] for d in recent_data) / len(recent_data),
-        'avg_cpu_temp': sum(d['cpu_temperature']['cpu_temperature'] for d in recent_data) / len(recent_data),
-        'avg_gpu_temp': sum(d['gpu_temperature']['gpu_temperature'] for d in recent_data) / len(recent_data),
+        'avg_rtt': sum(d['network_metrics']['rtt_avg'] for d in recent_data if 'rtt_avg' in d['network_metrics']) / len(recent_data),
+        'packet_loss': sum(d['network_metrics']['packet_loss_rate'] for d in recent_data if 'packet_loss_rate' in d['network_metrics']) / len(recent_data),
+        'avg_cpu_temp': sum(d['cpu_temperature']['cpu_temperature'] for d in recent_data if d['cpu_temperature']['cpu_temperature'] != "N/A") / len(recent_data),
+        'avg_gpu_temp': sum(d['gpu_temperature']['gpu_temperature'] for d in recent_data if d['gpu_temperature']['gpu_temperature'] != "N/A") / len(recent_data),
    }

+    # Baseline for open ports from nmap scans
+    host_ports = {}
+    for d in recent_data:
+        if 'nmap_results' in d and 'scan' in d['nmap_results']:
+            for host, scan_data in d['nmap_results']['scan'].items():
+                if host not in host_ports:
+                    host_ports[host] = set()
+                if 'tcp' in scan_data:
+                    for port, port_data in scan_data['tcp'].items():
+                        if port_data['state'] == 'open':
+                            host_ports[host].add(port)
+    
+    # Convert sets to sorted lists for JSON serialization
+    for host, ports in host_ports.items():
+        host_ports[host] = sorted(list(ports))
+
+    baseline_metrics['host_ports'] = host_ports
+
    return baseline_metrics
--- a/known_issues.json
+++ b/known_issues.json
@@ -0,0 +1,10 @@
+[
+    {
+        "issue": "CPU temperature spikes to 90C under heavy load",
+        "resolution": "This is normal behavior for this CPU model and is not a cause for concern."
+    },
+    {
+        "issue": "Access attempts from unknown IP Addresses",
+        "resolution": "ufw has been enabled, and blocks all default connections by default. The only IP Addresses allowed are 192.168.2.0/24 and 100.64.0.0/10"
+    }
+]
--- a/monitor_agent.py
+++ b/monitor_agent.py
@@ -7,42 +7,67 @@ import ollama
 from discord_webhook import DiscordWebhook
 import requests
 import data_storage
+import re
+import os
+from datetime import datetime, timezone
+import pingparsing
+import nmap

 # Load configuration
 import config

+from syslog_rfc5424_parser import parser
+
+LOG_POSITION_FILE = 'log_position.txt'
+
 # --- Data Ingestion & Parsing Functions ---

 def get_system_logs():
-    """Simulates collecting and parsing system logs."""
-    # Mock log entry for demonstration
-    mock_log_entry = '{"timestamp": "2025-08-15T12:00:00Z", "log": "Failed login attempt for user \'root\' from 10.0.0.1"}'
+    """Gets new lines from /var/log/syslog since the last check."""
    try:
-        parsed_log = json.loads(mock_log_entry)
-        return parsed_log
-    except json.JSONDecodeError as e:
-        print(f"Error parsing system log: {e}")
-        return None
+        last_position = 0
+        if os.path.exists(LOG_POSITION_FILE):
+            with open(LOG_POSITION_FILE, 'r') as f:
+                last_position = int(f.read())
+
+        with open("/var/log/syslog", "r") as f:
+            f.seek(last_position)
+            log_lines = f.readlines()
+            current_position = f.tell()
+
+        with open(LOG_POSITION_FILE, 'w') as f:
+            f.write(str(current_position))
+
+        parsed_logs = []
+        for line in log_lines:
+            try:
+                parsed_logs.append(parser.parse(line).as_dict()) # type: ignore
+            except Exception:
+                # If parsing fails, just append the raw line
+                parsed_logs.append({"raw_log": line.strip()})
+
+        return {"syslog": parsed_logs}
+    except FileNotFoundError:
+        print("Error: /var/log/syslog not found.")
+        return {"syslog": []}
+    except Exception as e:
+        print(f"Error reading syslog: {e}")
+        return {"syslog": []}
+
+import pingparsing

 def get_network_metrics():
-    """Simulates collecting and parsing network data."""
-    # Mock ping output for demonstration
-    mock_ping_output = '''{"destination_ip":"8.8.8.8","data_bytes":56,"pattern":null,"destination":"8.8.8.8","duplicates":0,"packets_transmitted":3,"packets_received":3,"packet_loss_percent":0.0,"time_ms":2003.0,"round_trip_ms_min":18.79,"round_trip_ms_avg":21.212,"round_trip_ms_max":22.787,"round_trip_ms_stddev":1.738,"responses":[{"type":"reply","timestamp":null,"bytes":64,"response_ip":"8.8.8.8","icmp_seq":1,"ttl":111,"time_ms":18.8,"duplicate":false},{"type":"reply","timestamp":null,"bytes":64,"response_ip":"8.8.8.8","icmp_seq":2,"ttl":111,"time_ms":22.8,"duplicate":false},{"type":"reply","timestamp":null,"bytes":64,"response_ip":"8.8.8.8","icmp_seq":3,"ttl":111,"time_ms":22.1,"duplicate":false}]}'''
+    """Gets network metrics by pinging 8.8.8.8."""
    try:
-        parsed_ping = json.loads(mock_ping_output)
-        if parsed_ping:
-            return {
-                "packets_transmitted": parsed_ping.get("packets_transmitted"),
-                "packets_received": parsed_ping.get("packets_received"),
-                "packet_loss_percent": parsed_ping.get("packet_loss_percent"),
-                "round_trip_ms_avg": parsed_ping.get("round_trip_ms_avg"),
-            }
-        return None
-    except json.JSONDecodeError as e:
-        print(f"Error parsing network metrics: {e}")
-        return None
-
-import re
+        ping_parser = pingparsing.PingParsing()
+        transmitter = pingparsing.PingTransmitter()
+        transmitter.destination = "8.8.8.8"
+        transmitter.count = 3
+        result = transmitter.ping()
+        return ping_parser.parse(result).as_dict()
+    except Exception as e:
+        print(f"Error getting network metrics: {e}")
+        return {"error": "ping command failed"}

 def get_cpu_temperature():
    """Gets the CPU temperature using the sensors command."""
@@ -96,14 +121,60 @@ def get_login_attempts():
        print(f"Error reading login attempts: {e}")
        return {"failed_logins": []}

+def get_nmap_scan_results():
+    """Performs an Nmap scan and returns the results."""
+    try:
+        nm = nmap.PortScanner()
+        scan_options = config.NMAP_SCAN_OPTIONS
+        if os.geteuid() != 0 and "-sS" in scan_options:
+            print("Warning: Nmap -sS scan requires root privileges. Falling back to -sT.")
+            scan_options = scan_options.replace("-sS", "-sT")
+
+        scan_results = nm.scan(hosts=config.NMAP_TARGETS, arguments=scan_options)
+        return scan_results
+    except Exception as e:
+        print(f"Error performing Nmap scan: {e}")
+        return {"error": "Nmap scan failed"}
+
 # --- LLM Interaction Function ---

 def analyze_data_with_llm(data, baselines):
    """Analyzes data with the local LLM."""
+    with open("CONSTRAINTS.md", "r") as f:
+        constraints = f.read()
+
+    with open("known_issues.json", "r") as f:
+        known_issues = json.load(f)
+
+    # Compare current nmap results with baseline
+    nmap_changes = {"new_hosts": [], "changed_ports": {}}
+    if "nmap_results" in data and "host_ports" in baselines:
+        current_hosts = set(data["nmap_results"].get("scan", {}).keys())
+        baseline_hosts = set(baselines["host_ports"].keys())
+
+        # New hosts
+        nmap_changes["new_hosts"] = sorted(list(current_hosts - baseline_hosts))
+
+        # Changed ports on existing hosts
+        for host in current_hosts.intersection(baseline_hosts):
+            current_ports = set()
+            if "tcp" in data["nmap_results"]["scan"][host]:
+                for port, port_data in data["nmap_results"]["scan"][host]["tcp"].items():
+                    if port_data["state"] == "open":
+                        current_ports.add(port)
+            
+            baseline_ports = set(baselines["host_ports"].get(host, []))
+
+            newly_opened = sorted(list(current_ports - baseline_ports))
+            newly_closed = sorted(list(baseline_ports - current_ports))
+
+            if newly_opened or newly_closed:
+                nmap_changes["changed_ports"][host] = {"opened": newly_opened, "closed": newly_closed}
+
    prompt = f"""
    **Role:** You are a dedicated and expert system administrator. Your primary role is to identify anomalies and provide concise, actionable reports.

-    **Instruction:** Analyze the following system and network data for any activity that appears out of place or different. Consider unusual values, errors, or unexpected patterns as anomalies. Compare the current data with the historical baseline data to identify significant deviations.
+    **Instruction:** Analyze the following system and network data for any activity that appears out of place or different. Consider unusual values, errors, or unexpected patterns as anomalies. Compare the current data with the historical baseline data to identify significant deviations. Consult the known issues feed to avoid flagging resolved or expected issues. Pay special attention to the Nmap scan results for any new or unexpected open ports.

    **Context:**
    Here is the system data in JSON format for your analysis: {json.dumps(data, indent=2)}
@@ -111,13 +182,44 @@ def analyze_data_with_llm(data, baselines):
    **Historical Baseline Data:**
    {json.dumps(baselines, indent=2)}

-    **Output Request:** If you find an anomaly, provide a report as a single, coherent, natural language paragraph. The report must clearly state the anomaly, its potential cause, and its severity (e.g., high, medium, low). If no anomaly is found, respond with "OK".
+    **Nmap Scan Changes:**
+    {json.dumps(nmap_changes, indent=2)}
+
+    **Known Issues Feed:**
+    {json.dumps(known_issues, indent=2)}
+
+    **Constraints and Guidelines:**
+    {constraints}
+
+    **Output Request:** If you find an anomaly, provide a report as a single JSON object with two keys: "severity" and "reason". The "severity" must be one of "high", "medium", "low", or "none". The "reason" must be a natural language explanation of the anomaly. If no anomaly is found, return a single JSON object with "severity" set to "none" and "reason" as an empty string. Do not wrap the JSON in markdown or any other formatting.

    **Reasoning Hint:** Think step by step to come to your conclusion. This is very important.
    """
    try:
        response = ollama.generate(model="llama3.1:8b", prompt=prompt)
-        return response['response'].strip()
+        # Sanitize the response to ensure it's valid JSON
+        sanitized_response = response['response'].strip()
+        # Find the first '{' and the last '}' to extract the JSON object
+        start_index = sanitized_response.find('{')
+        end_index = sanitized_response.rfind('}')
+        if start_index != -1 and end_index != -1:
+            json_string = sanitized_response[start_index:end_index+1]
+            try:
+                return json.loads(json_string)
+            except json.JSONDecodeError:
+                # If parsing a single object fails, try parsing as a list
+                try:
+                    json_list = json.loads(json_string)
+                    if isinstance(json_list, list) and json_list:
+                        return json_list[0] # Return the first object in the list
+                except json.JSONDecodeError as e:
+                    print(f"Error decoding LLM response: {e}")
+                    # Fallback for invalid JSON
+                    return {{"severity": "low", "reason": response['response'].strip()}} # type: ignore
+        else:
+            # Handle cases where the response is not valid JSON
+            print(f"LLM returned a non-JSON response: {sanitized_response}")
+            return {{"severity": "low", "reason": sanitized_response}} # type: ignore
    except Exception as e:
        print(f"Error interacting with LLM: {e}")
        return None
@@ -140,7 +242,12 @@ def send_discord_alert(message):
 def send_google_home_alert(message):
    """Sends an alert to a Google Home speaker via Home Assistant."""
    # Simplify the message for better TTS delivery
-    simplified_message = message.split('.')[0] # Take the first sentence
+    try:
+        response = ollama.generate(model="llama3.1:8b", prompt=f"Summarize the following message in a single sentence: {message}")
+        simplified_message = response['response'].strip()
+    except Exception as e:
+        print(f"Error summarizing message: {e}")
+        simplified_message = message.split('.')[0] # Take the first sentence as a fallback

    url = f"{config.HOME_ASSISTANT_URL}/api/services/tts/speak"
    headers = {
@@ -148,7 +255,7 @@ def send_google_home_alert(message):
        "Content-Type": "application/json",
    }
    data = {
-        "entity_id": "tts.google_en_com",
+        "entity_id": "all",
        "media_player_entity_id": config.GOOGLE_HOME_SPEAKER_ID,
        "message": simplified_message,
    }
@@ -173,27 +280,31 @@ if __name__ == "__main__":
        cpu_temp = get_cpu_temperature()
        gpu_temp = get_gpu_temperature()
        login_attempts = get_login_attempts()
+        nmap_results = get_nmap_scan_results()

        if system_logs and network_metrics:
            combined_data = {
+                "timestamp": datetime.now(timezone.utc).isoformat(),
                "system_logs": system_logs,
                "network_metrics": network_metrics,
                "cpu_temperature": cpu_temp,
                "gpu_temperature": gpu_temp,
-                "login_attempts": login_attempts
+                "login_attempts": login_attempts,
+                "nmap_results": nmap_results
            }
            data_storage.store_data(combined_data)

            llm_response = analyze_data_with_llm(combined_data, data_storage.calculate_baselines())

-            if llm_response and llm_response != "OK":
-                print(f"Anomaly detected: {llm_response}")
-                if "high" in llm_response.lower():
-                    send_discord_alert(llm_response)
-                    send_google_home_alert(llm_response)
+            if llm_response and llm_response.get('severity') != "none":
+                print(f"Anomaly detected: {llm_response.get('reason')}")
+                if llm_response.get('severity') == "high":
+                    send_discord_alert(llm_response.get('reason'))
+                    send_google_home_alert(llm_response.get('reason'))
            else:
                print("No anomaly detected.")
    else:
+        nmap_scan_counter = 0
        while True:
            print("Running monitoring cycle...")
            system_logs = get_system_logs()
@@ -202,8 +313,15 @@ if __name__ == "__main__":
            gpu_temp = get_gpu_temperature()
            login_attempts = get_login_attempts()

+            nmap_results = None
+            if nmap_scan_counter == 0:
+                nmap_results = get_nmap_scan_results()
+            
+            nmap_scan_counter = (nmap_scan_counter + 1) % 4 # Run nmap scan every 4th cycle (20 minutes)
+
            if system_logs and network_metrics:
                combined_data = {
+                    "timestamp": datetime.now(timezone.utc).isoformat(),
                    "system_logs": system_logs,
                    "network_metrics": network_metrics,
                    "cpu_temperature": cpu_temp,
@@ -211,15 +329,18 @@ if __name__ == "__main__":
                    "login_attempts": login_attempts
                }

+                if nmap_results:
+                    combined_data["nmap_results"] = nmap_results
+
                data_storage.store_data(combined_data)

                llm_response = analyze_data_with_llm(combined_data, data_storage.calculate_baselines())

-                if llm_response and llm_response != "OK":
-                    daily_events.append(llm_response)
-                    if "high" in llm_response.lower():
-                        send_discord_alert(llm_response)
-                        send_google_home_alert(llm_response)
+                if llm_response and llm_response.get('severity') != "none":
+                    daily_events.append(llm_response.get('reason'))
+                    if llm_response.get('severity') == "high":
+                        send_discord_alert(llm_response.get('reason'))
+                        send_google_home_alert(llm_response.get('reason'))

            # Daily Recap Logic
            current_time = time.strftime("%H:%M")
@@ -231,3 +352,4 @@ if __name__ == "__main__":
            time.sleep(300) # Run every 5 minutes


+
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,5 +1,6 @@
-ollama
 discord-webhook
 requests
+ollama
 syslog-rfc5424-parser
-apachelogs
+pingparsing
+python-nmap
Author	SHA1	Message	Date
Spencer	dd673829d2	Nmap... working?	2025-08-20 12:51:11 -05:00
Spencer	f6cbe1da8f	Final Commit before NMAP Integration	2025-08-20 12:26:46 -05:00
Spencer	369cb0b155	For real this time	2025-08-20 12:20:59 -05:00
Spencer	77b55ccb1f	Updated Known issues & Constraints	2025-08-20 12:20:54 -05:00
Spencer	0169483738	Converted Responses to JSON, improved severity detection, and built a known issues feed	2025-08-20 12:06:11 -05:00
Spencer	7eaff1d08c	Added NMAP planning and updated constraints	2025-08-19 15:57:31 -05:00
Spencer	b5e6ae3918	Somehow, working perfectly	2025-08-19 15:37:16 -05:00
Spencer	692e372ef3	Working with Live Data	2025-08-19 15:11:48 -05:00
Spencer	f15cd12fbf	Updated Future Plans	2025-08-18 15:07:32 -05:00