Refactor: Integrate scripts into a single application (v1.2.0)

This commit is contained in:
2025-12-29 16:45:40 -06:00
parent 671741772f
commit 5bd154fb4e
6 changed files with 213 additions and 251 deletions

View File

@@ -1,4 +1,4 @@
# Territory Analysis Tool
# Territory Analysis Tool v1.2.0
## Overview
@@ -6,88 +6,80 @@ This tool provides a complete pipeline for processing and analyzing territory da
The workflow is managed by a command-line script that gives the user fine-grained control over the execution process.
## Installation
This tool requires Python 3 and has a few dependencies.
1. **Install dependencies:**
Navigate to this directory in your terminal and run the following command to install the required Python libraries:
```bash
pip install -r requirements.txt
```
## File Structure
All necessary files are located in this directory.
### Core Scripts
- `run_all.py`: The main command-line script to run the workflow. **This is the recommended entry point.**
- `process_territories.py`: (Step 1) Combines address and boundary data.
- `analysis.py`: (Step 2) Performs general territory analysis and generates `map.html`.
- `category_analysis.py`: (Step 2) Performs category-specific analysis and generates `category_map.html`.
### Input Data Files
- The tool is designed to work with any address and boundary CSV files.
- The example files `Okinawa Territory Jan 2026 - Addresses.csv` and `Okinawa Territory Jan 2026 - Boundaries.csv` are provided.
These two files can be found in NW Scheduler. Go to export -> Territories, and download them both from there.
- `run_all.py`: The main command-line script to run the workflow.
- `process_territories.py`: A module that combines address and boundary data.
- `analysis.py`: A module that performs general territory analysis.
- `category_analysis.py`: A module that performs category-specific analysis.
- `requirements.txt`: A list of Python dependencies.
## Usage
The entire workflow is managed through `run_all.py` using a command-line interface. You can see all available commands by running:
The entire workflow is managed through `run_all.py`. You can see all commands by running:
```bash
python run_all.py --help
```
### Full Pipeline Run
To run the entire process from start to finish (process raw files and then analyze them), use the `full-run` command. This is the most common use case.
To run the entire process from start to finish in memory, use the `full-run` command.
**Command:**
```bash
python run_all.py full-run --addresses <path_to_addresses.csv> --boundaries <path_to_boundaries.csv>
```
**Example:**
```bash
python run_all.py full-run --addresses "Okinawa Territory Jan 2026 - Addresses.csv" --boundaries "Okinawa Territory Jan 2026 - Boundaries.csv"
```
### Running Steps Individually
You can also run each step of the pipeline separately.
#### Step 1: Process Raw Files
To combine the address and boundary files into a single "Final" CSV, use the `process` command.
To combine the address and boundary files and save the result to a CSV, use the `process` command.
**Command:**
```bash
python run_all.py process --addresses <path_to_addresses.csv> --boundaries <path_to_boundaries.csv>
```
This will generate a new file named `Okinawa Territory <Mon Year> - Final.csv`.
#### Step 2: Analyze a Processed File
To run the analysis and generate maps from a "Final" CSV file, use the `analyze` command.
To run analysis from a "Final" CSV file, use the `analyze` command.
**Command:**
```bash
python run_all.py analyze --input <path_to_final_file.csv>
```
**Example:**
## Changelog
```bash
python run_all.py analyze --input "Okinawa Territory Dec 2025 - Final.csv"
```
### v1.2.0 (Current)
- Refactored the tool from a collection of separate scripts into a single, integrated Python application.
- Replaced `subprocess` calls with direct function imports for improved performance and reliability.
- Integrated the `pandas` library for more efficient in-memory data processing.
- The `full-run` command now processes data in memory without writing an intermediate CSV file.
- Added a `requirements.txt` file for easier dependency management.
## Workflow Details
### v1.1.0
- Introduced a command-line interface with `argparse` to replace the interactive menu.
- Added `process`, `analyze`, and `full-run` commands.
- Allowed for dynamic input file paths via command-line arguments.
1. **Data Processing:** The `process_territories.py` script reads the `Addresses.csv` to count addresses per `TerritoryID` and merges this count into the `Boundaries.csv` file. It outputs a new CSV file named in the format `Okinawa Territory Mon Year - Final.csv`.
2. **Data Analysis:** The `analysis.py` and `category_analysis.py` scripts take the `Final.csv` file as input to generate reports and interactive maps.
## Output Files
- `Okinawa Territory <Mon Year> - Final.csv`: The consolidated data file.
- `analysis.md`: A markdown summary of the general territory analysis.
- `map.html`: An interactive map visualizing territories colored by address count.
- `category_map.html`: An interactive map visualizing territories colored by their category's total address count.
### v1.0.0
- Initial release with separate scripts for processing and analysis.
- Workflow managed by an interactive `run_all.py` script.
- Project structure consolidated into a single directory.
- Git repository initialized.