v1.0
This commit is contained in:
1
.gitignore
vendored
Normal file
1
.gitignore
vendored
Normal file
@@ -0,0 +1 @@
|
|||||||
|
Addresses_with_Territory.csv
|
||||||
23320
Addrsses.csv
Normal file
23320
Addrsses.csv
Normal file
File diff suppressed because it is too large
Load Diff
76
README.md
Normal file
76
README.md
Normal file
@@ -0,0 +1,76 @@
|
|||||||
|
# Territory Address Combiner
|
||||||
|
|
||||||
|
This script assigns a Territory ID to a list of addresses by determining which territory's geographical boundary each address's coordinates fall within.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
The script processes two input files:
|
||||||
|
|
||||||
|
1. `TerritoryExport.csv`: Contains territory information, including a `TerritoryID` and a `Boundary` polygon defined by a series of latitude and longitude points.
|
||||||
|
2. `Addresses.csv`: Contains address information, including latitude and longitude coordinates, but with an empty `TerritoryID` column.
|
||||||
|
|
||||||
|
The script reads both files, and for each address, it performs a point-in-polygon test to find the containing territory. It then populates the `TerritoryID` in the address data and saves the result to a new CSV file.
|
||||||
|
|
||||||
|
## Technical Breakdown
|
||||||
|
|
||||||
|
The script operates in the following sequence:
|
||||||
|
|
||||||
|
1. **Logging Setup**: Configures a logger to output informational messages to both the console and a file named `run.log`.
|
||||||
|
2. **Load Territory Data**: Reads the `TerritoryExport.csv` file into a pandas DataFrame.
|
||||||
|
3. **Parse Boundaries**: The `parse_boundary_to_polygon` function is applied to the 'Boundary' column. This function uses `ast.literal_eval` to safely parse the string representation of a list of coordinates into a Python list, and then `shapely.geometry.Polygon` to create a Polygon object from those coordinates.
|
||||||
|
4. **Load Address Data**: Reads the `Addrsses.csv` file into a pandas DataFrame.
|
||||||
|
5. **Process Addresses**: The script iterates through each row (address) in the addresses DataFrame:
|
||||||
|
- A `shapely.geometry.Point` object is created from the address's 'Latitude' and 'Longitude'.
|
||||||
|
- It then iterates through the territories. For each territory, it uses the `polygon.contains(point)` method to check if the address point is within the territory's boundary.
|
||||||
|
- If a containing territory is found, its `TerritoryID` is stored, and the inner loop is broken.
|
||||||
|
- If no containing territory is found after checking all territories, the `TerritoryID` is set to the string "OUTSIDE_TERRITORY".
|
||||||
|
6. **Update Address Data**: The script replaces the value in the first column of the original address row with the found `TerritoryID`.
|
||||||
|
7. **Save Results**: The updated address data is collected into a new DataFrame and saved to `Addresses_with_Territory.csv`.
|
||||||
|
|
||||||
|
### Input File Specifications
|
||||||
|
|
||||||
|
#### `TerritoryExport.csv`
|
||||||
|
|
||||||
|
This can be generated by exporting existing territories from NW Scheduler
|
||||||
|
|
||||||
|
This file must contain at least the following two columns:
|
||||||
|
|
||||||
|
- `TerritoryID`: A unique identifier for the territory.
|
||||||
|
- `Boundary`: A string representation of a list of coordinate tuples that form the polygon for the territory. Example: `"[(-85.6, 30.2), (-85.5, 30.2), (-85.5, 30.1), (-85.6, 30.1), (-85.6, 30.2)]"`
|
||||||
|
|
||||||
|
#### `Addrsses.csv`
|
||||||
|
|
||||||
|
We found ours at https://openaddresses.io/. Some file processing may be needed to get it to the point required below.
|
||||||
|
|
||||||
|
This file must contain at least the following two columns:
|
||||||
|
|
||||||
|
- `Latitude`: The latitude of the address.
|
||||||
|
- `Longitude`: The longitude of the address.
|
||||||
|
|
||||||
|
The first column of this file will be overwritten with the `TerritoryID` in the output file.
|
||||||
|
|
||||||
|
### Output File: `Addresses_with_Territory.csv`
|
||||||
|
|
||||||
|
The output file will have the same structure as `Addrsses.csv`, but with the first column populated with the `TerritoryID` of the containing territory, or "OUTSIDE_TERRITORY" if the address is not within any territory.
|
||||||
|
|
||||||
|
## Recommended Tech Stack
|
||||||
|
|
||||||
|
- **Language**: **Python**
|
||||||
|
- **Libraries**:
|
||||||
|
- **pandas**: For efficient reading, manipulation, and writing of CSV data.
|
||||||
|
- **shapely**: For robust and accurate geometric operations, specifically for parsing the boundary polygons and performing the point-in-polygon tests.
|
||||||
|
|
||||||
|
This stack is recommended because Python's data analysis and scientific computing ecosystem is ideal for this type of data-centric, geospatial task. It will lead to a simpler, more reliable, and more performant solution.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
1. **Install Dependencies:**
|
||||||
|
|
||||||
|
```sh
|
||||||
|
pip install pandas shapely
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Run the script:**
|
||||||
|
```sh
|
||||||
|
python main.py
|
||||||
|
```
|
||||||
42
SPEC.md
Normal file
42
SPEC.md
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
# Project Specification: Territory Address Combiner
|
||||||
|
|
||||||
|
This document outlines the plan for developing a script to assign Territory IDs to addresses based on their geographic coordinates.
|
||||||
|
|
||||||
|
## 1. Project Setup & File I/O [x]
|
||||||
|
|
||||||
|
1.1. Read the `TerritoryExport.csv` file. [x]
|
||||||
|
1.2. Parse the CSV data into a list of objects, where each object represents a territory and contains its ID and boundary points. [x]
|
||||||
|
1.3. Read the `Addresses.csv` file. [x]
|
||||||
|
1.4. Parse the CSV data into a list of objects, where each object represents an address and its properties, including latitude and longitude. [x]
|
||||||
|
|
||||||
|
## 2. Data Processing and Structuring [x]
|
||||||
|
|
||||||
|
2.1. For each territory, parse the `Boundary` string into a numerical list of coordinate pairs. Each pair will represent a vertex of the polygon. [x]
|
||||||
|
2.2. For each address, ensure its `Latitude` and `Longitude` are stored as numerical data types. [x]
|
||||||
|
|
||||||
|
## 3. Core Logic: Point-in-Polygon (PIP) Implementation [x]
|
||||||
|
|
||||||
|
3.1. Create a function that implements the Ray Casting algorithm to determine if a point is inside a polygon. [x]
|
||||||
|
3.2. This function will accept two arguments: the coordinates of the address (the point) and the list of vertices for a territory boundary (the polygon). [x]
|
||||||
|
3.3. The function will return `true` if the point is inside the polygon and `false` otherwise. [x]
|
||||||
|
|
||||||
|
## 4. Territory Assignment [x]
|
||||||
|
|
||||||
|
4.1. Iterate through each address in the parsed list from `Addresses.csv`. [x]
|
||||||
|
4.2. For each address, iterate through each territory from the parsed list from `TerritoryExport.csv`. [x]
|
||||||
|
4.3. Use the PIP function (from step 3) to check if the address's coordinate is inside the current territory's boundary. [x]
|
||||||
|
4.4. If the PIP function returns `true`:
|
||||||
|
4.4.1. Assign the territory's `TerritoryID` to the `TerritoryID` field of the address object. [x]
|
||||||
|
4.4.2. Break the inner loop (territory iteration) and proceed to the next address. [x]
|
||||||
|
|
||||||
|
## 5. Output Generation [x]
|
||||||
|
|
||||||
|
5.1. Create a new file named `Addresses_Updated.csv`. [x]
|
||||||
|
5.2. Write the header row from the original `Addresses.csv` to the new file. [x]
|
||||||
|
5.3. Iterate through the updated list of address objects. [x]
|
||||||
|
5.4. For each address object, write a new row to `Addresses_Updated.csv` with all the original data plus the newly assigned `TerritoryID`. [x]
|
||||||
|
|
||||||
|
## 6. Finalization [x]
|
||||||
|
|
||||||
|
6.1. Close any open file streams. [x]
|
||||||
|
6.2. Report successful completion to the user. [x]
|
||||||
167
TerritoryExport.csv
Normal file
167
TerritoryExport.csv
Normal file
File diff suppressed because one or more lines are too long
77
main.py
Normal file
77
main.py
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
import pandas as pd
|
||||||
|
from shapely.geometry import Polygon, Point
|
||||||
|
import ast
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
|
||||||
|
# --- Configure Logging ---
|
||||||
|
log_formatter = logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
|
||||||
|
|
||||||
|
# File Handler
|
||||||
|
file_handler = logging.FileHandler('run.log')
|
||||||
|
file_handler.setFormatter(log_formatter)
|
||||||
|
|
||||||
|
# Console Handler
|
||||||
|
console_handler = logging.StreamHandler(sys.stdout)
|
||||||
|
console_handler.setFormatter(log_formatter)
|
||||||
|
|
||||||
|
# Root Logger
|
||||||
|
root_logger = logging.getLogger()
|
||||||
|
root_logger.setLevel(logging.INFO)
|
||||||
|
root_logger.addHandler(file_handler)
|
||||||
|
root_logger.addHandler(console_handler)
|
||||||
|
|
||||||
|
|
||||||
|
# --- Load and Prepare Territory Data ---
|
||||||
|
logging.info("Loading and preparing territory data...")
|
||||||
|
territory_file = 'TerritoryExport.csv'
|
||||||
|
territories_df = pd.read_csv(territory_file)
|
||||||
|
|
||||||
|
def parse_boundary_to_polygon(boundary_str):
|
||||||
|
try:
|
||||||
|
coords = ast.literal_eval(boundary_str)
|
||||||
|
return Polygon(coords)
|
||||||
|
except (ValueError, SyntaxError, TypeError):
|
||||||
|
return None
|
||||||
|
|
||||||
|
territories_df['Polygon'] = territories_df['Boundary'].apply(parse_boundary_to_polygon)
|
||||||
|
territories_df.dropna(subset=['Polygon'], inplace=True)
|
||||||
|
logging.info(f"Loaded {len(territories_df)} territories.")
|
||||||
|
|
||||||
|
# --- Load and Prepare Address Data ---
|
||||||
|
logging.info("Loading address data...")
|
||||||
|
addresses_file = 'Addrsses.csv'
|
||||||
|
addresses_df = pd.read_csv(addresses_file)
|
||||||
|
logging.info(f"Found {len(addresses_df)} addresses to process.")
|
||||||
|
|
||||||
|
# --- Process Each Address ---
|
||||||
|
results = []
|
||||||
|
logging.info("Processing addresses...")
|
||||||
|
for index, address_row in addresses_df.iterrows():
|
||||||
|
user_lat = address_row['Latitude']
|
||||||
|
user_lon = address_row['Longitude']
|
||||||
|
user_point = Point(user_lon, user_lat)
|
||||||
|
|
||||||
|
found_territory_id = None
|
||||||
|
for _, territory_row in territories_df.iterrows():
|
||||||
|
if territory_row['Polygon'].contains(user_point):
|
||||||
|
found_territory_id = territory_row['TerritoryID']
|
||||||
|
break
|
||||||
|
|
||||||
|
# If no territory was found, assign 'OUTSIDE_TERRITORY'
|
||||||
|
if found_territory_id is None:
|
||||||
|
found_territory_id = 'OUTSIDE_TERRITORY'
|
||||||
|
|
||||||
|
# Replace the first column with the found TerritoryID
|
||||||
|
address_row.iloc[0] = found_territory_id
|
||||||
|
results.append(address_row)
|
||||||
|
logging.info(f" Processed address {index + 1}/{len(addresses_df)}")
|
||||||
|
|
||||||
|
logging.info("Processing complete.")
|
||||||
|
|
||||||
|
# --- Save Results to a New CSV ---
|
||||||
|
results_df = pd.DataFrame(results)
|
||||||
|
output_file = 'Addresses_with_Territory.csv'
|
||||||
|
results_df.to_csv(output_file, index=False)
|
||||||
|
|
||||||
|
logging.info(f"Results saved to {output_file}")
|
||||||
Reference in New Issue
Block a user