77 lines
4.0 KiB
Markdown
77 lines
4.0 KiB
Markdown
# Territory Address Combiner
|
|
|
|
This script assigns a Territory ID to a list of addresses by determining which territory's geographical boundary each address's coordinates fall within.
|
|
|
|
## Overview
|
|
|
|
The script processes two input files:
|
|
|
|
1. `TerritoryExport.csv`: Contains territory information, including a `TerritoryID` and a `Boundary` polygon defined by a series of latitude and longitude points.
|
|
2. `Addresses.csv`: Contains address information, including latitude and longitude coordinates, but with an empty `TerritoryID` column.
|
|
|
|
The script reads both files, and for each address, it performs a point-in-polygon test to find the containing territory. It then populates the `TerritoryID` in the address data and saves the result to a new CSV file.
|
|
|
|
## Technical Breakdown
|
|
|
|
The script operates in the following sequence:
|
|
|
|
1. **Logging Setup**: Configures a logger to output informational messages to both the console and a file named `run.log`.
|
|
2. **Load Territory Data**: Reads the `TerritoryExport.csv` file into a pandas DataFrame.
|
|
3. **Parse Boundaries**: The `parse_boundary_to_polygon` function is applied to the 'Boundary' column. This function uses `ast.literal_eval` to safely parse the string representation of a list of coordinates into a Python list, and then `shapely.geometry.Polygon` to create a Polygon object from those coordinates.
|
|
4. **Load Address Data**: Reads the `Addrsses.csv` file into a pandas DataFrame.
|
|
5. **Process Addresses**: The script iterates through each row (address) in the addresses DataFrame:
|
|
- A `shapely.geometry.Point` object is created from the address's 'Latitude' and 'Longitude'.
|
|
- It then iterates through the territories. For each territory, it uses the `polygon.contains(point)` method to check if the address point is within the territory's boundary.
|
|
- If a containing territory is found, its `TerritoryID` is stored, and the inner loop is broken.
|
|
- If no containing territory is found after checking all territories, the `TerritoryID` is set to the string "OUTSIDE_TERRITORY".
|
|
6. **Update Address Data**: The script replaces the value in the first column of the original address row with the found `TerritoryID`.
|
|
7. **Save Results**: The updated address data is collected into a new DataFrame and saved to `Addresses_with_Territory.csv`.
|
|
|
|
### Input File Specifications
|
|
|
|
#### `TerritoryExport.csv`
|
|
|
|
This can be generated by exporting existing territories from NW Scheduler
|
|
|
|
This file must contain at least the following two columns:
|
|
|
|
- `TerritoryID`: A unique identifier for the territory.
|
|
- `Boundary`: A string representation of a list of coordinate tuples that form the polygon for the territory. Example: `"[(-85.6, 30.2), (-85.5, 30.2), (-85.5, 30.1), (-85.6, 30.1), (-85.6, 30.2)]"`
|
|
|
|
#### `Addrsses.csv`
|
|
|
|
We found ours at https://openaddresses.io/. Some file processing may be needed to get it to the point required below.
|
|
|
|
This file must contain at least the following two columns:
|
|
|
|
- `Latitude`: The latitude of the address.
|
|
- `Longitude`: The longitude of the address.
|
|
|
|
The first column of this file will be overwritten with the `TerritoryID` in the output file.
|
|
|
|
### Output File: `Addresses_with_Territory.csv`
|
|
|
|
The output file will have the same structure as `Addrsses.csv`, but with the first column populated with the `TerritoryID` of the containing territory, or "OUTSIDE_TERRITORY" if the address is not within any territory.
|
|
|
|
## Recommended Tech Stack
|
|
|
|
- **Language**: **Python**
|
|
- **Libraries**:
|
|
- **pandas**: For efficient reading, manipulation, and writing of CSV data.
|
|
- **shapely**: For robust and accurate geometric operations, specifically for parsing the boundary polygons and performing the point-in-polygon tests.
|
|
|
|
This stack is recommended because Python's data analysis and scientific computing ecosystem is ideal for this type of data-centric, geospatial task. It will lead to a simpler, more reliable, and more performant solution.
|
|
|
|
## Usage
|
|
|
|
1. **Install Dependencies:**
|
|
|
|
```sh
|
|
pip install pandas shapely
|
|
```
|
|
|
|
2. **Run the script:**
|
|
```sh
|
|
python main.py
|
|
```
|