4.0 KiB
Territory Address Combiner
This script assigns a Territory ID to a list of addresses by determining which territory's geographical boundary each address's coordinates fall within.
Overview
The script processes two input files:
TerritoryExport.csv: Contains territory information, including aTerritoryIDand aBoundarypolygon defined by a series of latitude and longitude points.Addresses.csv: Contains address information, including latitude and longitude coordinates, but with an emptyTerritoryIDcolumn.
The script reads both files, and for each address, it performs a point-in-polygon test to find the containing territory. It then populates the TerritoryID in the address data and saves the result to a new CSV file.
Technical Breakdown
The script operates in the following sequence:
- Logging Setup: Configures a logger to output informational messages to both the console and a file named
run.log. - Load Territory Data: Reads the
TerritoryExport.csvfile into a pandas DataFrame. - Parse Boundaries: The
parse_boundary_to_polygonfunction is applied to the 'Boundary' column. This function usesast.literal_evalto safely parse the string representation of a list of coordinates into a Python list, and thenshapely.geometry.Polygonto create a Polygon object from those coordinates. - Load Address Data: Reads the
Addrsses.csvfile into a pandas DataFrame. - Process Addresses: The script iterates through each row (address) in the addresses DataFrame:
- A
shapely.geometry.Pointobject is created from the address's 'Latitude' and 'Longitude'. - It then iterates through the territories. For each territory, it uses the
polygon.contains(point)method to check if the address point is within the territory's boundary. - If a containing territory is found, its
TerritoryIDis stored, and the inner loop is broken. - If no containing territory is found after checking all territories, the
TerritoryIDis set to the string "OUTSIDE_TERRITORY".
- A
- Update Address Data: The script replaces the value in the first column of the original address row with the found
TerritoryID. - Save Results: The updated address data is collected into a new DataFrame and saved to
Addresses_with_Territory.csv.
Input File Specifications
TerritoryExport.csv
This can be generated by exporting existing territories from NW Scheduler
This file must contain at least the following two columns:
TerritoryID: A unique identifier for the territory.Boundary: A string representation of a list of coordinate tuples that form the polygon for the territory. Example:"[(-85.6, 30.2), (-85.5, 30.2), (-85.5, 30.1), (-85.6, 30.1), (-85.6, 30.2)]"
Addrsses.csv
We found ours at https://openaddresses.io/. Some file processing may be needed to get it to the point required below.
This file must contain at least the following two columns:
Latitude: The latitude of the address.Longitude: The longitude of the address.
The first column of this file will be overwritten with the TerritoryID in the output file.
Output File: Addresses_with_Territory.csv
The output file will have the same structure as Addrsses.csv, but with the first column populated with the TerritoryID of the containing territory, or "OUTSIDE_TERRITORY" if the address is not within any territory.
Recommended Tech Stack
- Language: Python
- Libraries:
- pandas: For efficient reading, manipulation, and writing of CSV data.
- shapely: For robust and accurate geometric operations, specifically for parsing the boundary polygons and performing the point-in-polygon tests.
This stack is recommended because Python's data analysis and scientific computing ecosystem is ideal for this type of data-centric, geospatial task. It will lead to a simpler, more reliable, and more performant solution.
Usage
-
Install Dependencies:
pip install pandas shapely -
Run the script:
python main.py