Files
Territory-Address-Combiner/README.md
2025-08-17 19:11:22 -05:00

4.0 KiB

Territory Address Combiner

This script assigns a Territory ID to a list of addresses by determining which territory's geographical boundary each address's coordinates fall within.

Overview

The script processes two input files:

  1. TerritoryExport.csv: Contains territory information, including a TerritoryID and a Boundary polygon defined by a series of latitude and longitude points.
  2. Addresses.csv: Contains address information, including latitude and longitude coordinates, but with an empty TerritoryID column.

The script reads both files, and for each address, it performs a point-in-polygon test to find the containing territory. It then populates the TerritoryID in the address data and saves the result to a new CSV file.

Technical Breakdown

The script operates in the following sequence:

  1. Logging Setup: Configures a logger to output informational messages to both the console and a file named run.log.
  2. Load Territory Data: Reads the TerritoryExport.csv file into a pandas DataFrame.
  3. Parse Boundaries: The parse_boundary_to_polygon function is applied to the 'Boundary' column. This function uses ast.literal_eval to safely parse the string representation of a list of coordinates into a Python list, and then shapely.geometry.Polygon to create a Polygon object from those coordinates.
  4. Load Address Data: Reads the Addrsses.csv file into a pandas DataFrame.
  5. Process Addresses: The script iterates through each row (address) in the addresses DataFrame:
    • A shapely.geometry.Point object is created from the address's 'Latitude' and 'Longitude'.
    • It then iterates through the territories. For each territory, it uses the polygon.contains(point) method to check if the address point is within the territory's boundary.
    • If a containing territory is found, its TerritoryID is stored, and the inner loop is broken.
    • If no containing territory is found after checking all territories, the TerritoryID is set to the string "OUTSIDE_TERRITORY".
  6. Update Address Data: The script replaces the value in the first column of the original address row with the found TerritoryID.
  7. Save Results: The updated address data is collected into a new DataFrame and saved to Addresses_with_Territory.csv.

Input File Specifications

TerritoryExport.csv

This can be generated by exporting existing territories from NW Scheduler

This file must contain at least the following two columns:

  • TerritoryID: A unique identifier for the territory.
  • Boundary: A string representation of a list of coordinate tuples that form the polygon for the territory. Example: "[(-85.6, 30.2), (-85.5, 30.2), (-85.5, 30.1), (-85.6, 30.1), (-85.6, 30.2)]"

Addrsses.csv

We found ours at https://openaddresses.io/. Some file processing may be needed to get it to the point required below.

This file must contain at least the following two columns:

  • Latitude: The latitude of the address.
  • Longitude: The longitude of the address.

The first column of this file will be overwritten with the TerritoryID in the output file.

Output File: Addresses_with_Territory.csv

The output file will have the same structure as Addrsses.csv, but with the first column populated with the TerritoryID of the containing territory, or "OUTSIDE_TERRITORY" if the address is not within any territory.

  • Language: Python
  • Libraries:
    • pandas: For efficient reading, manipulation, and writing of CSV data.
    • shapely: For robust and accurate geometric operations, specifically for parsing the boundary polygons and performing the point-in-polygon tests.

This stack is recommended because Python's data analysis and scientific computing ecosystem is ideal for this type of data-centric, geospatial task. It will lead to a simpler, more reliable, and more performant solution.

Usage

  1. Install Dependencies:

    pip install pandas shapely
    
  2. Run the script:

    python main.py