Start Here

This page is the recommended entry point for anyone picking up the project. It covers the runtime environment, the project layout, the order in which notebooks should run, and the convention used for output filenames.

Environment

The notebooks are written for the ESRI ArcGIS Pro Python environment. ArcPy is required and is used for spatial operations and therefore, an ArcGIS Pro license is therefore required.

The download notebook also uses the GDAL command-line tool gdal_translate (called by nc_to_geotiff_and_delete in src/download_input_layers.py) to convert PacIOOS NetCDF files to GeoTIFF. GDAL must be on the system PATH.

Python packages used across the notebooks: geopandas, pandas, numpy, requests, pyyaml, arcpy, osgeo (GDAL bindings, optional in some environments). The runtime in development was Python 3.11 inside the ESRI conda environment matrix_env.

Project layout

HiOSDS-TechSuitabilityAnalysis/
├── notebooks/                 
│   ├── 00_download_input_layers.ipynb
│   ├── 01_prepare_input_layers.ipynb
│   ├── 02_built_mpat.ipynb
│   ├── 03_screen_techs.ipynb
│   └── 04_results_viz.ipynb
├── src/                                          # Helper functions for notebooks
│   ├── download_input_layers.py
│   ├── prepare_input_layers.py
│   ├── build_mpat.py
│   ├── screen_techs.py
│   ├── export_config.py
│   └── validate_mpat.py
├── config/                                       # Screening thresholds, criteria and endpoint rules
│   └── baseline/
│       ├── criteria.yaml
│       ├── endpoint_rules.yaml
│       └── thresholds.yaml
├── data/                     
│   ├── 01_inputs/
│   │   ├── source/                                # output of notebook 00
│   │   └── prepared/                              # output of notebook 01 (also holds the HCPT_Matrix_v6.1.xlsx)
│   ├── 02_interim/                                # tempspace for ArcPy intermediate files
│   └── 03_processed/
│       ├── mpat/                                  # output of notebook 02
│       ├── tech_screening/                        # output of notebook 03
│       └── 20260428_parcel_analysis_points.gpkg   # output of notebook 02
├── assets/                   
│   └── img/
│       └── parcel_analysis_point_placement_logic.svg
└── documentation/
    └── project_wrapup/                            # Quarto site

The data/ tree is excluded from version control because the source layers and intermediate rasters are too large for GitHub. Source data lives in the project’s shared Google Drive. Notebooks, helper functions, and configuration files are tracked in the GitHub repository.

Run order

The four key endpoint notebooks run in numerical order. Each writes its outputs to a stage in data/ that the next notebook reads.

00_download_input_layers.ipynb. Run once to populate data/01_inputs/source/. Re-run only when source providers update their files. Building footprints and the cesspool inventory must be added to the source folder manually by collaborators (these are not automated).
01_prepare_input_layers.ipynb. Run after 00 to reproject, mosaic, and convert source layers into data/01_inputs/prepared/. Re-run only when a source layer changes.
02_built_mpat.ipynb. Run after 01 to assemble the Master Parcel Attribute Table in data/03_processed/mpat/ and write the parcel analysis points GeoPackage. Re-run when prepared inputs change or the pilot island list changes.
03_screen_techs.ipynb. Run after 02 to apply the screening configuration to the MPAT and write the technology screening result to data/03_processed/tech_screening/. Re-run when the MPAT or the YAML configuration changes.
04_results_viz.ipynb. Run after 03 to generate the summary PNGs used by the Results pages and slide deck.

“All” versus “Individual” sections

Every key notebook has the same structure: a “Setup” block followed by an “All” section that processes every layer in one call, and an “Individual” section with one commented-out cell per layer for selective re-runs.

First run on a fresh machine: run the “All” section.
Source layer changed: uncomment and run only the matching cell in the “Individual” section.
Re-running everything from scratch: delete the relevant output folder and run the “All” section. Already-prepared layers are skipped automatically when the output already exists; deleting the output forces a rebuild.

Where outputs land

Notebook	Output directory	Key files
00 download	`data/01_inputs/source/`	One subdirectory per source dataset (e.g., `coastline_hi_dbedt/`, `parcels_hi_higp/`)
01 prepare	`data/01_inputs/prepared/`	One GeoPackage or GeoTIFF per analysis-ready layer at EPSG:32604
02 build MPAT	`data/03_processed/mpat/` plus `data/03_processed/{TODAY}_parcel_analysis_points.gpkg`	MPAT GeoPackage, CSV, README, parcel analysis points GeoPackage
03 screen techs	`data/03_processed/tech_screening/`	Screening GeoPackage, CSV, README
04 results viz	`outputs/plots/` plus `documentation/project_wrapup/assets/plots/`	Result PNGs used by the handoff site and slides

Refresh dates and the YYYYMMDD prefix

Every output file from notebooks 02 and 03 starts with a date prefix in YYYYMMDD form (for example 20260428_mpat_32604.gpkg). The date is set once per notebook run from the Pacific/Honolulu wall clock at the top of each notebook:

TODAY = datetime.now(ZoneInfo("Pacific/Honolulu")).strftime("%Y%m%d")

A re-run on a different day produces a new set of dated files alongside the old ones. The previous run can be moved to _archive/ inside the same output folder once the new run is confirmed good.

The screening notebook reads the MPAT version explicitly through a mpat_v_date variable so you can re-screen against an older MPAT without re-running notebook 02. Update this variable when pinning to a specific MPAT version.

Status

Pilot scope

The MPAT and screening currently run for Maui, Oahu, and Kauai. The pilot island list lives in notebooks/02_built_mpat.ipynb as PILOT_ISLANDS = ["Maui", "Oahu", "Kauai"].

To add another island:

Add the island to PILOT_ISLANDS and re-run notebook 02.
Confirm that all source layers cover the new island.
Re-run notebook 03 against the new MPAT.
Re-run notebook 04 to refresh the Results plots.

Big Island, Lanai, Molokai, Niihau, and Kahoolawe are not yet covered.

Key Terms

Term	Meaning
MPAT	Master Parcel Attribute Table. One row per cesspool-bearing parcel with environmental, structural, and regulatory attributes.
TMK	Tax Map Key. The parcel identifier used as the main join key.
OSDS	On-Site Sewage Disposal System. Includes cesspools, septic tanks, and ATUs.
HAR	Hawaii Administrative Rules. HAR 11-62 Subchapter 3 is the regulatory basis for screening.
ATU	Aerobic Treatment Unit. NSF 40 and NSF 245 are the ATU standards used here.
NSF 40	Aerobic treatment standard for residential wastewater.
NSF 245	Higher-tier ATU standard with certified nitrogen removal.
SMA	Special Management Area. Hawaii coastal zone management boundary.
SFHA	Special Flood Hazard Area. FEMA flood zones designated A or V.
ksat	Saturated hydraulic conductivity from SSURGO soils data.
perc rate	Percolation rate in minutes per inch, estimated as `423.33 / ksat_r`.
GPKG	GeoPackage. The spatial file format used for processed vector outputs.
CRS	Coordinate Reference System. The project uses EPSG:32604.