Start Here
This page is the recommended entry point for anyone picking up the project. It covers the runtime environment, the project layout, the order in which notebooks should run, and the convention used for output filenames.
Environment
The notebooks are written for the ESRI ArcGIS Pro Python environment. ArcPy is required and is used for spatial operations and therefore, an ArcGIS Pro license is therefore required.
The download notebook also uses the GDAL command-line tool gdal_translate (called by nc_to_geotiff_and_delete in src/download_input_layers.py) to convert PacIOOS NetCDF files to GeoTIFF. GDAL must be on the system PATH.
Python packages used across the notebooks: geopandas, pandas, numpy, requests, pyyaml, arcpy, osgeo (GDAL bindings, optional in some environments). The runtime in development was Python 3.11 inside the ESRI conda environment matrix_env.
Project layout
HiOSDS-TechSuitabilityAnalysis/
├── notebooks/
│ ├── 00_download_input_layers.ipynb
│ ├── 01_prepare_input_layers.ipynb
│ ├── 02_built_mpat.ipynb
│ ├── 03_screen_techs.ipynb
│ └── 04_results_viz.ipynb
├── src/ # Helper functions for notebooks
│ ├── download_input_layers.py
│ ├── prepare_input_layers.py
│ ├── build_mpat.py
│ ├── screen_techs.py
│ ├── export_config.py
│ └── validate_mpat.py
├── config/ # Screening thresholds, criteria and endpoint rules
│ └── baseline/
│ ├── criteria.yaml
│ ├── endpoint_rules.yaml
│ └── thresholds.yaml
├── data/
│ ├── 01_inputs/
│ │ ├── source/ # output of notebook 00
│ │ └── prepared/ # output of notebook 01 (also holds the HCPT_Matrix_v6.1.xlsx)
│ ├── 02_interim/ # tempspace for ArcPy intermediate files
│ └── 03_processed/
│ ├── mpat/ # output of notebook 02
│ ├── tech_screening/ # output of notebook 03
│ └── 20260428_parcel_analysis_points.gpkg # output of notebook 02
├── assets/
│ └── img/
│ └── parcel_analysis_point_placement_logic.svg
└── documentation/
└── project_wrapup/ # Quarto site
The data/ tree is excluded from version control because the source layers and intermediate rasters are too large for GitHub. Source data lives in the project’s shared Google Drive. Notebooks, helper functions, and configuration files are tracked in the GitHub repository.
Run order
The four key endpoint notebooks run in numerical order. Each writes its outputs to a stage in data/ that the next notebook reads.
00_download_input_layers.ipynb. Run once to populatedata/01_inputs/source/. Re-run only when source providers update their files. Building footprints and the cesspool inventory must be added to the source folder manually by collaborators (these are not automated).01_prepare_input_layers.ipynb. Run after 00 to reproject, mosaic, and convert source layers intodata/01_inputs/prepared/. Re-run only when a source layer changes.02_built_mpat.ipynb. Run after 01 to assemble the Master Parcel Attribute Table indata/03_processed/mpat/and write the parcel analysis points GeoPackage. Re-run when prepared inputs change or the pilot island list changes.03_screen_techs.ipynb. Run after 02 to apply the screening configuration to the MPAT and write the technology screening result todata/03_processed/tech_screening/. Re-run when the MPAT or the YAML configuration changes.04_results_viz.ipynb. Run after 03 to generate the summary PNGs used by the Results pages and slide deck.
“All” versus “Individual” sections
Every key notebook has the same structure: a “Setup” block followed by an “All” section that processes every layer in one call, and an “Individual” section with one commented-out cell per layer for selective re-runs.
- First run on a fresh machine: run the “All” section.
- Source layer changed: uncomment and run only the matching cell in the “Individual” section.
- Re-running everything from scratch: delete the relevant output folder and run the “All” section. Already-prepared layers are skipped automatically when the output already exists; deleting the output forces a rebuild.
Where outputs land
| Notebook | Output directory | Key files |
|---|---|---|
| 00 download | data/01_inputs/source/ |
One subdirectory per source dataset (e.g., coastline_hi_dbedt/, parcels_hi_higp/) |
| 01 prepare | data/01_inputs/prepared/ |
One GeoPackage or GeoTIFF per analysis-ready layer at EPSG:32604 |
| 02 build MPAT | data/03_processed/mpat/ plus data/03_processed/{TODAY}_parcel_analysis_points.gpkg |
MPAT GeoPackage, CSV, README, parcel analysis points GeoPackage |
| 03 screen techs | data/03_processed/tech_screening/ |
Screening GeoPackage, CSV, README |
| 04 results viz | outputs/plots/ plus documentation/project_wrapup/assets/plots/ |
Result PNGs used by the handoff site and slides |
Refresh dates and the YYYYMMDD prefix
Every output file from notebooks 02 and 03 starts with a date prefix in YYYYMMDD form (for example 20260428_mpat_32604.gpkg). The date is set once per notebook run from the Pacific/Honolulu wall clock at the top of each notebook:
TODAY = datetime.now(ZoneInfo("Pacific/Honolulu")).strftime("%Y%m%d")A re-run on a different day produces a new set of dated files alongside the old ones. The previous run can be moved to _archive/ inside the same output folder once the new run is confirmed good.
The screening notebook reads the MPAT version explicitly through a mpat_v_date variable so you can re-screen against an older MPAT without re-running notebook 02. Update this variable when pinning to a specific MPAT version.
Status
Pilot scope
The MPAT and screening currently run for Maui, Oahu, and Kauai. The pilot island list lives in notebooks/02_built_mpat.ipynb as PILOT_ISLANDS = ["Maui", "Oahu", "Kauai"].
To add another island:
- Add the island to
PILOT_ISLANDSand re-run notebook 02. - Confirm that all source layers cover the new island.
- Re-run notebook 03 against the new MPAT.
- Re-run notebook 04 to refresh the Results plots.
Big Island, Lanai, Molokai, Niihau, and Kahoolawe are not yet covered.
Key Terms
| Term | Meaning |
|---|---|
| MPAT | Master Parcel Attribute Table. One row per cesspool-bearing parcel with environmental, structural, and regulatory attributes. |
| TMK | Tax Map Key. The parcel identifier used as the main join key. |
| OSDS | On-Site Sewage Disposal System. Includes cesspools, septic tanks, and ATUs. |
| HAR | Hawaii Administrative Rules. HAR 11-62 Subchapter 3 is the regulatory basis for screening. |
| ATU | Aerobic Treatment Unit. NSF 40 and NSF 245 are the ATU standards used here. |
| NSF 40 | Aerobic treatment standard for residential wastewater. |
| NSF 245 | Higher-tier ATU standard with certified nitrogen removal. |
| SMA | Special Management Area. Hawaii coastal zone management boundary. |
| SFHA | Special Flood Hazard Area. FEMA flood zones designated A or V. |
| ksat | Saturated hydraulic conductivity from SSURGO soils data. |
| perc rate | Percolation rate in minutes per inch, estimated as 423.33 / ksat_r. |
| GPKG | GeoPackage. The spatial file format used for processed vector outputs. |
| CRS | Coordinate Reference System. The project uses EPSG:32604. |
What to read next
- New to the project: skim Methods > Download Datasets to see how a typical methods page is structured, then read Status.
- Looking for a specific output: see Resources for the canonical links to GitHub, Google Drive, and the InfoWRRC ArcGIS Online assets.
- Acronym confusion: see Key Terms.