Document data sources and script generating the required input data #39

sgreenbury · 2024-08-07T15:26:27Z

This issue aims to document and automate where possible the set-up of required data for the pipeline. This will enable the pipeline to be run for other regions (specifically Greater London as a next case).

Add current file tree structure in data/ path
Script to get boundaries for region
Travel times for interzone travel Travel time matrices for assigning activities to zones #20
Consider using alternatives to OA for larger regions (e.g. make resolution as config param)
POI data (see Should we add osmox to the repo? #19)
NTS input update (see Filter NTS data to study area to avoid unrepresentative travel distances or mode share #16)

The text was updated successfully, but these errors were encountered:

Hussein-Mahfouz · 2024-08-23T11:18:00Z

This is the travel time workflow I used in another project

Expanded workflow described here: #20 (comment)

sgreenbury · 2024-08-23T11:22:34Z

From OSMOX: "west-yorkshire_epsg_4326.parquet"

sgreenbury · 2024-08-28T12:46:40Z

Documentation can be added in the scripts README: https://github.com/Urban-Analytics-Technology-Platform/acbm/blob/main/scripts/README.md

sgreenbury · 2024-08-28T13:08:22Z

See docs r5r: https://ipeagit.github.io/r5r/articles/travel_time_matrix.html

sgreenbury · 2024-09-20T14:03:48Z

Discussion with @Hussein-Mahfouz around how to generalize the boundary and travel time inputs. Currently the geometry ID (e.g. OA or MSOA) is specified independently in each script with different logic paths.

Considerations going forward:

Only OA and MSOA or other zone layers
Standardizing names for columns for boundary and travel times (e.g. the input data should be relabelled and then can be validated in data processing). Aim to just use zone_id, from_id, to_id.

Hussein-Mahfouz · 2024-09-20T15:47:11Z

@sgreenbury I've tried to capture all necessary updates in this comment. We will need to make edits to both the library functions and the scripts. We can move this to a separate issue if necessary

General

Preprocessing script

Decide on column names for zone_id column. this should be reflected in activity_chains, boundaries, and travel_times data. These names should work regardless of layer (OA, MSOA etc). It would then be ok to hardcode these columns in the functions/scripts and check their existence using pandera. Initial idea:
- activity_chains: zone_id
- boundaries: zone_id
- travel_times: from_id and to_id
preprocess boundary layer boundary preprocessing #52: Currently we load the boundary layer for the UK in each script, and filter it (see here).
We need to do this once in a preprocessing script. Steps:
- load in our boundary layer (currently OA or MSOA)
- if boundary layer has a city column, we can use that to filter, and grab the city name from the config.
- Otherwise we need to use another layer that can be subsetted to our desired region, and then do a spatial intersection with our boundary layer to crop out the region we want
Spatial join: Use variable zone_id from config instead of hardcoded OA21CD here. This needs to be done for these scripts. Alternatively, do this join once in the once in the beginning in a preprocessing script.
Preparing travel demand data in 3.2.2_assign_primary_zone_work: (see here)

Config

Add ["TravDay"] == 3] to config - see here. We then need to remove this filter step in all scripts
Replace commute_level here. It could point to boundary_geography in the config

Per script

Hussein-Mahfouz mentioned this issue Aug 23, 2024

osmox #40

Open

sgreenbury changed the title ~~Apply pipeline to run for other regions~~ Document data sources and script generating the required input data Aug 28, 2024

Hussein-Mahfouz mentioned this issue Sep 17, 2024

Time estimates workflow #48

Merged

Hussein-Mahfouz mentioned this issue Oct 4, 2024

boundary preprocessing #52

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document data sources and script generating the required input data #39

Document data sources and script generating the required input data #39

sgreenbury commented Aug 7, 2024 •

edited

Loading

Hussein-Mahfouz commented Aug 23, 2024 •

edited

Loading

sgreenbury commented Aug 23, 2024

sgreenbury commented Aug 28, 2024

sgreenbury commented Aug 28, 2024

sgreenbury commented Sep 20, 2024

Hussein-Mahfouz commented Sep 20, 2024 •

edited

Loading

Document data sources and script generating the required input data #39

Document data sources and script generating the required input data #39

Comments

sgreenbury commented Aug 7, 2024 • edited Loading

Hussein-Mahfouz commented Aug 23, 2024 • edited Loading

sgreenbury commented Aug 23, 2024

sgreenbury commented Aug 28, 2024

sgreenbury commented Aug 28, 2024

sgreenbury commented Sep 20, 2024

Hussein-Mahfouz commented Sep 20, 2024 • edited Loading

General

Preprocessing script

Config

Per script

sgreenbury commented Aug 7, 2024 •

edited

Loading

Hussein-Mahfouz commented Aug 23, 2024 •

edited

Loading

Hussein-Mahfouz commented Sep 20, 2024 •

edited

Loading