Skip to content

Latest commit

 

History

History
78 lines (56 loc) · 3.53 KB

009-session-metadata.md

File metadata and controls

78 lines (56 loc) · 3.53 KB

OSEP #009: Session Metadata

Author(s) Tim Showers [email protected]
Implementer(s) Tim Showers, James Turk
Status Draft
Issue https://github.com/openstates/enhancement-proposals/issues/TBD
Draft PR(s) https://github.com/openstates/enhancement-proposals/pull/TBD
Approval PR(s) https://github.com/openstates/enhancement-proposals/pull/TBD
Created 2021-11-22
Updated 2021-11-22

Abstract

Scraping bill data in many states requires some form of session-specific metadata. Generally this is a string or integer session ID. To ease creating new sessions and clean up the code, these should be stored in the same top-level session object as other session metadata.

Specification

The LegislativeSession data model should be updated to allow an extras dict, to match the behavior of existing fields with 'extras'.

EX: The Alabama 2021 Regular session LegislativeSession would change from:

{
    "_scraped_name": "Regular Session 2021",
    "classification": "primary",
    "identifier": "2021rs",
    "name": "2021 Regular Session",
    "start_date": "2021-02-02",
    "end_date": "2021-05-18",
},

to

{
    "_scraped_name": "Regular Session 2021",
    "classification": "primary",
    "identifier": "2021rs",
    "name": "2021 Regular Session",
    "start_date": "2021-02-02",
    "end_date": "2021-05-18",
    "extras": {
        # found in select#current_session at 
        # http://alisondb.legislature.state.al.us/alison/SelectSession.aspx
        "session_id": "1076"
    }
},

Rationale

Currently we store extra session metadata inconsistently. Sometimes it's a constant dict at the top of various scrapers, sometimes it's included in a jurisdiction-specific common library file, sometimes it's inline in the code as a variable.

This makes creating new sessions more error-prone, as users can't just copy/paste a previous legislative_sessions dict and update the keys. It leads to hunting down variables in code rather than having a standardized place. It can also be a source of errors when a scrape will complete without that ID, but link to broken versions or sources. Oftentimes the source of these IDs is not clear to the reader, where a standard spot would give an obvious place to leave a comment.

This proposal is for an extras dict rather than a simple top-level "session_id" variable, because there could be cases where we need multiple extra variables. This is most likely to happen when a session encodes both a regular and special session ID in its urls.

We occasionally also need things like special URL slugs, "/2021ss2/" vs "/2021/" which we are currently handling by if loops to check for specials and substitute.

The field is named extras rather than something more descriptive to maintain consistency with existing key-value stores in the data model.

Drawbacks

We could still end up with inconsistent meta key names, and this doesn't include a formal place for tracking down new IDs.

Implementation Plan

GovHawk would update the metadata in January or February 2022, TBD from openstates-core group would create the necessary django migrations and update relevant packages. GovHawk to modify scrapers to move to new system as time allows.

Copyright

This document has been placed in the public domain per the Creative Commons CC0 1.0 Universal license.