2025 Bill Blockchain
Open Civic Data Blockchain Proposal
This proposal outlines a decentralized, peer-to-peer system for managing and publishing civic data using a blockchain-like append-only log. Built on the Open Civic Data schema and powered by Git, this architecture enables transparency, tamper-resistance, and flexibility in how public information is stored, shared, and consumed. By treating government data as a series of verifiable, timestamped events, we create an ecosystem where organizations and individuals can build custom civic feeds, automate updates, and uncover hidden dynamics in governanceβall without relying on centralized servers.
Why Use a Hashed Append-Only Log?
-
π Truly Peer-to-Peer
Everyone keeps their own copy of the dataβno central server needed, no extra cost. -
π The Constitution Is Basically a Blockchain
Government changes through amendments. Our log reflects this: permanent, append-only, and transparent. -
π» Highly Tailored Custom Feeds Built With Code + AI
Composable event logs will be easy to filter, tag, and summarize. Orgs can compose those feeds too in order to make highly tailored feeds for publishing. -
π€ Publish Everywhere with Bots
Organizations can automate updates to any number of platforms easily, from Blue Sky Bot Alert posters βthink Reddit replies or Bluesky postsβon top of each other. In addition, we can make tooling to have public RSS feeds that can then be imported by news organizations. -
βοΈ Blockchain without the Cringe or Cost
Blockchain hashes + public key signatures let users verify data themselves without expensive proof algorithms. For IDs, Decentralized Identifiers are the new standard, and interop with Bluesky. -
βοΈ Network Agnostic
Supports everything: peer-to-peer, pub-sub, polling, WebRTC, email, RSS, pushβnotifications, etc. They will all work naturally. -
π± Our App Becomes A Glorified P2P Feed Reader With Civic Tendencies
By being a P2P feed reader with special features around civic data, we simplify the app itself, and allow others to make their own client apps. -
π RSS Feeds Just Work
Feed-based design lets us easily pull in existing sources like Executive Orders or court decisions via RSS, and allows organizations to pull news website feeds. -
βͺ Bonus: Reveal Power Dynamics
Replay legislative logs to uncover hidden patternsβwho votes when, with whom, and under whose influence.
Why Open Civic Data as the Base Schema?
-
π€ Plug Into the Civic Tech Ecosystem
Uses familiar Open Civic Data formats, making it easy to integrate with existing tools and scrapers. -
π Reuse Existing Data
Works with platforms like OpenStates and Councilmatic, giving us access to many data sources.
Why Git for Data Storage?
-
π Folders + Files = Maximum Portability
The most universal data structureβeasy to read, edit, and share across tools and platforms. -
π Git Is Already Peer-to-Peer
Git is built on a distributed log.git pullworks seamlessly in our app and AI workflows. -
π GitHub = Easy Browsing
Markdown rendering and file previews make GitHub a friendly UI for exploring without needing to clone. We can also expose RSS feeds via GHPages. -
π§© Submodules Keep Repos Lean
Git submodules let us split large datasets across repos, so no single repo gets bloated.
Folder Structure + Filename Convention
/open-civic-data-blockchain/
βββ country:us/ # United States
β βββ state:il/ # Illinois state
β β βββ sessions/ # Legislative sessions
β β β βββ ocd-session/country:us/state:il/2023-2024/ # Full OCD session ID
β β β β βββ bills/ # Bills in this session
β β β β β βββ sb1234/ # Senate Bill 1234
β β β β β β βββ logs/ # Event logs folder
β β β β β β β βββ 20240115T123045Z_session_bill_created.json # Initial bill creation in session
β β β β β β β βββ 20240115T123045Z_metadata_created.json # Initial metadata creation
β β β β β β β βββ 20240117T143022Z_metadata_updated.json # Metadata update with field mask
β β β β β β β βββ 20240117T143156Z_sponsor_added.json # Sponsors added
β β β β β β β βββ 20240120T092133Z_version_added.json # Version document added
β β β β β β β βββ 20240130T152247Z_action_added.json # Action recorded
β β β β β β β βββ 20240215T103045Z_doc_added.json # Supporting document added
β β β β β β β βββ 20240315T140011Z_vote_initiated.json # Vote started
β β β β β β β βββ 20240315T143022Z_vote_updated.json # Vote partial results
β β β β β β β βββ 20240315T150537Z_vote_finalized.json # Vote complete
β β β β β β βββ files/ # Raw file storage
β β β β β β βββ bill_introduced.pdf # Original version document
β β β β β β βββ bill_amended.pdf # Amended version document
β β β β β β βββ fiscal_note.pdf # Supporting document
β β β β β βββ hb0789/ # House Bill 789
β β β β β β βββ logs/ # Event logs folder
β β β β β β β βββ 20240118T090023Z_session_bill_created.json # Initial bill creation in session
β β β β β β β βββ 20240118T090023Z_metadata_created.json # Initial metadata creation
β β β β β β β βββ ...
β β β β β β βββ files/ # Raw file storage
β β β β β β βββ ...
β β β β β βββ ...
β β β β βββ events/ # Events for this session
β β β β βββ 2024-04-15-senate-appropriations-hearing.json # Senate committee hearing
β β β β βββ 2024-02-22-house-floor-session.json # House floor session
β β β β βββ ...
β β β βββ ocd-session/country:us/state:il/2021-2022/ # Previous session
β β β β βββ ...
β β β βββ ...
β β βββ events/ # Events not tied to a specific session
β β βββ 2024-07-15-joint-commission-meeting.json # Joint commission meeting
β β βββ 2024-08-20-special-task-force.json # Special task force meeting
β β βββ ...
β βββ state:ca/ # California state
β β βββ ...
β βββ state:ny/ # New York state
β βββ ...
βββ country:ca/ # Canada
βββ ...
Git Architecture
We plan to auto-generate many git repos.
Session Git Repo
This repo should be a blockchain-like append only log, making syncing data as easy as git pull.
Question: what about the files like PDFS? They feel right to keep in here as a copy, but also, would balloon the size of these. Maybe yet another submodule for session files?
/
βββ README.md # Session-specific information
βββ bills/ # Bills in this session
β βββ sb1234/ # Senate Bill 1234
β β βββ logs/ # Event logs folder
β β β βββ 20240115T123045Z_session_bill_created.json
β β β βββ 20240115T123045Z_metadata_created.json
β β β βββ 20240117T143022Z_metadata_updated.json
β β β βββ ...
β β βββ files/ # Raw file storage
β β βββ bill_introduced.pdf
β β βββ bill_amended.pdf
β β βββ fiscal_note.pdf
β βββ hb0789/ # House Bill 789
β β βββ logs/
β β β βββ ...
β β βββ files/
β β βββ ...
β βββ ...
βββ events/ # Events for this session
βββ 2024-04-15-senate-appropriations-hearing.json
βββ 2024-02-22-house-floor-session.json
βββ ...
Locale Git Repo
Overall locale repo (also generated). Contain links to git submodules that have event logs for different sessions/events. Will also contain scripts to rebuild data into Open Civic Data formats.
ocd-blockchain-illinois/
βββ .gitmodules
βββ README.md
βββ scripts/
β βββ scrape.py # Shortcut to directly scrape for this locale
| βββ rebuild.py # To rebuild OCD data from blockchain logs
βββ sessions/
β βββ ocd-blockchain-illinois/ocd-session/country:us/state:il/2023-2024/
β βββ ocd-blockchain-illinois/ocd-session/country:us/state:il/2021-2022/
β βββ ocd-blockchain-illinois/ocd-session/country:us/state:il/2019-2020/
βββ events/
βββ 2022-2026/
βββ 2018-2022/
βββ 2014-2018/
Main Repo
The primary repo (also generated) that people can clone to get all civic data easily via the submodules.
open-civic-data-blockchain/
βββ .gitmodules
βββ README.md
βββ scripts/
β βββ update_all.sh
β βββ integrity_check.py
β βββ generate_cross_jurisdictional_report.py
βββ jurisdictions/
βββ country:us/
β βββ state:il/ # Illinois submodule
β βββ state:ca/ # California submodule
β βββ state:ny/ # New York submodule
β βββ district:dc/ # Washington DC submodule
β βββ county:us/state:va/fairfax/ # Fairfax County submodule
β βββ place:us/state:tx/austin/ # City of Austin submodule
βββ country:ca/
β βββ province:on/ # Ontario province submodule
β βββ province:bc/ # British Columbia submodule
βββ country:uk/
βββ england/ # England submodule
βββ scotland/ # Scotland submodule
TODO List
-
Timestamps: Scrape-Oriented vs. Gov-Oriented
Are log timestamps the time we scraped the data, or the time of the actual government update?
What if a specific event doesn't have a timestamp?
β€ Open Civic Data also discussed this -
Unique IDs
OpenStates uses a lot of generated UUIDs. Ideally, our folder/file structure and naming conventions should follow official legislative data.- Jurisdiction ID: Follows OCD naming convention β
country:us/state:fl/government - Session ID: TODO
- Bill ID:
jurisdiction_id/sessions/:session_id/bill.identifierβ use official ID likeHB250 - Vote Event ID: TODO
- Person ID: TODO
- Event ID: TODO
- Jurisdiction ID: Follows OCD naming convention β
-
Bill Folder + Filename Convention
bill.metadata:bill_id/log/metadata_update_{TODO}.jsonbill.actions:bill_id/log/action_{TODO}.jsonbill.votes:bill_id/log/vote_{TODO}.jsonbill.sponsors:bill_id/log/sponsor_update_{TODO}.jsonbill.versions:- File:
bill_id/files/version_{TODO}.pdf - Log:
bill_id/log/version_add_{TODO}.json(we can extract PDF content to JSON)
- File:
bill.documents:- File:
bill_id/files/documents_{TODO}.pdf - Log:
bill_id/log/document_add_{TODO}.json(we can extract PDF content to JSON)
- File:
-
Event Folder Convention
Events tied to sessions should live inside the session folder.
Out-of-session events: can we define a reliable alternate time span for organization? -
How to Handle Metadata Changes
Metadata (likebill) may change from scrape to scrape.
UsefieldMaskfor lightweight updates, or consider JSON Patch.
β€ https://jsonpatch.com// bill.metadata_events { "fieldMask": ["from_organization"], "bill": { "from_organization": "" } }
Environment Setup
For now, we aren't doing any coding that touches the previous code. All code/decisions should be in this scraper_next folder as an isolated experiment. If you don't have git access, message @sartaj.
Easy: Download Data and Explore With SQL Explorers
- OpenState Illinois Scraper Output Files
- State/Federal OpenStates Data Explorer
- password is ChiHackNight closing group phrase all lowercase
- Chicago OCD Data Explorer Explore Councilmatic PG Dump for Chicago OCD data
Advanced: Running Scrapers / Importing PG Dumps
- Open States
- via Scraper. We are using this for v1. By running the scrapers directly, data will be much more up to date as it scrapes data directly. It also allow us to run certain scrapers, like USA, multiple times a day.
- via SQL Dump, which updates every few days, and has bill full text, in addition to a lot of other content like maps data.
- Chicago SQL Dump. This updates every night and is managed by Datamade, who we have already been collaborating with on Chicago data. They also do stuff like AI summaries that we can pre-pull.
Prior Art
- Washington DC made Github their official law source of truth. It looks immutable.
- How append-only logs are used in p2p/blockchain applications.
- Beginners guide to event sourced databases and their benefits.
- Bluesky LGBTQ+ Legislation Alerts This incredible team has manually created a system that I think we can make tooling for that they would potentially want to use.
Communications
- Discussion via Slack
- Task Board via Slack
- (this file) Collaborative Brainstorming via Git: Feel free to edit.