2025 Bill Blockchain

Open Civic Data Blockchain Proposal

This proposal outlines a decentralized, peer-to-peer system for managing and publishing civic data using a blockchain-like append-only log. Built on the Open Civic Data schema and powered by Git, this architecture enables transparency, tamper-resistance, and flexibility in how public information is stored, shared, and consumed. By treating government data as a series of verifiable, timestamped events, we create an ecosystem where organizations and individuals can build custom civic feeds, automate updates, and uncover hidden dynamics in governanceβ€”all without relying on centralized servers.

Why Use a Hashed Append-Only Log?

  • πŸ” Truly Peer-to-Peer
    Everyone keeps their own copy of the dataβ€”no central server needed, no extra cost.

  • πŸ“œ The Constitution Is Basically a Blockchain
    Government changes through amendments. Our log reflects this: permanent, append-only, and transparent.

  • πŸ’» Highly Tailored Custom Feeds Built With Code + AI
    Composable event logs will be easy to filter, tag, and summarize. Orgs can compose those feeds too in order to make highly tailored feeds for publishing.

  • πŸ€– Publish Everywhere with Bots
    Organizations can automate updates to any number of platforms easily, from Blue Sky Bot Alert posters β€”think Reddit replies or Bluesky postsβ€”on top of each other. In addition, we can make tooling to have public RSS feeds that can then be imported by news organizations.

  • ⛓️ Blockchain without the Cringe or Cost
    Blockchain hashes + public key signatures let users verify data themselves without expensive proof algorithms. For IDs, Decentralized Identifiers are the new standard, and interop with Bluesky.

  • ☎️ Network Agnostic
    Supports everything: peer-to-peer, pub-sub, polling, WebRTC, email, RSS, pushβ€”notifications, etc. They will all work naturally.

  • πŸ“± Our App Becomes A Glorified P2P Feed Reader With Civic Tendencies
    By being a P2P feed reader with special features around civic data, we simplify the app itself, and allow others to make their own client apps.

  • πŸ›œ RSS Feeds Just Work
    Feed-based design lets us easily pull in existing sources like Executive Orders or court decisions via RSS, and allows organizations to pull news website feeds.

  • βͺ Bonus: Reveal Power Dynamics
    Replay legislative logs to uncover hidden patternsβ€”who votes when, with whom, and under whose influence.

Why Open Civic Data as the Base Schema?

  • 🀝 Plug Into the Civic Tech Ecosystem
    Uses familiar Open Civic Data formats, making it easy to integrate with existing tools and scrapers.

  • πŸ”„ Reuse Existing Data
    Works with platforms like OpenStates and Councilmatic, giving us access to many data sources.

Why Git for Data Storage?

  • πŸ“ Folders + Files = Maximum Portability
    The most universal data structureβ€”easy to read, edit, and share across tools and platforms.

  • πŸ”„ Git Is Already Peer-to-Peer
    Git is built on a distributed log. git pull works seamlessly in our app and AI workflows.

  • 🌐 GitHub = Easy Browsing
    Markdown rendering and file previews make GitHub a friendly UI for exploring without needing to clone. We can also expose RSS feeds via GHPages.

  • 🧩 Submodules Keep Repos Lean
    Git submodules let us split large datasets across repos, so no single repo gets bloated.

Folder Structure + Filename Convention

/open-civic-data-blockchain/
β”œβ”€β”€ country:us/                                 # United States
β”‚   β”œβ”€β”€ state:il/                               # Illinois state
β”‚   β”‚   β”œβ”€β”€ sessions/                           # Legislative sessions
β”‚   β”‚   β”‚   β”œβ”€β”€ ocd-session/country:us/state:il/2023-2024/  # Full OCD session ID
β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ bills/                      # Bills in this session
β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ sb1234/                 # Senate Bill 1234
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ logs/               # Event logs folder
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240115T123045Z_session_bill_created.json  # Initial bill creation in session
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240115T123045Z_metadata_created.json      # Initial metadata creation
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240117T143022Z_metadata_updated.json      # Metadata update with field mask
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240117T143156Z_sponsor_added.json         # Sponsors added
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240120T092133Z_version_added.json         # Version document added
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240130T152247Z_action_added.json          # Action recorded
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240215T103045Z_doc_added.json             # Supporting document added
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240315T140011Z_vote_initiated.json        # Vote started
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240315T143022Z_vote_updated.json          # Vote partial results
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   └── 20240315T150537Z_vote_finalized.json        # Vote complete
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   └── files/              # Raw file storage
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚       β”œβ”€β”€ bill_introduced.pdf      # Original version document
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚       β”œβ”€β”€ bill_amended.pdf         # Amended version document
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚       └── fiscal_note.pdf          # Supporting document
β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ hb0789/                 # House Bill 789
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ logs/               # Event logs folder
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240118T090023Z_session_bill_created.json  # Initial bill creation in session
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”œβ”€β”€ 20240118T090023Z_metadata_created.json      # Initial metadata creation
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚   └── files/              # Raw file storage
β”‚   β”‚   β”‚   β”‚   β”‚   β”‚       └── ...
β”‚   β”‚   β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”‚   β”‚   └── events/                     # Events for this session
β”‚   β”‚   β”‚   β”‚       β”œβ”€β”€ 2024-04-15-senate-appropriations-hearing.json  # Senate committee hearing
β”‚   β”‚   β”‚   β”‚       β”œβ”€β”€ 2024-02-22-house-floor-session.json            # House floor session
β”‚   β”‚   β”‚   β”‚       └── ...
β”‚   β”‚   β”‚   β”œβ”€β”€ ocd-session/country:us/state:il/2021-2022/  # Previous session
β”‚   β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── events/                            # Events not tied to a specific session
β”‚   β”‚       β”œβ”€β”€ 2024-07-15-joint-commission-meeting.json  # Joint commission meeting
β”‚   β”‚       β”œβ”€β”€ 2024-08-20-special-task-force.json        # Special task force meeting
β”‚   β”‚       └── ...
β”‚   β”œβ”€β”€ state:ca/                               # California state
β”‚   β”‚   └── ...
β”‚   └── state:ny/                               # New York state
β”‚       └── ...
└── country:ca/                                 # Canada
    └── ...

Git Architecture

We plan to auto-generate many git repos.

Session Git Repo

This repo should be a blockchain-like append only log, making syncing data as easy as git pull.

Question: what about the files like PDFS? They feel right to keep in here as a copy, but also, would balloon the size of these. Maybe yet another submodule for session files?

/
β”œβ”€β”€ README.md                  # Session-specific information
β”œβ”€β”€ bills/                     # Bills in this session
β”‚   β”œβ”€β”€ sb1234/                # Senate Bill 1234
β”‚   β”‚   β”œβ”€β”€ logs/              # Event logs folder
β”‚   β”‚   β”‚   β”œβ”€β”€ 20240115T123045Z_session_bill_created.json
β”‚   β”‚   β”‚   β”œβ”€β”€ 20240115T123045Z_metadata_created.json
β”‚   β”‚   β”‚   β”œβ”€β”€ 20240117T143022Z_metadata_updated.json
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── files/             # Raw file storage
β”‚   β”‚       β”œβ”€β”€ bill_introduced.pdf
β”‚   β”‚       β”œβ”€β”€ bill_amended.pdf
β”‚   β”‚       └── fiscal_note.pdf
β”‚   β”œβ”€β”€ hb0789/                # House Bill 789
β”‚   β”‚   β”œβ”€β”€ logs/
β”‚   β”‚   β”‚   └── ...
β”‚   β”‚   └── files/
β”‚   β”‚       └── ...
β”‚   └── ...
└── events/                    # Events for this session
    β”œβ”€β”€ 2024-04-15-senate-appropriations-hearing.json
    β”œβ”€β”€ 2024-02-22-house-floor-session.json
    └── ...

Locale Git Repo

Overall locale repo (also generated). Contain links to git submodules that have event logs for different sessions/events. Will also contain scripts to rebuild data into Open Civic Data formats.

ocd-blockchain-illinois/
β”œβ”€β”€ .gitmodules
β”œβ”€β”€ README.md
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ scrape.py # Shortcut to directly scrape for this locale
|   └── rebuild.py # To rebuild OCD data from blockchain logs
β”œβ”€β”€ sessions/
β”‚   β”œβ”€β”€ ocd-blockchain-illinois/ocd-session/country:us/state:il/2023-2024/
β”‚   β”œβ”€β”€ ocd-blockchain-illinois/ocd-session/country:us/state:il/2021-2022/
β”‚   └── ocd-blockchain-illinois/ocd-session/country:us/state:il/2019-2020/
└── events/
   β”œβ”€β”€ 2022-2026/
   β”œβ”€β”€ 2018-2022/
   └── 2014-2018/

Main Repo

The primary repo (also generated) that people can clone to get all civic data easily via the submodules.

open-civic-data-blockchain/
β”œβ”€β”€ .gitmodules
β”œβ”€β”€ README.md
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ update_all.sh
β”‚   β”œβ”€β”€ integrity_check.py
β”‚   └── generate_cross_jurisdictional_report.py
└── jurisdictions/
    β”œβ”€β”€ country:us/
    β”‚   β”œβ”€β”€ state:il/                           # Illinois submodule
    β”‚   β”œβ”€β”€ state:ca/                           # California submodule
    β”‚   β”œβ”€β”€ state:ny/                           # New York submodule
    β”‚   β”œβ”€β”€ district:dc/                        # Washington DC submodule
    β”‚   β”œβ”€β”€ county:us/state:va/fairfax/         # Fairfax County submodule
    β”‚   └── place:us/state:tx/austin/           # City of Austin submodule
    β”œβ”€β”€ country:ca/
    β”‚   β”œβ”€β”€ province:on/                        # Ontario province submodule
    β”‚   └── province:bc/                        # British Columbia submodule
    └── country:uk/
        β”œβ”€β”€ england/                            # England submodule
        └── scotland/                           # Scotland submodule

TODO List

  • Timestamps: Scrape-Oriented vs. Gov-Oriented
    Are log timestamps the time we scraped the data, or the time of the actual government update?
    What if a specific event doesn't have a timestamp?
    ➀ Open Civic Data also discussed this
  • Unique IDs
    OpenStates uses a lot of generated UUIDs. Ideally, our folder/file structure and naming conventions should follow official legislative data.
    • Jurisdiction ID: Follows OCD naming convention β€” country:us/state:fl/government
    • Session ID: TODO
    • Bill ID: jurisdiction_id/sessions/:session_id/bill.identifier β€” use official ID like HB250
    • Vote Event ID: TODO
    • Person ID: TODO
    • Event ID: TODO
  • Bill Folder + Filename Convention
    • bill.metadata: bill_id/log/metadata_update_{TODO}.json
    • bill.actions: bill_id/log/action_{TODO}.json
    • bill.votes: bill_id/log/vote_{TODO}.json
    • bill.sponsors: bill_id/log/sponsor_update_{TODO}.json
    • bill.versions:
      • File: bill_id/files/version_{TODO}.pdf
      • Log: bill_id/log/version_add_{TODO}.json (we can extract PDF content to JSON)
    • bill.documents:
      • File: bill_id/files/documents_{TODO}.pdf
      • Log: bill_id/log/document_add_{TODO}.json (we can extract PDF content to JSON)
  • Event Folder Convention
    Events tied to sessions should live inside the session folder.
    Out-of-session events: can we define a reliable alternate time span for organization?
  • How to Handle Metadata Changes
    Metadata (like bill) may change from scrape to scrape.
    Use fieldMask for lightweight updates, or consider JSON Patch.
    ➀ https://jsonpatch.com
    // bill.metadata_events
    {
      "fieldMask": ["from_organization"],
      "bill": {
        "from_organization": ""
      }
    }
    

Environment Setup

For now, we aren't doing any coding that touches the previous code. All code/decisions should be in this scraper_next folder as an isolated experiment. If you don't have git access, message @sartaj.

Easy: Download Data and Explore With SQL Explorers

Advanced: Running Scrapers / Importing PG Dumps

  • Open States
    • via Scraper. We are using this for v1. By running the scrapers directly, data will be much more up to date as it scrapes data directly. It also allow us to run certain scrapers, like USA, multiple times a day.
    • via SQL Dump, which updates every few days, and has bill full text, in addition to a lot of other content like maps data.
  • Chicago SQL Dump. This updates every night and is managed by Datamade, who we have already been collaborating with on Chicago data. They also do stuff like AI summaries that we can pre-pull.

Prior Art

Communications