Hackpads are smart collaborative documents. .

Keller Scholl

946 days ago
Unfiled. Edited by Christopher Scranton , Keller Scholl 946 days ago
Data Liberation Project
 
Lyre C
  • Who's spending dark money in NH elections?
  • 1000 lobbyists. Who pays them?
  • Who pays how much in independent expenditures?
  • Need help pulling files off of a website, scrape pdfs/handwritten data, represent the data
 
Christopher S Project Goals:
  1. Secure downloads of info source file PDFs from state website (approx 1,000 files) and aggregate into centralized online repository
  1. identify files and file structure of website
  1. build tool to automatically scrape the website, download the files, and deposit them individually to an online location
  1. ensure that the scrape tool can be run periodically, potentially by a novice computer user (" 1-click trigger?)
  1. Process PDF data into machine-readable data
  1. Identify types and numbers of 1) PDFs with printed answers (OCR potential) and 2) PDFs with handwritten answers (OCR challenged)
  1. Identify options for OCR tools that can process Type 1 "Printed Answers" PDFs
  1. Run OCR on all documents, check for errors, & identify initial counts of Type 1 vs Type 2 PDFs.
  1. For Type 2 "Handwritten Answers" PDFs, identify either
  1. robust handwriting-recognition tools & test on samples, or
  1. collect Type 2 PDFs for crowd-sourced (min. double-redundant) human data entry
 
Project Participants
  • Xanni Brown - xanni@opendemocracy.me
  • Scott
Keller S
  • Keller Scholl - Keller.scholl@gmail.com
 

Contact Support



Please check out our How-to Guide and FAQ first to see if your question is already answered! :)

If you have a feature request, please add it to this pad. Thanks!


Log in / Sign up