Hackpads are smart collaborative documents. .

Scott Linford

943 days ago
Unfiled. Edited by Scott Linford 943 days ago
Scott L Changes - Sept 23, 2014
Added ZIP functionality to both scrape.py (Independent) and scrape2.py (Lobbyists). 
  • Files are downloaded into folder
  • An archive is created. The entire folder is ZIPed up.
  • Code has been committed to GitHub
Original Prototype Presented - Sunday, Sept 21, 2014
Working Python prototype. Downloads all PDFs listed at:
 
943 days ago
Unfiled. Edited by Scott Linford 943 days ago
Scrape Tool Discussion - issues, suggestions, future direction
Scott L Handle IO Errors
The NH website is flakey. IO errors are common. A single IO glitch causes the Scrape Tool to crash leaving behind an incomplete set of reports. The tool must then be restarted by hand from the beginning. There is no easy way to continue from where it crashed. Retry logic is a better solution, easier to implement, easier to test. Retry logic would prevent the script from bombing out. 
Add error handling logic:
  • Retry logic for loading HTML
  • Retry logic for downloading PDF
  • Skip item after n retries
  • log the failed URL
  • move on to the next report
 
943 days ago
Unfiled. Edited by Scott Linford 943 days ago
  1. ensure that the scrape tool can be run periodically, potentially by a novice computer user (" 1-click trigger?)
 
945 days ago
Unfiled. Edited by Scott Linford 945 days ago
Scraping 2013 Lobbyist Reports
 
Scott L Working Python prototype. Downloads all PDFs listed at:
  • <etc>
Code posted on Github
  • (I know, silly name. Needs some reorg)
Run from the command line
  • python scrape2.py
Logic was nearly identical to Scraping 2012 Independent Expenditures. This script new one just does it over and over. The sections are read dynamically. 
 
I have not run it to completion yet. It takes a while. Further testing ahead in other words.
 

Contact Support



Please check out our How-to Guide and FAQ first to see if your question is already answered! :)

If you have a feature request, please add it to this pad. Thanks!


Log in / Sign up