David Herzog spoke to our information graphics class on data collection. His bio page Missouri School of Journalism
Here are some notes and resources from his presentation:
The data flow
- Locating the data
- Obtaining the data
- Evaluating the data
- Working with the data
- Visualizing the data
Locating the data
- “Database state of mind”
- Data has to exist. Where?
U.S. agency FOIA pages
For example: Drug Enforcement Administration
Academic data catalogs
- State Auditor
- U.S. Government Accountability Office
- U.S. Inspectors General
Google Advanced Search
- Look for data files
- Look for key words
- Look only on government sites
- In the field
- At the office
Obtaining the data
- Download it
- Write or request a scraper with ScraperWiki
- Convert PDFs to Excel files CometDocs
- Just ask for it
- Make an open records request
Evaluate your data
Look at it immediately when you get it
- It is what you asked for/expected?
- How many rows/records of data?
- Is the file format OK?
Become a critical consumer of data
- Does it look too good to be true?
- Beware of missing information?
- Who collected the information?
- How? What are their methods?
- What is their agenda?
- Who supports them financially or otherwise?
Examine with a text editor
- Notepad++ for PCs
- TextMate for Mac
Spreadsheet data integrity checks
Google refine data integrity checks
Scrub dirty data as needed
Analyze and Visualize your data
- Tableau Public
- Many Eyes
- Google Public data explorer
- Geocommons Maker