Saving Data: Preservation during Political Turmoil

The first week of the Trump administration has been a disastrous assault on many fundamental human and academic rights. So far, a media blackout has been ordered for employees of the EPA, and moving forward, the administration says that “political staffers” will be required to review all published work and data produced by EPA scientists before release to the public or in academic venues.

Attempts to control access to data are spreading beyond the sciences as well. Last week, two Senators (Mike Lee from Utah and Paul Gosar from Arizona) introduced a bill that would undermine the Fair Housing Act, which prevents access to housing based on racial discrimination. In the text of the bill:

No Federal funds may be used to build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.

Of course, the larger fear is that the massive amounts of data available from governmental sources, such as the EPA, U.S. Census, Bureau of Labor Statistics, and so much more will be taken down. And as the text of the aforementioned bill suggests, the motivation for hiding data is clearly ideological; our leaders intend to enable the discrimination of people and erase human rights.

To guard against this, there have been many coordinated efforts to rescue and preserve our data. Here in New York, NYU’s ITP and Tisch School of the Arts are hosting a guerrilla data rescue event. Developers, coders, librarians, archivists, and activists will gather to work on scraping data, archiving web sources, and coming up with ways to preserve important data.

Other events have been springing up elsewhere. The New York Academy of Medicine organized a drive to save data related to climate change. In NYU’s sphere, members of the OpenGeoPortal geospatial metadata consortium  have launched efforts to complete a data crawl. Thus far, they have archived 20 terabytes of data from these sources:

  • EPA Data Download Site
  • EPA Data Commons FTP Site
  • EPA eGrid
  • EPA FTP Portal
  • EPA Toxic Relief Inventory (TRI)
  • EIA Open Data Portal
  • EIA Layer Information for Interactive State Maps
  • EIA Natural Gas Annual Respondent Query System
  • USGS National Land Cover Database (2011, 2006)
  • USGS National Hydrography Data set
  • NREL GIS Data Portal
  • US Fish & Wildlife National Wetlands Inventory
  • US Census Bureau Entire 1980,1990, 2000, 2010 Population and Housing Census; ACS 2002-2013; EEO Disability 2002-2008; Econ 1997-2015.
  • HUD Data Portal
  • BTS National Transportation Atlas Database 2011-2015 including all tabular statistical data
  • BJS Bureau of Justice Statistics Raw Data Sources
  • HRSA Data Warehouse
  • NOAA Northern Hemisphere Snow & Ice Archive 1997- Monthly
  • NOAA GSOM Global Temperature (Stations)
  • NOAA Nighttime Lights Time Series 1992 – 2013
  • NOAA Global Self-consistent, Hierarchical, High-resolution Shoreline Database (GSHHG)
  • NOAA   Continually Updated Shoreline Product
  • NOAA   Historical Shoreline Survey
  • NOAA   USGS National Assessment of Shoreline Change Vector Shorelines
  • NASA GISTEMP Global Temperature (Global Mean)
  • National Atlas – Entire Atlas

Also, kudos to Stanford University’s Jack Reed, who developed GovScooper, a tool for scraping data from the portal so it can be preserved.

The preservation of this data is only one part of the equation. Creating metadata for it so it can be discovered in new context is a next important step. At NYU, we’ve already been engaged in the process of preserving federal and local data. Our Spatial Data Repository contains a range of U.S. Census data, files from NYC’s Bytes of the Big Apple, and more. Our goal is to join in with these coordinated efforts and continue to make data accessible to as many people as possible.

Site content licensed Creative Commons Attribution-ShareAlike 4.0 International License. Creative Commons License