A few weeks ago my colleague Vicky Steeves and I taught a class on using Python to accomplish typical data management tasks often encountered by researchers using data on the web, including extracting and merging tabular datasets available for public use and culling structured data from text files.
One of the datasets available on the web that we used as a real-world example is the storm events database provided by NOAA. This constantly updated dataset records storm activities from 1950 to the present, including the dates, locations, and type of storms. NOAA presents these datasets in a series of CSVs, each dedicated to a single year, and our task using Python was to automate the downloading and merging of these files to create a single dataset of 1.35 million records.
I thought it would be worthwhile to take a look at the data on a map. As NOAA notes, its dataset for the years before 1992 were keyed from printed storm data publications, and the type of events recorded from that source were limited. From the mid-1990s on, a much expanded list of storm types were recorded, and born-digital files replaced keyed data as the source of the dataset.
Below are the results for the decades between 1950 and 1989, concentrating on tornado events. Start and end coordinates for tornadoes were provided in the dataset, making them a tempting candidate for visualizing storm paths. I limited the time span to the years prior to 1990 to account for the change in recording (and with it, the nature of the collected data) in the 1990s. This ultimately yielded a much smaller dataset of just over 10,000 events. On the map below, select a decade and click on a line depicting a storm path to access further information.
Storm Events: The 1950s
On a side note, the data prompted me to search for a personally memorable tornado event, the Thornton, Colorado, tornado of 3 June 1981 that I experienced as a child growing up in the Denver area and that prompted a stressful afternoon spent sheltered in the basement. Sure enough, the tornado is on the list.
As a reminder, this course on Python for data management and many others in research data management are on offer by the Data Services team at Bobst Library. If this type of visualization interests you, take note that Data Services is also now conducting workshops in best practices in data visualization led by my colleagues Denis Rubin and Him Mistry. For a complete list of classes, see here. You can also consult our new visualization library guide.
Specs: map built using Leaflet.js, a Stamen basemap, and Bobst Digital Scholarship Services’ NYU Reclaim Hosting. Special thanks to our GIS Specialist Michelle Thompson for mapping theme ideas. Photo source: The Denver City Pages