About the Visualization
I've been a huge fan of national parks; one of my goals is to visit all 60+ of them (so far only halfway there). When I saw that Tableau was sponsorsing a visualization competition at the 2016 DoGoodData conference, doing something with National Parks came to mind. The competition rules were broad: pick a social sector data set and tell a story with it. I did a bit of digging on the National Park Service site and saw they have a significant amount of data collected on a statistics subsection.
Unfortunately the data was separated by every park and in some report-style Excel spreadsheets. I used Python to scrape, clean and aggregate the different pages data. Once I built the full dataset, I started exploring the data. I was curious to see how attendance varied by park, how it grew over time, and how cyclical it was. My own experience visiting parks had given me intuition about these trends (ie Grand Canyon is really popular compared to Bryce Canyon, nobody visits Acadia in the Maine winter). These guiding questions each drove what went into each page.
Page One - Attendance by Park
Building the map of NPS sites with their dot size determined by attendance was straightforward, but the visualization looked bland initially. I really wanted to get some kind of image of the parks - after all people love and recognize parks for their imagery, not their data points. I wasn't sure where I could find a data source of iconic photos for each park. I considered grabbing each park's wikipedia page, but found that their image quality varied and in some cases their page didn't have a photo yet. I ended up with a neat solution - use each park's official NPS site. Each site had a high-quality iconic image and I could embed the image directly into Tableau using Tableau's website embed feature. A bit of tinkering with the dimensions and having Tableau load the page at the image's HTML class div and voila, an on demand library of curated images.
Page Two - Visitation Growth Over 40 Years
Parks have gotten much popular since data started getting collected in 1979; almost doubling. It turned out that the growth is concentrated in a few parks in particular: Zion, Yosemite, and Grand Canyon.
Page Three - Seasonality of Park Attendance
Each park has some sort of seasonality, here you can see what each particular park's looks like. Most follow a trend of high attendance in summer (nice weather plus summer break) but several parks diverge from this. Some are obvious; Death Valley has almost no visitation in the summer. Others I'm not sure about; Great Smoky Mountains peaks again in the fall (maybe people checking out the autumn leaves?)
I was pleased with what came out, but it still could use a bit of polish. Overall, the data collection and visualization building took a few days as a side project. Maybe I'll write an in-depth analysis of the data in a future blog post with a cleaner visualization.