Alan Zhao

Jun 25, 2016

National Parks Historical Attendence

About the Visualization

I've been a huge fan of national parks; one of my goals is to visit all 60+ of them (so far only halfway there). When I saw that Tableau was sponsorsing a visualization competition at the 2016 DoGoodData conference, doing something with National Parks came to mind. The competition rules were broad: pick a social sector data set and tell a story with it. I did a bit of digging on the National Park Service site and saw they have a significant amount of data collected on a statistics subsection.

Unfortunately the data was separated by every park and in some report-style Excel spreadsheets. I used Python to scrape, clean and aggregate the different pages data. Once I built the full dataset, I started exploring the data. I was curious to see how attendance varied by park, how it grew over time, and how cyclical it was. My own experience visiting parks had given me intuition about these trends (ie Grand Canyon is really popular compared to Bryce Canyon, nobody visits Acadia in the Maine winter). These guiding questions each drove what went into each page.

Page One - Attendance by Park

Building the map of NPS sites with their dot size determined by attendance was straightforward, but the visualization looked bland initially. I really wanted to get some kind of image of the parks - after all people love and recognize parks for their imagery, not their data points. I wasn't sure where I could find a data source of iconic photos for each park. I considered grabbing each park's wikipedia page, but found that their image quality varied and in some cases their page didn't have a photo yet. I ended up with a neat solution - use each park's official NPS site. Each site had a high-quality iconic image and I could embed the image directly into Tableau using Tableau's website embed feature. A bit of tinkering with the dimensions and having Tableau load the page at the image's HTML class div and voila, an on demand library of curated images.

Page Two - Visitation Growth Over 40 Years

Parks have gotten much popular since data started getting collected in 1979; almost doubling. It turned out that the growth is concentrated in a few parks in particular: Zion, Yosemite, and Grand Canyon.

Page Three - Seasonality of Park Attendance

Each park has some sort of seasonality, here you can see what each particular park's looks like. Most follow a trend of high attendance in summer (nice weather plus summer break) but several parks diverge from this. Some are obvious; Death Valley has almost no visitation in the summer. Others I'm not sure about; Great Smoky Mountains peaks again in the fall (maybe people checking out the autumn leaves?)

I was pleased with what came out, but it still could use a bit of polish. Overall, the data collection and visualization building took a few days as a side project. Maybe I'll write an in-depth analysis of the data in a future blog post with a cleaner visualization.

May 01, 2016

Do Good Data Talk

I had the opportunity to speak at the 2016 DoGoodData conference on how nonprofits should think about their capacity to utilize data, with a special focus on the small nonprofits that use Excel for everything. It was titled Building an Analytical Toolkit Beyond Excel and well received. The final slides are below.


This talk was done in conjunction with my good friend (and now former co-worker) Jeremy James. The talk was the result of many years of working the data trenches at our nonprofit and the with the many lessons learned as a result. In a nutshell, we talked about how many nonprofits get excited about the end results of using data effectively: the analysis, insights, and visuals. This excitement usually overshadows the foundational work of collecting and storing data such that aforementioned exciting stuff can happen. This isn't helped by sales driven companies that promise that their software can unlock your data no matter what. We recommended carefully evaluating the state of your needs and data infrastructure, and then picking the appropriate tool to address that.

The room was packed showing that moving beyond Excel is a task many nonprofits face and feel. My favorite moments from the talk: getting a good laugh when we shared how long our database development took (much longer than expected) and hearing one attendee tell us that this talk was the reason he came to the conference.