Alan Zhao

Aug 28, 2017

Resources and Inspiration Online

As much of my learning has come from online resources as from Swarthmore or Yale classrooms. This is my attempt to share some of the absolute best blogs I've encountered. They run the gamut from very technical to purely practical. I'll periodically add to this as I find more.

Practical Business Python

http://pbpython.com/

The blog that started it all. Written by Chris Moffitt, an engineer by training but finance guy by trade, this blog is focused on showcasing how Python can take on business analysis usually left to Excel. Example topics include using pandas for pivot tables, and streamlining bulk generation of Excel reports. This blog inspired me to learn Python more deeply and provided numerous insights to automate work; at my old nonprofit we even used some posts as training material. Most importantly of all, a conversation with Chris himself led me to the idea of creating this blog.

Math union Programming

https://jeremykun.com/

Written by Jeremy Kun, a math PhD turned software engineer, this blog covers a variety of mathematical topics (ie optimization, statistics, etc). What's interesting is that it is all done from a coder's perspective, focusing on intuition and programmatic examples rather than endless equations. His article on support vector machines is illustrative of this. Many of his posts has helped clarify several concepts from graduate level statistics courses I took at Yale.

Own your bits

https://ownyourbits.com/

Written by a hacker "Narcho," this blog focuses on leveraging the raspberry pi as a home cloud storage solution. In other words, a DIY dropbox. A great DIY project that I did over the past winter break with a leftover portable hard drive.

Narcho himself is an ardent advocate of understanding all technology used in daily life, particularly the hardware side. Obviously impractical to do for everything, but an admirable mindset to drive your curiosity.

John Myles White's Blog

http://www.johnmyleswhite.com/

Another blog attempting to explain statistical content in a colloquially understandable, but mathematically rigorous way. His optimization perspective on mean, median, and mode was eye-opening for me.

Jul 23, 2016

Thoughts on Coursera's Algorithms and Data Structures

Motivation

After spending the past three years largely independently learning programming with a "just make it work" mentality, I decided in January 2016 to formalize my knowledge with an actual course. Massive open online course companies like Coursera, EdX, and Codecademy offer lots of programming courses, but the majority of these are introductory-level or application-specific (like data-analysis with Python or web development with Ruby). I wanted something that would be a general "next level" course but also one that I could do with Python. I talked to software engineer friends and looked at some universities' syllabi and found that the Data Structures and Algorithms topics were the standard 2nd or 3rd course.

Coursera had an entire 6 month Data Structures and Algorithms specialization and it (mostly) fit before my graduate school began so I signed up. I liked that the class would be going in-depth, with 5 separate courses and also a "capstone" applied project. It also allowed for numerous languages (Ruby, Python 2&3, C, C++, Java) but only officially supported Java, C and Python 3 with starter files. To motivate myself further to actually complete the course, I paid the ~400 dollars to get the course verified (and for the nifty certificates on my linkedin).

Course Structure

Unsurprisingly, the course philosophy is learning through doing. You watch 1-2 hours of lectures, with some embedded quizzes, and then code solutions to the problem set. The solutions are submitted to a cold, inhuman grader that compiles your code and checks against 15+ test cases. There's also a discussion session to post questions and answers. The homeworks are well designed and closely follow the lectures. I did find that the coursework typically took double the amount of time estimated (6-8 vs 3-4 hours). The courses are pre-recorded and each one follows a set session. Missing one is fine as you can restart it, but you only have one full year from payment to finish the 6-month specialization.

Learnings

Testing

Learning how to design test cases and automate them was the most valuable takeaway from the course. The test cases are hidden beyond the first 3, so you need to become adept at implementing test cases to pass. Running tests manually becomes incredibly annoying, so I got much more comfortable with the assert statement. The course introduced me to the idea of testing corner cases with manually created cases, and then automated testing with random inputs (and brute-force calculated correct outputs).

Once I saw the time and headache save from rigorous testing, I started implementing testing at work. Prior to the course, my code base utilized integration testing not unit testing. Afterwards, I made it a team project to go back and write unit tests and the amount of blocker issues we uncovered was incredible.

Pseudocode Literacy

I never read formal pseudocode before - shocking, I know. Pseudocode was intimidating, and I just avoided it. You can't avoid it in this specialization though: the course is language-agnostic so the lingua franca is pseudocode. Every lecture has pseudocode, so every week involves translating what's conceptually laid out into code. This skill greatly increased my ability to pick up technical documentation on code-agnostic places like wikipedia.

Immediate Applicability

Algorithms and Data Structures sounded more like conceptual learnings than something helpful in my day to day at work. I was wrong about that. Learning "memoization" (giving your program memory of past results) as part of dynamic programming immediately gave me insight on how to speed up a database call that made redundant calculations. Implementing it took less than two days and cut down a query run 10x/day down from 5-10 minutes to 1-2 minutes. Sounds simple, but I had never heard of the concept before. However, because the class is general, not applications focused, you're going to need to figure out the applications yourself. I still haven't figured out applciations for all those graph algorithms or self-balancing trees.

So What's Missing?

Declining Enrollment

The course started off with high participation that gradually declined as we advanced through courses. In the first course, Algorithmic Toolkit, we had several thousand students across the world according to a live world map with student populations. The forums were active; every question I had while doing the homeworks was already asked and answered on the forums.

It was a different story by the third course, Algorithms on Strings. From forum activity, I estimate only several hundred people are taking this course. One question that I posted only got <10 views after several days with no response. Coincidentally, the world map showing classmate numbers is also no longer on the course side. I wish I had taken a screenshot of the original world map with student populations to prove this! The enrollment decline is expected from the increasing difficulty of the course and the extended commitment to stay on track. To be honest, I'm one of those students who's fallen behind. After completing the first two courses, I wasn't able to complete Algorithms on Strings on time and am now doing it with the second session. I hope more students regroup with me; the active discussion is key to learning.

Python's Limitations

I love Python because it's an abstracted high-level language. Unfortunately, this makes it difficult to implement many of the data structures because they don't exist natively in the language. Take the Python list object: it's easy to understand and use to build applications because of it's flexibility. The downside is that you need to simplify it or use it other ways to use it as a linked list or queue structure.

This lack of native support for "low-level" Python also meant a greater lack of resources than what I expected. Most supplemental resources I found outside the class were exclusively C or Java, so I relied heavily on reading pseudocode and stack overflow Python questions. A great find was this free Data Structures and Algorithms textbook written in Python. Roughly 70% of the first two courses was covered in these books.

After all that's said and done, I do thank Coursera for including Python as a supported course. I would never have taken on this course if it required learning a whole new language.

Hope this was helpful. Now on to finishing Algorithms on Strings!

May 01, 2016

Do Good Data Talk

I had the opportunity to speak at the 2016 DoGoodData conference on how nonprofits should think about their capacity to utilize data, with a special focus on the small nonprofits that use Excel for everything. It was titled Building an Analytical Toolkit Beyond Excel and well received. The final slides are below.


This talk was done in conjunction with my good friend (and now former co-worker) Jeremy James. The talk was the result of many years of working the data trenches at our nonprofit and the with the many lessons learned as a result. In a nutshell, we talked about how many nonprofits get excited about the end results of using data effectively: the analysis, insights, and visuals. This excitement usually overshadows the foundational work of collecting and storing data such that aforementioned exciting stuff can happen. This isn't helped by sales driven companies that promise that their software can unlock your data no matter what. We recommended carefully evaluating the state of your needs and data infrastructure, and then picking the appropriate tool to address that.

The room was packed showing that moving beyond Excel is a task many nonprofits face and feel. My favorite moments from the talk: getting a good laugh when we shared how long our database development took (much longer than expected) and hearing one attendee tell us that this talk was the reason he came to the conference.

← Future Page 2 of 2