After a great run, the Winning the Internet blog has been retired. However, you can still keep in touch with New Media Mentors here.
As we get ready for Netroots Nation 2014, we’re taking a closer look at some of the convention’s most exciting training sessions. We’re interviewing the trainers and taking you inside some of online activism’s most popular and elusive topics.
Today we’re interviewing Yahel Carmon of Blue State Digital, who’ll be leading Beyond Excel: How to analyze your data in SQL.
NN14 Training Session
Beyond Excel: How to analyze your data in SQL
Many analysts are able to force Excel into doing the things they need. But once the data gets big enough, their VLOOKUPs and pivot tables break Excel, or force them to wait minutes to get data that should only take seconds. Learning how to utilize SQL can be daunting. Attendees will get hands-on experience in learning how to import their data, analyze it in ways that Excel makes painful and get back useable results.
Q: Tell us about yourself and your experience analyzing data with SQL.
A: I’ve been using SQL to work on data in the progressive movement for the last four years at Blue State Digital. SQL has been a powerful and helpful tool to answer the kinds of questions it generally would take Excel much longer to answer.
In 2012, I left BSD to join the Obama campaign and work on the Digital Analytics team, focusing on fundraising optimization. I spent 12 hours a day, six days a week, working with a team of people to figure out what was working, what wasn’t, and where the gaps in our assumptions were. It’s not hard to draw a straight line between the analysis we did in SQL and additional millions of dollars the Obama campaign had in its pocket to win the election.
One of my nerdier accomplishments on the Obama campaign was optimizing the speed of our email reporting SQL script. For every donation email test we ran, an hour after sending the test emails, we’d analyze the data with that SQL script, but, since the campaign’s database was huge (filled with data on millions of supporters), it took 11 minutes to run. By the time the report was done, we’d be making decisions using data was already stale. I dove into the SQL, and found a few minor places where it was doing things in a way that really slowed it down. Just a few hours of work, and I was able to make the entire thing run in just 7 seconds. After that, we were able to make our judgments that much more quickly; reporting was never a bottleneck for us again.
Q: In your experience, is it pretty easy to learn to use SQL or does it require extensive training and expensive software programs?
A: SQL is a pretty expressive language; at the most basic level, the way you write it is fairly similar to how you might verbally describe the data. If someone says “Get the email addresses from constituents who joined between January 2012 and January 2013”, you could literally write something like SELECT email FROM constituents WHERE join_date BETWEEN “2012-01-01” AND “2013-01-01”.
At a basic level, there’s a lot you can do with minimal knowledge, especially when you need to join data together, the way you might with a VLOOKUP, or group it together, like with pivot tables.
There’s no need for expensive software programs. There are free SQL applications for every platform, and lots of operating systems come bundled with some form of SQL natively. Of course, if you want to be working with gigabytes or terabytes of data, you’ll probably need a dedicated SQL server of some kind, at which point there are definitely costs involved, but often organizations will have something like this of their own.
Beyond the basics, there’s definitely a lot to learn. But the internet is a big and helpful place; it’s how I learned SQL. With practice, I think anyone can learn enough SQL to answer the basic questions they have about their data.
Q: When is it better to use SQL to analyze data than Excel?
A: There’s a joke that “Big data is data that crashes Excel”. While I wouldn’t go that far, it’s generally better to use SQL to analyze your data than Excel if the work you’re doing in Excel, with complex VLOOKUPs, formulas, and pivot tables is taking more time for Excel to run than it did for you to actually setup your data. Before I learned SQL, I would regularly do things in Excel that would take 20 minutes to load. With a few minutes of SQL setup, the same type of work took just seconds.
The other time that SQL is demonstrably better than something like Excel is when your data is very relational. What that means is you have trouble expressing it in a meaningful way in a single spreadsheet. For example, your data for your list of supporters is different than the data of your donations. But you can draw a relationship between the two, since the donations are things the supporters did. Excel can be clunky in representing that, especially when you can have multiple matches for a single relationship. SQL is great at dealing with data where there are complex relationships between different data groupings.
Q: In your opinion, what are the two biggest advantages to using SQL over Excel?
For people who want to wear the hat of analyst, SQL’s biggest strengths are speed and flexibility.
Excel is great, but people who are sitting around watching spinning wheels for 20 minutes at a time just to run a VLOOKUP on two datasets of 20,000 rows each are wasting their time. Using SQL for these kinds of problems means you spend more time on analysis and less time waiting for results.
Q: Why should folks attend your session at Netroots Nation, and how can they connect with you?
A: There are two types of people that this session is specifically targeted to: First, people who have an unhealthy relationship with Excel and waste their time waiting for things to run. Second, people who might already have SQL access at their place of work, but don’t know how to take advantage of it. I’ll be covering both the basics of the language, as well as some advanced tips to help avoid danger with bigger data sets.
One thing I’m pretty excited about is that I’m building an easy SQL querying website for training attendees. That means if you come to the training, you won’t waste any time downloading software, or learning to use the command line. I’ve been to too many technical trainings where either the first half of the training is spent helping people get setup, or the entire training is just a lecture with no hands-on experience. I’m not interested in lecturing. At this session, you’ll have the best of both worlds: All you’ll need is a modern browser and internet access, and you’ll be able to participate and learn something.
Easiest way to reach out to me is on Twitter: @Yahel.
To attend this training, or one of the 39 others at Netroots Nation 2014 in Detroit, register now.