Author: Jeff Bladt and Bob Filbin

  • When Visualizing Data, You Have to Fail to Succeed

    At 4:26 a.m. one morning last December, an email from our COO arrived in our inboxes. “Literally can’t sleep — thinking about [your data presentation]…” As a data scientist, if your data causes co-workers to lose sleep, you’ve done your job, perhaps too well. The rest of the email outlined new strategies for member acquisition and engagement that will save our organization over $200,000. The inspiration? One map. It was a simple depiction of the U.S., which showed member engagement by city, but also hinted at the potential of segmenting our users in order to give them the right content at the right time.

    Cities By Engagement

    Our data hasn’t always prompted action. There have been plenty of times when it has hit the cutting room floor, even if the insights were technically valuable (they would save us money or acquire members). But data is only valuable if people are willing to act on it. Which means, as a data analyst, you’ve got to sell it.

    Data visualizations are not art, they’re advertisements.

    Data presentations often feel like window-shopping — the data looks pretty, but co-workers aren’t buying. This is especially true when people feel like they’re at an art gallery looking at beautiful pictures rather than at a meeting discussing business problems. How do you make your data visually appealing, but also compelling enough that your colleagues are willing to spend their resources to act on it?

    The world of product design points to two different approaches. The first is the “predictive” black-box approach: Build a big, fancy visualization in private, release it to the world in your presentation, and assume it will succeed. This rarely works. Data scientists are not often trained artists or designers who know exactly what their audience wants. The second, more successful tactic is an “agile” approach, similar to how restaurants use test kitchens: Create variations on a theme and see what sticks. This means crafting and testing ten different scallop dishes and recognizing that at best, one will make it to the menu.

    Going with your first best idea is a high risk, low probability recipe for success. The sauce could be wrong, the portion size off, or the coloration may be unappetizing. Diners might love scallops, just not what you’ve created.

    At most companies, failing is risky, because failure can be read as weakness. But here’s a way to minimize risk: Make it part of your culture. We do that three ways at DoSomething.org:

    1. We hold an annual event called “Fail Fest,” during which staff members speak about what they learned from a failure while wearing a pink boa.
    2. We A/B test new product features.
    3. We test our content, including data visualizations. To create the one map that impressed our COO, we failed at least 10 times.

    Ten failures for one success. Quantity over quality is a common strategy in the natural world. When a fern unfurls in spring, it releases hundreds of thousands of spores, the vast majority of which never take root — some are caught by the wind, others stray onto tree branches. Of those that reach soil and germinate, many are trampled, eaten, or starved. Only a few reach adulthood and start the process again.

    The lesson from nature is simple: The more we try, the more we succeed, even if the quality of each subsequent attempt does not improve.

    It’s foolhardy to believe that your first best-effort attempt is going to be the right one. But producing many quality versions is resource intensive. So you’ve got to move from many possible versions to the right one in an efficient way.

    Sequence development to save time.

    We do this by sequencing development phases: First, we make prototypes. Second, we test them. Third, we go to production. Many of our prototypes are nothing more than sketches on a whiteboard. Often we test five versions of an idea on co-workers, see what resonates, and then create five robust versions of the best one. We minimize the risks inherent in making just one visualization, while effectively allocating time away from ultimately doomed iterations.

    For our engagement visual, we knew our goal was to convey that user segmentation is valuable. But which variable should we show: age, gender, mobile carrier, city, or first name? To answer this, we created simple chart visualizations of all five like this one:

    Bladt2.jpg

    Instead of perfecting the form, we showed these basic mock-ups to co-workers. Their responses were clear: the city data resonated most. So the next step was to create five more visualizations with that data, fleshing out the form.

    Here we struggled with questions like: Should we show data in a table, a bar chart, or on a map? Should engagement be represented by a two-color scale, or one? So again we created five rough versions, collected informal feedback from co-workers, and iterated quickly. In the end, we created 10 visualizations, but covered a multitude of possibilities.

    Below is the rough visual that resonated most with colleagues — they responded to seeing our membership engagement by city in a real-world context, but had trouble seeing patterns in the data.

    Cities By Engagement 2

    In our final visualization, we moved to a two-color scale to highlight relative differences in engagement between cities. We also added lists of the five most and least-engaged cities. The lists alone provide the insight; the map adds depth and credibility.

    There are lots of ways to sequence development. But the end principle is simple: create more data visualizations than you need to show, because your first idea is unlikely to be your best. Even for this article, we pitched five headlines and synopses to HBR editors. We then wrote five versions of their favorite. So, this article is one of the best of 25 permutations.

    But is it good enough to change your behavior?

  • A Data Scientist’s Real Job: Storytelling

    Every morning at DoSomething.org, our computers greet us with a report containing over 350 million data points tracking our organization’s performance. Our challenge as data scientists is to translate this haystack of information into guidance for staff so they can make smart decisions — whether it’s choosing the right headline for today’s email blast (should we ask our members to “take action now” or “learn more”?) or determining the purpose of our summer volunteer campaign (food donation drive or recycling campaign?).

    In short, we’re tasked with transforming data into directives. Good analysis parses numerical outputs into an understanding of the organization. We “humanize” the data by turning raw numbers into a story about our performance.

    When many people hear “Big Data,” they think “Big Brother” (Type “big data is…” into Google and one of the top recommendations is, “…watching you.”). Central to this anxiety is a feeling that what it means to be human can’t be tracked or quantified by computers. This fear is well-founded. As the cost of collecting and storing data continues to decrease, the volume of raw data an organization has available can be overwhelming. Of all the data in existence, 90% was created in the last 2 years. Inundated organizations can lose sight of the difference between what’s statistically significant and what’s important for decision-making.

    Using Big Data successfully requires human translation and context whether it’s for your staff or the people your organization is trying to reach. Without a human frame, like photos or words that make emotion salient, data will only confuse, and certainly won’t lead to smart organizational behavior.

    Data gives you the what, but humans know the why.

    The best business decisions come from intuitions and insights informed by data. Using data in this way allows your organization to build institutional knowledge and creativity on top of a solid foundation of data-driven insights.

    For DoSomething.org, mapping our communications data gives us an amazing window to view our audience. We have over 1.5 million users, and for each one we have hundreds of data points to what and how they respond to new volunteer opportunities via email and texting. Here’s how we go from 350 million data points to organizational change, and how organizations grappling with similarly huge amounts of information can do the same:

    1. Look only for data that affect your organization’s key metrics. At DoSomething.org, our goal is increasing teens’ engagement in volunteering. So when we did a deep dive on our data last fall to determine how to increase that metric we started with simple questions: Who currently volunteers the most, and how can we find more people like them? We were able to ignore larger volumes of data that didn’t answer our questions and hone in on what really mattered.
    2. Present data so that everyone can grasp the insights. Hint: never show a regression analysis or a plot from R. In fact, our final presentation had very few numbers. We focused on telling a clear story with simple slides and visuals. While we used regression analysis to find a list of significant variables, we visualized data to find trends: even data analysts are much better at discovering geographic (and underlying demographic) trends on maps than in regression tables, especially when there are multiple underlying patterns with ambiguous relationships.

      By presenting the data visually, the entire staff was able to quickly grasp and contribute to the conversation. Everyone was able to see areas of high and low engagement. That led to a big insight: Someone outside the analytics team noticed that members in Texas border towns were much more engaged than members in Northwest coastal cities.

    3. Return to the data with new questions. Once we learned who our most engaged members were, we returned to the data to see what campaigns those members liked best; in other words, what led those members to get involved. The answer turned out to be campaigns around improving community health, an issue that disproportionately impacts minorities. This information allowed us to better tailor our volunteer campaigns going forward to engage new members, reach out to the right partnerships for those campaigns, and also highlight another potential area for growth — white, male college students in the Northwest.

    Data scientists want to believe that data has all the answers. But the most important part of our job is qualitative: asking questions, creating directives from our data, and telling its story.

    Please join the conversation and check back for regular updates. Follow the Scaling Social Impact insight center on Twitter @ScalingSocial and give us feedback.

  • Know the Difference Between Your Data and Your Metrics

    How many views make a YouTube video a success? How about 1.5 million? That’s how many views a video our organization, DoSomething.org, posted in 2011 got. It featured some well-known YouTube celebrities, who asked young people to donate their used sports equipment to youth in need. It was twice as popular as any video Dosomething.org had posted to date. Success! Then came the data report: only eight viewers had signed up to donate equipment, and zero actually donated.

    Zero donations. From 1.5 million views. Suddenly, it was clear that for DoSomething.org, views did not equal success. In terms of donations, the video was a complete failure.

    What happened? We were concerned with the wrong metric. A metric contains a single type of data, e.g., video views or equipment donations. A successful organization can only measure so many things well and what it measures ties to its definition of success. For DoSomething.org, that’s social change. In the case above, success meant donations, not video views. As we learned, there is a difference between numbers and numbers that matter. This is what separates data from metrics.

    You can’t pick your data, but you must pick your metrics.

    Take baseball. Every team has the same definition of success — winning the World Series. This requires one main asset: good players. But what makes a player good? In baseball, teams used to answer this question with a handful of simple metrics like batting average and runs batted in (RBIs). Then came the statisticians (remember Moneyball?). New metrics provided teams with the ability to slice their data in new ways, find better ways of defining good players, and thus win more games.

    Keep in mind that all metrics are proxies for what ultimately matters (in the case of baseball, a combination of championships and profitability), but some are better than others. The data of the game has never changed — there are still RBIs and batting averages; what has changed is how we look at the data. And those teams that slice the data in smarter ways are able to find good players that have been traditionally undervalued.

    Organizations become their metrics.

    Metrics are what you measure. And what you measure is what you manage to. In baseball, a critical question is how effective is a player when he steps up to the plate? One measure is hits. A better measure turns out to be the sabermetricOPS” — a combination of on-base percentage (which includes hits and walks) and total bases (slugging). Teams that look only at hitting suffer. Players on these teams walk less, with no offsetting gains in hits. In short, players play to the metrics their management values, even at the cost of the team.

    The same happens in workplaces. Measure YouTube views? Your employees will strive for more and more views. Measure downloads of a product? You’ll get more of that. But if your actual goal is to boost sales or acquire members, better measures might be return-on-investment (ROI), on-site conversion, or retention. Do people who download the product keep using it, or share it with others? If not, all the downloads in the world won’t help your business.

    In the business world, we talk about the difference between vanity metrics and meaningful metrics. Vanity metrics are like dandelions – they might look pretty, but to most of us, they’re weeds, using up resources, and doing nothing for your property value. Vanity metrics for your organization might include website visitors per month, Twitter followers, Facebook fans, and media impressions. Here’s the thing: if these numbers go up, it might drive up sales of your product. But can you prove it? If yes, great. Measure away. But if you can’t, they aren’t valuable.

    Metrics are only valuable if you can manage to them.

    Good metrics have three key attributes: their data are consistent, cheap, and quick to collect. A simple rule of thumb: if you can’t measure results within a week for free (and if you can’t replicate the process), then you’re prioritizing the wrong ones. There are exceptions, but they are rare. In baseball, the metrics an organization uses to measure a successful plate appearance will impact player strategy in the short term (do they draw more walks, prioritize home runs, etc.?) and personnel strategy in the mid and long terms. The data to make these decisions is readily available and continuously updated.

    Organizations can’t control their data, but they do control what they care about. If our metric on the YouTube video had been views, we would have called it a huge success. In fact, we wrote it off as a massive failure. Does that mean no more videos? Not necessarily, but for now, we’ll be spending our resources elsewhere, collecting data on metrics that matter. Good data scientists know that analyzing the data is the easy part. The hard part is deciding what data matters.

    Please join the conversation and check back for regular updates. Follow the Scaling Social Impact insight center on Twitter @ScalingSocial and register to stay informed and give us feedback.