Author: Thomas C. Redman

  • In a Big Data World, Don’t Forget Experimentation

    In the data world today, “big” dominates. But sometimes you don’t need big. You need a small dose of exactly the right data. Data that bear precisely on the question at hand, that you understand deeply, and that you can trust. If such data are already at hand, great. But frequently they are not. And then, nothing beats a well-conceived, -designed, – controlled, -executed, and -analyzed experiment. Companies need to make sure experimentation is included in their “data toolkits,” learn when to use it, and develop the skills to conduct effective experiments.

    Let’s consider a recent example: Boeing’s lithium ion batteries for the Dreamliner 787. As you probably know, the issue has been all over the news for a couple of months. In two instances, the batteries have nearly caught fire, grounding the aircraft for over three months. The planes are now back in the air, but they still won’t be carrying passengers for another month or so.

    What Boeing needs right now is not big data but a sequence of experiments that do what previous tests did not: isolate the root causes of the problems that have occurred so far; verify that fixes being made really work; identify other problems that are yet to rear their ugly heads; predict how the batteries will perform under “worst-case scenarios”; and convince regulators and the flying public that the Dreamliner is safe for passengers. Over time, Boeing and its suppliers will almost certainly require still even more experiments to prevent future problems, to better predict battery life, and to test new designs, new manufacturing techniques, and new maintenance strategies. Some of these tests have and will be conducted under controlled conditions in laboratories, and some must be conducted under increasingly less controlled circumstances in the air.

    In the unfolding data revolution, companies must develop the capabilities to experiment. But too many eschew it. This was the case in a couple of recent client engagements. In both cases, senior managers had posed a seemingly simple question. But the effort to assemble all the relevant information, across their disparate data warehouses, was daunting. Months went on, and the question remained unanswered. In both cases, a simple experiment, taking just a few weeks, would have filled the bill quickly, cheaply, and better than any alternative. In the vast majority of similar cases, we are not talking about a series of complex experiments under the extreme conditions facing Boeing — just small-scale, narrowly focused real-world trials.

    Some may view experimentation as “old school,” not up to the rigors of the unfolding data revolution. Quite the opposite — its fabled past is the best reason to employ it today! Experimentation has a rich and storied history in product development and market research. It has contributed to hundreds of thousands of improved products in nearly all sectors, from agriculture, to electronics, to medicine, and so on. And not just design — industrial experimentation has contributed to improvements in the technologies and processes needed to grow corn, assemble cars, find oil, and so forth. Industrial experimentation has a rich history in the service sector as well. Many Information Age companies, such as Google, already get this message. And over the years, I’ve helped many others conduct simple and effective experiments in areas as diverse as customer onboarding to policy deployment.

    It is critical that companies understand why experimentation works, so they will know where to apply it. In short, when used properly, experimentation brings the power of the scientific method to the problems companies face today. This means the attendant focus, sharp definition of the question, careful design, data you can trust, and in-depth analyses — just what is called for in many situations.

    Companies also must learn how to conduct experiments. They are hard work. It’s all too easy to define the problem poorly, choose a bad sample, skimp on design, fail to calibrate instruments, or misinterpret the data. Boeing and its suppliers had, of course, conducted extensive battery tests. Still, as already noted, they missed the mark.

    Even when the experiment itself is flawless, things can go wrong in the end. Most experiments involve sampling — a seemingly incomprehensible topic to many managers — so they don’t trust it. I’ll never understand why so many otherwise smart managers will trust a slightly off-target population of data that’s known to be loaded with errors over a small, spot-on, high-quality sample, but they do! The only way I’ve found to combat this issue is to clearly explain the many benefits of experimentation and present them in a powerful, but balanced, manner.

    To be clear, I am not advocating experimentation over big data. If you have data you can trust, by all means use them. And there are many instances where conducting an experiment is simply infeasible. You can’t run an experiment to predict the advance of the flu or isolate potentially exploding manhole covers. One hopes that big data and experimentation will work hand in hand. It’s not hard to imagine the day when chips are built into Boeing’s battery cells to continually monitor the health of each cell and take it out of service when needed, obviating the need to experiment. Conversely, one expects there will be times when big data suggests a direction that demands further experimentation.

    Companies that aim to score with data must not adopt a one-size-fits-all approach and blindly follow the crowd into big data. They need many approaches and tools in their data toolkits. For almost all, experimentation deserves a prominent spot in that toolkit. For many problems, it is the best approach. Companies must develop a deep appreciation for why and how it works. And give it a fair chance.

  • Invest in Proprietary Data for Competitive Advantage

    Nearly 30 years ago, Stephen Brand made one of the more prescient observations about the unfolding data revolution: “On the one hand, information wants to be expensive, because it’s so valuable. The right information in the right place just changes your life. On the other hand, information wants to be free, because the cost of getting it out is getting lower and lower all the time. So you have these two fighting against each other.”

    This assessment is spot on. Data promises a lot of new value, from insights that lead to better-targeted advertising, to ideas for new products, to “this changes everything” discoveries — the “expensive” half of Brand’s observations. But realists fully appreciate the “free” half. Translating those insights into profitable new and improved services and sustained competitive advantage is another matter altogether.

    So how can a company spend more time on the valuable, expensive side and less on the free side? The key lies in developing and exploiting “proprietary data” — data that you and you alone possess. Without such data it is simply too easy for competitors to let you do the hard work of innovation, then copy your insights and erode your competitive advantage.

    I use the term “proprietary data” after the 2003 HBR classic, “IT Doesn’t Matter,” by Nicholas Carr. There, he introduced the contrasting notions of “infrastructure” and “proprietary” technologies. An infrastructure technology diffuses throughout the economy in support of numerous industries and, in time, becomes available to all. It is difficult to sustain a competitive advantage via infrastructure technologies.

    Proprietary technologies, on the other hand, can be protected, at least for a significant time. Because they are protected, their owners can gain and sustain a strategic advantage. This logic applies directly to data. Unless you can create and protect a measure of “proprietary data,” your path to riches will be fraught. You probably don’t need a lot of proprietary data. Just enough to distance yourself from the other guy.

    Data earn proprietary status in two distinct ways — through structure and content. Think of a blank meeting calendar as a structure. It divides time into days and hours, and calls for details such as meeting place, duration, and attendees. A useful structure but with no actual meetings. Day-in and day-out, transactions populate the calendar. A phone call may yield, “Tuesday, March 20, at noon: lunch with Pete at Olive Garden” — actual content.

    All data possess both components, and they provide companies tremendous opportunity to capture (and exploit) the subtleties of their environments. To appreciate the significance of this point, consider the many ways that you are known to those who do business with you. Your doctor thinks of you as a patient, your banker as an account, your lawyer as a client, and the department store as a shopper. You are the same person, but each employs a different structure because they are interested in different information.

    Companies should seek to create advantage via proprietary data structures. For example, Facebook and LinkedIn have found ways to gather interesting data about people through their “friends” and “connections,” respectively, and secured an advantage. Others don’t have access to these data. And network lock-in may help them maintain that advantage for some time. Another example is the CUSIP, a means of identifying securities and process trades efficiently. It is owned by the American Bankers Association and administered by Standard & Poors and has provided a long-term advantage to S&P.

    Even without a proprietary data structure, companies should still seek to create advantage through their content. Only you have the specific transaction, “John Smith bought peas, bread, and grape nehi at 9:27 AM on March 11, 2013,” and the tens of thousands like it each day! Retailers, such as Amazon, Kroger, and Target, use this data to tailor their advertising for John Smith. Further, they can combine John Smith’s transaction with others to better understand buying patterns, improve their supply chains, lay stores out more effectively, and so forth.

    It is easier for competitors to copy your successes with transactional data. One retailer develops an insight into customer behavior and others follow suit. But don’t avoid this avenue on that score alone. After all, you conduct transactions every day, enriching the base of “things the competitor doesn’t know” each time. And there can be solid advantage therein. For example, pharmacies build a patient’s prescription history to better identify possible drug interactions, suggest cheaper generics, and get customers into their stores. It is no coincidence that patients must walk all the way to the back to get to the pharmacy!

    The key to exploiting proprietary data lies in identifying which data offer the most potential for profit and sustained advantage. All data are not created equal; some are far, far more important, especially, into the future. It is vital to develop a deep understanding of what is essential for the future. Thus, innovators may seek competitive advantage in their product and service data; the low-cost providers on operations and process data; and those aiming for customer intimacy on customer data.

    Finally, you must give data on the intersection of proprietary and important special attention. Ensure they are of the highest quality, enrich the data associated models, and keep them safe from the prying eyes of competitors and pirates. These data must capture the lion’s share of the dollars you devote to data. And, more than anything else, you must constantly look for new ways they can help distinguish you from your competitors.

    Those who appreciate history will find something age-old in the recommendations above. After all, since time immemorial, the general who knew something the enemy didn’t had a better chance in battle; just as today every salesperson knows she gains an edge when she learns that a special pinot noir over lunch helps seal the deal. Proprietary data don’t guarantee success. But those who find such proprietary nuggets on a large scale have a huge leg up.

  • What Separates a Good Data Scientist from a Great One

    Companies that wish to take full advantage of their data must build strong, new, and different organizational capabilities. There is a lot to do, and data scientists are front and center. Good ones are rare. And critically, the difference between a great one and a good one is like the difference between lightning and a lightning bug.

    A good one can help you find relationships in vast quantities of disparate data — often important insights that you would not have gotten in any other way. Great data scientists, on the other hand, develop new insights about the larger world. They certainly use data to develop those insights, but that is not the point.

    Over the years I’ve had the privilege of working with dozens, maybe hundreds, of good statisticians, analysts, and data scientists. And a few great ones. Great data scientists bring four mutually reinforcing traits to bear that even the good ones can’t.

    1. A sense of wonder. Recently, many have noted that curiosity is the number one trait of a data scientist. That should go without saying. Good data scientists must be curious, just like a scientist in any discipline must be.

    But the great ones take this trait to an extreme. They have a sense of wonder about the world and are happiest when they discover how something works or why it works that way! They look for those explanations in data — and anything else that will help. For example, great data scientists are interested in many things and develop networks of people with different perspectives than their own. So much the better to explore the world, and a mass of disparate data, from many angles!

    2. A certain quantitative knack. Great data scientists simply see things that others don’t. For example, I happened to chat with a summer intern (who now uses his analytical prowess as head of a media company) on his second day at an investment bank. His boss had given him a stack of things to read, and in scanning through, he spotted an error in a returns’ calculation. It took him about an hour to verify the error and determine the correction.

    What’s important here is that thousands of others did not see the error. It was obvious to him, but not to anyone else. And this was a top-tier investment bank. Presumably, at least a few good analysts read the same material and did not spot it.

    Mathematics has turned out to provide a convenient, amazingly-effective language (Einstein used the phrase “unreasonably effective”) for describing the real world. The great data scientist taps into that language intuitively and easily in ways that even good data scientists cannot.

    3. Persistence. The great data scientists are persistent, and in many ways. The intern in the vignette above made his discovery at a glance and confirmed it in an hour. It rarely works out that way. I believe it was Jeff Hooper, then at the great Bell Labs, who noted that “Data do not give up their secrets easily. They must be tortured to confess.”

    This is a really big deal. Even under the best of circumstances, too much data are poorly defined and simply wrong, and most turn out to be irrelevant to the problem at hand. Staring through this noisy data is arduous, frustrating work. Even good data scientists may move on to the next problem. Great data scientists stick with it.

    Great data scientists also persist in making themselves heard. Dealing with a recalcitrant bureaucracy can be even more frustrating than dealing with noisy data. Continuing the vignette from above, I told the intern that he was in for a long summer. He would almost certainly spend it defending his discovery. Whichever group made the error would take great offense and may even attack him personally. Others would react with glee as they celebrated the ignorance of their peers. And he’d be caught in the middle.

    4. Finally, technical skills. The abilities to access and analyze data using the newest methods are obviously important. But I’m less concerned about these than the ability to bring statistical rigor to bear. At the risk of oversimplifying, there are two kinds of analyses — descriptive and predictive. Descriptive analyses are tough enough. But the really profitable analyses involve prediction, which is inherently uncertain.

    Great data scientists embrace uncertainty. They recognize when a prediction rests on solid foundations and when it is merely wishful thinking. They are simply outstanding in describing here’s what has to go right for the prediction to hold; here’s what will really foul it up; and here are the unknowns that will keep me awake at night. They can often quantify the uncertainty, and they are good at suggesting simple experiments to confirm or deny assumptions, reduce uncertainty, explore the next set of questions, etc.

    To be clear, this ability is not “that certain quantitative knack.” It is trained, sophisticated, disciplined inferential horsepower, practiced and honed by both success and failure.

    Great data scientists are truly special. They’re the Derek Jeters, the Michael Jordans, the Mikhail Barishnikovs, and the Julia Robertses of the data space. If you’re serious about big data and advanced analytics, you need to find one or two, build around him or her, and craft an environment that helps them do their thing.