Author: Andrew McAfee

Pundits: Stop Sounding Ignorant About Data

The current surge of enthusiasm around big data has produced a predictable backlash. Some of it, like Gary Marcus’s New Yorker post “Steamrolled by Big Data,” is insightful and well-reasoned (even though I have my quibbles with some of his points). This is not surprising, since he’s a neuroscientist as well as a writer, and so quite comfortable with data.

Unfortunately, some other prominent commentators clearly aren’t. David Brooks has taken up big data in his New York Times column recently, and literary lion Leon Wieseltier posted last month in The New Republic about “What Big Data Will Never Explain.” Now, these guys are entitled to write about what whatever they like, but if they want to be taken seriously when discussing data they really should stop the kinds of elementary mistakes they’ve been making so far. Their errors of understanding and fact weaken their credibility and turn off quantitatively adept readers.

So as a public service here’s a short list, written for non-quant-jock pundits, of things to keep in mind always when writing about data and its uses.

Absolute Certainty is Not the Goal (Because It’s Impossible). Wieseltier writes that “The purpose of this accumulated information is to detect patterns that will enable prediction: a world with uncertainty steadily decreasing to zero, as if that is a dream and not a nightmare.” Everything in that sentence up to the colon is accurate; after it comes nonsense. When teaching introductory probability, I tell my students that a random variable (the mathematical workhorse of the data disciplines) is one where even after you know everything there is to know about it, you still don’t know everything. For example, you know a fair coin toss will come up heads 50% of the time and tails 50%; that’s it, and that’s a long, long way from zero uncertainty.

Data geeks desperately want to make better predictions using the seas of digital information available today. They want to know how many games the Red Sox will win this season, what course of treatment will zap that particular cancer, and whether they’ll beat the dealer on the next hand. They know they’ll never know any of these things for sure, and that zero uncertainty isn’t even a meaningful goal to discuss.

People are Not Inherently Better at Making Decisions, Predictions, Judgments, and Diagnoses. Brooks thinks that they are. He writes that “Data struggles with the social,” “Data struggles with context,” and “Data creates bigger haystacks” (apparently, when it comes to data knowing more about a topic is bad) while on the other hand “The human brain has evolved to account for this reality. People are really good at telling stories that weave together multiple causes and multiple contexts.”

And this is exactly the problem. The stories we tell ourselves are very often wrong, and we have a host of biases and other glitches in our mental wiring that keep us from sizing up a situation correctly.

How many of these glitches are there? I don’t think anyone knows for sure. The best catalog I’ve come across so far is Rolf Dobelli’s The Art of Thinking Clearly, which devotes a separate short chapter to each mental misfire he’s identified. The book has 99 chapters.

The late Paul Meehl and William Grove analyzed 136 research studies directly comparing the predictions of humans, many of them ‘experts,’ against those coming exclusively from data and algorithms. Humans were clearly better in only 8 of the cases, giving them a batting average of .058. And Meehl and Grove hypothesize that those 8 human victories might have been due to the fact that the people were “provided with more data than the actuarial formula.”

Quantification is Useful in Every Field of Inquiry. Viktor Mayer-Schönberger and Kenneth Cukier say in their new book Big Data: A Revolution That Will Transform How We Live, Work, and Think that “Datafication represents an essential enrichment in human comprehension.” Wieseltier reacts “It is this inflated claim that gives offense… The religion of information is another superstition, another distorting totalism, another counterfeit deliverance” But I don’t hear the two authors attempting to found a new religion around information; I hear them making the entirely reasonable claim that better, more precise measurement is a really valuable advance. The field of biology was transformed by Anton van Leeuwenhoek’s microscope, which for the first time gave us the ability to see, count, and otherwise measure the tiny entities that exist at a different scale than we do. This led to a reduction in superstition, not an increase.

The great Victorian scientist Lord Kelvin laid down a general rule: “[W]hen you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be.”

Wieseltier might respond that some fields of inquiry aren’t ‘science’ and should never be, but that response would be ridiculous. Science here is simply the process of testing claims against evidence. The ones resisting this are just about guaranteed to be the ones with the flimsiest claims.

Big Data’s Advocates Don’t Think Everything Can (and Should) be Turned Over to Computers. Brooks says that “If you asked me to describe the rising philosophy of the day, I’d say it is data-ism…; that data will help us do remarkable things — like foretell the future.” Wieseltier takes the same idea a lot further: “in the comprehensively quantified existence in which we presume to believe that eventually we will know everything, in the expanding universe of prediction in which hope and longing will come to seem obsolete and merely ignorant, we are renouncing some of the primary human experiences.”

I’ve been talking and hanging out with a lot of data geeks over the past months and even though they’re highly ambitious people, I’ve never heard any of them express anything like those sentiments and goals. In fact, they’re very circumspect when they talk about their work. They know that the universe is a ridiculously messy and complex place and that all we can do is chip away at its mysteries with whatever tools are available, our brains always first and foremost among them.

The geeks are excited these days because in the current era of Big Data the tools just got a whole lot better. If someone told them if their goal was to make hope and longing obsolete and merely ignorant they’d probably find a way to turn such an ignorant statement into a brilliantly nasty visual meme, post it on Reddit, and get back to work.

April 23, 2013
When a Successful Company Shrinks its Workforce

I completely support the right of companies to stop paying people they don’t need any more, but the latest round of recent and planned layoffs from United Technologies is troubling me. Because UTC is doing great; its stock is at an all-time high, and sales have grown by more than 35% since 2005, to $57.7 billion.

This growth was accomplished, however, without expanding its workforce much at all, and now UT believes it can continue to grow as it wants to while actually shrinking its employee base. It’s planning to lay off 3000 workers this year, after shedding 4000 last year.

Now, is this really anything new? After all, output has been going up and employment simultaneously going down in manufacturing around the world for several years now, and UTC is a big manufacturer. But two things strike me as potentially novel here. First, the company does a lot more than just make things in factories. As its website says, “United Technologies… is a diversified company that provides a broad range of high-technology products and services to the global aerospace and building systems industries.” Servicing elevators, security systems, and so on, in other words, is a big part of what UTC does, and services have historically been very labor-intensive. That could be changing.

Second, it feels new to me that a successful company would shrink its workforce as it grows its sales, profits, and stock price. To make that concrete, imagine that in the ‘old world’ of manufacturing there are three companies, each with 1000 employees. Two of them are bloated and poorly run, and one is lean, mean, and highly technologically sophisticated. Over the course of five years, it puts the other two out of business while adding 500 employees to its workforce in order to cope with all the new demand. At the end of that time, total employment in the sector has shrunk from 3000 to 1500. This is not great news for the laid-off workers, but at least we could have some hope that they’d eventually be hired by the successful company as it continued to grow.

Now imagine the same scenario, except that the winner in this case is so lean and mean that it actually lays off 500 people as it’s growing and putting the others out of business. Total employment in this scenario drops from 3000 to 500.

Both scenarios fit the observed pattern of increased output and decreased employment. But in the second one, there’s no new place for any laid off worker to go, because even the successful company is never hiring. That feels like a different world to me; is it the one we’re heading into?

I don’t want to use this post to discuss the morality of UTC laying off people while the company is thriving and the nation’s workers are hurting. I simply want to point out that if this example is part of any larger trend, then we cannot rely on economic growth to fix our current problems of unemployment or underemployment. Because even for individual companies, economic growth has become so decoupled from employment growth that the former goes up while the latter goes down.

If that’s the world we’re heading into, then we had better start rethinking a lot of our assumptions, policies, and prescriptions. And fast.

April 2, 2013
Stop Requiring College Degrees

If you’re an employer, there are lots of signals about a young person’s suitability for the job you’re offering. If you’re looking for someone who can write, do they have a blog, or are they a prolific Wikipedia editor? For programmers, what are their TopCoder or GitHub scores? For salespeople, what have they sold before? If you want general hustle, do they have a track record of entrepreneurship, or at least holding a series of jobs?

These days, there are also a range of tests you can administer to prospective employees to see if they’re right for the job. Some of them are pretty straightforward. Others, like Knack, seek to test for attributes that might seem unrelated, but have been shown by prior experience to be associated with good on-the-job performance.

And there’s been a recent explosion in MOOCs — massive, open, online courses, many of them free — on a wide range of subjects. Many of these evaluate their students via a final exam or other means, and so provide a signal about how well someone mastered the material. MOOCs are still quite young so it’s not clear how accurate their evaluations are, but I’m encouraged by what I’ve seen so far. I’d give serious consideration to a job seeker who had taken a bunch of MOOCs and done well in all of them.

You’ve noticed by now that ‘a college degree’ is not in this list of signals. That’s because I think it’s a pretty lousy one, and getting worse all the time. In fact, I think one of the most productive things an employer could do, both for themselves and for society at large, is to stop placing so much emphasis on standard undergraduate and graduate degrees.

Unfortunately, employers are doing exactly the opposite — they’re putting more emphasis over time on old-school degrees, not less. As a recent New York Times story put it, “The college degree is becoming the new high school diploma: the new minimum requirement, albeit an expensive one, for getting even the lowest-level job.” Dental lab techs, chemical equipment tenders, and medical equipment preparers are all jobs that require a degree at least 50% more often than they used to as recently as 2007.

There are two huge problems with this approach. One is that college is really expensive, and getting more so all the time. According to figures compiled by Jared Bernstein, while median income for two-parent, two-child families went up by 20% between 1990 and 2008, the cost of a four-year public college education went up by three times that amount. Total student loan debt is now larger than credit card debt in the US, and it can’t be discharged even in bankruptcy. As a 2011 graduate working as a receptionist put it in the Times article, “I am over $100,000 in student loan debt right now… I will probably never see the end of that bill”

The even bigger problem is that, as I mentioned above, I believe college degrees are getting less valuable over time even as they’re getting more expensive. There’s a lot of evidence piling up about what’s happening with actual learning on campuses these days, and most of it is not pretty. Fewer students are entering the tougher STEM majors and completing degrees in them, even though graduates in these fields are much in demand. It’s taking students longer to complete their degrees, and dropout rates are rising. The most alarming and depressing stats I’ve come across are that 45% of college students didn’t seem to learn much of anything during their first two years, and as many as 36% showed no improvement after four years. Whatever’s going on with these kids at these schools, it’s not education.

I think what’s going on in my home industry of higher education at present is something between a bubble and a scandal. And I don’t think it’ll change unless and until employers shift, and start valuing signals other than college degrees. I can’t think of a single good reason not to start that shift now. Can you?

February 26, 2013
Manufacturing Jobs and the Rise of the Machines
The story of how technological progress is affecting employment — whether, in other words, the robots are eating our jobs — is clearly an important one. But who’s telling it correctly? I believe that technological unemployment (and underemployment) is a real and growing phenomenon.

But since Erik Brynjolfsson and I appeared on 60 Minutes in January for “March of the Machines,” a story that examined the labor force implications of advanced digital technologies like robots and other forms of automation, we’ve been accused of being unclear on the concept.

For example, the Association for Advancing Automation said in response that we “are missing the bigger picture” by not recognizing that American companies are “successfully implement[ing] automation technologies instead of going out of business or sending manufacturing overseas.” They add: “American manufacturing’s embrace of robotics will ensure a new manufacturing renaissance in this country.”

If the A3, or anyone else, thinks that lots more manufacturing jobs will accompany this renaissance, they’re just dead wrong. The facts are too clear, and they all point in the other direction. For example:
- Manufacturing employment has been on a steady downward trend in the U.S. since 1980 (it increased some after the end of the Great Recession, but this boost appears to be leveling out).
- Manufacturing jobs have also been trending downward in Japan and Germany since at least 1990 and, as I wrote earlier, in China since 1996.
- Manufacturing employment decline is a global phenomenon. As a Bloomberg story summarized: “Some 22 million manufacturing jobs were lost globally between 1995 and 2002 as industrial output soared 30 percent. … It seems that devilish productivity is wreaking havoc with jobs both at home and abroad.”
Rob Atkinson, president of the Information Technology and Innovation Foundation, is another of our detractors. He takes the argument up a level across industries. Even if total manufacturing employment goes down because of automation, he writes, other industries will pick up the slack by employing more people. This is because:

“…most of the savings [from automation] would flow back to consumers in the form of lower prices. Consumers would then use the savings to buy things (e.g., go out to dinner, buy books, go on travel). This economic activity stimulates demand that other companies (e.g., restaurants, book stores, and hotels) respond to by hiring more workers.”

Fair enough, but what if those other companies are also automating? One of the most striking phenomena of recent years is the encroachment of automation into tasks, skills and abilities that used to belong to people alone. As we document in Race Against the Machine, this includes driving cars, responding accurately to natural language questions, understanding and producing human speech, writing prose, reviewing documents and many others. Some combination of these will be valuable in every industry.

Previous waves of automation, like the mechanization of agriculture and the advent of electric power to factories, have not resulted in large-scale unemployment or impoverishment of the average worker. But the historical pattern isn’t giving me a lot of comfort these days, simply because we’ve never before seen automation encroach so broadly and deeply, while also improving so quickly at the same time.

I don’t know what all the consequences of the current wave of digital automation will be — no one does. But I’m not blithe about its consequences for the labor force, because that would be ignoring the data and missing the big picture.
January 29, 2013