{"id":646096,"date":"2013-03-10T13:30:13","date_gmt":"2013-03-10T17:30:13","guid":{"rendered":"http:\/\/gigaom.com\/?p=618443"},"modified":"2013-03-10T13:30:13","modified_gmt":"2013-03-10T17:30:13","slug":"the-big-data-world-is-operating-at-1-percent","status":"publish","type":"post","link":"https:\/\/mereja.media\/index\/646096","title":{"rendered":"The big data world is operating at 1 percent"},"content":{"rendered":"<p>Many would be shocked to know that researchers analyze and gather insights from only 1 percent of the world\u2019s data. That 1 percent of analyzed data has been the only driver of innovation and insights in what we now know as \u201cbig data.\u201d\u00a0The other 99 percent of the 1 quintillion bytes of data that is collected every day (according to a recent study from IDC) remains untouched.<\/p>\n<p>We all know that big data has so much promise. For a very large number of problems today, the effective use of data is a bottleneck. The drug discovery problem is more about data than chemistry. The discovery of new energy sources is more about data than geology. It\u2019s the same for tracking terrorists, detecting fraud, and more.<\/p>\n<p>Today, we recognize that these, and many other critical global issues, are all data problems. This fact alone has given rise to a huge investment into big data, created the hottest job title around \u2013data scientist \u2014 and propelled the valuations of private data analytics providers into the billions. However, imagine the endless possibilities when the world is operating on the insights gathered from 100 percent of its data.<\/p>\n<h2 id=\"realizations\">Realizations<\/h2>\n<p>Where do you start when you have a data set as large as the human genome, for example,\u00a0or President Obama\u2019s recent call to map the human brain? To achieve the breakthroughs we need to address the world\u2019s most perplexing problems, we need to fundamentally change the way we gain knowledge from data. Here\u2019s what we need to start thinking about:<\/p>\n<ul>\n<li><strong>Starting with queries is a dead end:<\/strong> Queries are not inherently bad. In fact, they are essential once you know what question to ask. That\u2019s the key: the flaw is starting with queries in the hope that they will uncover a needle in the massive digital haystack. (Spoiler alert: they won\u2019t.)<\/li>\n<li><strong>Data has a cost:<\/strong> Storing data is no longer expensive, in most cases. Even querying large amounts of data is becoming more cost effective with tools like Hadoop and Amazon\u2019s Redshift. This is just the hard cost side of the equation, though.<\/li>\n<li><strong>Insights are value:<\/strong> The only reason why we bear the cost is because we believe that data has insights that unlock value. Ultimately, the undiscovered insights that organizations miss have a much higher cost in terms of being able to solve big problems quickly, accelerate innovation and drive growth. The cost of data collection can be high, but the cost of ineffectual analysis is even higher. The tools for getting at insights don\u2019t exist today. Today, we rely on very smart human beings to come up hypotheses and use our tools to validate \u2014 or invalidate \u2014 those hypotheses. This is a flawed strategy since it relies on (arguably smart) guesswork.<\/li>\n<li><strong>You have the right data today:<\/strong> There\u2019s often a belief that, \u201cIf we only had more data, we could get the answer we\u2019re looking for.\u201d Far too much time and money is wasted collecting new data when more can be done with the data already at hand. For example, Ayasdi recently published a study in Nature Scientific Reports that shows important new insights from a 12-year-old breast cancer study that had been thoroughly analyzed for over a decade.<\/li>\n<\/ul>\n<h2 id=\"big-bata%e2%80%9d-is-the-begin\">Big bata\u201d is the beginning, not the end<\/h2>\n<p>I\u2019m very concerned that the growing hype around the term big data has set us all up for disappointment. Query-based analysis is fine for a certain class of problems, but it will never be able to deliver on the expectations the market has for big data.<\/p>\n<p>We are on the cusp of critical breakthroughs in cancer research, energy exploration, drug discovery, financial fraud detection and more. It would be a crime if the passion, interest and dollars invested to solve critical global problems like these were sidetracked by a \u201cbig data bubble.\u201d<\/p>\n<p>We can and should expect more from data analysis, and we need to recognize the capabilities that the next generation of solutions must be able to deliver:<\/p>\n<ul>\n<li><strong>Empower domain experts:<\/strong> The world cannot produce data scientists fast enough to scale to the size of the problem set. Let\u2019s stop developing tools just for them. Instead, we need to develop tools for the business users: biologists, geologists, security analysts and the like. They understand the context of the business problem better than anyone, but might not be up to date with the latest in technology or mathematics.<\/li>\n<li><strong>Accelerate discovery:<\/strong> We need to get to critical insights faster. The promise of big data is to \u201coperate at the speed of thought.\u201d It turns out that the speed of thought is not that fast. If we depend on this approach, then we will never get to the critical insights quickly enough because we\u2019ll never be able to ask all of the questions of all of the data.<\/li>\n<li><strong>Marriage of man and machine:<\/strong> To get to those insights faster, we need to invest in machine intelligence. We need machines to do more of the heavy lifting when it comes to finding the clusters, connections and relationships between data points that gives business users a much better starting point to begin discovering insights. In fact, algorithmic discovery approaches can solve these problems by looking for rare, but statistically significant signals in large datasets that humans would never be able to find. For example, in a recent study, previously unreported drug side effects were found by algorithmically searching through web search engine logs.<\/li>\n<li><strong>Analyze data in all its forms:<\/strong> It\u2019s understood that researchers need to analyze both structured and unstructured data. We need to recognize the diversity and depth of unstructured data: text in all languages, voice, video and facial recognition.<\/li>\n<\/ul>\n<p>When it comes to the evolution of big data, we\u2019ve only begun to scratch the surface. It stands to reason that if we continue to analyze 1 percent of data, then we\u2019ll only tap into 1 percent of it\u2019s potential. If we\u2019re able to analyze the other 99 percent, then think about all of the ways that we can change the world. We can accelerate economic growth, cure cancer and other diseases, reduce the risk of terrorist attacks, and many other big ticket challenges that we\u2019re faced with.<\/p>\n<p>That\u2019s something that we can all rally around.<\/p>\n<p><em>Gurjeet Singht is the co-founder and CEO of <a href=\"http:\/\/www.ayasdi.com\/\">Ayasdi<\/a><\/em><em>, an insight discovery platform built on topological data analysis technology. He will be speaking at <a href=\"http:\/\/event.gigaom.com\/structuredata\/?utm_source=data&#38;utm_medium=editorial&#038;%2338;utm_campaign=intext&#038;%2338;utm_term=618443+the-big-data-world-is-operating-at-1-percent&#038;%2338;utm_content=gigaguest\">Structure: Data<\/a>, March 20-21 in New York.<\/em><\/p>\n<p><em>Have an idea for a post you\u2019d like to contribute to GigaOm? Click\u00a0<a href=\"http:\/\/gigaom.com\/2012\/11\/28\/have-an-idea-for-a-great-guest-post-heres-what-you-need-to-know\/\">here for our guidelines<\/a>\u00a0and contact info.<\/em><\/p>\n<p><em>Feature image courtesy of <a href=\"http:\/\/www.shutterstock.com\/gallery-56831p1.html\">Shutterstock user Sergey Lavrentev<\/a>.<\/em><\/p>\n<p> <img loading=\"lazy\" decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/stats.wordpress.com\/b.gif?host=gigaom.com&#038;blog=14960843&#038;%23038;post=618443&#038;%23038;subd=gigaom2&#038;%23038;ref=&#038;%23038;feed=1\" width=\"1\" height=\"1\" \/><\/p>\n<p><a href=\"http:\/\/pubads.g.doubleclick.net\/gampad\/jump?iu=\/1008864\/GigaOM_RSS_300x250&#038;sz=300x250&#038;%23038;c=515449\"><img decoding=\"async\" src=\"http:\/\/pubads.g.doubleclick.net\/gampad\/ad?iu=\/1008864\/GigaOM_RSS_300x250&#038;sz=300x250&#038;%23038;c=515449\" \/><\/a><\/p>\n<p><strong>Related research and analysis from GigaOM Pro:<\/strong><br \/>Subscriber content. <a href=\"http:\/\/pro.gigaom.com\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=618443+the-big-data-world-is-operating-at-1-percent&#038;utm_content=gigaguest\">Sign up for a free trial<\/a>.<\/p>\n<ul>\n<li><a href=\"http:\/\/pro.gigaom.com\/2011\/11\/connected-world-the-consumer-technology-revolution\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=618443+the-big-data-world-is-operating-at-1-percent&#038;utm_content=gigaguest\">Connected world: the consumer technology revolution<\/a><\/li>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/03\/a-near-term-outlook-for-big-data\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=618443+the-big-data-world-is-operating-at-1-percent&#038;utm_content=gigaguest\">A near-term outlook for big data<\/a><\/li>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/01\/newnet-q4-platform-mania-and-social-commerce-shakeout\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=618443+the-big-data-world-is-operating-at-1-percent&#038;utm_content=gigaguest\">NewNet Q4: Platform mania and social commerce shakeout<\/a><\/li>\n<\/ul>\n<p><img width='1' height='1' src='http:\/\/gigaom.feedsportal.com\/c\/34996\/f\/646446\/s\/29686a06\/mf.gif' border='0'\/><\/p>\n<div class='mf-viral'>\n<table border='0'>\n<tr>\n<td valign='middle'><a href=\"http:\/\/share.feedsportal.com\/viral\/sendEmail.cfm?lang=en&#038;title=The+big+data+world+is+operating+at+1+percent&#038;link=http%3A%2F%2Fgigaom.com%2F2013%2F03%2F10%2Fthe-big-data-world-is-operating-at-1-percent%2F\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/images\/emailthis2.gif\" border=\"0\" \/><\/a><\/td>\n<td valign='middle'><a href=\"http:\/\/res.feedsportal.com\/viral\/bookmark.cfm?title=The+big+data+world+is+operating+at+1+percent&#038;link=http%3A%2F%2Fgigaom.com%2F2013%2F03%2F10%2Fthe-big-data-world-is-operating-at-1-percent%2F\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/images\/bookmark.gif\" border=\"0\" \/><\/a><\/td>\n<\/tr>\n<\/table>\n<\/div>\n<p><a href=\"http:\/\/da.feedsportal.com\/r\/159490314897\/u\/49\/f\/646446\/c\/34996\/s\/29686a06\/a2.htm\"><img decoding=\"async\" src=\"http:\/\/da.feedsportal.com\/r\/159490314897\/u\/49\/f\/646446\/c\/34996\/s\/29686a06\/a2.img\" border=\"0\"\/><\/a><img loading=\"lazy\" decoding=\"async\" width=\"1\" height=\"1\" src=\"http:\/\/pi.feedsportal.com\/r\/159490314897\/u\/49\/f\/646446\/c\/34996\/s\/29686a06\/a2t.img\" border=\"0\"\/><\/p>\n<div class=\"feedflare\">\n<a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=5doaRkiwN3A:Mkj1YawDaxY:yIl2AUoC8zA\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?d=yIl2AUoC8zA\" border=\"0\"><\/img><\/a>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~r\/OmMalik\/~4\/5doaRkiwN3A\" height=\"1\" width=\"1\"\/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Many would be shocked to know that researchers analyze and gather insights from only 1 percent of the world\u2019s data. That 1 percent of analyzed data has been the only driver of innovation and insights in what we now know as \u201cbig data.\u201d\u00a0The other 99 percent of the 1 quintillion bytes of data that is [&hellip;]<\/p>\n","protected":false},"author":7765,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-646096","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/646096","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/users\/7765"}],"replies":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/comments?post=646096"}],"version-history":[{"count":0,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/646096\/revisions"}],"wp:attachment":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/media?parent=646096"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/categories?post=646096"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/tags?post=646096"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}