{"id":658024,"date":"2013-05-14T21:10:22","date_gmt":"2013-05-15T01:10:22","guid":{"rendered":"http:\/\/gigaom.com\/?p=645189"},"modified":"2013-05-14T21:10:22","modified_gmt":"2013-05-15T01:10:22","slug":"this-is-why-big-data-is-the-sweet-spot-for-saas","status":"publish","type":"post","link":"https:\/\/mereja.media\/index\/658024","title":{"rendered":"This is why big data is the sweet spot for SaaS"},"content":{"rendered":"<p>People often ask me where the smart money is in big data. I often tell them that\u2019s a foolish question, because I\u2019m not an investor \u2014 but if I were, I\u2019d look to software as a service.<\/p>\n<p>There are two primary reasons why, the first of which is obvious: Companies are tired of managing applications and infrastructure, so something that optimizes a common task using techniques they don\u2019t know on servers they don\u2019t have to manage is probably compelling. It\u2019s called cloud computing.<\/p>\n<p>The other reason is that <a href=\"http:\/\/gigaom.com\/2013\/04\/29\/google-research-director-and-ai-expert-peter-norvig-elected-into-aaas\/\">the <em>big <\/em>part of big data really is important<\/a> if you want to get a really clear picture of what\u2019s happening in any given space. While no single end-user company can (or likely would) address search-engine optimization, for example, by building a massive store comprised of data from hundreds or thousands of companies as well as the entire web, a cloud service dedicated to that specific task can.<\/p>\n<p>From <a href=\"http:\/\/gigaom.com\/2012\/11\/28\/log-data-startup-sumo-logic-raises-30m\/\">web security<\/a> to <a href=\"http:\/\/gigaom.com\/2012\/06\/21\/how-collective-intelligence-is-reshaping-systems-management\/\">systems management<\/a>, we\u2019re already seeing how centralized data stores provide SaaS companies a broad view into what\u2019s happening that can then be filtered down to serve each individual customer\u2019s specific situation. <a href=\"http:\/\/www.bloomreach.com\/\">BloomReach<\/a>, a SaaS startup that helps companies optimize web-page content, is another good example of this principle in action.<\/p>\n<h2 id=\"how-do-you-say-cotton-maxi-dre\">How do <em>you<\/em> say, \u201ccotton maxi dress\u201d<\/h2>\n<p>Ideally, BloomReach Head of Marketing Joelle Kaufman told me, the company wants to help customers ensure they get found in web searches by making sure they\u2019re not invisible (buried deep down), irrelevant (not saying anything meaningful on their sites) or incompatible (not speaking their consumers\u2019 language). On Tuesday, the company <a href=\"http:\/\/www.bloomreach.com\/buzz\/media-center-pr\/continuous-quality-management\/\">announced a new feature called Continuous Quality Management<\/a>, which lets customers continuously monitor their pages to ensure they\u2019re still featuring the right products and the right terminology. It\u2019s the latest addition to a seemingly useful service that\u2019s built atop a big data foundation few \u2014 if any \u2014 of its customers would ever attempt to build themselves.<\/p>\n<p>BloomReach is able to help companies optimize their sites because it\u2019s constantly crawling the web in order to figure out how everyone else is describing their content, laying out their pages and structuring their links. Running on the Amazon Web Services cloud, BloomReach runs more than 1,000 Hadoop jobs a day that process about 5 terabytes of data and a billion data points about users\u2019 site behavior. With the latter, co-founder and CTO Ashutosh Garg explained, the company is trying to figure out who\u2019s visiting sites, what they\u2019re doing, how long they\u2019re spending there and how they\u2019re related in terms of behavior.<\/p>\n<p>\u201cYou need to have the right amount of data and from the right places before we can do anything with it,\u201d he said. \u201c\u2026 It\u2019s a massive machine learning problem.\u201d<\/p>\n<p><a href=\"http:\/\/gigaom2.files.wordpress.com\/2013\/05\/br-stack.png\"><img loading=\"lazy\" decoding=\"async\" alt=\"BR stack\" src=\"http:\/\/gigaom2.files.wordpress.com\/2013\/05\/br-stack.png?w=708&#038;h=531\" width=\"708\" height=\"531\" class=\"aligncenter size-large wp-image-645359\"><\/a><\/p>\n<p>When you consider all the possible ways something could be described or formatted, the scale of the problem becomes more evident. Simple semantic analysis like associating \u201cdesk\u201d and \u201ctable\u201d is easy, Garg explained, but what if some wants a lightweight camera and you only have its exact weight listed without any indication of how it compares to other options? What if people searching for \u201csmartphones\u201d really mean \u201cAndroid phones,\u201d but you\u2019re top-loading your results with BlackBerry phones and Windows phones?<\/p>\n<p>Another of Garg\u2019s hypotheticals has to do with consumers\u2019 presentation biases. If, for example, they\u2019re looking at a lot of websites that look the same or focus on the same things (e.g., megapixels for digital cameras), they\u2019ll expect to see the same things from every site.<\/p>\n<h2 id=\"10-nonillion-possibilities-cho\">10 nonillion possibilities: Choose 1.<\/h2>\n<p>From a sheer numbers perspective, things get even hairier when you\u2019re trying to determine the relationship between any two pages in order to figure out the best path for links to to take. Garg said this is what computer scientists call an <a href=\"http:\/\/en.wikipedia.org\/wiki\/NP-complete\">NP-complete problem<\/a>, which means the amount of time it takes to process the results is exponentially greater than the amount of content you\u2019re analyzing. So, for example, analyzing 40 pages doesn\u2019t take 10 times as long as analyzing 4 pages, but more like 100 times longer.<\/p>\n<p>Actually, BloomReach CEO Raj De Datta gave me another example of this problem <a href=\"http:\/\/gigaom.com\/2012\/02\/22\/bloomreach-wants-to-save-your-site-with-big-data\/\">when we spoke in early 2012<\/a>. Here\u2019s how I described it then:<\/p>\n<blockquote id=\"quote-if-a-company-wants-t\">\n<p>[I]f a company wants to display just 1,000 products across 100 pages, De Datta\u00a0explained, there are 10-to-the-28th-power (10 octillion)\u00a0possibilities for how to do that. When it comes time to describe those products, there are 10-to-the-30th-power (10 nonillion) possibilities.<\/p>\n<\/blockquote>\n<p>If a website has a million pages, Garg said, \u201cit will take you longer than the life of the universe to solve that problem.\u201d<\/p>\n<p>Where this type of problem arises, BloomReach turns to <a href=\"http:\/\/en.wikipedia.org\/wiki\/Monte_Carlo_method\">Monte Carlo simluations<\/a>, a favorite technique of physicists and Wall Street quants. The method involves running lots of simulations over large data sets in order to determine approximate results in a reasonable time frame. (And if all this isn\u2019t enough computer science and cloud infrastructure for you, I suggest attending our <a href=\"http:\/\/event.gigaom.com\/structure\/?utm_source=data&#38;utm_medium=editorial&#038;%2338;utm_campaign=intext&#038;%2338;utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&#038;%2338;utm_content=dharrisstructure\">Structure conference<\/a> in June, which features\u00a0a who\u2019s who list of speakers, including Google\u2019s Jeff Dean, Facebook\u2019s Jay Parikh and Netflix\u2019s Adrian Cockroft.)<\/p>\n<h2 id=\"different-queries-different-pa\">Different queries, different pages<\/h2>\n<p>Things get even trickier when you\u2019re trying to change the content of web pages in real time as people are searching for things. This isn\u2019t the best method for organic search, where pages need to stay pretty consistent with the indexed versions, but it can be ideal in situations such as paid search and mobile. There are millions of ways to segment buyers, Garg explained, and how accurately you assess their intent and display your content can make the all the difference. Whether someone is a new or repeat visitor often matters, as does whether someone is price-conscious (e.g., the query included \u201ccheap\u201d) or perhaps searching for a particular brand.<\/p>\n<div id=\"attachment_645358\" class=\"wp-caption aligncenter\" style=\"width: 718px\"><a href=\"http:\/\/gigaom2.files.wordpress.com\/2013\/05\/llbean.png\"><img loading=\"lazy\" decoding=\"async\" alt=\"Source: BloomReach\" src=\"http:\/\/gigaom2.files.wordpress.com\/2013\/05\/llbean.png?w=708&#038;h=531\" width=\"708\" height=\"531\" class=\"size-large wp-image-645358\"><\/a><\/p>\n<p class=\"wp-caption-text\">Source: BloomReach<\/p>\n<\/div>\n<p>Around the holidays, the company actually realized something interesting: The bounce rate on queries for things like \u201cgifts for dad\u201d or \u201cgifts for co-workers\u201d was pretty high, but so was the conversion rate. The time to conversion was relatively fast, as well. It turns out, Garg explained, that people don\u2019t like to overthink certain gifts too much, so if something is presented in a visually appealing manner and is within their price range, they\u2019ll buy.<\/p>\n<p>But creating these types of models involves more than meets the eye. For all the talk about machine learning \u2014 and machines do a majority of the work for BloomReach \u2014 people also play a critical role. A person might know better than a machine whether something was likely purchased as gift, Garg explained, or they might spot the offensive content on the T-shirt the machine decided was ideal.<\/p>\n<p>\u201cHumans are really good at creativity, thinking through stuff,\u201d he said.<\/p>\n<p>Smart humans are also good at knowing when they\u2019re overmatched, which is why SaaS is so valuable in the big data era. CMOs could try doing what BloomReach or <a href=\"http:\/\/gigaom.com\/2012\/04\/24\/datapop-scores-7m-for-custom-built-ads\/\">similar companies such as DataPop<\/a> are doing, or they could pay someone to do it much better. Guess which route the smart ones will take.<\/p>\n<p><em>Feature image courtesy of <a href=\"http:\/\/www.shutterstock.com\/gallery-54269p1.html\">Shutterstock user Andrea Danti<\/a>.<\/em><\/p>\n<p> <img loading=\"lazy\" decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/stats.wordpress.com\/b.gif?host=gigaom.com&#038;blog=14960843&#038;%23038;post=645189&#038;%23038;subd=gigaom2&#038;%23038;ref=&#038;%23038;feed=1\" width=\"1\" height=\"1\" \/><\/p>\n<p><a href=\"http:\/\/pubads.g.doubleclick.net\/gampad\/jump?iu=\/1008864\/GigaOM_RSS_300x250&#038;sz=300x250&#038;%23038;c=714822\"><img decoding=\"async\" src=\"http:\/\/pubads.g.doubleclick.net\/gampad\/ad?iu=\/1008864\/GigaOM_RSS_300x250&#038;sz=300x250&#038;%23038;c=714822\" \/><\/a><\/p>\n<p><strong>Related research and analysis from GigaOM Pro:<\/strong><br \/>Subscriber content. <a href=\"http:\/\/pro.gigaom.com\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&#038;utm_content=dharrisstructure\">Sign up for a free trial<\/a>.<\/p>\n<ul>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/03\/a-near-term-outlook-for-big-data\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&#038;utm_content=dharrisstructure\">A near-term outlook for big data<\/a><\/li>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/06\/cloud-computing-infrastructure-2012-and-beyond\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&#038;utm_content=dharrisstructure\">Cloud computing infrastructure: 2012 and beyond<\/a><\/li>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/04\/infrastructure-q1-cloud-and-big-data-woo-the-enterprise\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=645189+this-is-why-big-data-is-the-sweet-spot-for-saas&#038;utm_content=dharrisstructure\">Infrastructure Q1: Cloud and big data woo enterprises<\/a><\/li>\n<\/ul>\n<p><img width='1' height='1' src='http:\/\/gigaom.feedsportal.com\/c\/34996\/f\/646446\/s\/2becbce9\/mf.gif' border='0'\/><\/p>\n<div class='mf-viral'>\n<table border='0'>\n<tr>\n<td valign='middle'><a href=\"http:\/\/share.feedsportal.com\/share\/twitter\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F14%2Fthis-is-why-big-data-is-the-sweet-spot-for-saas%2F&#038;t=This+is+why+big+data+is+the+sweet+spot+for+SaaS\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/twitter.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/facebook\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F14%2Fthis-is-why-big-data-is-the-sweet-spot-for-saas%2F&#038;t=This+is+why+big+data+is+the+sweet+spot+for+SaaS\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/facebook.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/linkedin\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F14%2Fthis-is-why-big-data-is-the-sweet-spot-for-saas%2F&#038;t=This+is+why+big+data+is+the+sweet+spot+for+SaaS\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/linkedin.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/gplus\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F14%2Fthis-is-why-big-data-is-the-sweet-spot-for-saas%2F&#038;t=This+is+why+big+data+is+the+sweet+spot+for+SaaS\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/googleplus.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/email\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F14%2Fthis-is-why-big-data-is-the-sweet-spot-for-saas%2F&#038;t=This+is+why+big+data+is+the+sweet+spot+for+SaaS\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/email.png\" border=\"0\" \/><\/a><\/td>\n<td valign='middle'><\/td>\n<\/tr>\n<\/table>\n<\/div>\n<p><a href=\"http:\/\/da.feedsportal.com\/r\/165664475900\/u\/49\/f\/646446\/c\/34996\/s\/2becbce9\/a2.htm\"><img decoding=\"async\" src=\"http:\/\/da.feedsportal.com\/r\/165664475900\/u\/49\/f\/646446\/c\/34996\/s\/2becbce9\/a2.img\" border=\"0\"\/><\/a><img loading=\"lazy\" decoding=\"async\" width=\"1\" height=\"1\" src=\"http:\/\/pi.feedsportal.com\/r\/165664475900\/u\/49\/f\/646446\/c\/34996\/s\/2becbce9\/a2t.img\" border=\"0\"\/><\/p>\n<div class=\"feedflare\">\n<a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=_9FvU5Cj0AU:DmrNuznDir4:yIl2AUoC8zA\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?d=yIl2AUoC8zA\" border=\"0\"><\/img><\/a>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~r\/OmMalik\/~4\/_9FvU5Cj0AU\" height=\"1\" width=\"1\"\/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>People often ask me where the smart money is in big data. I often tell them that\u2019s a foolish question, because I\u2019m not an investor \u2014 but if I were, I\u2019d look to software as a service. There are two primary reasons why, the first of which is obvious: Companies are tired of managing applications [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-658024","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/658024","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/comments?post=658024"}],"version-history":[{"count":0,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/658024\/revisions"}],"wp:attachment":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/media?parent=658024"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/categories?post=658024"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/tags?post=658024"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}