{"id":284033,"date":"2010-02-05T20:00:31","date_gmt":"2010-02-06T01:00:31","guid":{"rendered":"http:\/\/gigaom.com\/?p=96600"},"modified":"2010-02-05T20:00:31","modified_gmt":"2010-02-06T01:00:31","slug":"needed-infrastructure-to-make-the-web-personal","status":"publish","type":"post","link":"https:\/\/mereja.media\/index\/284033","title":{"rendered":"Needed: Infrastructure to Make the Web Personal"},"content":{"rendered":"<p><\/p>\n<p><a href=\"http:\/\/gigaom.files.wordpress.com\/2010\/02\/istock_000002727864xsmall.jpg\"><img loading=\"lazy\" decoding=\"async\"  title=\"stand out\" src=\"http:\/\/gigaom.files.wordpress.com\/2010\/02\/istock_000002727864xsmall.jpg?w=210&#038;h=139\" alt=\"\" width=\"210\" height=\"139\" class=\"alignleft size-thumbnail wp-image-96977\" \/><\/a>The web is becoming more dynamic, context-aware and personalized by the day, and the amount of information consumed by each person is increasing exponentially. But while hardware performance is improving, except when it comes to the simplest of parallel programming tasks, software infrastructure is not keeping pace. We need to develop new data processing architectures \u2014 ones that go beyond technologies like <a href=\"http:\/\/gigaom.com\/2009\/05\/17\/memcached-and-an-ailing-mysql\/\">memcached<\/a>, <a href=\"http:\/\/gigaom.com\/2010\/01\/19\/why-hadoop-users-shouldnt-fear-googles-new-mapreduce-patent\/\">MapReduce<\/a>, <a href=\"http:\/\/blogs.neotechnology.com\/emil\/2009\/11\/nosql-scaling-to-size-and-scaling-to-complexity.html\">NoSQL<\/a>, etc.<\/p>\n<p>Think of this as a <a href=\"http:\/\/gigaom.com\/2009\/12\/07\/google-amps-up-real-time-and-mobile-search\/\">search<\/a> problem. Traditionally, there was an index of every document in which every word occurred. When a query was received the search engine could just look up the precomputed answer to which documents had which word. For a personalized search, an exponentially larger index is needed that includes not only factual data (words in a document, brand of cameras, etc.) but also taste and preference data (people who like this camera tend to live in cities, be under 40, <a href=\"http:\/\/www.nytimes.com\/2008\/11\/23\/magazine\/23Netflix-t.html?pagewanted=all\">love \u201cNapoleon Dynamite<\/a>,\u201d etc.).<\/p>\n<p>Unfortunately, personalizing along 100 taste dimensions leads to nearly as many permutations of recommendation rankings as there are <a href=\"http:\/\/en.wikipedia.org\/wiki\/Observable_universe#Matter_content\">atoms in the universe<\/a>! Obviously there isn&#8217;t enough space to precompute what recommendations to show every possible type of person that queries a site. Additionally, precomputing the answer to queries is too slow. People expect real-time results, not hours- or days-old precomputed answers. If I tell Amazon I don\u2019t like a book, I want to immediately see that reflected in my recommendations.<\/p>\n<p>We\u2019re at a turning point in how we need to build web sites to handle these sorts of personalization problems. While first-generation distributed systems split the application into three tiers \u2014 web servers, application servers and databases \u2014 second-generation systems build large non-real-time back-end clusters to analyze huge amounts of sales data, index billions of web documents etc.<\/p>\n<p>A third generation of systems is now emerging, with the computation shifting from those back-end clusters into front-end real-time clusters. After all, you just can\u2019t build a back end that precomputes personalized results for millions of Internet users. You have to compute it in real time.<\/p>\n<p>Adding complexity, many personalization problems are more difficult to parallelize than a lot of traditional back-end applications. Indexing the words in web pages is actually a lot easier to parallelize than are the long sequence of matrix calculations required to optimize a user\u2019s recommendations.<\/p>\n<p><a href=\"http:\/\/ocw.mit.edu\/OcwWeb\/Electrical-Engineering-and-Computer-Science\/6-046JFall-2005\/VideoLectures\/detail\/embed23.htm\">Matrix calculations<\/a> tend to involve complicated data access patterns that mean it\u2019s hard to partition calculations and their data across a cluster of computers. Instead there tends to be a lot of sharing among many different computers, each of which holds a piece of the problem and updates the others as data changes. This back-and-forth data sharing is both incredibly hard to keep track of for the programmer, and can significantly degrade application performance.<\/p>\n<p>The systems we\u2019ve built at <a href=\"http:\/\/hunch.com\/\">Hunch<\/a> to solve this started off using distributed caching with memcached but very quickly veered into something more akin to d<a href=\"http:\/\/en.wikipedia.org\/wiki\/Non-Uniform_Memory_Access\">istributed shared memory (DSM)<\/a> systems, complete with multiple levels of caching, coherency protocols with application-specific consistency guarantees and data replication for performance. With an abundance of processing cores at our disposal, the real challenges tended to revolve around getting the right data to the right core.<\/p>\n<p><a href=\"http:\/\/gigaom.files.wordpress.com\/2010\/02\/1.jpg\"><img loading=\"lazy\" decoding=\"async\"  title=\"-1\" src=\"http:\/\/gigaom.files.wordpress.com\/2010\/02\/1.jpg?w=80&#038;h=80\" alt=\"\" width=\"80\" height=\"80\" class=\"alignleft size-full wp-image-96929\" \/><\/a> I think that in a few years we\u2019ll look back at this time as an era in which a slew of new large-scale programming challenges and their solutions were born. Hopefully we\u2019ll also see more open-source solutions along the lines of memcached and Hadoop, so that building personalized and real-time web applications is easy for everyone.<\/p>\n<p><em>Tom Pinckney is the co-founder &amp; VP of engineering of <a href=\"http:\/\/hunch.com\/\">Hunch.com<\/a>.<\/em><\/p>\n<p><strong>Related GigaOM Pro content:<\/strong><\/p>\n<ul>\n<li><a href=\"http:\/\/pro.gigaom.com\/2010\/01\/whats-next-for-the-cloud-distributed-architectures\/\">What&#8217;s Next for the Cloud? Distributed Architectures<\/a><\/li>\n<li><a href=\"http:\/\/pro.gigaom.com\/2009\/12\/infrastructure-winners-and-losers-of-2009\/\">Infrastructure Winners and Losers of 2009<\/a><\/li>\n<\/ul>\n<p>  <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/gocomments\/gigaom.wordpress.com\/96600\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/comments\/gigaom.wordpress.com\/96600\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/godelicious\/gigaom.wordpress.com\/96600\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/delicious\/gigaom.wordpress.com\/96600\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/gostumble\/gigaom.wordpress.com\/96600\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/stumble\/gigaom.wordpress.com\/96600\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/godigg\/gigaom.wordpress.com\/96600\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/digg\/gigaom.wordpress.com\/96600\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/goreddit\/gigaom.wordpress.com\/96600\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/reddit\/gigaom.wordpress.com\/96600\/\" \/><\/a> <img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/stats.wordpress.com\/b.gif?host=gigaom.com&#038;blog=1149864&#038;post=96600&#038;subd=gigaom&#038;ref=&#038;feed=1\" \/><\/p>\n<div class=\"feedflare\">\n<a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=-nSDlcHzNRI:_5TSn3NtG14:yIl2AUoC8zA\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?d=yIl2AUoC8zA\" border=\"0\"><\/img><\/a> <a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=-nSDlcHzNRI:_5TSn3NtG14:V_sGLiPBpWU\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?i=-nSDlcHzNRI:_5TSn3NtG14:V_sGLiPBpWU\" border=\"0\"><\/img><\/a> <a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=-nSDlcHzNRI:_5TSn3NtG14:F7zBnMyn0Lo\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?i=-nSDlcHzNRI:_5TSn3NtG14:F7zBnMyn0Lo\" border=\"0\"><\/img><\/a> <a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=-nSDlcHzNRI:_5TSn3NtG14:qj6IDK7rITs\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?d=qj6IDK7rITs\" border=\"0\"><\/img><\/a> <a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=-nSDlcHzNRI:_5TSn3NtG14:D7DqB2pKExk\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?i=-nSDlcHzNRI:_5TSn3NtG14:D7DqB2pKExk\" border=\"0\"><\/img><\/a>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~r\/OmMalik\/~4\/-nSDlcHzNRI\" height=\"1\" width=\"1\"\/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The web is becoming more dynamic, context-aware and personalized by the day, and the amount of information consumed by each person is increasing exponentially. But while hardware performance is improving, except when it comes to the simplest of parallel programming tasks, software infrastructure is not keeping pace. We need to develop new data processing architectures [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-284033","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/284033","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/comments?post=284033"}],"version-history":[{"count":0,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/284033\/revisions"}],"wp:attachment":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/media?parent=284033"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/categories?post=284033"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/tags?post=284033"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}