{"id":659648,"date":"2013-05-22T15:21:16","date_gmt":"2013-05-22T19:21:16","guid":{"rendered":"http:\/\/gigaom.com\/?p=648186"},"modified":"2013-05-22T15:21:16","modified_gmt":"2013-05-22T19:21:16","slug":"concurrent-is-building-a-hadoop-assembly-line-in-open-source","status":"publish","type":"post","link":"https:\/\/mereja.media\/index\/659648","title":{"rendered":"Concurrent is building a Hadoop assembly line in open source"},"content":{"rendered":"<p>If you know Java, R or SAS, doing machine learning on Hadoop data just got a lot easier. <a href=\"http:\/\/www.concurrentinc.com\/\">Concurrent<\/a> <em>(<\/em><em>see disclosure)<\/em>, the company behind the popular <a href=\"http:\/\/www.cascading.org\/\">Cascading<\/a> framework for writing big data jobs, has developed a new open source tool called <a href=\"http:\/\/www.cascading.org\/pattern\/\">Pattern<\/a> that lets users export their models from statistical analysis applications and run THEM? at scale on Hadoop data with little to no code change.<\/p>\n<p>The reason for creating Pattern is pretty simple, according to Concurrent Founder and CEO Chris Wensel: &#8220;Hadoop is never used alone.&#8221; It&#8217;s always part of a data environment that also includes databases, visualization tools, analytics software and\/or statistical analysis tools that arguably do the really valuable work. Hadoop&#8217;s real value is an integration platform that can feed data into these other systems and, ideally, put their outputs to work across much larger datasets.<\/p>\n<p>Developers <em>can<\/em> use the Pattern Java API to create machine learning jobs, but they can also simply export a Predictive Model Markup Language (PMML) file from software like R, SAS and MicroStrategy that Pattern will read and run them as a Cascading workflow. Models are useless unless you can run them in production, Wensel said, and Pattern lets them run across more data, stored in Hadoop, than you can use to build them with those other tools.<\/p>\n<p>However, Wensel noted, &#8220;The real takeaway isn&#8217;t Pattern itself.&#8221;<\/p>\n<p>From his perspective, the real story is Pattern plus Cascading plus <a href=\"http:\/\/www.cascading.org\/lingual\/\">Lingual<\/a>, the open source SQL-to-Hadoop tool that Concurrent recently developed and released. Lingual is the tie that binds everything together, creating a sort of assembly line for data as it works its way from generation to delivering some value. For example, someone might create a Cascading job that adds structure to incoming data, and then pull some of the data into R using Lingual. Once a model is created in R and exported to the Hadoop cluster using Pattern, Lingual can feed the MapReduce output file back to R so a data scientist can test the model&#8217;s accuracy.<\/p>\n<p><a href=\"http:\/\/gigaom2.files.wordpress.com\/2013\/05\/arch-diagram.png\"><img decoding=\"async\" alt=\"arch-diagram\" src=\"http:\/\/gigaom2.files.wordpress.com\/2013\/05\/arch-diagram.png?w=708\" class=\"aligncenter size-full wp-image-648347\" \/><\/a><\/p>\n<p>And actually, Wensel said, Lingual could have a positive effect on companies&#8217; bottom lines. Airbnb recently replaced a departed engineer with Lingual for monthly migrations of data from Hadoop and into SQL environments. Climate Corporation, <a href=\"http:\/\/gigaom.com\/2012\/05\/02\/how-climate-corp-is-pitting-big-data-against-mother-nature\/\">a massive Hadoop and Cascading user<\/a>, could use Lingual to let its crop-and-weather insurance customers access their data from the company&#8217;s Hadoop store.<\/p>\n<p>Lingual and Pattern should help Concurrent finally make some money, too. Both of them, as well as the Cascading framework that underpins them, will always be open source, Wensel said, but it plans to create &#8220;a suite of products that will make your life much better if &#8230; you standardize on Cascading.&#8221;<\/p>\n<p>For example, the company has the ability to monitor jobs at the application level rather than the cluster level, meaning it can tell you the details of that job that&#8217;s locking up all the resources and whether you really want to kill it (it might be an important report for the CFO &#8230;). &#8220;We can do some really interesting things,&#8221; Wensel said.<\/p>\n<p><em><strong>Disclosure<\/strong>: Concurrent is backed by True Ventures, a venture capital firm that is an investor in the parent company of this blog, Giga Omni Media. Om Malik, the founder of Giga Omni Media, is also a venture partner at True.<\/em><\/p>\n<p><em>Feature image courtesy of <a href=\"http:\/\/www.shutterstock.com\/gallery-908242p1.html\">Shutterstock user PENGYOU91<\/a>.<\/em><\/p>\n<p> <img loading=\"lazy\" decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/stats.wordpress.com\/b.gif?host=gigaom.com&#038;blog=14960843&#038;%23038;post=648186&#038;%23038;subd=gigaom2&#038;%23038;ref=&#038;%23038;feed=1\" width=\"1\" height=\"1\" \/><\/p>\n<p><a href=\"http:\/\/pubads.g.doubleclick.net\/gampad\/jump?iu=\/1008864\/GigaOM_RSS_300x250&#038;sz=300x250&#038;%23038;c=209751\"><img decoding=\"async\" src=\"http:\/\/pubads.g.doubleclick.net\/gampad\/ad?iu=\/1008864\/GigaOM_RSS_300x250&#038;sz=300x250&#038;%23038;c=209751\" \/><\/a><\/p>\n<p><strong>Related research and analysis from GigaOM Pro:<\/strong><br \/>Subscriber content. <a href=\"http:\/\/pro.gigaom.com\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&#038;utm_content=dharrisstructure\">Sign up for a free trial<\/a>.<\/p>\n<ul>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/05\/the-importance-of-putting-the-u-and-i-in-visualization\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&#038;utm_content=dharrisstructure\">The importance of putting the U and I in visualization<\/a><\/li>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/04\/infrastructure-q1-cloud-and-big-data-woo-the-enterprise\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&#038;utm_content=dharrisstructure\">Infrastructure Q1: Cloud and big data woo enterprises<\/a><\/li>\n<li><a href=\"http:\/\/pro.gigaom.com\/2012\/03\/a-near-term-outlook-for-big-data\/?utm_source=data&#038;utm_medium=editorial&#038;utm_campaign=auto3&#038;utm_term=648186+concurrent-is-building-a-hadoop-assembly-line-in-open-source&#038;utm_content=dharrisstructure\">A near-term outlook for big data<\/a><\/li>\n<\/ul>\n<p><img width='1' height='1' src='http:\/\/gigaom.feedsportal.com\/c\/34996\/f\/646446\/s\/2c451781\/mf.gif' border='0'\/><\/p>\n<div class='mf-viral'>\n<table border='0'>\n<tr>\n<td valign='middle'><a href=\"http:\/\/share.feedsportal.com\/share\/twitter\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F22%2Fconcurrent-is-building-a-hadoop-assembly-line-in-open-source%2F&#038;t=Concurrent+is+building+a+Hadoop+assembly+line+in+open+source\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/twitter.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/facebook\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F22%2Fconcurrent-is-building-a-hadoop-assembly-line-in-open-source%2F&#038;t=Concurrent+is+building+a+Hadoop+assembly+line+in+open+source\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/facebook.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/linkedin\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F22%2Fconcurrent-is-building-a-hadoop-assembly-line-in-open-source%2F&#038;t=Concurrent+is+building+a+Hadoop+assembly+line+in+open+source\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/linkedin.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/gplus\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F22%2Fconcurrent-is-building-a-hadoop-assembly-line-in-open-source%2F&#038;t=Concurrent+is+building+a+Hadoop+assembly+line+in+open+source\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/googleplus.png\" border=\"0\" \/><\/a>&nbsp;<a href=\"http:\/\/share.feedsportal.com\/share\/email\/?u=http%3A%2F%2Fgigaom.com%2F2013%2F05%2F22%2Fconcurrent-is-building-a-hadoop-assembly-line-in-open-source%2F&#038;t=Concurrent+is+building+a+Hadoop+assembly+line+in+open+source\" ><img decoding=\"async\" src=\"http:\/\/res3.feedsportal.com\/social\/email.png\" border=\"0\" \/><\/a><\/td>\n<td valign='middle'><\/td>\n<\/tr>\n<\/table>\n<\/div>\n<p><a href=\"http:\/\/da.feedsportal.com\/r\/165665299226\/u\/49\/f\/646446\/c\/34996\/s\/2c451781\/a2.htm\"><img decoding=\"async\" src=\"http:\/\/da.feedsportal.com\/r\/165665299226\/u\/49\/f\/646446\/c\/34996\/s\/2c451781\/a2.img\" border=\"0\"\/><\/a><img loading=\"lazy\" decoding=\"async\" width=\"1\" height=\"1\" src=\"http:\/\/pi.feedsportal.com\/r\/165665299226\/u\/49\/f\/646446\/c\/34996\/s\/2c451781\/a2t.img\" border=\"0\"\/><\/p>\n<div class=\"feedflare\">\n<a href=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?a=xCEC6rmXTpA:2E9hVOEV1To:yIl2AUoC8zA\"><img decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~ff\/OmMalik?d=yIl2AUoC8zA\" border=\"0\"><\/img><\/a>\n<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"http:\/\/feeds.feedburner.com\/~r\/OmMalik\/~4\/xCEC6rmXTpA\" height=\"1\" width=\"1\"\/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you know Java, R or SAS, doing machine learning on Hadoop data just got a lot easier. Concurrent (see disclosure), the company behind the popular Cascading framework for writing big data jobs, has developed a new open source tool called Pattern that lets users export their models from statistical analysis applications and run THEM? [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-659648","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/659648","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/comments?post=659648"}],"version-history":[{"count":0,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/659648\/revisions"}],"wp:attachment":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/media?parent=659648"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/categories?post=659648"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/tags?post=659648"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}