{"id":71146,"date":"2009-12-08T10:48:00","date_gmt":"2009-12-08T15:48:00","guid":{"rendered":"http:\/\/www.cmswatch.com\/Trends\/1756-Taming-Apache-Menagerie?source=RSS"},"modified":"2009-12-08T10:48:00","modified_gmt":"2009-12-08T15:48:00","slug":"time-to-tame-the-apache-menagerie","status":"publish","type":"post","link":"https:\/\/mereja.media\/index\/71146","title":{"rendered":"Time to Tame the Apache Menagerie"},"content":{"rendered":"<p>Subscribers to our <a href=\"http:\/\/www.cmswatch.com\/Search\/Report\/\">Search and Information Access Research<\/a> are well aware that we&#8217;ve been increasing our coverage of <a href=\"http:\/\/www.cmswatch.com\/Search\/Vendors\/Apache\">Apache<\/a> <a href=\"http:\/\/lucene.apache.org\/\">Lucene<\/a> lately, in keeping with the phenomenal &#8212; and still growing &#8212; popularity of Apache&#8217;s well-known open-source search engine.<\/p>\n<p>This has led to a coverage conundrum (of sorts) for us, inasmuch as it is no longer possible to cover Lucene properly without also devoting a good deal of discussion to closely related projects like Nutch and (especially) Solr. This becomes problematic at times, not just because we&#8217;re in essence covering multiple projects under one conceptual umbrella, but because the functional and architectural boundaries between things like Lucene, Nutch, and Solr &#8212; though well understood by developers &#8212; are easily blurred in a semi-technical writeup unless special care is taken to distinguish between concepts like <em>search server<\/em>, search <em>engine<\/em>, crawlers versus parsers, etc.<\/p>\n<p>Some of these bits are unique to Lucene (the &quot;engine&quot; part, for example, consisting of the indexer and query framework), whereas others are unique to Solr (e.g., the &quot;query server&quot; bits that handle data-fetching and -passing over HTTP), whereas other bits (like UI widgets for faceted search) aren&#8217;t there at all &#8212; you have to build them yourself.<\/p>\n<p>In short, as we expand our coverage of Lucene, we find ourselves investing ever-greater amounts of time and care in tiptoeing the conceptual boundaries around Solr, Nutch, Lucene, <a title=\"Distributed computing platform\" href=\"http:\/\/hadoop.apache.org\/\">Hadoop<\/a>, and so on. We think we do a pretty good job. But it&#8217;s surprising how many people (including us, at times) <em>still<\/em> have trouble keeping the various pieces of the Apache search world straight.<\/p>\n<p>Our job isn&#8217;t made easier by the Apache Foundation&#8217;s <em>laissez-faire<\/em> attitude toward project naming, which has led to an out-of-control zoo of projects with some sensical but oftentimes nonsensical names like Hadoop, <a href=\"http:\/\/lucene.apache.org\/mahout\/\">Mahout<\/a>, <a href=\"http:\/\/lucene.apache.org\/tika\/\">Tika<\/a>, <a title=\"Content Management System\" href=\"http:\/\/lenya.apache.org\/\">Lenya<\/a>, <a title=\"Java Apache Mail Enterprise Server\" href=\"http:\/\/james.apache.org\/\">James<\/a>, <a title=\"Multipurpose Infrastructure for Network Application\" href=\"http:\/\/mina.apache.org\/\">Mina<\/a>&#8230; and the list goes on.<\/p>\n<p>There&#8217;s a longstanding tradition in R&amp;D (and elsewhere, of course) of using whimsical, short, purposely obscure code names for projects early in their lifetimes. And that&#8217;s fine for prototypes and pre-release versions of software. But a mature product needs a mature name, preferably something descriptive and apropos. For example, <a href=\"http:\/\/incubator.apache.org\/droids\/\">Droids<\/a> is not an entirely inappropriate name for Apache&#8217;s autonomous-robots project. It&#8217;s at least semantically aligned with the domain. But even if you know enough Hindi to figure out that Mahout is a term for the driver of an elephant, you&#8217;re not likely to divine that it is also an open-source project for distributed machine learning algorithms on the Hadoop platform (and you shouldn&#8217;t then be forced to look up what Hadoop means, and so on).<\/p>\n<p>So, Suggestion No. 1 for Apache: <em>When a project graduates from incubation, give it a real name. <\/em><\/p>\n<p>It would also help if Apache namespaced subprojects and\/or related projects in a logical fashion &#8212; a fashion that shows the relationship. For example, would it hurt to call Solr &quot;Lucene Search Server&quot; &#8212; or at least &quot;Lucene Solr&quot;? Solr is, after all, strictly dependent on Lucene, much the way <a title=\"Web Framework for JCR Content Repositories\" href=\"http:\/\/sling.apache.org\/\">Sling<\/a> is dependent on <a title=\"Content Repository for Java\" href=\"http:\/\/jackrabbit.apache.org\/\">Jackrabbit<\/a>.<\/p>\n<p>Suggestion No. 2: <em>Make dependencies evident in project names. It helps people understand what the projects are about.<\/em><\/p>\n<p>If the world is headed toward a Lucene-* stack (as it surely is), wouldn&#8217;t it be nice to be able to refer to it that way? If people are having a hard time understanding that Solr is a search server, wouldn&#8217;t it make sense to put &quot;server&quot;&nbsp;in the name?&nbsp;Bottom line, a rational namespace for Apache projects would be a big win for all concerned.<\/p>\n<p>Those of us who regularly tiptoe the boundaries around Apache&#8217;s zoo of related projects would like to occasionally <a href=\"http:\/\/en.wikipedia.org\/wiki\/Representational_State_Transfer\">REST<\/a> our feet.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Subscribers to our Search and Information Access Research are well aware that we&#8217;ve been increasing our coverage of Apache Lucene lately, in keeping with the phenomenal &#8212; and still growing &#8212; popularity of Apache&#8217;s well-known open-source search engine. This has led to a coverage conundrum (of sorts) for us, inasmuch as it is no longer [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-71146","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/71146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/comments?post=71146"}],"version-history":[{"count":0,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/71146\/revisions"}],"wp:attachment":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/media?parent=71146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/categories?post=71146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/tags?post=71146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}