{"id":419942,"date":"2010-03-12T09:06:34","date_gmt":"2010-03-12T14:06:34","guid":{"rendered":"http:\/\/knowledgeproblem.com\/?p=6568"},"modified":"2010-03-12T09:06:34","modified_gmt":"2010-03-12T14:06:34","slug":"fish-leg-counts-what-the-web-knows-and-doesn%e2%80%99t-know","status":"publish","type":"post","link":"https:\/\/mereja.media\/index\/419942","title":{"rendered":"Fish leg counts: What the web knows and doesn\u2019t know"},"content":{"rendered":"<p><em>Michael Giberson<\/em><\/p>\n<p>David Pennock hears another another tick of the clock in the <a href=\"http:\/\/blog.oddhead.com\/2010\/03\/07\/countdown-to-web-sentience\/\">countdown to web sentience<\/a>.<\/p>\n<p style=\"padding-left:30px;\">[In 2003] we trained a computer to answer questions from the then-hit game show by querying Google. We combined words from the questions with words from each answer in mildly clever ways, picking the question-answer pair with the most search results. For the most part (see below), it worked.<\/p>\n<p style=\"padding-left:30px;\">It was a classic example of \u201cbig data, shallow reasoning\u201d and a sign of the times. Call it Google\u2019s Law.\u00a0<strong>With enough data nothing fancy can be done, but more importantly nothing fancy need be done: even simple algorithms can look brilliant.<\/strong> When in comes to, say, identifying synonyms, simple pattern matching across an enormous corpus of sentences beats the most sophisticated language models developed meticulously over decades of research.<\/p>\n<p style=\"padding-left:30px;\">Our\u00a0<em>Millionaire<\/em> player was great at answering obscure and specific questions &#8230; It failed mostly on the warm-up questions that people find easy \u2014 the truly trivial trivia. The reason is simple. Factual answers like the year that Mozart was born appear all over web. Statements capturing common sense for the most part do not. Big data can only go so far.<\/p>\n<p>In 2003 their best example of a question that they could not answer via websearch was &#8220;<em>How many legs does a fish have?<\/em>&#8220;<\/p>\n<p><em>Now<\/em>, on the other hand, Pennock said:<\/p>\n<p style=\"padding-left:30px;\">I was recently explaining all this to\u00a0<a href=\"http:\/\/research.yahoo.com\/Michael_Schwarz\">a colleague<\/a>. To make my point, we\u00a0<a href=\"http:\/\/www.google.com\/#q=how+many+legs+does+a+fish+have%3F&amp;fp=1\">Googled<\/a> that question.\u00a0<strong>Low and behold, there it was: asked and answered \u2014 verbatim \u2014 on Yahoo! Answers.<\/strong> <a href=\"http:\/\/answers.yahoo.com\/question\/index?qid=20080702180134AAtBbNq\">How many legs does a fish have? Zero.<\/a> Apparently Yahoo! Answers\u00a0<a href=\"http:\/\/www.google.com\/#q=+site:answers.yahoo.com+how+many+legs+does+a+fish+have%3F\">also knows<\/a> the number of legs of a crayfish, rabbit, dog, starfish, mosquito, caterpillar, crab, mealworm, and \u201cabout 133,000\u2033 more.<\/p>\n<p>Pennock links to <a href=\"http:\/\/blog.computationalcomplexity.org\/2009\/04\/ai-in-jeopardy.html\">Lance Fortnow&#8217;s related comments<\/a> on <a href=\"http:\/\/www.nytimes.com\/2009\/04\/27\/technology\/27jeopardy.html\">IBM&#8217;s effort to write a <em>Jeopardy<\/em>-playing computer<\/a>, and Fortnow suggests something that is going to remain hard for computers for a while: making sense of natural language in context. Fortnow, part of the group that wrote the <em>Millionaire <\/em>paper, said:<\/p>\n<p style=\"padding-left:30px;\">Humans have little trouble interpreting the meaning of the &#8220;answers&#8221; in\u00a0<em>Jeopardy<\/em>, they are being tested on their knowledge of that material. The computer has access to all that knowledge but doesn&#8217;t know how to match it up to simple English sentences.<\/p>\n<p>  <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/gocomments\/knowledgeproblem.wordpress.com\/6568\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/comments\/knowledgeproblem.wordpress.com\/6568\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/godelicious\/knowledgeproblem.wordpress.com\/6568\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/delicious\/knowledgeproblem.wordpress.com\/6568\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/gostumble\/knowledgeproblem.wordpress.com\/6568\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/stumble\/knowledgeproblem.wordpress.com\/6568\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/godigg\/knowledgeproblem.wordpress.com\/6568\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/digg\/knowledgeproblem.wordpress.com\/6568\/\" \/><\/a> <a rel=\"nofollow\" href=\"http:\/\/feeds.wordpress.com\/1.0\/goreddit\/knowledgeproblem.wordpress.com\/6568\/\"><img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/feeds.wordpress.com\/1.0\/reddit\/knowledgeproblem.wordpress.com\/6568\/\" \/><\/a> <img decoding=\"async\" alt=\"\" border=\"0\" src=\"http:\/\/stats.wordpress.com\/b.gif?host=knowledgeproblem.com&#038;blog=5880275&#038;post=6568&#038;subd=knowledgeproblem&#038;ref=&#038;feed=1\" \/><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Michael Giberson David Pennock hears another another tick of the clock in the countdown to web sentience. [In 2003] we trained a computer to answer questions from the then-hit game show by querying Google. We combined words from the questions with words from each answer in mildly clever ways, picking the question-answer pair with the [&hellip;]<\/p>\n","protected":false},"author":4109,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[],"class_list":["post-419942","post","type-post","status-publish","format-standard","hentry","category-news"],"_links":{"self":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/419942","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/users\/4109"}],"replies":[{"embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/comments?post=419942"}],"version-history":[{"count":0,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/posts\/419942\/revisions"}],"wp:attachment":[{"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/media?parent=419942"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/categories?post=419942"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mereja.media\/index\/wp-json\/wp\/v2\/tags?post=419942"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}