Chris Potts – Mereja

Posting on behalf of Phil Resnik:

This post brings together a bunch of news about language-related efforts to help out in Haiti:

First, Jeff Allen, an expert in language technology for Haitian creole and for language resources more generally, has arranged with CMU’s Language Technology Institute to release, with unusually friendly terms of use, relevant language resources they collected over a number of years. This includes a set of speech data, and, probably more immediately relevant, parallel data (translations of medical phrases and sentences in the medical domain) in English and Haitian creole.

Second, a group called Crisis Commons is organizing "CrisisCamp" events in a variety of cities, where volunteers get involved in "activities such as crisis mapping, data and RSS feed aggregation. In addition, people with specialized skills such as translation, computer programing and literacy advocates are encouraged to participate." Of particular interest to Language Log readers, they’ve got a Language and Translation team involved in a variety of efforts, one of which is trying to get machine translation capabilities up and running. Results so far are pretty rudimentary, e.g. the demo:

Source: Tanpri nou bezwen manje dlo ak tant pou nou ka kouche tanpri voye je nou gade kafou soutou nan z?n b?ten ak titus tanpri nou bezwen manje paske kay nou

Target: please our need meal river and aunt in our quart lay please broadcast eye our regard intersection unknown in unknown unknown and unknown please our need meal because dwelling our

Human translation: Please we need food and water and tents so that we can sleep. Please go look at Carrefour particularly in the area of Titus and Betem (or Boten) . We need food because – Incomplete

Information collected by Crisis Commons on language and translation can be found here. Christopher Taylor has been working on collecting up parallel text, aiming to get a statistical MT system up and running quickly, which might, one hopes, do better than just a dictionary-based approach. In either case, of course, nonstandard orthography, spelling errors, etc., are going to be a challenge. Still, maybe automatic translation can help in some ways.

Where did the human translation above come from? The specific answer is here. The general answer is that Rob Munro of Stanford has been coordinating volunteer translators as part of an impressive broader effort involving the use of text messaging for crisis-related communication. In this Wired article, he is quoted as saying, "The total number of texts is in the thousands, and they arrive every five seconds in busy times, to every 10 minutes overnight". Munro says the average turnaround time for translation is around 10 minutes, which is really striking. In fact, it has me wondering about use cases that would make automatic MT’s increase in scalability/speed worthwhile given the presumed decrease in quality. (I believe we need to spend more time on ways to combine automatic methods with human effort in order to achieve both higher quality and better scalability/lower cost, but that’s a topic for another day.) For more details on Munro’s efforts.

Third, the "Tweak the Tweet" effort at University of Colorado, conceived of by graduate student Kate Starbird, involves getting people to annotate the natural language in tweets using a small vocabulary of emergency-related hashtags, in order to make them more amenable to automatic processing. One of their examples:

TWEET-BEFORE: Altagrace Pierre needs help at Delmas 14 House no. 14.

TWEET-AFTER: #haiti #name Altagrace Pierre #need help #loc Delmas 14 House no. 14.

Adding structured information to unstructured tweets, particularly in tandem with geolocation, could enable a whole variety of useful applications. (Typing "Delmas 14 House no. 14, port au prince, haiti" into this function gets you the latitude and longitude for the address. Call me old fashioned, but I find this pretty astonishing, assuming of course that the info it returned is correct.) Colorado’s Prof. Leysia Palen, quoted in Discovery News, concedes that it is not realistic to expect volunteers and organizations originating tweets to include these annotations. The Tweak the Tweet site has an alternative suggestion, though, which is that volunteers translate tweets into the hashtag syntax (see "How to Help"). Perhaps someone ought to contribute some money to get Mechanical Turkers doing this?

These are just a few things I’ve become aware of during the last day or so that might interest language folks. The level of energy, innovation, and willingness to help is a great thing to see, and I’m sure there’s a lot more out there…

Author: Chris Potts

Language-related efforts to help out in Haiti