Improving Filipino Language Identification
- Login to post comments
Dear Filipino translators,
We are working on improving our language identification algorithms. In particular, we want Twitter to learn how to better identify tweets posted in Filipino, so here is where we approach you for help.
In the following spreadsheet, we have compiled a variety of tweets, which our system currently understands as written in Filipino. We would be immensely grateful if you could help us parse through this (admittedly long) list, identifying those phrases, which are indeed written wholly in Filipino.
https://docs.google.com/spreadsheet/ccc?key=0AsEzD64omUo3dEZCdm1BMWRnQ1p...
As a contributing editor, you should mark with an "X" in column B any tweets that are (A) entirely in English, (B) in any other language, or (C) clearly bilingual (e.g., "you put the good in my morning. kahit hapon na."). A few English terms in an otherwise-Tagalog sentence are acceptable and should not be marked (e.g., "Labing-isang improvised explosive device,narekober sa Cotabato ngayong taon"). Further, we would like you to keep an eye out for any tweets in other Philippine languages, especially Cebuano, Hiligaynon, and Kapampangan - these are very close and difficult for the computer to distinguish from Tagalog, so there may be a few mixed in with the rest of the tweets. Those, too, should be marked with an X.
If you have any follow-up questions or require any further clarification, please do not hesitate to contact me.
Thank you so much for your amazing, ongoing work to make the Twitter experience in Filipino much better, day by day.
All my best,
Gaby