Internationalizing Dates, Numbers, Plurals, and More
Wed, 05/02/2012 - 12:24
by Cameron Dutro "You have kept London time, which is two hours behind that of Suez. You ought to regulate your watch at noon in each country." - Jules Verne, Around the World in 80 Days Here’s a test. Say this date out loud: 2/1/2012. If you said, “February first, 2012”, you’re probably an American. If you said, “January second, 2012”, you’re probably of European or possibly Asian descent. If you said, “January 12, 1902”, you’re probably a computer. The point is that we almost never think about dates - or plurals, lists, quotes, numbers, abbreviations, and capitalization - in our everyday lives. Most of the time, they just are the way they are. Of course if you’re creating a platform available around the world, these kinds of minutiae make a big difference. In fact, they can make months or years of difference. A long time ago, in a galaxy far, far away, programmers didn’t have the tools to display anything but Latin characters and numbers. Forget Japanese, forget Chinese - all you could write were the letters A-Z. We needed a better standard, so the Unicode Consortium was born. Whereas A-Z characters (also known as ASCII) only fit into one byte, Unicode characters could fit into multiple bytes. More room meant more characters, including characters from non-Latin alphabets. To fit the characters into the right number of bytes, programmers developed several “encoding” systems, the most popular of which became UTF-8. The Unicode Consortium published a bunch of data regarding formatting dates, numbers, lists, and more, called the CLDR or the Common Locale Data Repository. Pretty soon IBM developed something called ICU, or the International Components for Unicode, a library that used the Consortium’s data to make it easier for programmers to use. Sun Microsystems (now Oracle) added ICU support into their popular Java programming language. For the first time, full internationalization support was available to thousands of programmers in the form of the Java Software Development Kit. Java’s internationalization tools are the envy of quite a few other programming languages. Ruby, one of the most predominant languages used at Twitter, is no exception - we just don’t have the awesomeness that Java has! It’s time to change that. TwitterCLDR provides a way to use the same CLDR data that Java uses, but in a Ruby environment. Formatting dates, times, numbers, currencies, and plurals should now be much easier for the Rubyist. Let’s go over some examples. Dates # 21:44:57 UTC -0800 lunes 12 de diciembre de 2011 Numbers # 1.472 Currencies # € 1.337,00 Plurals replacements = { :horse_count => 3, :horses => { :one => "is 1 horse", :other => "are %{horse_count} horses" } } # "there are 3 horses in the barn" Inline Plurals str = 'there %<{ "horse_count": { "one": "is one horse", "other": "are %{horse_count} horses" } }> in the barn' # "there are 3 horses in the barn" In the future, we hope to provide even more internationalization capabilities to TwitterCLDR, including a JavaScript version, quoting, abbreviations, collation (sorting), and normalization. Stay tuned! TwitterCLDR is available now on Github at http://github.com/twitter/twitter-cldr-rb. Check out the link for a full explanation of its capabilities. If you use Ruby and work for an international company, or if your company is considering supporting multiple languages, we’d like to invite you to try TwitterCLDR to take some of the pain out of formatting dates, times, currencies, and more. We’ve decided to share TwitterCLDR to give back to the Ruby and internationalization communities - together we can reach every person on the planet. |
Categories
Archive
Other Twitter BlogsBlogs in other languagesOur Accounts |