Wednesday, January 23, 2008

Carabao Language Kit 1.0.0.3 released

The version 1.0.0.3 is now available for download.

Fixed:

  • Various tagging problems
  • A bug with mid-sentence sequences priority setting
  • Generation of lemmas from the canonic form for tagging-only affixes

Added:

  • A button to tag new entries morphologically
  • A handful of commonly used business entities (e.g., address, phone, fax, business hours)

Improved:

  • Accuracy of some sequences
  • Domains

Monday, January 14, 2008

A discussion on sentiment / opinion extraction

An interesting discussion about sentiment / opinion extraction in Yahoo! TextAnalytics group, initiated by Seth Grimes:

http://tech.groups.yahoo.com/group/TextAnalytics/message/204

I get mentioned somewhere in the middle (look for Digital Sonata):
http://www.b-eye-network.com/view/6744 - with exactly the opposite of what I said :-) .

Monday, January 7, 2008

On the importance of localization

Do you have a Facebook account? If your native tongue is English, chances are that you do. Otherwise, it is far, far from certain.

The geographic distribution of the social network users varies greatly from network to network. Orkut is an Indian / Brazilian / Pakistani domain; Friendster is Filipino / Chinese; Facebook is mostly used in Anglophone countries. In addition to the more elaborate reasons, such as an average mindset in a particular country (e.g., some are more after pictures, others like to argue and write essays, etc.) - there is one very simple reason. The users either have to strain themselves to use a non-native GUI, or are simply unable to use it. Believe it or not, an average person on the planet earth is only fluent in one language, especially if his/her native tongue is one of the widespread ones.

Recently I witnessed something really cool. A small local social network took a dead grip on a huge market of 300 million or so. Facebook, along with the other giants, seems to be oblivious of this.

Russians, or rather people associated with ex-USSR, usually do not possess good English skills. Russian is still lingua franca throughout the entire ex-USSR space, and is a preferred medium of communication among millions of migrants from there.

So when Odnoklassniki.ru, a small website offering people to get in touch with their ex-classmates was launched (the literal translation of the word "odnoklassniki" is "classmates"), it seemed to land on a right spot. Nobody in ex-USSR used Facebook. It would be equally absurd to expect the Russians to use Baidu, the main Chinese search engine. Odnoklassniki.ru grew up at a rate which is hard to believe.

I got an invitation from an ex-classmate in my primary school in Moscow. I usually discard invitations to social networks, they are already too many to keep track on. Just out of curiosity I went there, to discover 4 or 5 of the classmates I haven't seen in 15 years or so, among about 300 ex-students of the same school. That was two moths ago. Now over 2,000 users are associated with this school. (Obviously, it doesn't even exist on Facebook.) As of now, the majority of people I wanted to reconnect with, are there. The website is technically average (makes my Opera hang sometimes) and has too many ads, but it is unlikely I'll ever leave it for anything else.  It simply does the job.

The only problem I have is that when I give the URL of my photo album in Flickr, very few are able to use it. Why? You guessed it right: because Flickr does not have a localized version for Russian (despite the funky hello messages in 150 languages), so even navigating through the registration page is too difficult for most.

So I wonder, will they get it? I say, no. Which means, there will be a Chinese Facebook, a Japanese Facebook, and a French Facebook, and there is absolutely no chance these guys will recapture the non-Anglophone markets.