CDAC has released Inscript 5.1 proposal (requires user details form submission at the site) arrived after discussions with Microsoft IBM and Redhat. The new proposal enhances the Inscript to latest Unicode standard 5.1, by providing backward compatibility with mappings for for ZWJ, ZWNJ and an extended layer to encode the new character codes. The toggle key for basic to extended layers is not defined. Another proposal is to make the mapping language specific rather than script. For the Telugu language, my feedback is that Indo-Arabic numerals need to be retained as default rather than Telugu numerals.
However smart mobiles will become commonplace than PCs in 21st century, this standard is inadequate as most mobiles have keypads with about 35 keys for typing. As the computers are powerful, what is needed is a clever way to accommodate the most practically used letters of the language on this keyboard. As most users are bilingual, phonetic approaches make it convenient for new users to easily switch between the languages. Phonetic based schemes with dictionary support for word hinting from Microsoft and Google are now available for PC. Alternative statistical based input method approaches are becoming available with mobile phones targeting basic 12 keys models. Tirumalakrishna Desikachary has defined one Phonetic keyboard (released along with a font called Pothana) for Telugu language 10 years back. In this short and long forms of vowel are assigned to the same key. Vowel keys are treated as independent vowels if they appear first in the word and as dependent vowels (or matras of the preceding consonant) if they appear after the first letter of the word. This avoids wasting of keys for representing matra forms of vowels. This still uses 2 keys beyond the basic alphabet to accommodate 4 Telugu letters. Out of that only one (letter au) is in popular use and that when it is interchanged with chandrabindu, all the letters that are in popular use are accommodated on the English alphabet keys. Similar arrangement can be worked out for Hindi and other languages based on the initial proposals of Desikachary.
Compared to other phonetic methods like Itrans and RTS, Pothana keeps the mapping simple one to one except for the overloading of vowel keys based on the context. It is desirable to have standard for Phonetic input on mobiles to enable the IT revolution to benefit the masses.
When will we have Inscript Mobile?
Labels: cdac , indic , inscript , phonetic , smart mobile
Firefox Indic language usage is not growing rapidly

Firefox is providing insight into download and use of its language versions from Dec-2009. It uses a feature called blocklist, where in the Firefox browser contacts Mozilla servers for updates on the malware websitesm once a day. Through this transaction, Mozilla is able to track the usage of the locale of the browser.
I got access to few Indic language report s (Hindi, Kannada, Malayalam, Marathi, ,Tamil, Telugu) for the past 3 months and tried to analyze the same. In the chart below, the downloads for the week are shown in the chart. Hindi tops the list, among the languages I have analyzed, an expected statistic in view of the larger population.
However the actual usage for the week ending 2010-03-14 is shown below. There is not much change in the last few months in these numbers. Hindi again tops the list, but its proportion of users from outside India is very much low compared to all other languages . Malayalam has large percentage of users outside India relative to to other languages. Hindi users are about 0.1% of the total users from India for Firefox 3.6 Version. Other language percentages are much below. Though the download numbers every week are roughly the same, the usage has not improved much in the last three months. This means that users are curious to find out the language version looks like, but many are not really using it.
It would be interesting to speculate about the reasons. Many users could be using computers provided by their employer. They may not be comfortable with the localized version. Or they might have experienced font display issues, if their system is not configured for rendering the language. Any other guesses and suggestions on how we can convert the downloaders into actual users?
More detailed analysis for Telugu is available in Telugu blogpost entry.
Indian Wikipedias-Grading based on Traffic Analysis
Indian language wikipedia community has so far been looking at more of input statistics, like # of pages, page depth to measure their growth. Using absolute numbers based on content contribution is not a right metric to compare the languages. Outcome based measures like the # of unique visitors per Million speakers is a better measure. Instead of looking at ranking based on value, comparison between groups of languages based on grades can show better insights. I present here the same for Indian languages with more than 1 Million primary and secondary speakers. I have excludeded Sanskrit, Bishnupriya Manipuri (score of >10000) . I have dropped English (>10000) as it is a worldwide language and is not useful for comparison. I have included Chinese(zh) as that language has similar problems as our with regard to use on computers.
It shows Malayalam tops the list and is on par with Chinese (approximately 150). Tamil. Telugu, Marathi are in the second group(>50), Kannada, Urdu,Hindi, Gujarati, Bengali(>25) are in third group. Assuming that these language communities have similar issues with regard to Wikipedia, this trend is expected to continue for the next decade. Only surprise element may be, if Hindi beats the prediction and emerges as the top, because of it being treated as a national language and incorporating it in the curriculum. These statistics I hope will make the Indian Wikipedian communities think about what needs to be done to move to the top grade. Another interesting analysis on how soon these will grow would be, to look at the number of active Wikipedians and what is being done to prmote wikipedia in these communities. I will plan that for a later post. Meawhile I look forward to your feedback.
For those, who are interested in the inputs for this analysis, I have taken the daily statistics on 2009-09-01 and the language population numbers at wiki for generating this chart.
Labels: indian , languages , statistics , telugu , wikipedia
Promoting Indian Language Wikipedias

Wikipedia in Indian languages are of great help to bridge the information gap. Concerted effort to promote Wikipedia awareness is needed. I have created a presentation showing snapshots of Indian language Wikipedia pages, which can be used at any relevant event to promote Wikipedia.
This was done on Ubuntu 8.10 with Open office 2.4, Firefox, Inkscape. There are certain difficulties in attempting this, as I am not a polyglot and know only four languages. This will also be useful for wikipedians tocompare the visual appearance of wikipedia in different languages and plan improvements in their own language version. Improvement ideas and help with additional text/content for the languages is welcome.
INDIKEYS: Keymap stickers for Indian Languages
Using computer through my mother tongue is my hobby. I started doing it some time in 2005, when Linux started sporting this feature. The indic localization improved a lot. I also contributed to Telugu localization of Firefox. Recently, I became interested in Indian language Wikipedias. I realized that though there are lot of members, active contributors are very few.
The big problems I think are awareness and the hurdle to start typing in native language on the English keyboard.
While people from Asia pacific countries such as Japan and China can be proud of their native language support on Computers, we in India have to bear with this problem for few more decades. The colonial rule that we endured is a blessing because we acquired English language which benefitted us in IT. It is also a bane because it made our native languages second class citizens on computers.
While indian government has been working at this for quite some time, the use of native languges is dismally low. Even CDAC's release of Linux based distribution in Indian languages is hindered as it does not provide basic keyboard stickers for Indian languages using Inscript, a complicated key mapping scheme for newbies. There are costly foreign solutions. After trying for few months to find Indian language stickers, I came up with a simple sticker generation scheme, using printing on self-adhesive laser printer paper. A sample file for Telugu and the pdf are available. It can be easily adopted for other Indian languages.
Business Analytics for Railways reservation system

Business analytics is one word which I started hearing from 2005 or so. This is much more jazzier form of the plain Management Information System of the early years of the computer. In simple terms, it means getting an insight from the customers dealings in the past and coming up with better alternatives and ideas to meet the customer needs. While this is being pursued by the private commercial organizations, there is a lot that government or public organizations need to do to provide a better experience for the customer. i will illustrate the same with a real life example of reserving a berth on Indian Railways.
Recently I was to travel to from Bangalore to Hyderabad and utilized the online reservation system of Railways to book a seat. While I have got the confirmed reservation for my onward journey, the status for my return journey was WL 119/WL92. So I had to make a decision on whether I should continue with my reservation for return journey or find alternate modes of travel. I had no means of knowing the chance of confirming a seat or a berth with the status that I was presented. As Railways has multitude of quotas including Tatkal and categories of reservation, I doubt whether it would be possible for even Railway officers to give any probability of confirmation. The only useful information is the final reservation charts for the train on the same day of week/month in the recent past. But these are not made available on the internet. If an analytics application is developed utilizing the past history of final train reservation charts for the same day of the previous week/month/year and the availability of total seats, it will help provide guidance to the travellers when they have to book a ticket.
Referring back to my case, I just went ahead with reservation with a prayer on my lips.I tracked the status regularly, and few days before the status got changed to RAC and then finally I was alloted a berth in the train after the TTE found few vacant berths after the train had commenced its journey.
Text: CC by SA 3.0
Image: From Wiki Commons :Indian_metre_gauge_Train.jpg
Labels: analytics , business , railway , reservation
Computer Literacy for School Children

Sarva Shiksha Abhiyan seeks to achieve universalization of elementary education by 2010. SSA also seeks to provide Computer Education to bridge Digital Divide.
I am contributing to a session on Computer Literacy for School Children during Barcamp Bangalore 8 on Mar 7-8 2009. In this we will explore the status of computer education in Schools and discuss how to usher in 100% Computer Literacy for School Children. We investigate the status of the school education, the approaches to bring down the cost of hardware and discuss the key issues towards 100% computer literacy for school children. The presentation is given below
Labels: BCB8-S90 , Digital divide , education , internet , k-12