Technology For Society

Focus on technology trends that create an impact on larger part of Society.

Need for Indian language content in Unicode

Indian readership on the Internet is growing rapidly 42% yoy as per IRS Q32011 (pdf) numbers, while the press growth is  measly3.1%yoy. Many newspapers particularly in Telugu (Andhrajyothy, Sakshi,Vaartha etc) who have started with non Unicode encodings for their online versions have switched over to Unicode in the recent past. Even the district editions are available on the Internet. Unfortunately, the archival is only available for few weeks or few months at most. Even this content is mostly PDF, which can't be searched by search engines, as PDF is mainly an output mechanism for Indian languages which have complex text scripts. These are the major major setbacks to Indian languages, as Search engines, encyclopaedic sites like Wikipedia rely on digital content which is maintained forever to provide links to the information.

For Indian languages, the rendering is a complex process as illustrated by this Telugu example (Ya+E matra->Yi). As pdf only retains the final image, it is not amenable for search (Illustration Via WIkimedia Commons, Author:Hatukanezumi)
Few  Indian language newspapers and Government Planning magazine (Yojana in other than Hindi version) and state government magazines (Sample Page of Andhra Pradesh Magazine in pdf) and websites(notably information and public relations website) are still following legacy fonts and pdf files, as they have not even migrated to Unicode for their content. As Information plays increasingly dominant role in our day to day life, the traditional digital content producers who  tweak their systems to archive their valuable content in Unicode without any time limit will stand to benefit themselves and their readers much more than others.

Popular Posts


FEEDJIT Live Traffic Map

About this blog:

Thoughts on making technology work for everyone


Creative Commons License
Technology For Society- Blog by Arjuna Rao Chavala is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.