Indian readership on the Internet is growing rapidly 42% yoy as per IRS Q32011 (pdf) numbers, while the press growth is measly3.1%yoy. Many newspapers particularly in Telugu (Andhrajyothy, Sakshi,Vaartha etc) who have started with non Unicode encodings for their online versions have switched over to Unicode in the recent past. Even the district editions are available on the Internet. Unfortunately, the archival is only available for few weeks or few months at most. Even this content is mostly PDF, which can't be searched by search engines, as PDF is mainly an output mechanism for Indian languages which have complex text scripts. These are the major major setbacks to Indian languages, as Search engines, encyclopaedic sites like Wikipedia rely on digital content which is maintained forever to provide links to the information.
Few Indian language newspapers and Government Planning magazine (Yojana in other than Hindi version) and state government magazines (Sample Page of Andhra Pradesh Magazine in pdf) and websites(notably information and public relations website) are still following legacy fonts and pdf files, as they have not even migrated to Unicode for their content. As Information plays increasingly dominant role in our day to day life, the traditional digital content producers who tweak their systems to archive their valuable content in Unicode without any time limit will stand to benefit themselves and their readers much more than others.
For Indian languages, the rendering is a complex process as illustrated by this Telugu example (Ya+E matra->Yi). As pdf only retains the final image, it is not amenable for search (Illustration Via WIkimedia Commons, Author:Hatukanezumi) |
0 comments:
Post a Comment