OCR: Optical Character Recognition for Indian Languages
The Objective of the OCR system is to develop robust OCR's for printed Indian
scripts, which can deliver desired performance for possible conversion of
legacy, printed documents into electronically accessible format. The system has
been developed for
Bangla, Devanagari, Gurumukhi, Kannada Malayalam, Telugu and
it will soon be available for Gujrati, Tamil, Oriya, Tibetan, Assamese,Manipuri,Urdu
script in future. Indian Language OCR being a consortium based project is
having a hybrid approach, designed to work with the platform and technology independent
modules. This system has been developed to
facilitate the digitization of the multi-lingual textual images. The area of
coverage of the system is Printed Text OCR. The implementing Agency comprises of
Consortium with IIT Delhi as Consortium Leader .This Sytem is an outcome of
effort of consortium members sponsored by Ministry of Communication and
nformation Technology. The preprocessing modules such as Noise cleaning,Skew
detection, binarization modules have been developed by different consortium
institutes. The Language Vertical tasks and integration have been carried out by
various consortia members.