CFP last date
20 May 2024
Call for Paper
June Edition
IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper
Know more
Reseach Article

Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents

by Dr.S.Basavaraj Patil
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 13 - Number 8
Year of Publication: 2011
Authors: Dr.S.Basavaraj Patil
10.5120/1803-2279

Dr.S.Basavaraj Patil . Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents. International Journal of Computer Applications. 13, 8 ( January 2011), 6-14. DOI=10.5120/1803-2279

@article{ 10.5120/1803-2279,
author = { Dr.S.Basavaraj Patil },
title = { Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents },
journal = { International Journal of Computer Applications },
issue_date = { January 2011 },
volume = { 13 },
number = { 8 },
month = { January },
year = { 2011 },
issn = { 0975-8887 },
pages = { 6-14 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume13/number8/1803-2279/ },
doi = { 10.5120/1803-2279 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:02:09.466317+05:30
%A Dr.S.Basavaraj Patil
%T Neural Network based Bilingual OCR System: Experiment with English and Kannada Bilingual Documents
%J International Journal of Computer Applications
%@ 0975-8887
%V 13
%N 8
%P 6-14
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The paper presents the Neural Network based Bilingual OCR system which can read printed document images, written in two scripts of English and Kannada languages. Such systems are highly preferred in automation of multi-script, multi lingual document processing. The developed system includes document image pre-processor, dynamic feature extractor, neural network based script classifier, Kannada character recognition system and English character recognition system. Document image pre-processor, accepts the bilingual document image and performs grey to two tone conversion, segmentation into lines and words. Dynamic feature extractor extracts distinctive equal number of features from each separated word irrespective of size of the word. These features are accepted by probabilistic neural classifier and are sorted by script, Kannada and Roman. Developed Kannada character recognition system accepts these words and further segments each word into characters and maps the recognized characters into corresponding ASCII values of the chosen Kannada font. Similarly specifically developed English character recognition system, segments English words into characters and maps to corresponding ASCII value of the specific English font. Thus recognized English and Kannada characters are written into separate ASCII files language wise. The results are exciting and proved the effectiveness of the approach.

References
  1. A. L. Spitz, "Determination of the Script and Language content of Document Images", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No. 3 , pp. no. 235 - 245, March 1997.
  2. T. N. Tan, "Rotation Invariant Texture Features and their use in Automatic script Identification", IEEE Transactions on PAMI, Vol.20, No.7, pp. no. 751 - 756, July 1998.
  3. J. Hochberg, P. Kelly, T. Thomas and Lila Keens, "Automatic Script Identification from Document Images using Cluster based Templates", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 19, No.2, pp. no. 176 - 181, Feb 1997.
  4. B. B. Chaudhuri and U. Pal, "Automatic separation of machine printed and handwritten text lines", 5th Ineternational Conference on Document Analysis and and Recognition, Vol.1, pp. no. 645 - 648, 1999.
  5. U.Pal and B.B. Chauduri, "Automatic seperation of words in multi-lingual multi- script Indian documents", 4th International Conference on Document and Recognition, Vol.2, pp. no.576 - 579, 1997.
  6. Sanghamitra Mohanty “A Novel Approach for Bilingual (English-Oriya) Script Identification and Recognition in a printed document”, International Journal of Image Processing, Volume 4, Issue 2, 2010.
  7. C.V.Jawahar, Pavan Kumar, S.S.Ravi Kiran, “A Bilingual OCR for Hindi-Telugu Documents and its applications”, Proceedings of 7th International Conference on Document Analysis and Recognition (ICDAR)-2003.
  8. Sanjeev Kunte and Sudhakar Samuel, “A Bilingual Machine-Interface OCR for printed Kannada and English Text Employing Wavelet Features”, 10th IEEE International Conference on Information Technology,2007.
  9. Padma and Vijaya, “Script Identification from Trilingual documents using profile based features”, International Journal of Computer Science and Applications, Volume 7, No.4, pp. 16-33, 2010.
  10. S.Basavaraj Patil and N V Subbareddy “ Neural Network based System for Script Identification in Indian Documents“, Sadhana, Special Issue on Indian Language Document Processing”,Vol.27,part-1,2002.
  11. C. C. Tappert, C. Y. Suen and T. Wakahara, "The state of art in on-line handwriting recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence No.12, pp. no.787 - 808, 1990.
  12. T. Y. Young and Fu, Handbook of Pattern Recognition and Image processing, Academic Press, New York, 1986.
  13. J. R. Ullman, Pattern Recognition techniques, Butterworths, London, 1973.
  14. R. H. Cheng, C. W. Lee and Z. Chen, "Pre-classification of handwritten Chinese characters based on basic stroke substructures", Pattern Recognition Letters Vo.16, pp. no. 1023 - 1032, 1995.
  15. C. C. Han, Tseng, Y. L. Fan, and K.C. Wang, "Coarse classification of Chinese characters via stroke clustering method", Pattern Recognition Letters, Vol.16, pp. no.1079 - 1089, 1995 .
  16. K. K. Biswas and S. Chatterjee, "Feature based recognition of Hindi characters", International conference on Pattern recognition, Image processing and Computer Vision, Kharagpur, pp. no.182 - 187, 1995.
  17. C. Y. Suen, J. Guo and Z. C. Li, "Analysis and recognition of alphanumeric handprints by parts", IEEE Transactions on Systems, Man and Cybernetics, Vol.24, pp. no. 614 - 631, 1994.
  18. Sameer Antani and Lalitha Agnihotri, "Gujarati character recognition", International Conference on Document Analysis and Recognition, pp. no.418-421, 1999.
  19. P. Nagabhushan, Radhika and M. Pai, "Modified region decomposition method and optimal depth decision tree in the recognition of non-uniform sized characters- An experimentation with Kannada characters", Pattern Recognition Letters, Vol.20, pp. no.1467 - 1457, 1999.
  20. T. V. Ashwin and P. S. Sastry , "A font and size independent OCR system for printed Kannada documents using support vector machines", SADHANA, Vol.27, Part 1, pp. no.35 - 58 February 2002.
Index Terms

Computer Science
Information Sciences

Keywords

Script Classification Kannada Character Recognition Bilingual OCR