CFP last date
20 May 2024
Reseach Article

An Affix Removal Stemmer for Natural Language Text in Nepali

by Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 91 - Number 6
Year of Publication: 2014
Authors: Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha
10.5120/15882-3439

Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha . An Affix Removal Stemmer for Natural Language Text in Nepali. International Journal of Computer Applications. 91, 6 ( April 2014), 1-4. DOI=10.5120/15882-3439

@article{ 10.5120/15882-3439,
author = { Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha },
title = { An Affix Removal Stemmer for Natural Language Text in Nepali },
journal = { International Journal of Computer Applications },
issue_date = { April 2014 },
volume = { 91 },
number = { 6 },
month = { April },
year = { 2014 },
issn = { 0975-8887 },
pages = { 1-4 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume91/number6/15882-3439/ },
doi = { 10.5120/15882-3439 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:12:01.477589+05:30
%A Abhijit Paul
%A Arindam Dey
%A Bipul Syam Purkayastha
%T An Affix Removal Stemmer for Natural Language Text in Nepali
%J International Journal of Computer Applications
%@ 0975-8887
%V 91
%N 6
%P 1-4
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Stemming is the prerequisite step in Text Mining, Spelling Checker applications as well as a basic requirement for Natural Language Processing (NLP) tasks. Also it is very important in most of the Information Retrieval (IR) systems. This paper describes an affix stripping technique for finding out the stems from context free text in Nepali Language using lexical lookup based and rule based approach. It starts by introducing different types of lexicon, the basic unit of Nepali stemmer and few rules to identify the word in the lexicon. These rules and lexicons are applied in the design and implementation of an extensible architecture of a stemmer system for Nepali text. Finally designed stemmer performance is evaluated over different domains of 1,800 words. These domains include news on Economics, Health & Political in Nepali language, which are based on Devanagari Script. The overall accuracy of the designed system is 90. 48%. Due to the absence of extensive linguistic resources, this technique shows improvement in the performance over simple rule based system.

References
  1. T. Siddiqui and U. S. Tiwary, "Natural Language Processing and Information Retrieval", Oxford University Press Publication, 2010.
  2. P. Sinha, B. Sarma and B. Purkayastha, "Kinship Terms in Nepali Language and its Morphology" International Journal of Computer Applications, Vol. 58 pp. 9-15, 2012.
  3. B. Prasain, LP. Khatiwada, B. K. Bal, and P. Sheathe, "Part-of-speech Tagset for Nepali", Madan Puraskar Pustakalay, 2008.
  4. J. B. Lovins, "Development of a stemming algorithm", Mechanical Translation and Computational Linguistics 11, 1968, pp. 22-31.
  5. M. F. Porter, "An algorithm for suffix stripping", Program, 14(3) 1980, pp. 130?137.
  6. C. J. van Rijsbergen, S. E. Robertson and M. F. Porter, "New models in probabilistic information retrieval", British Library Research and Development Report, no. 5587,1980.
  7. C. D. Paice, "An evaluation method for stemming algorithms", In the Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 1990, pp. 42 – 50.
  8. J. Dawson, "Suffix removal and word conflation", LLCbulletin, 2(3), 1974, pp. 33– 46.
  9. R. Krovetz, "Viewing morphology as an inference process", In Proceedings of the 16 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp. 191-202.
  10. http://www. mpp. org. np
  11. http://www. bhashasanchar. org
  12. C. Sitaula, "A Hybrid Algorithm for Stemming of Nepali Text", Intelligent Information Management, vol. 5, pp. 136-139, 2013.
  13. B. Das & T. Paul, "Development of Bengali Language Stemmer", A Project Report.
  14. Dr. S. Lipschutz, "Data Structures", International Edition 2008.
  15. http://tdil. mit. gov. in
Index Terms

Computer Science
Information Sciences

Keywords

Stemmer Lexicon NLP Text Mining Spelling Checker IR