An Affix Removal Stemmer for Natural Language Text in Nepali

Abhijit Paul; Arindam Dey; Bipul Syam Purkayastha

Call for Paper

June Edition

IJCA solicits high quality original research papers for the upcoming June edition of the journal. The last date of research paper submission is 20 May 2024

Submit your paper

Know more

The week's pick

Enhancing Privacy Preservation: Multi-Attribute Protection with P-Sensitive K-Anonymity

Twinkle Patel Kiran Amin

Random Articles

A Novel Hidden Markov Model for Credit Card Fraud Detection

December

2012

An Efficient Approach Based on Trust to Purge the Weakness of Recommendation System

February

2010

Performance Enhancement of Database Driven Technique using Cynosure Method in Cloud

October

2014

Performance Analysis of Controlled Scalability in Unstructured Peer-to-Peer Networks

February

2012

Reseach Article

An Affix Removal Stemmer for Natural Language Text in Nepali

by Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 91 - Number 6

Year of Publication: 2014

Authors: Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha

10.5120/15882-3439

Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha . An Affix Removal Stemmer for Natural Language Text in Nepali. International Journal of Computer Applications. 91, 6 ( April 2014), 1-4. DOI=10.5120/15882-3439

@article{ 10.5120/15882-3439,

author = { Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha },

title = { An Affix Removal Stemmer for Natural Language Text in Nepali },

journal = { International Journal of Computer Applications },

issue_date = { April 2014 },

volume = { 91 },

number = { 6 },

month = { April },

year = { 2014 },

issn = { 0975-8887 },

pages = { 1-4 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume91/number6/15882-3439/ },

doi = { 10.5120/15882-3439 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T22:12:01.477589+05:30

%A Abhijit Paul

%A Arindam Dey

%A Bipul Syam Purkayastha

%T An Affix Removal Stemmer for Natural Language Text in Nepali

%J International Journal of Computer Applications

%@ 0975-8887

%V 91

%N 6

%P 1-4

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Stemming is the prerequisite step in Text Mining, Spelling Checker applications as well as a basic requirement for Natural Language Processing (NLP) tasks. Also it is very important in most of the Information Retrieval (IR) systems. This paper describes an affix stripping technique for finding out the stems from context free text in Nepali Language using lexical lookup based and rule based approach. It starts by introducing different types of lexicon, the basic unit of Nepali stemmer and few rules to identify the word in the lexicon. These rules and lexicons are applied in the design and implementation of an extensible architecture of a stemmer system for Nepali text. Finally designed stemmer performance is evaluated over different domains of 1,800 words. These domains include news on Economics, Health & Political in Nepali language, which are based on Devanagari Script. The overall accuracy of the designed system is 90. 48%. Due to the absence of extensive linguistic resources, this technique shows improvement in the performance over simple rule based system.

References

T. Siddiqui and U. S. Tiwary, "Natural Language Processing and Information Retrieval", Oxford University Press Publication, 2010.
P. Sinha, B. Sarma and B. Purkayastha, "Kinship Terms in Nepali Language and its Morphology" International Journal of Computer Applications, Vol. 58 pp. 9-15, 2012.
B. Prasain, LP. Khatiwada, B. K. Bal, and P. Sheathe, "Part-of-speech Tagset for Nepali", Madan Puraskar Pustakalay, 2008.
J. B. Lovins, "Development of a stemming algorithm", Mechanical Translation and Computational Linguistics 11, 1968, pp. 22-31.
M. F. Porter, "An algorithm for suffix stripping", Program, 14(3) 1980, pp. 130?137.
C. J. van Rijsbergen, S. E. Robertson and M. F. Porter, "New models in probabilistic information retrieval", British Library Research and Development Report, no. 5587,1980.
C. D. Paice, "An evaluation method for stemming algorithms", In the Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 1990, pp. 42 – 50.
J. Dawson, "Suffix removal and word conflation", LLCbulletin, 2(3), 1974, pp. 33– 46.
R. Krovetz, "Viewing morphology as an inference process", In Proceedings of the 16 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1993, pp. 191-202.
http://www. mpp. org. np
http://www. bhashasanchar. org
C. Sitaula, "A Hybrid Algorithm for Stemming of Nepali Text", Intelligent Information Management, vol. 5, pp. 136-139, 2013.
B. Das & T. Paul, "Development of Bengali Language Stemmer", A Project Report.
Dr. S. Lipschutz, "Data Structures", International Edition 2008.
http://tdil. mit. gov. in

Index Terms

Computer Science

Information Sciences

Keywords

Stemmer Lexicon NLP Text Mining Spelling Checker IR