CFP last date
20 March 2024
Reseach Article

Unsupervised Tagging of Chinese Articles

by Shailendra Singh Kathait, Shubhrita Tiwari
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 165 - Number 3
Year of Publication: 2017
Authors: Shailendra Singh Kathait, Shubhrita Tiwari
10.5120/ijca2017913825

Shailendra Singh Kathait, Shubhrita Tiwari . Unsupervised Tagging of Chinese Articles. International Journal of Computer Applications. 165, 3 ( May 2017), 29-32. DOI=10.5120/ijca2017913825

@article{ 10.5120/ijca2017913825,
author = { Shailendra Singh Kathait, Shubhrita Tiwari },
title = { Unsupervised Tagging of Chinese Articles },
journal = { International Journal of Computer Applications },
issue_date = { May 2017 },
volume = { 165 },
number = { 3 },
month = { May },
year = { 2017 },
issn = { 0975-8887 },
pages = { 29-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume165/number3/27554-2017913825/ },
doi = { 10.5120/ijca2017913825 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:11:25.326662+05:30
%A Shailendra Singh Kathait
%A Shubhrita Tiwari
%T Unsupervised Tagging of Chinese Articles
%J International Journal of Computer Applications
%@ 0975-8887
%V 165
%N 3
%P 29-32
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Large amount of insights can be drawn from the articles that are published online. Instead of manually reading all the articles and assigning relevant tags to them satisfying the content, it will be highly efficient if there exists an automated process for performing the task. In this paper, an unsupervised approach for the automated tagging of articles in Chinese language has been implemented. The input is an article and output is the tags to that article. The major challenge is the segmentation of the Chinese characters, which do not make use of separators unlike the English characters. To overcome this, different approaches are combined together in order to get accurate results. Efficient tagging of articles is required, which can be used for many applications in the analysis, one of which is in Recommendation Engine. The tagging process should consider all the aspects of the article and assign the most relevant tags accordingly. The proposed algorithm was implemented for a Chinese Publication House and relevant tags were assigned to its articles of different categories. At the end of the project, the results were manually checked for, in a corpus of 10000 Chinese articles, which reflected the attainment of overall accuracy of around 85%, greater than that obtained through different traditional methods.

References
  1. Aditi Sharan, Siddiqi, Sifatullah, "Keyword and key-phrase extraction techniques: A literature review", International Journal of Computer Applications 109, no. 2 (2015).
  2. Ana, Beliga, Sanda, Slobodan, "An overview of graph-based keyword extraction methods and approaches", Journal of Information and Organizational Sciences 39, no. 1 (2015).
  3. Chao-Huang Chang, Cheng-Der Chen, “HMM-based Part-of-Speech Tagging for Chinese Corpora”, Hsinchu, Taiwan, R.O.C.
  4. Ben Taskar, Joao V. Graca, Shen Li, “Wiki-ly Supervised Part-of-Speech Tagging”, University of Pennsylvania.
  5. “Automatic free-text-tagging of online news archives”.
  6. Gridaphat Sriharee, “An ontology based approach to auto-tagging articles”, University of Technology, North Bangkok, Bangkok, Thailand.
  7. Tf-idf weighting - Stanford NLP Group https://nlp.stanford.edu/IR-book/html/htmledition/tf-idf-weighting-1.html.
  8. Andreea Godea, Cornelia Caragea, Florin Bulgarov, Sujatha Das Gollapalli, “Citation-Enhanced Keyphrase Extraction from Research Papers: A Supervised Approach”, Singapore.
Index Terms

Computer Science
Information Sciences

Keywords

Text articles automated tagging tags unsupervised approach Recommendation Engine Chinese segmentation corpus.