Reading: Hierarchical tag-set for rule-based processing of Tamil language

Download

A- A+
Alt. Display

Articles

Hierarchical tag-set for rule-based processing of Tamil language

Authors:

Kengatharaiyer Sarveswaran ,

University of Jaffna, LK
About Kengatharaiyer
Department of Computer Science
X close

Sinnathamby Mahesan

University of Jaffna, LK
About Sinnathamby
Department of Computer Science
X close

Abstract

Corpora are fundamental tools for Natural Language Processing. Part of Speech tagging provides more meaning to the corpora by annotating words. A tag-set used to annotate a corpus should be selected in such a way that it represents grammatical structure of the respective language. These tag-sets can be flat or hierarchical in structure. There are several efforts have been made in Tamil language to identify a tag-set. However, existing tag-sets have many shortcomings including inability of tagging all the words, inability to capture required syntactic information such as divisibility, too many numbers of tags in a set, flat in tag structure, and lack of extendibility. The scholar works Tolkāppiyam and Naṉṉūl clearly shows the grammatical classification of words. This paper proposes a new hierarchical tag-set with 10 labels for Tamil language in view of developing a morphological analyser by considering the existing limitations and using Tamil grammar. The morphological analyser can be used to extend the proposed tag-set easily with more grammatical information.

How to Cite: Sarveswaran, K. and Mahesan, S., 2014. Hierarchical tag-set for rule-based processing of Tamil language. International Journal of Multidisciplinary Studies, 1(2), pp.67–74. DOI: http://doi.org/10.4038/ijms.v1i2.53
Published on 31 Dec 2014.
Peer Reviewed

Downloads

  • PDF (EN)

    comments powered by Disqus