Building a dictionary of Malay language part-of-speech tagged words using Bahasa WordNet and Bahasa Indonesia resources / Mohamed Lubani and Rohana Mahmud

Lubani, Mohamed and Mahmud, Rohana (2015) Building a dictionary of Malay language part-of-speech tagged words using Bahasa WordNet and Bahasa Indonesia resources / Mohamed Lubani and Rohana Mahmud. In: ICOMHAC2015 eproceedings, 16-17 Disember 2015, Century Helang Hotel, Pulau Langkawi.

Abstract

Assigning grammatical categories to words in natural text is a vital step in processing natural language. Language resources and text processing tools such as part-of-speech (POS) can be used to assign each word the corresponding grammatical category based on its context. Such resources are available for the major languages such as English, Spanish and Japanese. However, the lack of resources for Malay language makes it very hard to develop new processing tools and contribute to the automation of the language processing. In this paper, a Malay POS dictionary is built using Bahasa wordnet and a POS tagged of Indonesian corpus, as well as a monolingual Malay dictionary. The output is a list of 25,778 Malay POS tagged words where each word is assigned all its possible grammatical categories. The proposed process can also be used as a guideline for future improvements

Metadata

Item Type: Conference or Workshop Item (Paper)
Creators:
Creators
Email / ID Num.
Lubani, Mohamed
mohamed.lubani@siswa.um.edu.my
Mahmud, Rohana
rohanamahmud@um.edu.my
Subjects: L Education > LB Theory and practice of education > Educational technology
L Education > LB Theory and practice of education > Learning. Learning strategies
Divisions: Universiti Teknologi MARA, Kedah > Sg Petani Campus
Event Title: ICOMHAC2015 eproceedings
Event Dates: 16-17 Disember 2015
Page Range: pp. 443-452
Keywords: Malay language, natural language, part-of-speech tagging
Date: December 2015
URI: https://ir.uitm.edu.my/id/eprint/35721
Edit Item
Edit Item

Download

[thumbnail of 35721.pdf] Text
35721.pdf

Download (397kB)

ID Number

35721

Indexing

Statistic

Statistic details