Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar

Abu Bakar, Zamri (2023) Enhancement of compound word extraction in Malay sentences using modified linguistics approaches / Zamri Abu Bakar. PhD thesis, Universiti Teknologi MARA (UiTM).

Abstract

Malay compound word is defined as a form of words that exists when two or more words are combined into a single syntax, and it gives a specific meaning. Thus, this extraction of compound words is significant for the following research, which is text summarization, grammar checker, sentiments analysis and machine translation. The aim of this study is to propose a new extraction technique using linguistic approaches that combines many features and rules. There are many research efforts that have been proposed in extracting compound word using linguistic approaches. However, the result for this approach still produces some problems in giving a better result. Overall, this study has three objectives; to identify new rules in detecting the Malay compound word, to construct an improved compound word extraction technique (algorithm) that combines many rules for Malay sentences using linguistic approaches, and lastly to evaluate the accuracy of proposed technique from using the standard evaluation of Recall, Precious and F-Measure. To achieve the objective, this research explores a linguistic method for extracting compound word from standard Malay corpus. A Malay news dataset was used to extract compound word in this research. Therefore, an improvement for the effectiveness of the compound word extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach to enhance the extraction of compound word processing. Several preprocessing steps were involved which include normalization, tokenization, stemming and tagging. Finally, this study described several rules-based and modified the rules to get the most relevant relation between the first word and the second word in order to assist this study in solving the problems.

Metadata

Item Type: Thesis (PhD)
Creators:
Creators
Email / ID Num.
Abu Bakar, Zamri
2014858228
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Ismail, Normaly Kamal
UNSPECIFIED
Subjects: P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania
Divisions: Universiti Teknologi MARA, Shah Alam > College of Computing, Informatics and Mathematics
Programme: Doctor of Philosophy (Computer Science)
Keywords: Malay language, linguistic, compound word
Date: 2023
URI: https://ir.uitm.edu.my/id/eprint/88705
Edit Item
Edit Item

Download

[thumbnail of 88705.pdf] Text
88705.pdf

Download (195kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:

ID Number

88705

Indexing

Statistic

Statistic details