Abstract
Malay compound word is defined as a form of words that exists when two or more words are combined into a single syntax, and it gives a specific meaning. Thus, this extraction of compound words is significant for the following research, which is text summarization, grammar checker, sentiments analysis and machine translation. The aim of this study is to propose a new extraction technique using linguistic approaches that combines many features and rules. There are many research efforts that have been proposed in extracting compound word using linguistic approaches. However, the result for this approach still produces some problems in giving a better result. Overall, this study has three objectives; to identify new rules in detecting the Malay compound word, to construct an improved compound word extraction technique (algorithm) that combines many rules for Malay sentences using linguistic approaches, and lastly to evaluate the accuracy of proposed technique from using the standard evaluation of Recall, Precious and F-Measure. To achieve the objective, this research explores a linguistic method for extracting compound word from standard Malay corpus. A Malay news dataset was used to extract compound word in this research. Therefore, an improvement for the effectiveness of the compound word extraction is needed because the result can be compromised. Thus, this study proposed a modification of linguistic approach to enhance the extraction of compound word processing. Several preprocessing steps were involved which include normalization, tokenization, stemming and tagging. Finally, this study described several rules-based and modified the rules to get the most relevant relation between the first word and the second word in order to assist this study in solving the problems.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Creators: | Creators Email / ID Num. Abu Bakar, Zamri 2014858228 |
Contributors: | Contribution Name Email / ID Num. Thesis advisor Ismail, Normaly Kamal UNSPECIFIED |
Subjects: | P Language and Literature > PL Languages and literatures of Eastern Asia, Africa, Oceania |
Divisions: | Universiti Teknologi MARA, Shah Alam > College of Computing, Informatics and Mathematics |
Programme: | Doctor of Philosophy (Computer Science) |
Keywords: | Malay language, linguistic, compound word |
Date: | 2023 |
URI: | https://ir.uitm.edu.my/id/eprint/88705 |
Download
88705.pdf
Download (195kB)