The study of stemming algorithm on Malay words that begin with alphabets P, Q, Y, and Z from the translated Al-Quran / Suriani Mat

Mat, Suriani (2001) The study of stemming algorithm on Malay words that begin with alphabets P, Q, Y, and Z from the translated Al-Quran / Suriani Mat. Degree thesis, Universiti Teknologi MARA (UiTM).

Abstract

This thesis concerns a Malay language documents retrieval system. Stemming algorithm, Malay Quran translated documents and root dictionaries are used in order to complete this study. The performance of a Malay stemming algorithm is tested based on words beginning with letter 'p', 'q', 'y' and 'z', using 5 experiments. First experiment uses the original set of data collections. In second experiment, new words are added in the dictionary and the total value for i ' , 'm', 'p', 'q', 'y' and 'z' are modified in the header file "dcvarnew.h". Other than that, affixes rule format in file "rule.txt" are added and misspell words are corrected. Third, the locations of rules in file "rule.txt" are changed. For fourth experiment, words that have more than one root, old spelling words and spoken word are deleted from the dictionary. After the modification, the total value for 'k', 'm\ 'n' and 'p' in header file "dcvarnew.h" are corrected again. Otherwise, new code is added into module 'ubahejaan'. In fifth experiment, the spoken word is deleted from the dictionary and the total value for 'p' in file "dcvarnew.h" is corrected. Then alternative rule to solve the words pengawal, pengawalan and perangan is carried out. The objective of this project is achieved when the best order of the rules to use to stem the words that beginning with p', 'q', 'y' and 'z' is met. This involves the use of two combinations simultaneously such as the pair combination of 1234 as primary combinations and 3124 as the secondary. First, all the words used the 1234 combination, and if the program encountered that the words cannot be solved correctly, combination will be shifted to the secondary combination that is 3124 combination. These experiments can serves as a benchmark for future research in Malay language.

Metadata

Item Type: Thesis (Degree)
Creators:
Creators
Email / ID Num.
Mat, Suriani
UNSPECIFIED
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Abu Bakar, Zainab
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Analysis
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Bachelor of Science
Keywords: Malay language documents retrieval system, affixes rule format, secondary combination
Date: 2001
URI: https://ir.uitm.edu.my/id/eprint/98195
Edit Item
Edit Item

Download

[thumbnail of 98195.pdf] Text
98195.pdf

Download (148kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
Processing

ID Number

98195

Indexing

Statistic

Statistic details