To enhance existing Malay stemming algorithm starting with the letter 'D' / Mohd Nazril Hafez Mohd Supandi

Mohd Supandi, Mohd Nazril Hafez (2000) To enhance existing Malay stemming algorithm starting with the letter 'D' / Mohd Nazril Hafez Mohd Supandi. Degree thesis, Universiti Teknologi MARA (UiTM).

Abstract

This thesis concerns a Malay language documents retrieval system. Stemming algorithm, database Quran translated documents and electronic root dictionaries are used in order to complete this study. The performance of a Malay stemming algorithm is tested based on words that beginning with 'd', using 4 experiments. First, use the original set of data collections. Second, adding a new words in the dictionary. Other than that we modify the total value for 'a', 'k' and 'm'dictionary in header file "dcvarnew.h". Third, the modification into the program is adding the affixes rule format in "rule.txt" Forth, add a new code to differentiate the use of affix rule of "di+an" and "di+kan". The main objective is to minimize the unstemming, understemming, overstemming, spelling exception and other problems that occurred when 'd' word stemmed. It is achieved the objective when the best order of rule to used to stem the words that beginning with 'd' is met. In which it involves the use of two combinations simultaneously such as the pair combination of 1234 as primary combination and 2341 as the secondary. First, all the words will used the 1234 combination, and if the program encountered that the words can not be solved correctly, the combination will be shifted to the secondary combination that is 2341. These experiments can serves as a benchmark for future research in Malay language. Furthermore, it can help those who are interested to know about certain subject matters from the Al-Quran where the document retrieval system will automatically retrieve all relevant documents in response to the users' queries.

Metadata

Item Type: Thesis (Degree)
Creators:
Creators
Email / ID Num.
Mohd Supandi, Mohd Nazril Hafez
98422021
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Abu Bakar, Zainab
UNSPECIFIED
Subjects: Q Science > QA Mathematics
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Bachelor of Science
Keywords: Stemming algorithm, data collection, procedure employed
Date: 2000
URI: https://ir.uitm.edu.my/id/eprint/98014
Edit Item
Edit Item

Download

[thumbnail of 98014.pdf] Text
98014.pdf

Download (125kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
Processing

ID Number

98014

Indexing

Statistic

Statistic details