To improve stemming algorithm on Malay words begin with alphabet B / Norasiah Ismail

Ismail, Norasiah (2001) To improve stemming algorithm on Malay words begin with alphabet B / Norasiah Ismail. Degree thesis, Universiti Teknologi MARA (UiTM).

Abstract

This thesis concerns a Malay language document retrieval system. Stemming algorithm, Malay Quran translated documents and root dictionaries are used in order to complete this study. The performance on words beginning with letter 'b' of Malay stemming algorithm are tested using 5 experiments. First experiment is use the original set of data collections. In second experiment, affixes rule are added in rule format in file "rule.txt". Third experiments are modifying the total value for V dictionary in header file "dcvarnew.h". For fourth experiment, a new word is adding in the dictionary and modifies Malay Quran translated. In fifth experiment, the total value for 'a' dictionary in header file "dcvarnew.h" is modifying. The main objective of these experiments is to minimize the unstemming, understemming, overstemming, spelling exception and other problems that occurred when 'b' words are stemmed. The objective is achieved when the best order of the rules to use to stem the words that beginning with 'b' is met. This involves the use of two combinations simultaneously such as the pair combination of prefix-suffix-prefix suffix-infix as primary combinations and prefix suffix-suffix-prefix-infix as the secondary. First, all the words used the prefix-suffix-prefix suffix-infix combination, and if the program encountered that the words can not be solved correctly, combination will be shifted to the secondary combination that is prefix suffix-suffix-prefix-infix combination. These experiments can serves as a benchmark for future research in Malay language in finding the best approach to stem words that begin with other rest of alphabets.

Metadata

Item Type: Thesis (Degree)
Creators:
Creators
Email / ID Num.
Ismail, Norasiah
UNSPECIFIED
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Abu Bakar, Zainab
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Analysis
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Bachelor of Science
Keywords: Malay Quran translated documents, root dictionaries, stemming algorithm
Date: 2001
URI: https://ir.uitm.edu.my/id/eprint/98076
Edit Item
Edit Item

Download

[thumbnail of 98076.pdf] Text
98076.pdf

Download (154kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
Processing

ID Number

98076

Indexing

Statistic

Statistic details