To study the performance of stemming algorithm on Malay words beginning with the letter "S" / Rohana Jantan

Jantan, Rohana (2000) To study the performance of stemming algorithm on Malay words beginning with the letter "S" / Rohana Jantan. Degree thesis, Universiti Teknologi MARA (UiTM).

Abstract

This thesis concerns the study of Malay stemming algorithm for the word beginning with the letter "S". This algorithm is used in the Malay language document that is used is the Quran translated document. A Malay stemming algorithm known as RulesApplication-Order (RAO) is applied in the experiment. In the experiments dictionaries of Malay root words and combination of morphological rules also used. The performance of the Malay stemming algorithm is evaluated by applying to the "S" word by removing different combination of prefixes. The "S" words or the resulted stemmed words are checked for their existences in the dictionaries. If these words do exist, the following stemming processes stop. These words are then analyzed. In the analysis, the percentage of each combination is compared to find the best prefixes combination. The result shows that there is still problem of overstemming, understemming and unstemming of word. For a total of unique 411 "S" words there are 0.73% overstemming, 0.73% understemming and 2.68% unstemmed words. Therefore, the algorithm must be modified in order to increase the performance of the stemming algorithm for Malay words.

Metadata

Item Type: Thesis (Degree)
Creators:
Creators
Email / ID Num.
Jantan, Rohana
UNSPECIFIED
Subjects: L Education > LA History of education > Malaysia
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Information Management
Programme: Bachelor of Science
Keywords: Malay, algorithm, analysis
Date: 2000
URI: https://ir.uitm.edu.my/id/eprint/98222
Edit Item
Edit Item

Download

[thumbnail of 98222.PDF] Text
98222.PDF

Download (107kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
Processing

ID Number

98222

Indexing

Statistic

Statistic details