Study of stemming algorithm for Malay words which begin with alphabets 'M' / Mohd Zawawi Mohd Yunus

Mohd Yunus, Mohd Zawawi (2000) Study of stemming algorithm for Malay words which begin with alphabets 'M' / Mohd Zawawi Mohd Yunus. Degree thesis, Universiti Teknologi MARA (UiTM).

Abstract

This research concerns a study of stemming algorithm for Malay words begin with alphabet 'M'. This research involves a Malay stemming approach called Rules-Application-Order (RAO). The performance of this Malay stemming algorithm is tested using the test collection of 1066 words that starts with the letter 'M' that have been extracted from 6236 Malay Quran documents. It also used 24 different combinations of Malay affixes that consist of prefix, prefix-suffix, suffix and infix. The results are obtained from the experiments that use the four rules and it combination. The type of errors found in the stemming algorithm is overstemmed, understemmed, spelling exception and unstemmed. These stemming algorithm problems will be solved by doing five experiments such as analysis the existing algorithm, do correction in the file, adding rules, correct the stemming algorithm and use two combination rules. The results of the experiments will show that the algorithm has successfully stemmed all Malay words begin with alphabet 'M' that extracted from Quran documents.

Metadata

Item Type: Thesis (Degree)
Creators:
Creators
Email / ID Num.
Mohd Yunus, Mohd Zawawi
98401760
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Abu Bakar, Zainab
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Analysis
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Bachelor of Science
Keywords: Stemming algorithm, Rules-Application-Order (RAO), Malay Quran documents
Date: 2000
URI: https://ir.uitm.edu.my/id/eprint/98081
Edit Item
Edit Item

Download

[thumbnail of 98081.pdf] Text
98081.pdf

Download (123kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
Processing

ID Number

98081

Indexing

Statistic

Statistic details