Incorporating stemming algorithm in the Malay information retrieval that employs Thesaurus aproach / Mohd Rosmadi Mokhtar

Mokhtar, Mohd Rosmadi (2001) Incorporating stemming algorithm in the Malay information retrieval that employs Thesaurus aproach / Mohd Rosmadi Mokhtar. Degree thesis, Universiti Teknologi MARA (UiTM).

Abstract

This project incorporates the ROA stemming algorithm with thesaurus approach by Rapizal. It is an opportunity to find out whether combining stemming with thesaurus will improve retrieval effectiveness and efficiency. Advance in information technology has made it possible for a wide range of text-based information to be search and retrieved online, locally or from remote hosts. A wide range of text-based information therefore can be searched and retrieved from online connection anywhere in the world. This type of popularity is due to advancement in technology that is rapidly growing from day to day. There are many Malay word variants that have the same meaning available from Malay words itself. In order to overcome these words variants problems, the development of computational technique that could transform both user's search and database words into a single canonical form is introduces. It is known as conflation methods. One of well-known conflation methods is stemming algorithms, where it is used to identify morphological variants. Stemming algorithms are language dependent. They have proven to be successful to reduce words with the same stem to a common form and are evidenced by the work many researchers. Unfortunately, conflation method is unable to conflate different words that possess the same meaning. These words can only be conflated by a thesaurus that can handle hierarchic, synonymic, and also morphological relationship. To create a thesaurus for a given subject an extensive manual and highly skilled, therefore to solve this problem, another language dependent conflation method, thesaurus is used. Its can build all types of relationship that exist between words. The information retrieval thesaurus typically contains a list of terms, where a term is either a single word or phrase. The relationships between them are also included to assist in coordinating indexing and retrieval. So from this project study it is found that the incorporations of stemming algorithm and thesaurus successfully increase the retrieved and relevant documents using Malay query words but on the other hand reduces its efficiency.

Metadata

Item Type: Thesis (Degree)
Creators:
Creators
Email / ID Num.
Mokhtar, Mohd Rosmadi
99329039
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Abu Bakar, Zainab
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Analysis
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Bachelor of Science
Keywords: Stemming algorithm, conflation methods, coordinating indexing and retrieval
Date: 2001
URI: https://ir.uitm.edu.my/id/eprint/98015
Edit Item
Edit Item

Download

[thumbnail of 98015.pdf] Text
98015.pdf

Download (145kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
Processing

ID Number

98015

Indexing

Statistic

Statistic details