Abstract
Due to difficulty bring by the overloaded of digitized collection, Information Retrieval rapidly concerns in improving task such as discovering relevant documents. The thesis is performed to improve the issues produced by the lack of keyword-based search for document in indexing and queries, and the shortage of sources on topic discovery for Malay language research. Thus, this thesis uses a topic discovery algorithm, which is Latent Dirichlet Allocation, in indexing to construct a conceptual-based search and selects Malay Hansard document as a data-set that represent Malay language document. The objectives of this thesis are to identify highest frequency words on Malay Hansard document using Word Frequency method, to index the data-set based on word suggested by Latent Dirichlet Allocation method, and to develop a retrieval prototype for this document using conceptual-based search. In this research, the result of highest frequency word from Word Frequency method is indexed as the keyword and acts as a baseline that represents the keyword-based search. While, the result of word suggested by Latent Dirichlet Allocation is indexed as a group of related keywords and it represents the conceptual-based search. As the result, from the indexing of conceptual-based, the retrieval prototype system is able to identify keyword that also related to search query word.
Metadata
Item Type: | Thesis (Degree) |
---|---|
Creators: | Creators Email / ID Num. Mohd Fadzil Thani, Nurul Ain UNSPECIFIED |
Subjects: | B Philosophy. Psychology. Religion > B Philosophy (General) |
Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
Programme: | Bachelor Of Computer Science (HONS) |
Keywords: | Digitized, prototype, indexing |
Date: | 2012 |
URI: | https://ir.uitm.edu.my/id/eprint/98294 |
Download
98294.PDF
Download (157kB)