Keyword indexing using inverted file on hansard documents / Rosnawati Abdul Kudus

Abdul Kudus, Rosnawati (2008) Keyword indexing using inverted file on hansard documents / Rosnawati Abdul Kudus. UNSPECIFIED. UNSPECIFIED. (Unpublished)

Abstract

Information retrieval is the first step in developing retrieval systems for text document in collections. Inverted file is the most popular and effective in searching and retrieving processes (Zobel and Moffat, 2006). This project explores the potential and limitation of prototype text search engines using inverted files on Malaysia Hansard Documents. Malaysia Hansard Document is an official verbatim report of proceedings and debates in parliament which is documented in Malay Language and maintained by House of Parliament. These document are categorizes into House of Commons and House of Lords. Currently, searching and retrieving information from hansard document are done manually. These process are tedious, very time consuming and inefficient. Text search engine prototype using inverted file can speed up the process of searching and retrieving information from hansard document. The objectives of this study are to develop a text search engine prototype for Malaysia Hansard Documents and to evaluate the prototype for seven speakers' speech text. Scopes of the research are to search and retrieve document up to two words and in Malay language. The methodologies in this study includes preliminary study about the models of text search engines and identify similar studies, analyze indexing techniques, defines data structure for inverted file which includes hash table, linked lists, vector, array and quick sort, collect and preprocessing hansard document, design and develop prototype using Java platform, conduct testing to evaluate the accuracy of the prototype tool and analyze findings. From the experiment that has been conducted, the accuracy of search keywords through the prototype and manual check is 100 percents.

Metadata

Item Type: Monograph (UNSPECIFIED)
Creators:
Creators
Email / ID Num.
Abdul Kudus, Rosnawati
UNSPECIFIED
Subjects: Q Science > Q Science (General)
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Bachelor Of Computer Science (HONS)
Keywords: Developing, retrieval system, prototype
Date: 2008
URI: https://ir.uitm.edu.my/id/eprint/98228
Edit Item
Edit Item

Download

[thumbnail of 98228.PDF] Text
98228.PDF

Download (182kB)

ID Number

98228

Indexing

Statistic

Statistic details