The evaluation of content-oriented XML document retrieval: a case study of FTMSK official letter / Hayati Abdul Rahman

Abdul Rahman, Hayati (2006) The evaluation of content-oriented XML document retrieval: a case study of FTMSK official letter / Hayati Abdul Rahman. [Research Reports] (Unpublished)

Abstract

The research applies the process of document segmentation in which document is separated into many parts. The term segmentation is usually used in which the document retrieval is significant. It is important since the content of documents appear as one big part. Later in the retrieval development, the segmentation would be used for the indexing part. The letter document has their own format, which consists of many parts. The prototype has been developed to allow the segmentation and the existence of content-based to the letter document. The documents are divided into smaller, recognized labels that are intensive and flexible for managing, editing, and extracting. The target of this thesis is to apply the standard of official letter for the system, as well as to develop the algorithm which will segment the letter documents, and convert to XML documents. The software used for this prototype is Visual Basic 6.0. More over, the information retrieval makes the retrieval of document or collection of data in the storage media more efficient, effective, relevant, faster and more reliable than before. Such indexing techniques may influence the effectiveness of retrieval itself. The extension component within the indexing structure may also influence the performance of the retrieval process. This research is to develop a prototype for indexing algorithm considering tag weighting for the XML document and also to test the indexer with the existing document. In order to perform efficient retrieval on documents, appropriate index structure or algorithm must be used which include the structural information. The inverted file method has been used for the indexing techniques to develop the indexing algorithm of the FTMSK official letter. The relevancy of the document for the retrieval by using the algorithm has been successful achieved and it can prove that the prototype can increase the relevancy of document retrieval.

Metadata

Edit Item
Edit Item

Download

[thumbnail of 2843.pdf] Text
2843.pdf

Download (703kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:

ID Number

2843

Indexing

Statistic

Statistic details