Abstract
The research applies the process of document segmentation in which document is separated into many parts. The term segmentation is usually used in which the document retrieval is significant. It is important since the content of documents appear as one big part. Later in the retrieval development, the segmentation would beused for the indexing part. The letter document has their own format, which consists of many parts. The prototype has been developed to allow the segmentation and the existence of content-based to the letter document. The documents are divided into smaller, recognized labels that are intensive and flexible for managing, editing, and extracting. The target of this thesis is to apply the standard of official letter for the system, as well as to develop the algorithm which will segment the letter documents, and convert to XML documents. The software used for this prototype is Visual Basic6.0. More over, the information retrieval makes the retrieval of document or collection of data in the storage media more efficient, effective, relevant, faster and more reliable than before. Such indexing techniques may influence the effectiveness of retrieval itself. The extension component within the indexing structure may also influence the performance of the retrieval process. This research is to develop a prototype for indexing algorithm considering tag weighting for the XML document and also to test the indexer with the existing document. In order to perform efficient retrieval on documents, appropriate index structure or algorithm must be used which include the structural information. The inverted file method has been used for the indexing techniques to develop the indexing algorithm of the FTMSK official letter. The relevancy of the document for the retrieval by using the algorithm has been successful achieved and it can prove that the prototype can increase the relevancy of document retrieval.
Metadata
Item Type: | Research Reports |
---|---|
Creators: | Creators Email / ID Num. Abd Rahman, Hayati UNSPECIFIED |
Contributors: | Contribution Name Email / ID Num. Advisor Ahmad, Adnan (PM. Dr.) UNSPECIFIED |
Subjects: | Z Bibliography. Library Science. Information Resources > ZA Information resources (General) > Research. Information retrieval. Information behavior. Information literacy |
Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
Programme: | Master of Science |
Keywords: | XML, Document, Letter |
Date: | February 2006 |
URI: | https://ir.uitm.edu.my/id/eprint/37522 |
Download
37522.PDF
Download (715kB)