Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan

Azizan, Azilawati (2022) Evaluation of query reformulation strategies for domain-specific information searches: a case study of the durian fruit domain / Azilawati Azizan. PhD thesis, Universiti Teknologi MARA (UiTM).

Abstract

Although search engine technologies have made great strides in helping users find information on the Web, search results are only as good as the keywords and phrases that users use in the search query. Hence, search queries need to precisely formulated. However, users often fail to accurately translate their information needs into correct query words or phrases for a search engine to utilize. This becomes harder when users search for domain-specific information as, in most cases, users are unable to identify the keywords that are appropriate for the domain in the search query. As such, the search engine is unable to locate the relevant documents. This causes users to reformulate the query multiple times in the hopes of retrieving a more relevant set of search results. To address this issue, many researchers propose the use of query reformulation, query refinement, query expansion, or query disambiguation to intentionally build better queries and retrieve more relevant results. However, most of strategies employed to tackle this issue; such as the query log, rhetorical structure, thesaurus, WordNet, ontology, and user profiles; require extensive sources, risky and are time consuming. Therefore, more effective and simpler techniques are needed to obtain better search results as well reduce the need of query reformulation (QR). To that end, this study applied a search engine framework which employs standard methodology in Information Retrieval (IR) to evaluate several reformulation strategies and proposes an operative and effective QR strategy to locate domain-specific information. The fruit domain; specifically, durian; was chosen as the case study. An investigation was first conducted to prove that the issues present at the time of the study as well as the selected domain were still pertinent. Several popular commercial search engines were examined to determine their current search performance in locating domain-specific information on the Web. A group of users was then selected to conduct a task-based search to examine how users structured their queries to obtain the search intent. The results indicated that the most popular search engine (Google) only had an average of P@10 score of 0.463 and mean average precision (MAP) score of 0.649 when searching for durian-related information. The results of the task-based search showed that 84.82% of users reformulate their queries, clearly indicating that users do not obtain relevant search results on the first few tries. As such, several QR strategies that may produce better search results were investigated. Nine strategies were examined by using features, such as query keywords, ontology, the characteristic category of the domain, and the domain name. These features were manipulated using techniques, such as ‘generalization’, ‘specification’, and ‘new’. Of the nine strategies examined, three outperformed the baseline. Combining query keywords with ontology significantly surpassed the baseline MAP score by 2.65%. More interestingly, the characteristic category of the domain, which is considerably simpler and easier to use, also outperformed the baseline MAP score by 2.63%. The findings of this study contribute to the field of IR, through the performance of search engines, user behaviour, test collection and reformulation strategies in searching for domain specific informatio

Metadata

Item Type: Thesis (PhD)
Creators:
Creators
Email / ID Num.
Azizan, Azilawati
2012808144
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Abu Bakar, Zainab
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Instruments and machines > Electronic Computers. Computer Science > Programming. Rule-based programming. Backtrack programming
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Doctor of Philosophy (Information Technology and Quantitative Sciences)-CS990
Keywords: durian, information, domain
Date: 2022
URI: https://ir.uitm.edu.my/id/eprint/78545
Edit Item
Edit Item

Download

[thumbnail of 78545.pdf] Text
78545.pdf

Download (406kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:
Processing

ID Number

78545

Indexing

Statistic

Statistic details