Context enrichment framework for sentiment analysis in handling word ambiguity resolution

Yusof, Nor Nadiah (2024) Context enrichment framework for sentiment analysis in handling word ambiguity resolution. PhD thesis, Universiti Teknologi MARA (UiTM).

Abstract

The rise of social networking platforms has created a surge in online opinionated data. Despite the challenges posed by this wealth of data, analysing and assessing this data presents an opportunity to gain valuable insights through sentiment analysis. However, ambiguous words pose a challenge for sentiment analysis algorithms as they require the identification of the correct meaning and word polarity within a specific context. Word sense disambiguation is a critical problem to be addressed in sentiment analysis due to the challenge of context-dependant nature, word ambiguity and negation present in sentiment texts. An enhancement of a generic sentiment analysis framework is introduced to address these issues, ensuring accurate texts classification in predicting the sentiment orientation. The proposed framework integrates enriched semantic information and a word ambiguity resolution strategy into sentiment analysis which is called context enrichment framework and aims to classify the sentiment texts by incorporating both context and semantic information. This is further elaborated in two novel components which are a word ambiguity resolution model (WAR) and a negation analysis model (NAM). The movie review dataset, considered as the baseline in sentiment analysis, is utilized, and the validation approach employs the SemCor 3.0 dataset. Word prior-polarity extraction is facilitated by the utilization of dictionary lexicons with predefined sentiment scores. The enhancement of this process utilises a hybrid lexicon, combining SentiWordNet and So-Cal, exploiting on the strengths of both resources. The WAR model identifies ambiguous words and generates context terms with a window size of three to resolve its ambiguity. The similarity between ambiguous words and their context words is evaluated using the cosine similarity approach. A rule-based method is introduced to select context words based on their similarity. The resolution of word ambiguity is addressed by introducing a formula that adjusts the polarity of ambiguous words. The evaluation of WAR’s performance includes a comparison with the baseline model and evaluating the accuracy through the summation approach, which is based on aligning word polarity values with the document's assigned class label. A NAM is implemented within the negation scope to accurately address negation. The model incorporates a proposed negation rule-based method that considers the threshold value condition of negative words in the documents. NAM is evaluated by comparing its performance with the baseline negation rule-based, through accuracy assessment using the summation approach. Eventually, the assessed models of WAR and NAM, along with the evaluated word polarity extraction from dictionary lexicons, are integrated into the proposed CEF. Machine learning algorithms are deployed to perform sentiment classification. CEF is evaluated through a series of experiments, showcasing its superior effectiveness over the baseline approach. It achieves an accuracy of 75%, outperforming the baseline approach's accuracy of 67.75%. Additionally, the validation of CEF demonstrated its superiority with an accuracy of 78.95%, compared to the baseline approach's accuracy of 63.16%. The results show CEF through the implementation of WAR and NAM effectively addresses word ambiguity issues and enhances the performance of sentiment analysis. Future work in sentiment analysis could involve the incorporation of diverse datasets, investigation of additional techniques for resolving word ambiguity, and exploration of alternative dictionary lexicons. For classification performances, optimization of machine learning parameters and exploration of deep learning approaches can be applied for further enhancement.

Metadata

Item Type:	Thesis (PhD)
Creators:	Creators Email / ID Num. Yusof, Nor Nadiah UNSPECIFIED
Contributors:	Contribution Name Email / ID Num. Thesis advisor Abdul Rahman, Shuzlina UNSPECIFIED
Subjects:	Q Science > Q Science (General) > General. Including nature conservation, geographical distribution Q Science > QA Mathematics > Computer literacy
Divisions:	Universiti Teknologi MARA, Shah Alam > College of Computing, Informatics and Mathematics
Programme:	Doctor of Philosophy (Computer Science)
Keywords:	Research methodology, Natural language processing, Machine learning approach
Date:	2024
URI:	https://ir.uitm.edu.my/id/eprint/122897