Hybridising descriptive-semantic techniques for sentiment analysis of Malaysia’s islands via youtube review.

Bidin, Nurain Batrisyia (2026) Hybridising descriptive-semantic techniques for sentiment analysis of Malaysia’s islands via youtube review. Masters thesis, Universiti Teknologi MARA (UiTM).

Abstract

Malaysia’s island tourism continues to grow, as seen by the extensive reviews of destinations such as Langkawi, Perhentian, and Pangkor on platforms like YouTube. YouTube was selected as the primary data source for this study due to its extensive repository of user-generated video reviews and comments, offering valuable insights into tourist perspectives and satisfaction for sentiment analysis (SA). However, most existing SA research in the tourism sector implements lexicon-based or machine learning (ML) techniques, which frequently overlook the variety of contextual meanings encountered in text reviews, such as sarcasm, local idioms, or mixed-language expressions. This research addressed three main challenges in tourism-related SA. These include the linguistic complexity of online reviews, limited capacity to detect cultural and contextual subtleties, and the lack of labelled data for supervised learning. In order to bridge this gap, this research developed a hybrid SA approach that integrates Descriptive-Semantic Analysis (DSA) techniques with both lexicon-based and ML approaches involving the Valence Aware Dictionary and Sentiment Reasoner and Support Vector Machine. For DSA implementation, significant words were identified using the Term Frequency-Inverse Document Frequency, while dominating topics and thematic insights were extracted using Latent Dirichlet Allocation. The island reviews were extracted using the YouTube Data API, followed by pre-processing steps such as translation, cleaning and normalisation. Findings indicate that the hybrid SA approach achieved accuracy rates of 98.15% for Langkawi, 98.5% for Perhentian, and 97.19% for Pangkor. Moreover, the model validation using the Area Under the Curve metric showed strong performance. Perhentian had a validation accuracy of 100% and a testing accuracy of 98.84%. Langkawi followed with 99.36% validation and 98.72% testing, while Pangkor recorded 95.77% validation and 97.22% testing accuracy. Overall, this approach successfully bridged the gap between descriptive and semantic insights, overcoming the limitations of standalone SA techniques, such as struggles to accurately interpret user content and relying on basic techniques that overlook deeper meanings. Future research anticipates broadening the data sources by incorporating platforms such as TripAdvisor, Google Reviews, Facebook, and X (formerly Twitter) to enhance accessibility and diversify sentiment representation. Additionally, subsequent studies could explore deep learning models such as Bidirectional Encoder Representations from Transformers and Long Short-Term Memory to further improve sentiment classification by capturing deeper contextual relationships.

Metadata

Item Type: Thesis (Masters)
Creators:
Creators
Email / ID Num.
Bidin, Nurain Batrisyia
2023432318
Contributors:
Contribution
Name
Email / ID Num.
Advisor
Abu Samah, Khyrina Airin Fariza
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Instruments and machines
Q Science > QA Mathematics > Instruments and machines > Electronic Computers. Computer Science > Data mining
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Master of Science (Computer Science)
Keywords: Tourism, YouTube, Platform
Date: February 2026
URI: https://ir.uitm.edu.my/id/eprint/135666
Edit Item
Edit Item

Download

[thumbnail of 135666_fulltext.pdf] Text
135666_fulltext.pdf
Available under License Dasar Harta Intelek UiTM (Para 6).

Download (1MB)
[thumbnail of declarationform.pdf] Text
declarationform.pdf
Restricted to Repository staff only

Download (321kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:

ID Number

135666

Indexing

Statistic

Statistic details