Abstract
Malaysia’s island tourism continues to grow, as seen by the extensive reviews of destinations such as Langkawi, Perhentian, and Pangkor on platforms like YouTube. YouTube was selected as the primary data source for this study due to its extensive repository of user-generated video reviews and comments, offering valuable insights into tourist perspectives and satisfaction for sentiment analysis (SA). However, most existing SA research in the tourism sector implements lexicon-based or machine learning (ML) techniques, which frequently overlook the variety of contextual meanings encountered in text reviews, such as sarcasm, local idioms, or mixed-language expressions. This research addressed three main challenges in tourism-related SA. These include the linguistic complexity of online reviews, limited capacity to detect cultural and contextual subtleties, and the lack of labelled data for supervised learning. In order to bridge this gap, this research developed a hybrid SA approach that integrates Descriptive-Semantic Analysis (DSA) techniques with both lexicon-based and ML approaches involving the Valence Aware Dictionary and Sentiment Reasoner and Support Vector Machine. For DSA implementation, significant words were identified using the Term Frequency-Inverse Document Frequency, while dominating topics and thematic insights were extracted using Latent Dirichlet Allocation. The island reviews were extracted using the YouTube Data API, followed by pre-processing steps such as translation, cleaning and normalisation. Findings indicate that the hybrid SA approach achieved accuracy rates of 98.15% for Langkawi, 98.5% for Perhentian, and 97.19% for Pangkor. Moreover, the model validation using the Area Under the Curve metric showed strong performance. Perhentian had a validation accuracy of 100% and a testing accuracy of 98.84%. Langkawi followed with 99.36% validation and 98.72% testing, while Pangkor recorded 95.77% validation and 97.22% testing accuracy. Overall, this approach successfully bridged the gap between descriptive and semantic insights, overcoming the limitations of standalone SA techniques, such as struggles to accurately interpret user content and relying on basic techniques that overlook deeper meanings. Future research anticipates broadening the data sources by incorporating platforms such as TripAdvisor, Google Reviews, Facebook, and X (formerly Twitter) to enhance accessibility and diversify sentiment representation. Additionally, subsequent studies could explore deep learning models such as Bidirectional Encoder Representations from Transformers and Long Short-Term Memory to further improve sentiment classification by capturing deeper contextual relationships.
Metadata
| Item Type: | Thesis (Masters) |
|---|---|
| Creators: | Creators Email / ID Num. Bidin, Nurain Batrisyia 2023432318 |
| Contributors: | Contribution Name Email / ID Num. Advisor Abu Samah, Khyrina Airin Fariza UNSPECIFIED |
| Subjects: | Q Science > QA Mathematics > Instruments and machines Q Science > QA Mathematics > Instruments and machines > Electronic Computers. Computer Science > Data mining |
| Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
| Programme: | Master of Science (Computer Science) |
| Keywords: | Tourism, YouTube, Platform |
| Date: | February 2026 |
| URI: | https://ir.uitm.edu.my/id/eprint/135666 |
Download
135666_fulltext.pdf
Available under License Dasar Harta Intelek UiTM (Para 6).
Download (1MB)
declarationform.pdf
Restricted to Repository staff only
Download (321kB)
Digital Copy
Physical Copy
ID Number
135666
Indexing
