Abstract
A text classifier model optimized for short snippets like tweets is developed to make bilingual sentiment analysis possible. The two languages explored are Bahasa Malaysia and English, since they are the two most commonly spoken languages in Malaysia. The classifier model is trained and tested on a huge multi domain dataset pre-labelled with the labels “0” and “1”, which resemble “positive” and “negative” respectively. Naïve Bayes ML technique is used as the core of the classifier model. The data are all pre-processed, and once the development of the classifier model is done, the model is run on real-time data, which are public tweets directly or indirectly mentioned to the three biggest CSP in Malaysia, which are Celcom, Maxis and Digi in the year of 2018. The result of the analysis is incorporated into a web application built on Bootstrap on top of Python’s Flask allowing interactive data visualization. Agile methodology is used throughout the development of the application to ensure that this project is done according to the guideline prepared in the design phase. Functionality testing is also done to ensure that there is no significant error that will render the application useless. In conclusion, the findings gathered show that Naïve Bayes is fairly suitable to be used in NLP problems. The future work that can be put into this project is to improve the corpus to include different slangs of Bahasa Malaysia and commonly used short forms as well as adding an extra class to represent texts that do not belong to either “positive” or “negative”.
Metadata
Item Type: | Thesis (Degree) |
---|---|
Creators: | Creators Email / ID Num. Abdullah Sani, Aidil Amirul Safwan 2016782377 |
Contributors: | Contribution Name Email / ID Num. Thesis advisor Abu Samah, Khyrina Airin Fariza UNSPECIFIED |
Subjects: | H Social Sciences > HM Sociology > Groups and organizations > Social groups. Group dynamics H Social Sciences > HM Sociology > Groups and organizations > Social groups. Group dynamics > Social networks > Online social networks > Particular networks, A-Z > Twitter P Language and Literature > P Philology. Linguistics > Communication. Mass media |
Divisions: | Universiti Teknologi MARA, Melaka > Jasin Campus > Faculty of Computer and Mathematical Sciences |
Keywords: | Bilingual sentiment analysis; Twitter sentiment analysis; Communication service providers |
Date: | 2020 |
URI: | https://ir.uitm.edu.my/id/eprint/31488 |
Download
31488.pdf
Download (146kB)