Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad

Enjop, Vanessa and Adnan, Rosanita and Jamil, Nursuriati and Zainol, Zarina and Ahmad, Siti Arpah (2022) Does Google translate affect lexicon-based sentiment analysis of Malay social media text? / Vanessa Enjop, Rosanita Adnan, Nursuriati Jamil, Sanizah Ahmad, Zarina Zainol and Siti Arpah Ahmad. Malaysian Journal of Computing (MJoC), 7 (2): 13. pp. 1236-1249. ISSN 2600-8238

Official URL: https://mjoc.uitm.edu.my

Abstract

There are a lot of sentiment resources for English, however, there are limited resources in a resource-poor language like the Malay language. One approach to improving sentiment analysis is to translate the focus-language text to a resource-rich language such as English by using Machine Translation (MT). However, when text is translated from one language into another, sentiment is preserved to varying degrees. The objective of this paper is to assess the performance of MT in Google Translate towards sentiment analysis of Malay social media text on Facebook pages of a caregiver of a person with autism. A total of 3,525 Facebook comments in the Malay language were gathered from May to October 2020. The comments were manually translated to English to create dataset_manual. Google Translate was used to automatically translate the Malay comments into English creating dataset_auto. The sentiment polarity of each comment was labeled as a ground truth dataset. A lexicon-based approach was used to extract sentiment from both dataset_manual and dataset_auto to determine the sentiment polarity. Results show that 65.9% of sentiment analysis using dataset_auto significantly reduces sentiment analysis. The sentiment expressions are often mistranslated into neutral expressions when translated. Meanwhile, sentiment analysis using dataset_manual was still able to capture the sentiment of Facebook comment without taking the comment out of context where 92.5% shows positive sentiment towards comments related to autism spectrum disorder.

Metadata

Item Type: Article
Creators:
Creators
Email / ID Num.
Enjop, Vanessa
UNSPECIFIED
Adnan, Rosanita
UNSPECIFIED
Jamil, Nursuriati
UNSPECIFIED
Zainol, Zarina
UNSPECIFIED
Ahmad, Siti Arpah
UNSPECIFIED
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Journal or Publication Title: Malaysian Journal of Computing (MJoC)
UiTM Journal Collections: UiTM Journal > Malaysian Journal of Computing (MJoC)
ISSN: 2600-8238
Volume: 7
Number: 2
Page Range: pp. 1236-1249
Date: October 2022
URI: https://ir.uitm.edu.my/id/eprint/69251
Edit Item
Edit Item

Download

[thumbnail of 69251.pdf] Text
69251.pdf

Download (572kB)

ID Number

69251

Indexing

Statistic

Statistic details