Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie

Che Mohd Safawi, Najwa Umaira and Shafie, Nur Amalina (2024) Performance of TF-IDF for text classification reviews on Google Play Store: Shopee / Najwa Umaira Che Mohd Safawi and Nur Amalina Shafie. Journal of Computing Research and Innovation (JCRINN), 9 (2): 2. pp. 13-22. ISSN 2600-8793

Abstract

TF-IDF is a technique used to extract features in the field of text classification. The TF-IDF approach extracts feature by considering the frequencies of terms and their inverse document frequencies. The performance of various feature extraction methods varies, and it is necessary to determine the most appropriate approach for accurately classifying Shopee's application user reviews to enhance the user experience in Malaysia. This study aims to assess the efficacy of TF-IDF in text classification tasks, analyze their advantages and disadvantages, and identify the specific conditions in TF-IDF. The study employs a dataset of Shopee customer reviews acquired from the Google Play Store as the main data source. The methodology entails pre-processing the text data by applying a text normalization procedure that includes several processes, such as eliminating stop words, Unicode characters, and lemmatizing. The Logistic Regression, Support Vector Machine, and Decision Tree classifiers are trained and graded using both feature extraction approaches. The research notes that the efficacy of feature extraction approaches may differ based on the specific data set and task being considered. Subsequent studies might examine alternative methods of extracting features and assess their efficacy across various domains and datasets.

Metadata

Item Type: Article
Creators:
Creators
Email / ID Num.
Che Mohd Safawi, Najwa Umaira
UNSPECIFIED
Shafie, Nur Amalina
amalina@uitm.edu.my
Subjects: Q Science > QA Mathematics > Mathematical statistics. Probabilities
Divisions: Universiti Teknologi MARA, Perlis > Arau Campus
Journal or Publication Title: Journal of Computing Research and Innovation (JCRINN)
UiTM Journal Collections: UiTM Journal > Journal of Computing Research and Innovation (JCRINN)
ISSN: 2600-8793
Volume: 9
Number: 2
Page Range: pp. 13-22
Keywords: TF-IDF, Text Classification, Shopee, Feature Extaction
Date: September 2024
URI: https://ir.uitm.edu.my/id/eprint/102603
Edit Item
Edit Item

Download

[thumbnail of 102603.pdf] Text
102603.pdf

Download (2MB)

ID Number

102603

Indexing

Statistic

Statistic details