Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]

Abdul Rahim, Amirah Hazwani and Abdul Rashid, Nurazlina and Ahmad, Abd-Razak and Shamsuddin, Norin Rahayu (2021) Investigating the effect of different sampling methods on imbalanced datasets using bankruptcy prediction model / Amirah Hazwani Abdul Rahim ... [et al.]. In: e-Proceedings of the 5th International Conference on Computing, Mathematics and Statistics (iCMS 2021), 4-5 August 2021. (Submitted)

Abstract

Most classifiers of bankruptcy studies encounter less difficulty when dealing with a balanced non-bankrupt and bankrupt data set. The classifiers evaluate performance of the model through the accuracy rate. However, accuracy rate is not an appropriate measurement when dealing with imbalanced distribution of the data set. Sensitivity and precision were used instead to measure the performance of the classifier. This study employed three sampling strategies to deal with imbalanced datasets: oversampling, undersampling, and SMOTE (Synthetic Minority Oversampling Technique). The intent of this research is to examine how different sampling methods impact the performance of a bankruptcy prediction model utilising highly imbalanced real data. SMEs in the storage and transportation business were the subject of the research. The sample size is 9190 firms with 0.084% bankrupt firms and 99.16% non-bankrupt firms. As a classifier, Partial Least Square-Discriminant Analysis (PLS-DA) was selected. The findings suggest that employing Partial Least Square-Discriminant Analysis, SMOTE increases the classification probability for an imbalanced dataset. In the meantime, neither oversampling nor undersampling improved the results of the Partial Least Square-Discriminant Analysis.

Metadata

Item Type: Conference or Workshop Item (Paper)
Creators:
Creators
Email / ID Num.
Abdul Rahim, Amirah Hazwani
amirah017@uitm.edu.my
Abdul Rashid, Nurazlina
azlina150@uitm.edu.my
Ahmad, Abd-Razak
ara@uitm.edu.my
Shamsuddin, Norin Rahayu
norinrahayu@uitm.edu.my
Subjects: H Social Sciences > HG Finance
H Social Sciences > HG Finance > Financial engineering
Divisions: Universiti Teknologi MARA, Kedah > Sg Petani Campus
Event Title: e-Proceedings of the 5th International Conference on Computing, Mathematics and Statistics (iCMS 2021)
Event Dates: 4-5 August 2021
Page Range: pp. 271-277
Keywords: Partial Least Square-Discriminant Analysis, SMOTE, oversampling, undersampling, imbalanced data
Date: 2021
URI: https://ir.uitm.edu.my/id/eprint/56212
Edit Item
Edit Item

Download

[thumbnail of 56212.pdf] Text
56212.pdf

Download (1MB)

ID Number

56212

Indexing

Statistic

Statistic details