Abstract
Credit risk assessment is the procedure by the investors or lenders to predict the chances of loan default to measuring the risk. A wrong decision places the institution at risk. Corporation Credit Risk Assessment (CCRA) depends on the financial indicators representing the companies' status at a given time. Nowadays, machine learning is a significant field used in various applications, including the financial domain. The global trend in the CCRA study shows that implementing machine learning and deep learning techniques is expanding rapidly. These techniques have demonstrated their superiority over traditional approaches in many CCRA studies. Machine learning model selection is an iterative process of exploring, evaluating, and improving algorithms. Selecting an optimal model for a particular domain is rigid, challenging, and complicated. No free lunch theorem implies that no particular algorithm or combination of features will always produce considerably superior outcomes to others. Hence, the question arises about selecting the optimal model: the characteristic data for CCRA and the best practice machine learning pipelines. The characteristic data for CCRA includes the features used and data dimension. This study used thirteen features, including ten financial ratios, two macroeconomic variables, and the company's age. The features are selected based on the extensive literature on CCRA studies worldwide. This study also investigates the significance of data dimension in CCRA: single or multi-dimensional, and the correlation of the features. For the best practice machine learning pipelines, various machine learning models are used to discover the best model for CCRA study. This study has proposed an automated model selection based on the exhaustive search algorithm—that caused the timeout and memory leak issues. The proposed automated model selection has solved the timeout and memory leak issue by automatically writing all results in CSV files to reduce memory consumption. The samples of the study are the PN17 status companies. Through the automated model selection, 176 models are created across the experiment settings. The models are based on the four machine learning algorithms: logistic regression, support vector machine, decision tree, and neural network; two ensemble techniques: adaptive boost and bootstrap aggregation; three deep learning algorithms: recurrent neural network, long short-term memory(LSTM), and gated recurrent unit (GRU). Besides that, this study proposed two hybrid LSTM-GRU based models. The hybrid models were LSTM-GRU Double Stack(LGDS) and LSTM-GRU Alternate Double Stack (LGADS). As a result, the proposed automated model selection has found that the LGADS model on multi-dimensional data of FR-only features and without a features correlation setup has outperformed the other models with the highest accuracy and Fl score. The LGADS model achieved 84.2% for both measurements. This study contributed to the body of knowledge by proposing an automated machine learning model selection for the CCRA study. This study might be expanded with extensive scope. The scope can be extended by adding more financial ratios since the FR features are significant for the CCRA study and adding more samples to produce better results
Metadata
Item Type: | Thesis (PhD) |
---|---|
Creators: | Creators Email / ID Num. Halim, Zulkifli UNSPECIFIED |
Contributors: | Contribution Name Email / ID Num. Thesis advisor Mohamed Shuhidan,, Shuhaida (Dr.) UNSPECIFIED |
Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
Programme: | Philosophy of Doctorate (Computer Science) |
Keywords: | Logistic, machine, network |
Date: | 2023 |
URI: | https://ir.uitm.edu.my/id/eprint/88795 |
Download
88795.pdf
Download (17kB)