Imbalanced multi-class power transformer fault data classification through Edited Nearest Neighbour-Manhattan-Random Forest

R Azmira, Putri Azmira (2025) Imbalanced multi-class power transformer fault data classification through Edited Nearest Neighbour-Manhattan-Random Forest. Masters thesis, UiTM.

Abstract

This study highlights the global significance of the O&G industry, emphasizing the need for efficient operational management to ensure energy reliability, economic optimization, and environmental protection. Transformers are critical in power systems and require constant monitoring to maintain stability. Dissolved gas analysis uses gas chromatography to detect combustible gases generated during abnormal operations, playing a key role in transformer fault diagnosis. Artificial intelligence techniques, such as Support Vector Machines and Artificial Neural Networks, have been widely applied, enhancing diagnostic accuracy through predictive analytics. However, imbalanced datasets, particularly in dissolved gas analysis, severely affect classification performance by causing machine learning models to favour majority fault types while overlooking minority classes. This misclassification and data loss can lead to the failure to detect rare but critical transformer faults, jeopardizing system reliability and early fault mitigation. To address this challenge, the study focuses on improving Edited Nearest Neighbour techniques using alternative distance measures to enhance classification accuracy in imbalanced dissolved gas analysis datasets. The Edited
Nearest Neighbour technique, shown to be effective in other O&G subdomains, is evaluated using the Random Forest algorithm, which is widely used for its precision and ability to handle non-linear datasets. To validate the effectiveness of Edited Nearest Neighbour-Random Forest, it is compared to four data-level techniques including Random Under-Sampling, NearMiss, Random Oversampling, and Adaptive Synthetic Sampling. Furthermore, Random Forest is compared to four machine learning algorithms including Support Vector Machine, XGBoost, Convolutional Neural Networks, and Decision Trees. Edited Nearest Neighbour with Manhattan distance measure, which demonstrated over 85.00% accuracy in previous studies, is assessed alongside Minkowski and Mahalanobis distances to achieve the best model. After parameter tuning of Random Forest, the Edited Nearest Neighbour-Manhattan-Random Forest model outperformed, achieving 90.77% accuracy and reducing data loss from 70.00% to 17.50%. These findings show that Edited Nearest Neighbour-Manhattan-Random Forest effectively balances the dissolved gas analysis dataset while enhancing classification accuracy. Further research is required to explore the broader applicability of this technique in other domains with imbalanced multi-class datasets, as real-world datasets are often scattered and imbalanced.

Metadata

Item Type:	Thesis (Masters)
Creators:	Creators Email / ID Num. R Azmira, Putri Azmira UNSPECIFIED
Contributors:	Contribution Name Email / ID Num. Thesis advisor Yusoff, Marina UNSPECIFIED Thesis advisor Mahmud, Yuzi UNSPECIFIED
Subjects:	H Social Sciences > HD Industries. Land use. Labor > Manufacturing industries H Social Sciences > HD Industries. Land use. Labor > Manufacturing industries > Forest products. Lumber. Logging
Divisions:	Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme:	Master of Science (Computer Science)
Keywords:	XGBoost Classifier, Support Vector Machine Classifier, Convolutional Neural Network Classifier
Date:	2025
URI:	https://ir.uitm.edu.my/id/eprint/129199