Imbalanced multi-class power transformer fault data classification through Edited Nearest Neighbour-Manhattan-Random Forest

R Azmira, Putri Azmira (2025) Imbalanced multi-class power transformer fault data classification through Edited Nearest Neighbour-Manhattan-Random Forest. Masters thesis, UiTM.

Abstract

This study highlights the global significance of the O&G industry, emphasizing the need for efficient operational management to ensure energy reliability, economic optimization, and environmental protection. Transformers are critical in power systems and require constant monitoring to maintain stability. Dissolved gas analysis uses gas chromatography to detect combustible gases generated during abnormal operations, playing a key role in transformer fault diagnosis. Artificial intelligence techniques, such as Support Vector Machines and Artificial Neural Networks, have been widely applied, enhancing diagnostic accuracy through predictive analytics. However, imbalanced datasets, particularly in dissolved gas analysis, severely affect classification performance by causing machine learning models to favour majority fault types while overlooking minority classes. This misclassification and data loss can lead to the failure to detect rare but critical transformer faults, jeopardizing system reliability and early fault mitigation. To address this challenge, the study focuses on improving Edited Nearest Neighbour techniques using alternative distance measures to enhance classification accuracy in imbalanced dissolved gas analysis datasets. The Edited
Nearest Neighbour technique, shown to be effective in other O&G subdomains, is evaluated using the Random Forest algorithm, which is widely used for its precision and ability to handle non-linear datasets. To validate the effectiveness of Edited Nearest Neighbour-Random Forest, it is compared to four data-level techniques including Random Under-Sampling, NearMiss, Random Oversampling, and Adaptive Synthetic Sampling. Furthermore, Random Forest is compared to four machine learning algorithms including Support Vector Machine, XGBoost, Convolutional Neural Networks, and Decision Trees. Edited Nearest Neighbour with Manhattan distance measure, which demonstrated over 85.00% accuracy in previous studies, is assessed alongside Minkowski and Mahalanobis distances to achieve the best model. After parameter tuning of Random Forest, the Edited Nearest Neighbour-Manhattan-Random Forest model outperformed, achieving 90.77% accuracy and reducing data loss from 70.00% to 17.50%. These findings show that Edited Nearest Neighbour-Manhattan-Random Forest effectively balances the dissolved gas analysis dataset while enhancing classification accuracy. Further research is required to explore the broader applicability of this technique in other domains with imbalanced multi-class datasets, as real-world datasets are often scattered and imbalanced.

Metadata

Item Type: Thesis (Masters)
Creators:
Creators
Email / ID Num.
R Azmira, Putri Azmira
UNSPECIFIED
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Yusoff, Marina
UNSPECIFIED
Thesis advisor
Mahmud, Yuzi
UNSPECIFIED
Subjects: H Social Sciences > HD Industries. Land use. Labor > Manufacturing industries
H Social Sciences > HD Industries. Land use. Labor > Manufacturing industries > Forest products. Lumber. Logging
Divisions: Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences
Programme: Master of Science (Computer Science)
Keywords: XGBoost Classifier, Support Vector Machine Classifier, Convolutional Neural Network Classifier
Date: 2025
URI: https://ir.uitm.edu.my/id/eprint/129199
Edit Item
Edit Item

Download

[thumbnail of 129199.pdf] Text
129199.pdf

Download (180kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:

ID Number

129199

Indexing

Statistic

Statistic details