Early-stage lung cancer diagnosis using new regression features and machine learning

Jafery, Nurul Najiha (2025) Early-stage lung cancer diagnosis using new regression features and machine learning. PhD thesis, Universiti Teknologi MARA (UiTM).

Abstract

Lung cancer is the most common cancer worldwide and one of the leading causes of cancer-related deaths due to late detection. Radiologists typically diagnose lung cancer through the visual analysis of computed tomography (CT) scan images, a process that is tedious, time-consuming, and prone to errors. Additionally, variations in CT scan image intensity and the potential for misinterpretation of anatomical structures make it challenging to accurately identify cancerous cells. The TNM (Tumour, Node, Metastases) staging system is commonly used by doctors and radiologists to classify lung cancer progression. Early detection of lung cancer, particularly in the T1 and T2 stages, significantly improves survival rates, highlighting the importance of timely and accurate diagnosis. This study aims to develop an automated early-stage lung cancer diagnosis system using a new regression feature extraction method and machine learning techniques. The system is designed to assist radiologists and medical experts in diagnosing lung cancer and making treatment decisions. The methodology is divided into five stages: image acquisition, pre-processing, lung lesion detection, early-stage lung cancer diagnosis, and performance evaluation. The lung CT scan images used in this study were obtained from the Advanced Medical and Dental Institute (AMDI), Universiti Sains Malaysia (USM). In the pre-processing stage, a new segmentation method using geometrical features was proposed to segment lung lesion and non-lesion regions. For lung lesion detection, a new Regression Features (RFE) was introduced, generating four feature sets: RFE_1, RFE_2, RFE_3, and RFE_4. The best-performing set, RFE_2, was then fed into two proposed hybrid deep neural networks: Hybrid 1DCNN-LSTM and VGG16-1D-LSTM, to classify lung lesion and non-lesion regions. Both models achieved an accuracy of 96%, with the Hybrid 1D-CNN-LSTM outperforming VGG16-1D-LSTM in AUC (0.91 vs. 0.81). Identified lung lesions were further analysed in the early-stage lung cancer diagnosis stage using machine learning classifiers, including Support Vector Machine (SVM), Gradient Boosting, AdaBoost, and Random Forest. Among these, Random Forest demonstrated the highest capability for automatically diagnosing early-stage lung cancer, achieving a cross-validated accuracy of 97.14% and an AUC of 0.9884. In the performance evaluation stage, the results were correlated with patient radiology reports to assess clinical relevance. The findings suggest that the proposed system has the potential to serve as an effective decision-support tool for radiologists in diagnosing early-stage lung cancer, ultimately improving early detection, patient outcomes, and clinical workflow efficiency.

Metadata

Item Type:	Thesis (PhD)
Creators:	Creators Email / ID Num. Jafery, Nurul Najiha UNSPECIFIED
Contributors:	Contribution Name Email / ID Num. Thesis advisor Sulaiman, Siti Noraini UNSPECIFIED
Subjects:	T Technology > T Technology (General) T Technology > TK Electrical engineering. Electronics. Nuclear engineering
Divisions:	Universiti Teknologi MARA, Shah Alam > Faculty of Electrical Engineering
Programme:	Doctor of Philosophy (Electrical Engineering)
Keywords:	Lung cancer, Computed tomography (CT) scan images, TNM (Tumour, Node, Metastases).
Date:	2025
URI:	https://ir.uitm.edu.my/id/eprint/125083