Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit

Ezadin, Ayunie and Chumin, Nur Izzaty and Salit, Siti Nur Izzatulnisa (2021) Comparison between imputation method for handling missing data / Ayunie Ezadin, Nur Izzaty Chumin and Siti Nur Izzatulnisa Salit. [Student Project] (Unpublished)


This paper presents imputation method for the National Institute of Diabetes and Digestive and Kidney Diseases data from Arizona, United States. Missing data occurs in this data for five variables which are plasma glucose concentration, diastolic blood pressure, triceps skin fold thickness, serum insulin intake and body mass index (BMI). Missing data leads to problem that can cause bias and invalid conclusions to be made. This research objectives are to improve the data by filling the missing value and to compare which imputation method is better to handle missing value in a data set. In this research, imputation method and evaluation of the performance are applied for this data using Rstudio software. Five imputation methods used in this paper are Mean imputation method, K-Nearest Neighbour (KNN) imputation method, Multiple imputation method, Hot-Deck imputation method and Regression imputation method. The performance of these methods are evaluated using statistical analysis, coefficient of determination (R2), mean-squared eror (MSE), root of mean square error (RMSE), mean absolute error (MAE), index of agreement (d) and bias (B). Based on the result obtained from this research, it can be concluded that K-Nearest Neighbour imputation method is the best method among the five methods that are applied to handle the missing value. Conclusions are made as K-Nearest Neighbour (KNN) imputation method shows the best performance and has the lowest error value compared to other methods.


Item Type: Student Project
Email / ID Num.
Ezadin, Ayunie
Chumin, Nur Izzaty
Salit, Siti Nur Izzatulnisa
Email / ID Num.
Thesis advisor
Md Yasin, Zaitul Anna Melisa
Subjects: H Social Sciences > HA Statistics > Statistical data
Q Science > QA Mathematics > Study and teaching
Q Science > QA Mathematics > Mathematical statistics. Probabilities > Data processing
Q Science > QA Mathematics > Analysis
Divisions: Universiti Teknologi MARA, Negeri Sembilan > Seremban Campus > Faculty of Computer and Mathematical Sciences
Programme: Bachelor of Science (Hons.) Statistics
Keywords: imputation method, missing data, body mass index, BMI
Date: 2021
Edit Item
Edit Item


[thumbnail of 59272.pdf] Text

Download (118kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:

ID Number




Statistic details