Diamond price prediction using random forest algorithm / Nur Amirah Mohd Azmi

Mohd Azmi, Nur Amirah (2025) Diamond price prediction using random forest algorithm / Nur Amirah Mohd Azmi. Degree thesis, Universiti Teknologi MARA, Terengganu.

Abstract

This project addresses the challenge of accurately pricing diamonds by leveraging machine learning techniques. Diamonds are priced based on various attributes such as carat, cut, color, clarity, depth, and table, which exhibit complex interrelationships. Traditional methods struggle to model these complexities effectively, necessitating adoption of advanced algorithms to improve accuracy. The aim of this project is to develop a Diamond Price Prediction System using Random Forest, designed to accurately predict diamond prices based on attributes. Project seeks to compare performance of Random Forest with other regression models using key performance metrics. Some of the broad steps of methodology involve data preprocessing, by means of which handling of missing values, outliers, and inconsistencies for quality were developed. Development for a customized Random Forest-based model and a library-based one is performed. In both versions, feature selection was done along with hyperparameter tuning to have a better performance for both models. Comparisons among the MAE, RMSE, and R2 on a custom-based, library-based model, along with other regression models, have been drawn on a comparative basis. The best balance was achieved at a 70:30 train-test split. Of the six regressors tried, Random Forest had the highest predictive accuracy. It outperformed those from Gradient Boosting and Decision Tree. On the contrary, SVR has the weakest performance among the six regressors. All in all, a library-based Random Forest model gives, consistently better accuracy compared to a custom-based one. That achieves lower MAE and RMSE which is 101.24 and 203.53. Besides the regression, the R2 score, namely 99%. The research indicates that a custom Random Forest model surpasses standard implementations when properly optimized. Its deeper predictive accuracy arises from tuning hyperparameters to better suit the dataset. The model, however, is resource-consuming, has the requirement for pseudo-expertise in tuning parameters, and is therefore less accessible for a layperson. Future work should focus on improving the interpretability and extending the models to capture localized diamond pricing trends in real-life transactions.

Metadata

Item Type: Thesis (Degree)
Creators:
Creators
Email / ID Num.
Mohd Azmi, Nur Amirah
2023104431
Contributors:
Contribution
Name
Email / ID Num.
Thesis advisor
Tan, Gloria Jennis
UNSPECIFIED
Subjects: Q Science > QA Mathematics > Instruments and machines > Electronic Computers. Computer Science > Algorithms
Divisions: Universiti Teknologi MARA, Terengganu > Kuala Terengganu Campus > Faculty of Computer and Mathematical Sciences
Programme: Bachelor of Computer Science (Hons)
Keywords: Diamond Price Prediction System, Forest Algorithm
Date: 2025
URI: https://ir.uitm.edu.my/id/eprint/115275
Edit Item
Edit Item

Download

[thumbnail of 115275.pdf] Text
115275.pdf

Download (102kB)

Digital Copy

Digital (fulltext) is available at:

Physical Copy

Physical status and holdings:
Item Status:

ID Number

115275

Indexing

Statistic

Statistic details