Kamarulzalis, Ahmad Haadzal and Abdullah, Mohd Asrul Affendi
(2019)
*An improvement algoithm for Iris classification by using Linear Support Vector Machine (LSVM), k-Nearest Neighbours (k-NN) and Random Nearest Neighbous (RNN) / Ahmad Haadzal Kamarulzalis and Mohd Asrul Affendi Abdullah.*
Journal of Mathematics & Computing Science, 5 (1).
pp. 32-38.
ISSN 0128-0767

## Abstract

In machine learning, there are three type of learning branch that can used in classification procedures for data mining. Those branchconsist of supervised learning, unsupervised learning and reinforcement learning. This study focuses on supervised learning that seek to classify all the Iris dataset respect to three species (setosa, versicolor and virginica) in order them to mimic the actual dataset by using Linear Support Vector Machine (LSVM) , k-Nearest Neighbours (kNN) and Random Nearest Neighbours (RNN) as a method. Aims of this study is to improve an existing algorithm technique for classification. The ideas come from a combination of k-NN algorithm and ensemble concept. Next, is to identify the best model for classification procedures. Existing Performance Measurement Tools such as overall accuracy and misclassification error rate (MER) areused for each classifier. Random Nearest Neighbours (RNN) has the highest accuracy value with98% and2% misclassification error rate (MER) compare to other classifier. Therefore, Random Nearest Neighbors (RNN) is preferable for supervised learning classification procedures.

## Metadata

Item Type: | Article |
---|---|

Creators: | Creators Email / ID Num. Kamarulzalis, Ahmad Haadzal haadzal9301@gmail.com Abdullah, Mohd Asrul Affendi UNSPECIFIED |

Subjects: | Q Science > QA Mathematics > Mathematical statistics. Probabilities > Data processing Q Science > QA Mathematics > Instruments and machines > Electronic Computers. Computer Science > Data mining |

Divisions: | Universiti Teknologi MARA, Kelantan > Machang Campus |

Journal or Publication Title: | Journal of Mathematics & Computing Science |

UiTM Journal Collections: | UiTM Journal > Journal of Mathematics and Computing Science (JMCS) |

ISSN: | 0128-0767 |

Volume: | 5 |

Number: | 1 |

Page Range: | pp. 32-38 |

Keywords: | Ensemble learning, Iris classification, k-Nearest Neighbours (k-NN), Machine learning, Supervise learning.1 IntroductionThe science of extracting useful information from large data sets, big data or databases is known as data miningaccording to Hand et al. [1]. It is a new discipline, lying at the intersection of statistics, mathematics, machine learning, data management and artificial intelligence in computer science areas and others. All of these are concerned and involve with certain aspects of data analysis, so they have much in common. But each also has its own variety pattern, emphasizing problems and types of solution. Data mining encompasses a wide variety of topics in computer science, statistics and mathematics. On the other hand machine learning is a sub-field of data science that can be used for data mining.Basically, classification have divided into three branches. These branches are Supervised, Unsupervised and Reinforcement Learning methods. But, for this study improvement supervised learning techniques (Machine learning) are focused. Learning from numerical data (independent variables) to classify a categorical data (dependent variable) in order to mimic actual data was carried on for this study. The variable that going to predicted is typically called as class variable (for obvious reasons). While,for independent variable usually in form of features, attributes, explanatory variables, input, and so on. In this study, classification was done to classify an Iris dataset respectively to fifty flowers from three species that are Iris setosa, versicolorand virginica while to verify an improvement supervisedtechniqueand to find a best classifier with highest value of accuracy level that indicate how far each of classifier classify all iris datasetsrespectively.2 Literature ReviewMountrakis et al. [2]conduct a research on remote sensing field by using SVM to analyse it performance in generalize even limited samples. According them, SVMs can yield comparable accuracy using a much smaller sample size. This is in line with the support vector concept that relies only on a few data points to define the classifierâ€™s hyperplane. This property has been exploited and has been prove that SVM very useful in any application. Furthermore, SVM is suitable for image recognition and classification that sometimes has complex environment. |

Date: | 2019 |

URI: | https://ir.uitm.edu.my/id/eprint/29220 |