Abstract
There is a lot of work which have been implemented to solve the problem of text classification, but there is only a little research doing Arabic text classification because of the difficulties in Arabic morphology and the limited public dataset. In order to construct the dataset, the dataset is validated by an expert from lecturer University Sains Islam Malaysia. The purpose validates the dataset is to maintain the authenticity of the content of the hadith. Convolution Neural networks and support vector machines are two different algorithms applied to text classification. CNN seems to be good in extracting the feature from input, and SVM is good for the classification task. This study is to introduce Hadith text classification using a Convolutional Neural Network and Support Vector Machine. There are 6 different ways of designing the experiment to evaluate the result of the study, which are an experiment with the model using different stemming techniques, an experiment with the model using three different algorithms, the result analysis of confusion matric of three algorithms, experiment the model using different SVM kernel, experiment the model using unseen data, produce precision, recall, F1-measure and accuracy result of the model and parameter. First, different model performances are being analysed to find which model gives higher accuracy for this study. CNN-SVM shows a promising result with 92% accuracy, while the CNN only and SVM only give lower accuracy than the proposed model with 82% and 74%. Second, parameter tuning is conducted to find the best parameter for CNN-SVM. Third, the model (CNN-SVM, CNN and SVM) is monitored to see if their performance predicts unseen data. In this study, the CNN-SVM model predicts all correct when using unseen data. Fourth, the model is being tested using different stemming techniques, and it found that the model using non-stemming techniques gives higher accuracy with 92%. Lastly, the different kernel of SVM kernels is being tested to investigate the model's performance for this study. The details about the other experiment can be seen in chapter five, Result and Discussion. The model (CNN-SVM) shows the potential in this study as the model shows better performance than other models. However, there are some limitation of this study, the dataset used were not applied to all categories. It only involved three classes which are prayer, fasting and zakat. So, the model not able to predict correctly if the model predict out of the selected classes. It might be better when the model learns more data and a more specific topic about the Hadith in Arabic. For future work, it is recommended to extend the dataset so that the model can predict the classes in more detail and combine the model with an optimization algorithm to improve the performance of the model.
Metadata
Item Type: | Thesis (Masters) |
---|---|
Creators: | Creators Email / ID Num. Mazlin, Mohd Irwan 2019507823 |
Contributors: | Contribution Name Email / ID Num. Thesis advisor Mohamed Rawi, Mohd Izani UNSPECIFIED |
Subjects: | B Philosophy. Psychology. Religion > BP Islam. Bahaism. Theosophy, etc > Islam > Sacred books > Hadith literature. Traditions. Sunna Z Bibliography. Library Science. Information Resources > Information organization |
Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
Programme: | Master of Science (Computer Science) |
Keywords: | Arabic hadith, convolutional neural network, vector machine |
Date: | 2022 |
URI: | https://ir.uitm.edu.my/id/eprint/75386 |
Download
75386.pdf
Download (169kB)