Abstract
As vital components of human-computer interaction, CAPTCHAs are widely used across various industries, including video, education, finance, e-commerce, aviation, and public services. Research into CAPTCHA recognition is crucial for identifying network security vulnerabilities and advancing cybersecurity measures. Among the commonly used types, text and slider CAPTCHAs are particularly notable. While slider CAPTCHAs ask users to determine the location of a gap, text CAPTCHAs require users to recognize characters. In text CAPTCHAs, significant noise and correlations between characters pose challenges for recognition. To address these anti-attack mechanisms in text CAPTCHAs, a new text CAPTCHA framework has been proposed consisting of three parts: a data augmentation module, a font enhancement network, and a recognition network. Among them, the recognition network named Adaptive-CAPTCHA is improved based on Deep-CAPTCHA, consisting of Convolutional Recurrent Neural Network (CRNN), Adaptive Fusion Filtering Networks (AFFN), and residual connections, achieving an Average Attack Success Rate (AASR) of 99% on complex datasets (M-CAPTCHA) and near-perfect performance (99.9%) on simpler ones (P-CAPTCHA). Additionally, a Font Enhancement (FE) network based on Generative Adversarial Networks (GAN) has been introduced, which significantly undermines the interference in text CAPTCHAs. To address the limitation of traditional color enhancement algorithms that lack adaptive learning capabilities, three types of Variation Color Shift (VCS) algorithms have also been proposed for data augmentation. Experimental results show VCS notably improves recognition accuracy; for instance, it boosts the AASR of Adaptive-CAPTCHA on the challenging M-CAPTCHA dataset by approximately 10 percentage points compared with no color shift. For slider CAPTCHA detection, mean Relative Offset (mRO) has been proposed as a specific metric for slider CAPTCHA recognition, and Offset-based Intersection over Union (OIoU) loss is developed to improve the loss function, effectively reducing the mRO to below 1% on the Geetest dataset. The Fixed Quantity Prediction Non-Maximum Suppression (FQP-NMS) method, along with lightweight backbones and attention mechanisms, is proposed to optimize recognition performance and improve the architecture of YOLOv5, achieving a mean Average Precision (mAP) of 0.994 on the SliderCAPTCHA dataset. By comparing them with the benchmark models in terms of recognition accuracy, precision, computational complexity, storage space, and other metrics, the superiority of our proposed algorithm is proved. Furthermore, we have delineated the path for subsequent study and enhancement.
Metadata
| Item Type: | Thesis (PhD) |
|---|---|
| Creators: | Creators Email / ID Num. Xing, Wan UNSPECIFIED |
| Contributors: | Contribution Name Email / ID Num. Thesis advisor Ahmat Ruslan, Fazlina UNSPECIFIED Thesis advisor Johari, Juliana UNSPECIFIED |
| Subjects: | T Technology > TK Electrical engineering. Electronics. Nuclear engineering T Technology > TK Electrical engineering. Electronics. Nuclear engineering > Telecommunication > Computer networks. General works. Traffic monitoring |
| Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Electrical Engineering |
| Programme: | Doctor of Philosophy (Electrical Engineering) |
| Keywords: | Artificial Neural Network (ANN), Average Attack Success Rate (AASR), Adaptive Color Shift (ACS) |
| Date: | September 2025 |
| URI: | https://ir.uitm.edu.my/id/eprint/132611 |
Download
132611.pdf
Download (18kB)
Digital Copy
Physical Copy
ID Number
132611
Indexing
