Abstract
This study presents a descriptive comparative investigation of the mathematical accuracy of two widely used large language model (LLM) tools, ChatGPT and Gemini, across three core domains: Algebra, Calculus, and Statistics. The increasing adoption of generative AI in higher education has raised concerns about the reliability of AIgenerated mathematical solutions, particularly when outputs appear coherent but contain hidden reasoning gaps. To examine domain-specific performance, both tools were tested using an identical prompt protocol, and only first responses were recorded to reflect typical student usage. Accuracy was evaluated using final-answer correctness and summarized using descriptive statistics, reported as percentage of correct solutions by domain. Results indicate that both tools achieved consistently high accuracy across all domains, exceeding 88%. ChatGPT demonstrated higher accuracy in Algebra (97.22%) compared to Gemini (91.67%), suggesting stronger performance on symbolic manipulation and structured equation-based tasks. In contrast, Gemini achieved perfect accuracy in both Calculus and Statistics (100% each), outperforming ChatGPT in those domains (88.89% and 94.44%, respectively). These findings indicate that LLM effectiveness in mathematics is domain-dependent rather than uniform, with each system exhibiting distinct strengths. Overall, the study suggests that AI tools can serve as useful computational assistants in mathematics learning and practice, but domain sensitivity implies that outputs should be interpreted cautiously and verified, especially in formal assessment contexts. Future work should expand the problem set, incorporate step-validity scoring, and evaluate performance under reworded and out-of-distribution problem conditions to better assess reasoning robustness.
Metadata
| Item Type: | Article |
|---|---|
| Creators: | Creators Email / ID Num. Umar, Norazah norazah191@uitm.edu.my Ahmad, Nurhafizah nurha9129@uitm.edu.my Othman, Jamal jamalothman@uitm.edu.my Hamat, Muniroh muniroh@uitm.edu.my |
| Contributors: | Contribution Name Email / ID Num. Advisor Abd Rahman, Nor Hanim UNSPECIFIED Chief Editor Othman, Jamal UNSPECIFIED |
| Subjects: | Q Science > QA Mathematics > Evolutionary programming (Computer science). Genetic algorithms |
| Divisions: | Universiti Teknologi MARA, Pulau Pinang > Permatang Pauh Campus |
| Journal or Publication Title: | Merging Lanes: Where E-Learning Diversity Meets Future Trends |
| ISSN: | 978-629-98755-9-8 |
| Volume: | 11 |
| Page Range: | pp. 108-115 |
| Keywords: | ChatGPT, Gemini, AI |
| Date: | April 2026 |
| URI: | https://ir.uitm.edu.my/id/eprint/137356 |
