Abstract
Accurate interpretation of Electronic Medical Records (EMRs), especially clinical notes, is crucial for effective healthcare communication and achieving accurate patient outcomes. The main challenges in hybrid Natural Language Processing (NLP) methods include integrating various techniques while maintaining contextual understanding, resolving ambiguous abbreviations, and reducing misinterpretations of clinical narratives. The dataset in this study consisted of cardiovascular-related clinical notes containing medical abbreviations, diagnoses, and discharge summaries. Before analysis, the data underwent preprocessing steps such as text normalization, abbreviation extraction, and punctuation cleaning to ensure consistency and readiness for the model. This study addresses abbreviation ambiguity, diagnosis prediction, and International Classification of Diseases (ICD) classification using a hybrid NLP approach. The objectives are to extract and expand abbreviations, develop a hybrid framework for diagnosis prediction and ICD mapping, and evaluate its performance. The methodology integrates the Text-to-Text Transfer Transformer (T5) model with enhanced inference combining cosine similarity and beam search for abbreviation expansion, MedBioClinicalBERT, an integration of BioClinicalBERT and MedBERT for diagnosis prediction, and Semantic Role Labeling (SRL) for explainability. The enhanced elicitive inference achieved 95.38% BLEU and 97.96% ROUGE-L scores on abbreviation expansion. For diagnosis prediction, the hybrid input framework with MedBioClinicalBERT attained 90.00% accuracy with precision, recall, and F1 scores of 0.9530, 0.9470, and 0.9000, respectively, outperforming BioClinicalBERT and MedBERT individually. Standardization to ICD-10 codes was refined using fuzzy matching to improve mapping accuracy. The overall performance for the hybrid NLP method is 94.89% of precision, 94% of recall, and 95% of F1 score. Although limitations persist due to the multimodal data nature of clinical notes and the cardiovascular-specific dataset, the proposed method demonstrates substantial improvements. Overall, this study highlights the effectiveness of combining hybrid NLP methods with advanced abbreviation expansion to enhance EMR interpretation and ICD-10 classification, paving the way for broader applications in medical text analysis.
Metadata
| Item Type: | Thesis (Masters) |
|---|---|
| Creators: | Creators Email / ID Num. Iqbal Basheer, Nurul Anis Balqis UNSPECIFIED |
| Contributors: | Contribution Name Email / ID Num. Thesis advisor Nordin, Sharifalillah UNSPECIFIED Thesis advisor Abdul Hamid, Nurzeatul Hamimah UNSPECIFIED |
| Subjects: | H Social Sciences > HD Industries. Land use. Labor H Social Sciences > HD Industries. Land use. Labor > Service industries |
| Divisions: | Universiti Teknologi MARA, Shah Alam > Faculty of Computer and Mathematical Sciences |
| Programme: | Master of Science (Computer Science) |
| Keywords: | Electronic Medical Records (EMR), Semantic Role Labeling (SRL), International Classification of Diseases (ICD) |
| Date: | December 2025 |
| URI: | https://ir.uitm.edu.my/id/eprint/132629 |
Download
132629.pdf
Download (15kB)
Digital Copy
Physical Copy
ID Number
132629
Indexing
