A Modular AI-Driven Framework for Automating HR Case Processing: OCR, NLP, Sentiment Analysis and Imbalanced Learning
Main Article Content
Abstract
This study investigates the feasibility and effectiveness of integrating Optical Character Recognition (OCR), Regular Expressions (RegEx), Aspect-Based Sentiment Analysis (ABSA), and supervised learning to automate structured and semi-structured Human Resource (HR) case processing in African public-sector contexts. The proposed modular pipeline comprises OpenCV-based pre-processing for noise reduction, skew correction, and ROI detection; tuned Tesseract 4.1 OCR; RegEx-driven attribute extraction; ABSA for narrative sentiment; and Balanced Bagging Random Forest classification with SMOTE-ENN applied to enhance minority-class representation and improve sensitivity to rare but decision-critical HR outcomes. Evaluated on anonymised promotion and transfer cases from Zambia’s Teaching Service Commission, the system achieved 95% OCR accuracy, F2-Score of 0.97, weighted F1 = 0.95, PR AUC = 0.98, and minority-class F1 = 0.36, demonstrating improved detection of low-frequency, high-priority HR cases. End-to-end processing reduced manual timelines from ~3 days to <9.4 s per two-page case batch, scaling to 1,312 records in 3.3 s on commodity hardware, enabling timely decision support in resource-constrained environments. The framework’s domain-specific integration and ethical alignment provide a scalable, adaptable solution for HR digitisation in policy-bound, resource-limited sectors, supporting compliance with governance and transparency requirements.