Automated Document Classification for research HEI grant awards using Machine Learning

Main Article Content

Rebecca Lupyani
Jackson Phiri

Abstract

The recent advances in information technology has resulted into a continual increase of electronic textual documents. The need to classify these documents according to their subject or related content has become pragmatic for decision making and policy makers. This paper explores the use of the Support Vector Machine Model which is considered one of the most popular text classification models. The model was trained with two different datasets; the S2ORC and the dataset obtained from the University of Zambia-Institutional Repository (UNZA-IR). The model performed generally well using the S2ORC but did not perform well when trained with the UNZA-IR dataset due to its small size.  The research therefore recommends merging the two datasets with the hope of improving the performance of the model and/ or building a larger corpus of Zambian electronic thesis, dissertations and articles to make the dataset size satisfactory for training.

Article Details

How to Cite
Lupyani, R., & Phiri, J. (2023). Automated Document Classification for research HEI grant awards using Machine Learning . Zambia Association of Public Universities and Colleges (ZAPUC) Conference, 3(1), 90–95. Retrieved from https://ictjournal.icict.org.zm/index.php/zapuc/article/view/217
Section
Articles