An Analysis into factors affecting accuracy levels in deep learning models: A case of local language dataset in Zambia

Main Article Content

Clement Mulenga Sinyangwe
Douglas Kunda
William Phiri Abwino
Emmanuel Lwele

Abstract

Deep learning models are being trained to detect hate speech and abusive language using labeled examples. However, there are challenges, particularly in language dictionaries. Language dictionaries are collections of phrases and embeddings used to represent words as numerical vectors in a high-dimensional space. Collecting a high-quality dataset of words and their translations can be challenging, especially in low-resource languages with limited resources. Additionally, ambiguity and variation in language can make it difficult to accurately match words between languages. Out-of-vocabulary (OOV) words, which are not found in the training dataset and are unrecognized by the model, can also pose challenges when developing a local language dictionary, especially in low-resource languages with limited vocabulary. The main objective of this study was to analyse how the language dictionary affects the accuracy levels of deep learning models. CRISP-DM was used as a prefered mothodology. It was noted that in order for these challenges to be addressed, local datasets must be properly curated and preprocessed to guarantee that they are representative, diverse, and unbiased. The study was informed that cloud-based machine learning services can be used to overcome resource constraints and make model maintenance easier.

Article Details

How to Cite
Sinyangwe, C. M., Kunda, D., Abwino, W. P., & Lwele, E. (2023). An Analysis into factors affecting accuracy levels in deep learning models: A case of local language dataset in Zambia. Proceedings of International Conference for ICT (ICICT) - Zambia, 5(1), 82–87. Retrieved from https://ictjournal.icict.org.zm/index.php/icict/article/view/283
Section
Articles