An Analysis into factors affecting accuracy levels in deep learning models: A case of local language dataset in Zambia

Clement Mulenga Sinyangwe; Douglas Kunda; William Phiri Abwino; Emmanuel Lwele

PDF

Published: Dec 3, 2023

Keywords:

Accuracy, Deep Learning, dataset, models, language

Clement Mulenga Sinyangwe

Chalimbana University

Douglas Kunda

ZCAS University

William Phiri Abwino

Chalimbana University

Emmanuel Lwele

Sheffield Hallam University

Abstract

Deep learning models are being trained to detect hate speech and abusive language using labeled examples. However, there are challenges, particularly in language dictionaries. Language dictionaries are collections of phrases and embeddings used to represent words as numerical vectors in a high-dimensional space. Collecting a high-quality dataset of words and their translations can be challenging, especially in low-resource languages with limited resources. Additionally, ambiguity and variation in language can make it difficult to accurately match words between languages. Out-of-vocabulary (OOV) words, which are not found in the training dataset and are unrecognized by the model, can also pose challenges when developing a local language dictionary, especially in low-resource languages with limited vocabulary. The main objective of this study was to analyse how the language dictionary affects the accuracy levels of deep learning models. CRISP-DM was used as a prefered mothodology. It was noted that in order for these challenges to be addressed, local datasets must be properly curated and preprocessed to guarantee that they are representative, diverse, and unbiased. The study was informed that cloud-based machine learning services can be used to overcome resource constraints and make model maintenance easier.

How to Cite

Sinyangwe, C. M., Kunda, D., Abwino, W. P., & Lwele, E. (2023). An Analysis into factors affecting accuracy levels in deep learning models: A case of local language dataset in Zambia. Proceedings of International Conference for ICT (ICICT) - Zambia, 5(1), 82–87. Retrieved from https://ictjournal.icict.org.zm/index.php/icict/article/view/283

Issue

Vol. 5 No. 1 (2023): PACT 2023 - Pan African Conference on Science, Computing and Telecommunications (PACT) 2023, Lusaka Zambia

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details