Detecting Hate Speech and Offensive Language using Machine Learning in Published Online Content

Clement Sinyangwe; Douglas Kunda; William Phiri Abwino

doi:10.33260/zictjournal.v7i1.143

PDF

Published: Mar 30, 2023

DOI: https://doi.org/10.33260/zictjournal.v7i1.143

Keywords:

Hate Speech, Offensive Language, Online Content, Machine Learning

Clement Sinyangwe

Chalimbana University

Douglas Kunda

ZCAS University

William Phiri Abwino

Chalimbana University

Abstract

Businesses are more concerned than ever about hate speech content as most brand communication and advertising move online. Different organisations may be incharge of their products and services but they do not have complete control over their content posted online via their website and social media channels, they have no control over what online users post or comment about their brand. As a result, it became imperative in our study to develop a model that will identify hate speechand, offensive language and detect cyber offence in online published content using machine learning. This study employed an experimental design to develop a detection model for determining which agile methodologies were preferred as a suitable development methodology. Deep learning and HateSonar was used to detect hate speech and offensive language in posted content. This study used data from Twitter and Facebook to detect hate speech. The text was classified as either hate speech, offensive language, or both. During the reconnaissance phase, the combined data (structured and unstructured) was obtained from kaggle.com. The combined data was stored in the database as raw data. This revealed that hate speech and offensive language exist everywhere in the world, and the trend of the vices is on the rise. Using machine learning, the researchers successfully developed a model for detecting offensive language and hate speech on online social media platforms. The labelling in the model makes it simple to categorise data in a meaningful and readable manner. The study establishes that in fore model to detect hate speech and offensive language on online social media platforms, the data set must be categorised and presented in statistical form after running the model; the count indicates the total number of data sets imported. The mean for each category, as well as the standard deviation and the minimum and maximum number of tweets in each category, are also displayed. The study established that preventing online platform abuse in Zambia requires a comprehensive approach that involves government law, responsible platform policies and practices, as well as individual responsibility and accountability. In accordance with this goal, the research was effective in developing the detection model. To guarantee that the model was completely functional, it was trained on the English dataset before being applied to the local language dataset. This was because of the fact that training deep learning models with local datasets can present a number of challenges, such as limited, biased data, data privacy, resource requirements, and model maintenance. However, the efficacy of these systems varies, and there have been concerns raised about the inherent biases and limitations of automatic moderation techniques. The study recommends that future studies consider other sources of information such as Facebook, WhatsApp, Instagram, and other social media platforms, as well as consider harvesting local data sets for training machines rather than relying on foreign data sets; the local data set can then be used to detect offences targeting Zambian citizens on local platforms.

How to Cite

Sinyangwe, C., Kunda, D., & Abwino, W. P. (2023). Detecting Hate Speech and Offensive Language using Machine Learning in Published Online Content . Zambia ICT Journal, 7(1), 79–84. https://doi.org/10.33260/zictjournal.v7i1.143

Issue

Vol. 7 No. 1 (2023): Zambia ICT Journal

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details