Challenges of using Data Mining Techniques to Analyze and Forecast COVID-19 Pandemic in Zambia
Main Article Content
Abstract
COVID-19 is a highly infectious respiratory disease that belongs to the SARS group of viruses that has presented a global challenge to almost everyone world-wide. During the early stages of the pandemic in Zambia, a major challenge was the limited data and datasets for COVID-19. This challenge restricted research, especially in data mining. The challenge of data and datasets is currently improving. This paper presents the challenges of using data mining techniques and models to analyze and forecast the COVID-19 pandemic in Zambia. The analysis initially presents the methodology used for creating a dataset that focuses on the pandemic at provincial scope and uses the Zambia National Public Health Institute (ZNPHI) and Ministry of Health Zambia daily situation reports. The analysis of the pandemic at country level used the COVID-19 datasets from the Humanitarian Data Exchange (HDX) and the European Center for Disease Prevention and Control (ECDC). The study finally discusses the development and evaluation of the forecasting model. The forecasting model is based on the COVID_SEIRD Python package. To evaluate the forecasting model, the research utilized a combination of correlation and the max-function from basic statistics. The analysis focuses on finding the provincial area with the most COVID-19 cases in Zambia, while the forecasting process manages to forecast the trend of the pandemic for recoveries and fatalities.