Exploratory Analysis and Preprocessing of Dataset for the Classification of Osteosarcoma Types

Main Article Content

Amoakoh Gyasi-Agyei
Tahsien Al-Quraishi
Bhagwan Das
Johnson I. Agbinya

Abstract

Osteosarcoma is a born-forming tumor which is more common with children and young adults than adults. Classification of its type is crucial to its proper treatment and possible survival. Machine learning models, trained on datasets of the disease, are are effective classification tool than hand-crafted features which are highly dependent on a pathologist’s expertise. However, machine learning models are only useful if the dataset used to train them are representative, of good quality and well prepared. Thus, data preprocessing and statistical analysis of datasets used to train models are necessary precursors to model learning. Data preprocessing is the most demanding task in the model learning pipeline. Thus, availability of a pre-processed quality dataset for a given task is desirable for model learning tasks. Two things are needed to obtain good results in a machine learning project: good data preprocessing and good algorithms. This paper provides a thorough preprocessing and statistical analysis of a 1144-sample dataset of osteosarcoma patients, to render the dataset ready for model learning. The efficacy of the preprocessing methods is verified by training multiclass logistic regression in Python using datasets with 63 of the 69 variables, with PCA and feature selection to achieve the respective predictive accuracies of 19.27%, 65.14% and 80.28%.

Article Details

How to Cite
Gyasi-Agyei, A., Al-Quraishi, T., Das, B., & Agbinya, J. I. (2023). Exploratory Analysis and Preprocessing of Dataset for the Classification of Osteosarcoma Types. Proceedings of International Conference for ICT (ICICT) - Zambia, 5(1), 36–43. Retrieved from https://ictjournal.icict.org.zm/index.php/icict/article/view/276
Section
Articles