Exploratory Analysis and Preprocessing of Dataset for the Classification of Osteosarcoma Types

Amoakoh Gyasi-Agyei; Tahsien Al-Quraishi; Bhagwan Das; Johnson I. Agbinya

PDF

Published: Dec 3, 2023

Amoakoh Gyasi-Agyei

Melbourne Institute of Technology

Tahsien Al-Quraishi

Melbourne Institute of Technology

Bhagwan Das

Melbourne Institute of Technology

Johnson I. Agbinya

Melbourne Institute of Technology

Abstract

Osteosarcoma is a born-forming tumor which is more common with children and young adults than adults. Classification of its type is crucial to its proper treatment and possible survival. Machine learning models, trained on datasets of the disease, are are effective classification tool than hand-crafted features which are highly dependent on a pathologist’s expertise. However, machine learning models are only useful if the dataset used to train them are representative, of good quality and well prepared. Thus, data preprocessing and statistical analysis of datasets used to train models are necessary precursors to model learning. Data preprocessing is the most demanding task in the model learning pipeline. Thus, availability of a pre-processed quality dataset for a given task is desirable for model learning tasks. Two things are needed to obtain good results in a machine learning project: good data preprocessing and good algorithms. This paper provides a thorough preprocessing and statistical analysis of a 1144-sample dataset of osteosarcoma patients, to render the dataset ready for model learning. The efficacy of the preprocessing methods is verified by training multiclass logistic regression in Python using datasets with 63 of the 69 variables, with PCA and feature selection to achieve the respective predictive accuracies of 19.27%, 65.14% and 80.28%.

How to Cite

Gyasi-Agyei, A., Al-Quraishi, T., Das, B., & Agbinya, J. I. (2023). Exploratory Analysis and Preprocessing of Dataset for the Classification of Osteosarcoma Types. Proceedings of International Conference for ICT (ICICT) - Zambia, 5(1), 36–43. Retrieved from https://ictjournal.icict.org.zm/index.php/icict/article/view/276

Issue

Vol. 5 No. 1 (2023): PACT 2023 - Pan African Conference on Science, Computing and Telecommunications (PACT) 2023, Lusaka Zambia

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details