Machine Learning Approaches for Predicting Breast Cancer Recurrence: A Comparative Analysis
Main Article Content
Abstract
This paper reports a comparative analysis of four supervised machine learning algorithms: RF, SVM (using radial and linear kernels), Logistic Regression, and Multi-Layer Perceptron, for breast cancer recurrence prediction on a carefully curated clinical dataset. The data set, first collected by Royston and Altman and subsequently released on Kaggle, has patient age, menopausal status, tumor size, histological grade, lymph node status, estrogen and progesterone receptor levels, hormone therapy for treatment, recurrence-free survival time, and a binary recurrence outcome. The data set was then divided after the elimination of identifiers and z-score normalization in an 80:20 ratio using stratified sampling. Models were compared based on accuracy, precision, recall, F1-score, and area under the ROC curve, with RF and Logistic Regression having the highest test-set accuracy of 0.703. Feature significance analysis Gini impurity in R F, linear model absolute coefficients, and permutation importance in neural networks all showed lymph node count, survival time, and hormone receptor levels to be significant predictors. Visualized confusion matrices, ROC curves, and correlation heatmaps enhanced interpretability. The results illustrate the potential of explainable machine learning to enhance individualized surveillance and treatment planning in breast cancer care.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.