Data Mining Driven Segmentation of Health Insurance Policyholders Using K-Means Clustering

Main Article Content

Farah Ali Khairi
Laith Farhan
Oluwaseun A. Adelaja

Abstract

This study illustrates a data‐driven approach to the segmentation of health insurance policyholders based on K-Means clustering of an open insurance dataset. Key demographic and financial features like age, body mass index (BMI), dependents, annual medical spending, and premium payment were normalized first to ensure comparability. The optimal number of clusters (k = 3) was determined using silhouette analysis, and three clusters were formed: (1) young, low‐cost individuals, (2) middle‐aged medium‐cost individuals, and (3) old, high‐cost individuals. Cluster centroids provide actionable profiles that can be utilized by insurers for target marketing, risk profiling, and development of customized plans. A set of visualizations scatter plots, boxplots, histograms, and bar charts illustrate the separation and within‐distribution nature of these segments. The preprocessing workflow (missing value treatment, encoding of categorical features, and feature scaling) was encoded in a flowchart for reproducibility. Results demonstrate that straightforward-to-implement unsupervised learning techniques can yield interpretable customer segmentations, offering a foundation for more advanced predictive modeling and individualized insurance policies.


 


 


 


 

Article Details

Section

Articles

How to Cite

Data Mining Driven Segmentation of Health Insurance Policyholders Using K-Means Clustering (F. A. . Khairi, L. . Farhan, & O. A. . Adelaja , Trans.). (2025). Mesopotamian Journal of Artificial Intelligence in Healthcare, 2025, 187-196. https://doi.org/10.58496/MJAIH/2025/018

Similar Articles

You may also start an advanced similarity search for this article.