Data Mining Driven Segmentation of Health Insurance Policyholders Using K-Means Clustering
Main Article Content
Abstract
This study illustrates a data‐driven approach to the segmentation of health insurance policyholders based on K-Means clustering of an open insurance dataset. Key demographic and financial features like age, body mass index (BMI), dependents, annual medical spending, and premium payment were normalized first to ensure comparability. The optimal number of clusters (k = 3) was determined using silhouette analysis, and three clusters were formed: (1) young, low‐cost individuals, (2) middle‐aged medium‐cost individuals, and (3) old, high‐cost individuals. Cluster centroids provide actionable profiles that can be utilized by insurers for target marketing, risk profiling, and development of customized plans. A set of visualizations scatter plots, boxplots, histograms, and bar charts illustrate the separation and within‐distribution nature of these segments. The preprocessing workflow (missing value treatment, encoding of categorical features, and feature scaling) was encoded in a flowchart for reproducibility. Results demonstrate that straightforward-to-implement unsupervised learning techniques can yield interpretable customer segmentations, offering a foundation for more advanced predictive modeling and individualized insurance policies.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.