Factor-Based Unsupervised Network Traffic Anomaly Analysis Using Density, Spectral, and Hierarchical Clustering
Main Article Content
Abstract
Anomaly detection in network traffic is crucial for the nurturance of today’s communication systems with soundness and robustness, particularly in situations where labeled data are sparse or not available. In this paper, we propose an unsupervised approach in which factor analysis combined with multiple clustering methods is used to locate anomalous patterns on network traffic. First, the numerical traffic characteristics are standardized and then factor analysis is applied to obtain a reduced set of latent factors composed of the mean statistics with 85% cumulative variance. The resultant factor scores are then modeled by Agglomerative Hierarchical Clustering, Gaussian Mixture Models, DBSCAN and Spectral Clustering to reveal intrinsic traffic patterns. Validity of the clustering results is also evaluated through internal validation measures such as Silhouette Coefficient (SC), Calinski–Harabasz Index (CHI), and Davies–Bouldin Score (DBS), along with a post-hoc comparison to ground-truth labels for interpretability judgment. The proposed method has been tested on simulated and real data–sets of gene expression profiles, and Student t-distributed Stochastic Neighbor Embedding for dimensionality reduction applied to Spectral Clustering produces the best separation in reduced factor space, with Agglomerative Clustering failing between clustering into stable and interpretable groups. In addition, DBSCAN has a strong ability to find rare and irregular traffic cases that anomalous samples highly concentrate in certain clusters. These findings confirm the efficiency of parallel factor analysis followed by various unsupervised clustering methods in order to improve robustness, interpretability, and scalability in network traffic anomaly detection, thus validating the proposed approach for practical network reliability and cybersecurity.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution 4.0 International License.