TY - GEN
T1 - Towards Unsupervised Sudden Data Drift Detection in Federated Learning with Fuzzy Clustering
AU - Stallmann, Morris
AU - Wilbik, Anna
AU - Weiss, Gerhard
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Federated learning (FL) is a machine learning (ML) discipline that allows to train ML models on distributed data without revealing raw data instances. It promises to enable ML in environments with data sharing constraints, e.g., due to data privacy concerns, or other considerations. Data and concept drift are commonly referred to as unpredictable changes in data distributions over time. It is known to impact a ML model's performances in many real-world scenarios. While drift detection and adaptation has been studied extensively in the non-federated setting, it is still less explored in the FL setting. The private and distributed nature of data in FL makes drift detection much harder in FL since no entity can oversee all data instances to estimate changes in the global data distribution. In this paper, we propose a novel unsupervised federated data drift detection method that is based on federated fuzzy $c-\mathbf{means}$ clustering and the federated fuzzy Davies-Bouldin index, a global cluster validation metric. First, using the federated fuzzy $c-\mathbf{means}$ clustering algorithm, an initial global data model is learned. Second, the federated fuzzy Davies-Bouldin index $\Delta$ is calculated estimating how well the data fits the learned model. Third, whenever a new batch of data is available at time $t$, the fit of initial data model and new data is evaluated through the federated fuzzy Davies-Bouldin index $\Delta_{t}$. Finally, $\Delta$ and $\Delta_{t}$ are compared to detect drift. The method is unsupervised as it does not require any labels and detects global data drift while keeping all data private. We evaluate our method carefully in a controlled environment by simulating multiple federated drift scenarios. We observe promising results as it rarely signals false positive alarms and detects drift in multiple scenarios. We also observe short-comings such as sensitivity to parameter choices and low detection rate in case only few data points in a new batch of data are affected by drift.
AB - Federated learning (FL) is a machine learning (ML) discipline that allows to train ML models on distributed data without revealing raw data instances. It promises to enable ML in environments with data sharing constraints, e.g., due to data privacy concerns, or other considerations. Data and concept drift are commonly referred to as unpredictable changes in data distributions over time. It is known to impact a ML model's performances in many real-world scenarios. While drift detection and adaptation has been studied extensively in the non-federated setting, it is still less explored in the FL setting. The private and distributed nature of data in FL makes drift detection much harder in FL since no entity can oversee all data instances to estimate changes in the global data distribution. In this paper, we propose a novel unsupervised federated data drift detection method that is based on federated fuzzy $c-\mathbf{means}$ clustering and the federated fuzzy Davies-Bouldin index, a global cluster validation metric. First, using the federated fuzzy $c-\mathbf{means}$ clustering algorithm, an initial global data model is learned. Second, the federated fuzzy Davies-Bouldin index $\Delta$ is calculated estimating how well the data fits the learned model. Third, whenever a new batch of data is available at time $t$, the fit of initial data model and new data is evaluated through the federated fuzzy Davies-Bouldin index $\Delta_{t}$. Finally, $\Delta$ and $\Delta_{t}$ are compared to detect drift. The method is unsupervised as it does not require any labels and detects global data drift while keeping all data private. We evaluate our method carefully in a controlled environment by simulating multiple federated drift scenarios. We observe promising results as it rarely signals false positive alarms and detects drift in multiple scenarios. We also observe short-comings such as sensitivity to parameter choices and low detection rate in case only few data points in a new batch of data are affected by drift.
KW - drift
KW - drift detection
KW - federated data drift detection
KW - federated drift detection
KW - federated learning
KW - fuzzy clustering
KW - unsupervised
U2 - 10.1109/FUZZ-IEEE60900.2024.10611883
DO - 10.1109/FUZZ-IEEE60900.2024.10611883
M3 - Conference article in proceeding
T3 - IEEE International Conference on Fuzzy Systems
BT - 2024 IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2024 - Proceedings
PB - IEEE
T2 - 2024 IEEE International Conference on Fuzzy Systems
Y2 - 30 June 2024 through 5 July 2024
ER -