Cookie Consent by Free Privacy Policy Generator 📌 From Centralized to Federated Learning

🏠 Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeiträge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden Überblick über die wichtigsten Aspekte der IT-Sicherheit in einer sich ständig verändernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch übersetzen, erst Englisch auswählen dann wieder Deutsch!

Google Android Playstore Download Button für Team IT Security



📚 From Centralized to Federated Learning


💡 Newskategorie: AI Nachrichten
🔗 Quelle: towardsdatascience.com

A summary of dataset distribution techniques for Federated Learning on the CIFAR benchmark dataset

Federated Learning (FL) is a method to train Machine Learning (ML) models in a distributed setting [1]. The idea is that clients (for example hospitals) want to cooperate without sharing their private and sensitive data. Each client holds their private data in FL and trains an ML model on it. Then a central server collects and aggregates the model parameters, thus building a global model based on information from all the data distribution. Ideally, this serves as privacy protection by design.

A long line of research has been done to understand FL's efficiency, privacy, and fairness. Here we will focus on the benchmark datasets used to evaluate horizontal FL methods where the clients share the same task and data type but they have their individual data samples.

If you want to know more about Federated Learning and what I work on, visit our research lab website!

Photo by JJ Ying on Unsplash

There are three types of datasets in the literature:

  1. Real FL scenario: an application where FL is a needed method. It has natural distributions and sensitive data. However, given the nature of FL if you want to keep the data locally you won’t publish the dataset online for benchmarking. Therefore it is hard to find a dataset of this kind. OpenMinded behind PySyft tries to organize an FL community of universities and research labs to host data in a more realistic scenario. Additionally, there are applications where the privacy-awareness has risen recently. So there might be publicly available data while the demand for FL exists. One application is smart electricity meters [2].
  2. FL benchmark datasets: these datasets are designed to serve as FL benchmarks. The distribution is realistic, but the sensitivity of the data is questionable as they are built from publicly available origins. One example is creating an FL dataset from Reddit posts using the users as clients and distributing it to one user as one partition. The LEAF project proposed more datasets like this [3].
  3. Distributing standard datasets: there are a couple of well-known datasets like CIFAR and ImageNet for images as an example used as a benchmark in many Machine Learning works. Here FL scientists define a distribution according to their research questions. It makes sense to use this method if the topic is well-studied on a standard ML scenario and one wants to compare their FL algorithm to centralized SOTA. However, this artificial distribution doesn’t reveal every problem with the distribution skew. For example, if the clients collect images with very different cameras or in different lighting conditions.

As the last category is not distributed by design, there are several ways past research works split them. In the rest of this story, I will summarise distribution techniques used for the CIFAR dataset in a federated scenario.

CIFAR dataset

The CIFAR-10 and CIFAR-100 datasets contain 32x32 colored images labeled to mutually exclusive classes [4]. The CIFAR-10 has 10 classes of 6000 images and the CIFAR-100 has 100 classes of 600 images. They are used in many image classification tasks and one can access dozens of models evaluated on them, even browsing them using a leaderboard on PapersWithCode.

Data partitioning in Federated Learning

Uniform distribution

This is considered to be identically and independently distributed (IID) data. Data points are randomly allocated to clients.

Single (n-) class clients

Data points allocated for a specific client come from the same class or classes. It can be recognized as an extreme non-IID setting. Examples of this distribution are in [1,5–8]. The work first naming the setting as Federated Learning [1] uses 200 single-class sets and gives two sets to each client making them 2-class clients. [5–7] use 2-class clients.

[9] builds on the hierarchical classes in CIFAR-100: clients have data points from one subclass in each superclass. This way in the classification task for superclasses has clients with samples from each (super)class, yet a distribution skew is simulated as the data points are from different subclasses. For example, one client has access to lions while the other has tiger images, the superclass task is to categorize both as large carnivores.

Dominant class clients

[5] also uses a mixture of uniform and 2-class clients, which means half of the data points come from the 2 dominant classes, and the rest are uniformly selected from the other classes. [10] uses an 80%-20% partition 80% selected from a single dominant class and the rest is uniformly selected from the other classes.

Dirichlet distribution

To understand the Dirichlet distribution, I follow the example of this blog post. Let’s say one wants to produce a dice, with θ=(1/6,1/6,1/6,1/6,1/6,1/6) probabilities for each number 1–6. However, in reality, nothing can be perfect, so each die will be a bit skewed. 4 a bit more likely and 3 a bit less likely for example. The Dirichlet distribution describes this variety with a parameter vector α=(α₁,α₂,..,α₆). Larger αᵢ strengthens the weight of that number and the larger overall sum of the αᵢ values ensures more similar sampled probabilities (dice). Turning back to the dice example, to have a fair die each αᵢ should be equal, and the larger the α value the better manufactured the dice are. As it is a multivariate generalization of the beta distribution, let’s display some examples of the beta distribution (Dirichlet distribution with two dice):

Different beta distributions (Dirichlet distribution for 2 variables) — Figure by the author

I reproduced the visualization in [11], using the same α value for αᵢ each. This is called a symmetric Dirichlet distribution. We can see that as the α value decreases it is more likely that there will be unbalanced dice. The figures below show the Dirichlet distribution for different α values. Here each row represents a class, each column is a client and the area of the circles is proportionate to the probabilities.

Distribution over classes: Sampling 20 clients for 10 classes using different Dirichlet distribution α values — Figure by the author

Distribution over classes: The samples for each client are drawn independently with class distribution following the Dirichlet method. [11, 16] use this version of the Dirichlet distribution.

Distribution over classes: normalized sum of samples by class (10) and by client (20) — Figure by the author

Each client has a predetermined number of samples, but the classes are chosen randomly, thus the final total class representation will be unbalanced. In the clients, α→∞ is the prior (uniform) distribution while α→0 means single-class clients.

Distribution over clients: Sampling 20 clients for 10 classes using different Dirichlet distribution α values — Figure by the author

Distribution over clients: if we know the total number of samples in a class and the number of clients, we can distribute the samples to the clients class by class. This will result in clients having a different number of samples (which is very typical in FL), while the global class distribution is balanced. [12] use this variation of the Dirichlet distribution.

Distribution over clients: normalized sum of samples by class (10) and by client (20) — Figure by the author

While works like [11–16] follow and cite each other using Dirichlet distribution, they use the two different methods. Furthermore, the different experiments use different α values which can result in very different performances. [11,12] uses α=0.1 and [13-15] uses α=0.5, [16] gives an overview of different α values. These design choices lose the original principle of using the same benchmark dataset to evaluate algorithms.

Asymmetric Dirichlet distribution: one can use different αᵢ values to simulate more resourceful clients. For example, the figure below is produced using 1/i for the ith client. It is not represented in the literature to my knowledge, instead, Zipf distribution is used in [17].

Asymmetric Dirichlet distribution with αᵢ=1/i — Figure by the author

Zipf distribution

[17] uses a combination of Zipf and Dirichlet distributions. It uses the Zipf distribution to determine the number of samples at each client and then selects the class distribution using the Dirichlet.

Probability for rank k in the Zipf distribution where is the Riemann Zeta function

In the Zipf (zeta) distribution the frequency of an item is inversely proportional to its rank in a frequency table. Zipf’s law can be observed in many real-world datasets, for example regarding the word frequency in language corpora [18].

Sampling items using the Zipf distribution — Figure by the author following the numpy documentation on Zipf

Conclusion

Benchmarking federated learning methods is a challenging task. Ideally, one uses predefined real federated datasets. However, if a certain scenario has to be simulated without a good existing dataset to cover it, one can use data distribution techniques. Proper documentation for reproducibility and motivation of the design choice is important. Here I summarized the most common methods already in use for FL algorithm evaluation. Visit this Colab notebook for the codes used for this story!

References

[1] McMahan, B., Moore, E., Ramage, D., Hampson, S., & y Arcas, B. A. (2017, April). Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics (pp. 1273–1282). PMLR.

[2] Savi, M., & Olivadese, F. (2021). Short-term energy consumption forecasting at the edge: A federated learning approach. IEEE Access, 9, 95949–95969.

[3] Caldas, S., Duddu, S. M. K., Wu, P., Li, T., Konečný, J., McMahan, H. B., … & Talwalkar, A. (2019). Leaf: A benchmark for federated settings. Workshop on Federated Learning for Data Privacy and Confidentiality

[4] Krizhevsky, A. (2009). Learning Multiple Layers of Features from Tiny Images. Master’s thesis, University of Tront.

[5] Liu, W., Chen, L., Chen, Y., & Zhang, W. (2020). Accelerating federated learning via momentum gradient descent. IEEE Transactions on Parallel and Distributed Systems, 31(8), 1754–1766.

[6] Zhang, L., Luo, Y., Bai, Y., Du, B., & Duan, L. Y. (2021). Federated learning for non-iid data via unified feature learning and optimization objective alignment. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 4420–4428).

[7] Zhang, J., Guo, S., Ma, X., Wang, H., Xu, W., & Wu, F. (2021). Parameterized knowledge transfer for personalized federated learning. Advances in Neural Information Processing Systems, 34, 10092–10104.

[8] Zhao, Y., Li, M., Lai, L., Suda, N., Civin, D., & Chandra, V. (2018). Federated learning with non-iid data. arXiv preprint arXiv:1806.00582.

[9] Li, D., & Wang, J. (2019). Fedmd: Heterogenous federated learning via model distillation. arXiv preprint arXiv:1910.03581.

[10] Wang, H., Kaplan, Z., Niu, D., & Li, B. (2020, July). Optimizing federated learning on non-iid data with reinforcement learning. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications (pp. 1698–1707). IEEE.

[11] Lin, T., Kong, L., Stich, S. U., & Jaggi, M. (2020). Ensemble distillation for robust model fusion in federated learning. Advances in Neural Information Processing Systems, 33, 2351–2363.

[12] Luo, M., Chen, F., Hu, D., Zhang, Y., Liang, J., & Feng, J. (2021). No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. Advances in Neural Information Processing Systems, 34, 5972–5984.

[13] Yurochkin, M., Agarwal, M., Ghosh, S., Greenewald, K., Hoang, N., & Khazaeni, Y. (2019, May). Bayesian nonparametric federated learning of neural networks. In International conference on machine learning (pp. 7252–7261). PMLR.

[14] Wang, H., Yurochkin, M., Sun, Y., Papailiopoulos, D., & Khazaeni, Y. (2020) Federated Learning with Matched Averaging. In International Conference on Learning Representations.

[15] Li, Q., He, B., & Song, D. (2021). Model-contrastive federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 10713–10722).

[16] Hsu, T. M. H., Qi, H., & Brown, M. (2019). Measuring the effects of non-identical data distribution for federated visual classification. arXiv preprint arXiv:1909.06335.

[17] Wadu, M. M., Samarakoon, S., & Bennis, M. (2021). Joint client scheduling and resource allocation under channel uncertainty in federated learning. IEEE Transactions on Communications, 69(9), 5962–5974.

[18] Fagan, Stephen; Gençay, Ramazan (2010), “An introduction to textual econometrics”, in Ullah, Aman; Giles, David E. A. (eds.), Handbook of Empirical Economics and Finance, CRC Press, pp. 133–153


From Centralized to Federated Learning was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

...



📌 Federated learning with TensorFlow Federated (TF World '19)


📈 42.71 Punkte

📌 From Centralized to Federated Learning


📈 42.68 Punkte

📌 Federated Learning with Azure Machine Learning


📈 31.85 Punkte

📌 This Machine Learning Framework Collaborates Heterogeneous Natural Language Processing Tasks via Federated Learning


📈 31.85 Punkte

📌 Federated Learning with Azure Machine Learning NVIDIA FLARE and MONAI | ODFP220


📈 31.85 Punkte

📌 Six Data Points You Should Know about Federated Machine Learning


📈 24.85 Punkte

📌 VisionAir: Using Federated Learning to estimate Air Quality using the Tensorflow API for Java


📈 24.85 Punkte

📌 Federated Learning Of Cohorts: Personalisierte Werbung ohne Nutzerprofile


📈 24.85 Punkte

📌 Federated Learning: A Therapeutic for what Ails Digital Health


📈 24.85 Punkte

📌 FLoC: Was ist Federated Learning of Cohorts?


📈 24.85 Punkte

📌 How Aster DM Healthcare used federated learning to better secure AI analysis of sensitive data


📈 24.85 Punkte

📌 AI Show Live - Episode 71 - Federated Learning with AzureML and Building Recommender Systems


📈 24.85 Punkte

📌 A Quick Start on Your Journey to Federated Learning


📈 24.85 Punkte

📌 Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 2


📈 24.85 Punkte

📌 Federated Learning on AWS with FedML: Health analytics without sharing sensitive data – Part 1


📈 24.85 Punkte

📌 Federated Learning of Cohorts: Microsoft und Mozilla äußern sich zu Googles neuer Technologie


📈 24.85 Punkte

📌 What is Federated Machine Learning?


📈 24.85 Punkte

📌 Distributed differential privacy for federated learning


📈 24.85 Punkte

📌 Reinventing a cloud-native federated learning architecture on AWS


📈 24.85 Punkte

📌 Federated Learning for Speech Recognition: Revisiting Current Trends Towards Large-Scale ASR


📈 24.85 Punkte

📌 The UK-US Blog Series on Privacy-Preserving Federated Learning: Introduction


📈 24.85 Punkte

📌 Privacy Attacks in Federated Learning


📈 24.85 Punkte

📌 Federated learning on AWS using FedML, Amazon EKS, and Amazon SageMaker


📈 24.85 Punkte

📌 Enable data sharing through federated learning: A policy approach for chief digital officers


📈 24.85 Punkte

📌 In A New AI Research, Federated Learning Enables Big Data For Rare Cancer Boundary Detection


📈 24.85 Punkte

📌 Importance of Smoothness Induced by Optimizers in FL4ASR: Towards Understanding Federated Learning for End-to-End ASR


📈 24.85 Punkte

📌 Training Large-Vocabulary Neural Language Model by Private Federated Learning for Resource-Constrained Devices


📈 24.85 Punkte

📌 Federated learning, a new model for confidential computing | Intel


📈 24.85 Punkte

📌 Data Distribution in Privacy-Preserving Federated Learning


📈 24.85 Punkte

📌 Protecting Model Updates in Privacy-Preserving Federated Learning


📈 24.85 Punkte

📌 Is learning Linux the same as learning Bash? And if not, what does learning Linux consist of?


📈 20.98 Punkte

📌 Office 365 Vulnerability Exposed Any Federated Account


📈 17.86 Punkte











matomo