Skip to content

Integrative Scalable Computing Laboratory

A research group at the Department of Information Technology, Uppsala Universtity.

  • Home
  • Projects
  • People
  • Publications
  • Teaching
  • Software
  • Recruitment
  • About us
  • Toggle search form

Federated Machine Learning

Artificial intelligence is rapidly transforming our society. Machine learning models will be components in nearly every digital system we use. For this reason, there is an urgent need for methods and software that allows for development of state-of-the art ML models while protecting the integrity of data owners .

A short introduction to Federated Machine Learning.

In this project, which started in 2019, we work on algorithms, highly scalable implementations, and applications of Federated Learning (FedML) – an approach to training ML models while keeping input data privacy of data owners.

FEDn – a software framework for scalable federated learning

FEDn in an open source framework for highly scalable and robust FedML. This software project is a collaboration with the engineering team in our spin-out company Scaleout Systems.

We approach the project from a distributed computing perspective, and the overarching goal with FEDn is to provide a highly scalable, robust, resilient and secure framework that can, depending on deployment-level tuning, effectively handle both cross-silo and cross-device use-cases. We propose a highly scalable architecture drawing on the Actor model and utilizing hierarchical aggregation capabilities for horizontal scalability.

Another central goal of FEDn is a ML-framework agnostic backend. For this reason, we treat local model training in a black-box manner. The performance-utility tradeoff from this design objective is one of the questions we address in this project.

A scalable, resilient and model agnostic federated learning framework.
https://github.com/scaleoutsystems/fedn
21 forks.
83 stars.
21 open issues.

Recent commits:
  • Update version number i setup.py (#463), GitHub
  • Merge branch 'master' into release/v0.4.0, GitHub
  • Fix broken metrics in UI (#453)* Resolved #451 (#452)Co-authored-by: Andreas Hellander <andreas.hellander@gmail.com>* Update README.rst (#449)Removed a duplicate in main redmeCo-authored-by: Andreas Hellander <andreas.hellander@gmail.com>, GitHub
  • Update README.md, GitHub
  • Update README.md, GitHub

Security and trust in FedML

When working on a federated machine learning model in a setting with several different actors, there is a challenge to trust that the model generated is secure, maintain full data privacy and is not misused by anyone in the group. In this project we are working on integrating blockchain technologies in the FEDn platform to enable fully decentralized, trust-less construction of FedML models.

Algorithms

We are pursuing research on improved algorithms for FedML. For example, based on our work and on FEDn and our proposed architecture, we are working on highly scalable implementations of Secure Gradient Boosting. We are also pursuing improved performance in cross-device use-cases based on transfer learning. Another line of research are meta-models / ensembles in a federated setting. Together with our collaborator Ola Spjuth, we look into federated conformal prediction. Another area of interest is scalable measurement of client contributions to the federated model.

Applications

Biomedical image processing

Biomedical image processing is an important area where privacy concerns are prohibiting pooling of data to train machine learning models. Federated learning can be used to overcome this problem, but with models and datasets being large, it is important that we seek training strategies that minimizes the number of training rounds.

Example of training a FedML model for classifying types and stages of cancer cells. The dataset and classification task is taken from “A Single-Cell Morphological Dataset of Leukocytes from AML Patients and Non-Malignant Controls (AML-Cytomorphology_LMU) – The Cancer Imaging Archive (TCIA) Public Access – Cancer Imaging Archive Wiki.” n.d. Accessed February 14, 2020. https://wiki.cancerimagingarchive.net/pages/viewpage.action?pageId=61080958 .

In this project we are collaborating with Fredrik Löfman’s group at RaySearch Laboratories on applications of FedML to 3D segmentation problems. This is collaboration is funded via the eSSENCE collaboration on eScience.

Community-driven federated machine learning framework for cloud operators

In this project the aim is accurate predictive modeling of resource usage parameters in a typical data center environments. Given such models, it is possible to throttle resources intelligently based on predicted demands. This can lead to substantial savings in operational cost and greener computing. We have proposed a FedML-based solution to let data center operators easily pool together resource consumption patterns in a privacy preserving setting, and benefit from shared knowledge. We are now investigating the integration of finer-level information including application-level usage patterns. Also, the next step is to leverage the models to solve optimization problems where an operator can specify scenarios such as, how can we optimally allocate resources so that all service level agreements are met, but our electricity bill does not exceed X units?

Power and temperature data from data center operators (left) and power predictions using a FedML model (right).

In this project we are using data from two academic cloud providers, the SNIC Science Cloud in Sweden, and CSC in Finland.

Recent publications

Morgan Ekmefjord, Desislava Stoyanova, Ola Spjuth, Salman Toor, and Andreas Hellander, FEDn – A scalable framework for federated machine learning (manuscript in preparation)

Prashant Singh, Mona Mohamad Elamin and Salman Toor, Towards Smart e-Infrastructures, A Community Driven Approach Based on Real Datasets, in Proceedings of the IEEE GreenTech Conference, 2020 (accepted)

Felix Morsbach, Hardened Model Aggregation for Federated Learning backed by Distributed Trust Towards decentralizing Federated Learning using a Blockchain, MSc. thesis, 2020.

Meenal Pathak, Mohamed Hussein, Studying Data Distribution Dependencies In Federated Learning, 2020.

Jiaong Liang, Federated Learning for Bioimage Classification, Msc thesis, 2020.

Mona Babikir Abdelhamid Mohamed Elamin, Machine Learning for Cloud: Modeling Cluster Health using Usage Parameters, MSc thesis, 2019.

Copyright © 2023 Integrative Scalable Computing Laboratory.

Powered by PressBook Blog WordPress theme