There are various approaches that allow machine learning models to be trained across several data sources without disclosing them. We have identified multi-party computation and federated machine learning as the most promising candidates for privacy preserving training. Additionally, we also take into consideration the protection of the trained model (Differential Privacy).
In order to gain expertise, we are running hands-on analysis and experiments where we focus on a real-life scenario (unbalanced, non-IID data):
- Secure multi-party computation for linear models
- Federated training of tree-based models (Gradient Boosted and CART decision trees)
- Federated training of neuronal networks (parameter server approach)