Data Science Research Group: EXTREMUM

EXTREMUM

Ethical Machine Learning for Knowledge Discovery from Medical Data Sources (EXTREMUM)

This project is a continuation of the EXTREME-Pilot project.

Project leader - main PI:

Panagiotis Papapetrou, Professor, DSV, SU

Co-PIs

Lars Asker, Associate Professor, DSV, SU
Cristian Rojas, Associate Professor, KTH
Rami Mochaourab, Researcher, RISE
Stanley Greenstein, Senior Lecturer, Dept. of Law, SU

Researchers

Ioanna Miliou, Senior Lecturer, DSV, SU
Isak Samsten, Senior Lecturer, DSV, SU
Ioannis Pavlopoulos, Affiliated Researcher, DSV, SU (2019-2021)
Sugandh Sinha, RISE
Zhendong Wang, PhD student, DSV, SU
Vanessa Lislevand, Research Assistant, DSV, SU (2022)
Vasiliki Kougia, Research Assistant, DSV, SU (2021)

Project period: 2020-01-01 to 2024-05-31

Funding source: Digital Futures

Digital Futures Website: EXTREMUM: Explainable and Ethical Machine Learning for Knowledge Discovery from Medical Data Sources

Budget: 8.4M SEK

Description

This is a continuation of the EXTREME pilot project, which ran in 2019 and 2020 for 3.85M SEK.

This project intends to build a novel data management and analytics framework, focusing on three pillars: (1) data integration and federated learning, (2) explainable machine learning, and (3) legal and ethical integrity of predictive models. The final product will be a set of methods and tools for integrating massive and heterogeneous medical data sources in a federated manner, a set of predictive models for learning from these data sources, with emphasis on interpretability and explainability of the models rationale for the predictions, while focusing on maintaining ethical integrity and fairness in the underlying decision making mechanisms that govern machine learning. The project will focus on two critical application areas: adverse drug event detection and heart failure treatment. The project is a collaborative effort between four research institutions: the department of Computer and Systems Sciences at Stockholm University, the Department of Law at Stockholm University, RISE Research Institute Sweden, and KTH.

Objective 1: Unified data representation and integration. We will define novel unifying space representations, similarity measures, and methods for searching and indexing large and complex data spaces. The basic challenge is the temporal nature of the data spaces and the inherent temporal dependencies that may exist within the same and across different data sources in these spaces. Particular emphasis will be given on providing theoretical guarantees on the performance of the proposed indexing techniques in terms of retrieval accuracy and efficiency.
Objective 2: Explainable predictive models. We will develop novel predictive modeling mechanisms for combining and enhancing the aggregate knowledge from heterogeneous data sources, with particular emphasis on the temporal properties of the data. The main challenge will be how to extract and fuse meaningful static and temporal features from multiple data sources, with focus on sequential and temporal data. The constructed models will be interpretable to the domain experts by employing explainable features and rules.
Objective 3: Legal and ethical implications of machine learning models. We will focus on the legal and ethical risks, implications, and potential harms resulting from the development and use of predictive modelling in relation to the analysis of healthcare data. To this end, we will embed existing predictive modelling schemes with legal and ethical considerations, thereby making them more accessible to regulatory and policy demands.

The implementation of the project is organized in five implementation WPs, one for each of the three objectives (WP1, WP2, WP3), one for validation on real data sources (WP4), and one for dissemination and exploitation (WP5). The project coordination (WP6) is done by SU-DSV.

WP1: unifying representation of complex data spaces
WP2: explainable machine learning models
WP3: adherence to legal and ethical frameworks
WP4: validation in real data domains
WP5: dissemination and exploitation
WP6: project management

Implementation

wildboar - explainable machine learning library for time series in Python.

Ethical Machine Learning for Knowledge Discovery from Medical Data Sources (EXTREMUM)

Description

Implementation

People

Panagiotis Papapetrou, Professor

Isak Samsten, Senior Lecturer

Ioanna Miliou, Senior Lecturer

Zhendong Wang