A paper recently accepted for publication by PLOS Digital Health explores machine-learning based prognostic scores and their validity when applied to demographically different patient cohorts: the ‘external validity’. The publication is the latest accomplishment in CCAIM’s ongoing efforts to push forward in cystic fibrosis research with the Floto lab and van der Schaar lab on the frontlines. Follow the links for their respective research focuses on cystic fibrosis.
The paper is co-authored by Yuchao Qin, PhD student at the van der Schaar lab, Prof Ahmed Alaa, a van der Schaar lab alumnus and now professor at UC Berkeley, and the CCAIM directors Prof Andres Floto and Prof Mihaela van der Schaar. Its creation was supported by the US Cystic Fibrosis Foundation and the UK Cystic Fibrosis Trust, with the help from Dr Janet Allen (Floto lab).
Cystic fibrosis is a genetic disease that affects multiple organs of a patient, potentially including advanced lung damage that necessitates lung transplantation. Due to the scarcity of donor lungs, precise and timely selection of high-risk patients for lung transplant referral is of paramount importance. Machine learning models have been proven to be a viable tool for improving prognostic accuracy. However, the development of trustworthy machine learning models relies on a large volume of data that is difficult to provide given that cystic fibrosis only affects a small sub-population around the world.
Therefore, it would be desirable to be able to use developed machine learning models from a large population for a demographically different cohort. In this paper, our researchers evaluate the external validity, or applicability of a machine learning model with registry data of cystic fibrosis patients from the UK and Canada.
To achieve this, the authors used a unique approach employing the new state-of-the-art AutoML framework AutoPrognosis, developed by the van der Schaar lab and made available as open source earlier this year.

They identified several risk factors and patient subgroups affected by variation across the two countries. FEV1 was discovered to be the most significant risk factor for adverse outcomes in cystic fibrosis patients and its impact on LTx referral was shown as being hugely affected by the cross-population variations in the external validation set from Canada. The appropriate consideration of these variation-associated subgroups helped with the adaptation of machine learning models for a different population.

These valuable insights highlight the importance of external validation of machine learning models for cystic fibrosis outcome diagnostic. For the first time, the authors provide useable guidance for the adaptation of high-precision machine learning models to different populations, and inspire new research on applying modern transfer learning methods for fine-tuning models in highly variable environments.
External Validity of Machine Learning-based Prognostic Scores for Cystic Fibrosis: A Retrospective Study using the UK and Canadian Registries
Yuchao Qin, Ahmed Alaa, Andres Floto, Mihaela van der Schaar
Abstract
Precise and timely referral for lung transplantation is critical for the survival of cystic fibrosis patients with terminal illness. While machine learning (ML) models have been shown to achieve significant improvement in prognostic accuracy over current referral guidelines, the external validity of these models and their resulting referral policies has not been fully investigated.
Here, we studied the external validity of machine learning-based prognostic models using annual follow-up data from the UK and Canadian Cystic Fibrosis Registries. Using a state-of-the-art automated ML framework, we derived a model for predicting poor clinical outcomes in patients enrolled in the UK registry, and conducted external validation of the derived model using the Canadian Cystic Fibrosis Registry. In particular, we studied the effect of (1) natural variations in patient characteristics across populations and (2) differences in clinical practice on the external validity of ML-based prognostic scores.
Overall, decrease in prognostic accuracy on the external validation set (AUCROC: 0.88, 95% CI 0.88-0.88) was observed compared to the internal validation accuracy (AUCROC: 0.91, 95% CI 0.90-0.92). Based on our ML model, analysis on feature contributions and risk strata revealed that, while external validation of ML models exhibited high precision on average, both factors (1) and (2) can undermine the external validity of ML models in patient subgroups with moderate risk for poor outcomes. A significant boost in prognostic power (F1 score) from 0.33 (95% CI 0.31-0.35) to 0.45 (95% CI 0.45-0.45) was observed in external validation when variations in these subgroups were accounted in our model. Our study highlighted the significance of external validation of ML models for cystic fibrosis prognostication.
The uncovered insights on key risk factors and patient subgroups can be used to guide the cross-population adaptation of ML-based models and inspire new research on applying transfer learning methods for fine-tuning ML models to cope with regional variations in clinical care.
For a full list of the Centre’s publications, click here.
In September 2022, the van der Schaar lab dedicated a Revolutionizing Healthcare engagement session to cystic fibrosis. You can find the recorded session in full here.
Learn more about AutoPrognosis and its potential for clinical research here.