A paper recently accepted for publication by PLOS Digital Health explores machine-learning based prognostic scores and their validity when applied to demographically different patient cohorts: the ‘external validity’. The publication is the latest accomplishment in CCAIM’s ongoing efforts to push forward in cystic fibrosis research with the Floto lab and van der Schaar lab on the frontlines. Follow the links for their respective research focuses on cystic fibrosis.
The paper is co-authored by Yuchao Qin, PhD student at the van der Schaar lab, Prof Ahmed Alaa, a van der Schaar lab alumnus and now professor at UC Berkeley, and the CCAIM directors Prof Andres Floto and Prof Mihaela van der Schaar. Its creation was supported by the US Cystic Fibrosis Foundation and the UK Cystic Fibrosis Trust, with the help from Dr Janet Allen (Floto lab).
Cystic fibrosis is a genetic disease that affects multiple organs of a patient, potentially including advanced lung damage that necessitates lung transplantation. Due to the scarcity of donor lungs, precise and timely selection of high-risk patients for lung transplant referral is of paramount importance. Machine learning models have been proven to be a viable tool for improving prognostic accuracy. However, the development of trustworthy machine learning models relies on a large volume of data that is difficult to provide given that cystic fibrosis only affects a small sub-population around the world.
Therefore, it would be desirable to be able to use developed machine learning models from a large population for a demographically different cohort. In this paper, our researchers evaluate the external validity, or applicability of a machine learning model with registry data of cystic fibrosis patients from the UK and Canada.
To achieve this, the authors used a unique approach employing the new state-of-the-art AutoML framework AutoPrognosis, developed by the van der Schaar lab and made available as open source earlier this year.
They identified several risk factors and patient subgroups affected by variation across the two countries. FEV1 was discovered to be the most significant risk factor for adverse outcomes in cystic fibrosis patients and its impact on LTx referral was shown as being hugely affected by the cross-population variations in the external validation set from Canada. The appropriate consideration of these variation-associated subgroups helped with the adaptation of machine learning models for a different population.
These valuable insights highlight the importance of external validation of machine learning models for cystic fibrosis outcome diagnostic. For the first time, the authors provide useable guidance for the adaptation of high-precision machine learning models to different populations, and inspire new research on applying modern transfer learning methods for fine-tuning models in highly variable environments.
For a full list of the Centre’s publications, click here.
In September 2022, the van der Schaar lab dedicated a Revolutionizing Healthcare engagement session to cystic fibrosis. You can find the recorded session in full here.
Learn more about AutoPrognosis and its potential for clinical research here.