CCAIM Faculty members will present seven papers at the 10th annual International Conference on Learning Representations (ICLR 2022), the premier conference dedicated to the advancement of deep learning.
Taking place virtually this year, ICLR 2022 will run for one week from 25 – 29 April. The conference is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.1
The Cambridge Centre for AI in Medicine (CCAIM) is at the forefront of research in deep learning and will be presenting a total of seven papers at this year’s conference. The papers – from Prof. Mihaela van der Schaar and Prof. Jose Miguel Hernández-Lobato – cover a number of the Centre’s key research areas, including interpretability, causality, and time-series data, among others. The Centre’s novel deep learning solutions for clinical decision-making perform competitively with, and often outperform, state of the art approaches.
One of the Centre’s papers – “Neural graphical modelling in continuous-time: consistency guarantees and algorithms” – was co-authored with Dr. Kim Branson of GSK, one of the Centre’s core industry partners. The paper considers score-based graph learning for the study of dynamical systems. It proposes a score-based learning algorithm based on penalized Neural Ordinary Differential Equations (modelling the mean process), showing it to be applicable to the general setting of irregularly-sampled multivariate time series and outperforming the state of the art across a range of dynamical systems.
1 Source: https://iclr.cc/
Further details (abstracts, authors, and links) about the papers that have been accepted at ICLR 2022:
1. D-CODE: Discovering Closed-form ODEs from Observed Trajectories
Zhaozhi Qian, Krzysztof Kacprzyk, Mihaela van der Schaar
Abstract: For centuries, scientists have manually designed closed-form ordinary differential equations (ODEs) to model dynamical systems. An automated tool to distill closed-form ODEs from observed trajectories would accelerate the modeling process.
Traditionally, symbolic regression is used to uncover a closed-form prediction function with label-feature pairs as training examples. However, an ODE models the time derivative of a dynamical system and the “label” is usually not observed. The existing ways to bridge this gap only perform well for a narrow range of settings with low measurement noise and frequent sampling.
In this work, we propose the Discovery of Closed-form ODE framework (D-CODE), which advances symbolic regression beyond the paradigm of supervised learning. D-CODE uses a novel objective function based on the variational formulation of ODEs to bypass the unobserved time derivative. For formal justification, we prove that this objective is a valid proxy for the estimation error of the true (but unknown) ODE.
In the experiments, D-CODE successfully discovered the governing equations of a diverse range of dynamical systems under challenging measurement settings with high noise and infrequent sampling.
2. Self-Supervision Enhanced Feature Selection with Correlated Gates
Changhee Lee, Fergus Imrie, Mihaela van der Schaar
Abstract: Discovering relevant input features for predicting a target variable is a key scientific question. However, in many domains, such as medicine and biology, feature selection is confounded by a scarcity of labeled samples coupled with significant correlations among features.
In this paper, we propose a novel deep learning approach to feature selection that addresses both challenges simultaneously. First, we pre-train the network using unlabeled samples within a self-supervised learning framework via solving pretext tasks that require the network to learn informative representations from partial feature sets. Then, we fine-tune the pre-trained network to discover relevant features using labeled samples. During both training phases, we explicitly account for the correlation structure of the input features by generating correlated gate vectors from a multivariate Bernoulli distribution.
Experiments on multiple real-world datasets including clinical and omics demonstrate that our model discovers relevant features that provide superior prediction performance compared to the state-of-the-art benchmarks, especially in practical scenarios where there is often limited labeled data and high correlations among features.
3. Inverse Online Learning: Understanding Non-Stationary and Reactionary Policies
Alex Chan, Alicia Curth, Mihaela van der Schaar
Abstract: Human decision making is well known to be imperfect and the ability to analyse such processes individually is crucial when attempting to aid or improve a decision-maker’s ability to perform a task, e.g. to alert them to potential biases or oversights on their part.
To do so, it is necessary to develop interpretable representations of how agents make decisions and how this process changes over time as the agent learns online in reaction to the accrued experience. To then understand the decision-making processes underlying a set of observed trajectories, we cast the policy inference problem as the inverse to this online learning problem. By interpreting actions within a potential outcomes framework, we introduce a meaningful mapping based on agents choosing an action they believe to have the greatest treatment effect.
We introduce a practical algorithm for retrospectively estimating such perceived effects, alongside the process through which agents update them, using a novel architecture built upon an expressive family of deep state-space models.
Through application to the analysis of UNOS organ donation acceptance decisions, we demonstrate that our approach can bring valuable insights into the factors that govern decision processes and how they change over time.
4. Neural graphical modelling in continuous-time: consistency guarantees and algorithms
Alexis Bellot, Kim Branson, Mihaela van der Schaar
Abstract: The discovery of structure from time series data is a key problem in fields of study working with complex systems. Most identifiability results and learning algorithms assume the underlying dynamics to be discrete in time. Comparatively few, in contrast, explicitly define dependencies in infinitesimal intervals of time, independently of the scale of observation and of the regularity of sampling.
In this paper, we consider score-based structure learning for the study of dynamical systems. We prove that for vector fields parameterized in a large class of neural networks, least squares optimization with adaptive regularization schemes consistently recovers directed graphs of local independencies in systems of stochastic differential equations.
Using this insight, we propose a score-based learning algorithm based on penalized Neural Ordinary Differential Equations (modelling the mean process) that we show to be applicable to the general setting of irregularly-sampled multivariate time series and to outperform the state of the art across a range of dynamical systems.
5. POETREE: Interpretable Policy Learning with Adaptive Decision Trees
Alizée Pace, Alex Chan, Mihaela van der Schaar
Abstract: Building models of human decision-making from observed behaviour is critical to better understand, diagnose and support real-world policies such as clinical care. As established policy learning approaches remain focused on imitation performance, they fall short of explaining the demonstrated decision-making process.
Policy Extraction through decision Trees (POETREE) is a novel framework for interpretable policy learning, compatible with fully-offline and partially-observable clinical decision environments — and builds probabilistic tree policies determining physician actions based on patients’ observations and medical history.
Fully-differentiable tree architectures are grown incrementally during optimization to adapt their complexity to the modelling task, and learn a representation of patient history through recurrence, resulting in decision tree policies that adapt over time with patient information.
This policy learning method outperforms the state-of-the-art on real and synthetic medical datasets, both in terms of understanding, quantifying and evaluating observed behaviour as well as in accurately replicating it — with potential to improve future decision support systems.
6. Invariant Causal Representation Learning for Out-of-Distribution Generalization
Abstract: Due to spurious correlations, machine learning systems often fail to generalize to environments whose distributions differ from the ones used at training time. Prior work addressing this, either explicitly or implicitly, attempted to find a data representation that has an invariant relationship with the target. This is done by leveraging a diverse set of training environments to reduce the effect of spurious features and build an invariant predictor. However, these methods have generalization guarantees only when both data representation and classifiers come from a linear model class.
We propose invariant Causal Representation Learning (iCaRL), an approach that enables out-of-distribution (OOD) generalization in the nonlinear setting (i.e., nonlinear representations and nonlinear classifiers). It builds upon a practical and general assumption: the prior over the data representation (i.e., a set of latent variables encoding the data) given the target and the environment belongs to general exponential family distributions, i.e., a more flexible conditionally non-factorized prior that can actually capture complicated dependences between the latent variables. Based on this, we show that it is possible to identify the data representation up to simple transformations. We also prove that all direct causes of the target can be fully discovered, which further enables us to obtain generalization guarantees in the nonlinear setting. Extensive experiments on both synthetic and real-world datasets show that our approach outperforms a variety of baseline methods.
7. Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation
Abstract: Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts.
We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.