
The Cambridge Centre for AI in Medicine will be represented at the Thirty-ninth International Conference on Machine Learning (ICML 2022) with twelve papers accepted for publication. Of these papers, eleven have been chosen by the organisers to be Spotlight Presentations. In addition, the van der Schaar Lab will have an oral presentation of their Neural Laplace paper.
ICML, running from 17 – 23 July, is globally renowned for presenting and publishing cutting-edge research on all aspects of machine learning used in a wide array of fields like artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, and robotics.
Along with NeurIPS and ICLR, ICML is one of the 3 primary conferences of high impact in machine learning and artificial intelligence research.
The Cambridge Centre for AI in Medicine and its partners at AstraZeneca and GSK are at the forefront of research in machine learning. From better understanding human decision-making to individualised treatment effects and machine learning interpretability, our team – made up of expert minds from the Cambridge Machine Learning Group (José Miguel Hernández-Lobato), University of Cambridge Department of Computer Science and Technology (Pietro Liò), and the van der Schaar Lab (Mihaela van der Schaar) – is working on revolutionary technologies that will shape the future of healthcare.
Collectively, these papers touch on some of the most important areas within the Centre’s extensive research agenda, including discovery using machine learning, and self- and semi-supervised learning, machine learning interpretability, synthetic data, data-centric AI, AutoML, data imputation, individualised treatment effects, and, last but not least, augmenting human skills using machine learning.
Here are the highlights of the CCAIM contributions to ICML 2022:
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning
Javier Antorán, David Janz, James Allingham, Erik Daxberger, Riccardo Barbano, Eric Nalisnick, Jose Miguel Hernandez-Lobato
Spotlight Presentation Thursday – 21 July – 16:50 PM – 16:55 PM BST
Abstract
The linearised Laplace method for estimating uncertainty has received renewed attention in the Bayesian deep learning community. The method provides reliable error-bars and admits a closed form expression for the model evidence, allowing for scalable selection of model hyperparameters.
In this work, we examine the assumptions behind this method, particularly in conjunction with model selection. We show that these interact poorly with some now-standard features of deep learning – stochastic approximation methods and normalisation layers – and make recommendations for how to better adapt this classic method to the modern setting.
We provide theoretical support of our recommendations and validate them empirically on MLPs, classic CNNs, residual networks with and without normalisation layers, generative autoencoders and transformers.
Why is this important?
This research shows how to improve the Laplace approximation to work better in modern deep neural networks, allowing us to have better uncertainty estimates.
These uncertainty estimates can have important applications, for example in healthcare/medicine/biology.
3D Infomax improves GNNs for Molecular Property Prediction
Hannes Stärk, Dominique Beaini, Gabriele Corso, Prudencio Tossou, Christian Dallago, Stephan Günnemann, Pietro Lió
Spotlight Presentation Wednesday – 20 July – 18:55 PM – 19:00 PM BST
Abstract
Molecular property prediction is one of the fastest-growing applications of deep learning with critical real-world impacts. Although the 3D molecular graph structure is necessary for models to achieve strong performance on many tasks, it is infeasible to obtain 3D structures at the scale required by many real-world applications.
To tackle this issue, we propose to use existing 3D molecular datasets to pre-train a model to reason about the geometry of molecules given only their 2D molecular graphs. Our method, called 3D Infomax, maximizes the mutual information between learned 3D summary vectors and the representations of a graph neural network (GNN). During fine-tuning on molecules with unknown geometry, the GNN is still able to produce implicit 3D information and uses it for downstream tasks.
We show that 3D Infomax provides significant improvements for a wide range of properties, including a 22% average MAE reduction on QM9 quantum mechanical properties. Moreover, the learned representations can be effectively transferred between datasets in different molecular spaces.
Why is this important?
TBC
Neural Laplace: Learning diverse classes of differential equations in the Laplace domain
Samuel Holt, Zhaozhi Qian, Mihaela van der Schaar
Oral Presentation Tuesday – 19 July – 09:25 PM – 09:45 PM BST
Abstract
Neural Ordinary Differential Equations model dynamical systems with ODEs learned by neural networks. However, ODEs are fundamentally inadequate to model systems with long-range dependencies or discontinuities, which are common in engineering and biological systems.
Broader classes of differential equations (DE) have been proposed as remedies, including delay differential equations and integro-differential equations. Furthermore, Neural ODE suffers from numerical instability when modelling stiff ODEs and ODEs with piecewise forcing functions. In this work, we propose Neural Laplace, a unified framework for learning diverse classes of DEs including all the aforementioned ones. Instead of modelling the dynamics in the time domain, we model it in the Laplace domain, where the history-dependencies and discontinuities in time can be represented as summations of complex exponentials. To make learning more efficient, we use the geometrical stereographic map of a Riemann sphere to induce more smoothness in the Laplace domain.
In the experiments, Neural Laplace shows superior performance in modelling and extrapolating the trajectories of diverse classes of DEs, including the ones with complex history dependency and abrupt changes.
Why is this important?
Neural Laplace goes beyond Neural ODE and provides a unified framework for learning diverse classes of differential equations including ODE, delay DE, integro DE and
more. Instead of modeling the dynamics in the time domain, it models the system in the Laplace domain, where the history-dependencies and discontinuities in time can be
represented as summations of complex exponentials.
Learning differential equations that govern dynamical systems is of great practical interest in the natural and social sciences. Experimentally Neural Laplace shows superior performance in modeling and extrapolating the trajectories of diverse classes of DEs, including ones with complex history dependency and abrupt changes.
The other papers accepted at ICML 2022 include:
Label-Free Explainability for Unsupervised Models
Jonathan Crabbé, Mihaela van der Schaar
Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box’s output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem.
To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time.
We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing feature and example importance methods.
We illustrate the utility of our label-free explainability paradigm through a qualitative and quantitative comparison of representation spaces learned by various autoencoders trained on distinct unsupervised tasks.
Inverse Contextual Bandits: Learning How Behavior Evolves over Time
Alihan Hüyük, Daniel Jarrett, Mihaela van der Schaar
Understanding a decision-maker’s priorities by observing their behavior is critical for transparency and accountability in decision processes— such as in healthcare. Though conventional approaches to policy learning almost invariably assume stationarity in behavior, this is hardly true in practice: Medical practice is constantly evolving as clinical professionals fine-tune their knowledge over time.
For instance, as the medical community’s understanding of organ transplantations has progressed over the years, a pertinent question is: How have actual organ allocation policies been evolving? To give an answer, we desire a policy learning method that provides interpretable representations of decision-making, in particular capturing an agent’s non-stationary knowledge of the world, as well as operating in an offline manner.
First, we model the evolving behavior of decision-makers in terms of contextual bandits, and formalize the problem of Inverse Contextual Bandits (ICB).
Second, we propose two concrete algorithms as solutions, learning parametric and nonparametric representations of an agent’s behavior.
Finally, using both real and simulated data for liver transplantations, we illustrate the applicability and explainability of our method, as well as benchmarking and validating its accuracy.
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection
Bogdan Cebere, Daniel Jarrett, Tennison Liu, Alicia Curth, Mihaela van der Schaar
Consider the problem of imputing missing values in a dataset. One the one hand, conventional approaches using iterative imputation benefit from the simplicity and customizability of learning conditional distributions directly, but suffer from the practical requirement for appropriate model specification of each and every variable. On the other hand, recent methods using deep generative modelling benefit from the capacity and efficiency of learning with neural network function approximators, but are often difficult to optimize and rely on stronger data assumptions.
In this work, we study an approach that marries the advantages of both: We propose HyperImpute, a generalized iterative imputation framework for adaptively and automatically configuring column-wise models and their hyperparameters. Practically, we provide a concrete implementation with out-of-the-box learners, optimizers, simulators, and extensible interfaces.
Empirically, we investigate this framework via comprehensive experiments and sensitivities on a variety of public datasets, and demonstrate its ability to generate accurate imputations relative to a strong suite of benchmarks. Contrary to recent work, we believe our findings constitute a strong defense of the iterative imputation paradigm.
How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models
Ahmed Alaa, Boris van Breugel, Evgeny Saveliev, Mihaela van der Schaar
Devising domain- and model-agnostic evaluation metrics for generative models is an important and as yet unresolved problem. Most existing metrics, which were tailored solely to the image synthesis application, exhibit a limited capacity for diagnosing modes of failure of generative models across broader application domains. In this paper, we introduce a 3-dimensional metric, (-Precision, -Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a wide variety of application domains.
Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity. We introduce generalization as an additional dimension for model performance that quantifies the extent to which a model copies training data—a crucial performance indicator when modeling sensitive and private data.
The three metric components are interpretable probabilistic quantities, and can be estimated via sample-level binary classification. The sample-level nature of our metric inspires a novel use case which we call model auditing, wherein we judge the quality of individual samples generated by a (black-box) model, discarding low-quality samples and hence improving the overall model performance in a post-hoc manner.
Data-SUITE: Data-centric identification of in-distribution incongruous examples
Nabeel Seedat, Jonathan Crabbé, Mihaela van der Schaar
Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet equally important problem of characterizing incongruous regions of in-distribution (ID) data, which may arise from feature space heterogeneity. To this end, we propose a paradigm shift with Data-SUITE: a datacentric framework to identify these regions, independent of a task-specific model.
DATA-SUITE leverages copula modeling, representation learning, and conformal prediction to build featurewise confidence interval estimators based on a set of training instances. These estimators can be used to evaluate the congruence of test instances with respect to the training set, to answer two practically useful questions: (1) which test instances will be reliably predicted by a model trained with the training instances? and (2) can we identify incongruous regions of the feature space so that data owners understand the data’s limitations or guide future data collection?
We empirically validate Data-SUITE’s performance and coverage guarantees and demonstrate on cross-site medical data, biased data, and data with concept drift, that Data-SUITE best identifies ID regions where a downstream model may be reliable (independent of said model). We also illustrate how these identified regions can provide insights into datasets and highlight their limitations.
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations
Nabeel Seedat, Fergus Imrie, Alexis Bellot, Zhaozhi Qian, Mihaela van der Schaar
Systematic quantification of data quality is critical for consistent model performance. Prior works have focused on out-of-distribution data. Instead, we tackle an understudied yet eEstimating counterfactual outcomes over time has the potential to unlock personalized healthcare by assisting decision-makers to answer “what-if” questions. Existing causal inference approaches typically consider regular, discrete-time intervals between observations and treatment decisions and hence are unable to naturally model irregularly sampled data, which is the common setting in practice.
To handle arbitrary observation patterns, we interpret the data as samples from an underlying continuous-time process and propose to model its latent trajectory explicitly using the mathematics of controlled differential equations. This leads to a new approach, the Treatment Effect Neural Controlled Differential Equation (TE-CDE), that allows the potential outcomes to be evaluated at any time point. In addition, adversarial training is used to adjust for time-dependent confounding which is critical in longitudinal settings and is an added challenge not encountered in conventional time-series.
To assess solutions to this problem, we propose a controllable simulation environment based on a model of tumor growth for a range of scenarios with irregular sampling reflective of a variety of clinical scenarios. TE-CDE consistently outperforms existing approaches in all simulated scenarios with irregular sampling.
Attentional meta-learners for few-shot polythetic classification
Ben Day, Ramon Viñas Torné, Nikola Simidjievski, Pietro Lió
Polythetic classifications, based on shared patterns of features that need neither be universal nor constant among members of a class, are common in the natural world and greatly outnumber monothetic classifications over a set of features.
We show that threshold meta-learners, such as Prototypical Networks, require an embedding dimension that is exponential in the number of features to emulate these functions. In contrast, attentional classifiers, such as Matching Networks, are polythetic by default and able to solve these problems with a linear embedding dimension. However, we find that in the presence of task-irrelevant features, inherent to meta-learning problems, attentional models are susceptible to misclassification. To address this challenge, we propose a self-attention feature-selection mechanism that adaptively dilutes non-discriminative features.
We demonstrate the effectiveness of our approach in meta-learning Boolean functions, and synthetic and real-world few-shot learning tasks.
Fast Relative Entropy Coding with A* coding
Gergely Flamich, Stratis Markou, Jose Miguel Hernandez-Lobato
Relative entropy coding (REC) algorithms encode a sample from a target distribution Q using a proposal distribution P, such that the expected codelength is O(KL[Q || P]). REC can be seamlessly integrated with existing learned compression models since, unlike entropy coding, it does not assume discrete Q or P, and does not require quantisation. However, general REC algorithms require an intractable Ω(exp(KL[Q || P])) runtime. We introduce AS* and AD* coding, two REC algorithms based on A* sampling. We prove that, for continuous distributions over the reals, if the density ratio is unimodal, AS* has O(D∞[Q || P]) expected runtime, where D∞[Q || P] is the Renyi ∞-divergence. We provide experimental evidence that AD* also has O(D∞[Q || P]) expected runtime. We prove that AS* and AD* achieve an expected codelength of O(KL[Q || P]). Further, we introduce DAD, an approximate algorithm based on AD which retains its favourable runtime and has bias similar to that of alternative methods. Focusing on VAEs, we propose the IsoKL VAE (IKVAE), which can be used with DAD* to further improve compression efficiency. We evaluate A* coding with (IK)VAEs on MNIST, showing that it can losslessly compress images near the theoretically optimal limit.
Action-Sufficient State Representation Learning for Control with Structural Constraints
Biwei Huang, Chaochao Lu, Liu Leqi, Jose Miguel Hernandez-Lobato, Clark Glymour, Bernhard Schölkopf, Kun Zhang
Perceived signals in real-world scenarios are usually high-dimensional and noisy, and finding and using their representation that contains essential and sufficient information required by downstream decision-making tasks will help improve computational efficiency and generalization ability in the tasks.
In this paper, we focus on partially observable environments and propose to learn a minimal set of state representations that capture sufficient information for decision-making, termed Action-Sufficient state Representations (ASRs). We build a generative environment model for the structural relationships among variables in the system and present a principled way to characterize ASRs based on structural constraints and the goal of maximizing cumulative reward in policy learning. We then develop a structured sequential Variational Auto-Encoder to estimate the environment model and extract ASRs.
Our empirical results on CarRacing and VizDoom demonstrate a clear advantage of learning and using ASRs for policy learning.
Moreover, the estimated environment model and ASRs allow learning behaviors from imagined outcomes in the compact latent space to improve sample efficiency.