Research Themes
The Hub has five initial research themes.
-
A major obstacle to a more widespread use of probabilistic AI is the additional challenge of sampling from, or approximating posterior distributions. In recent years there have been substantial advances in efficient sampling. Such methods are important for obtaining probabilistic measures of uncertainty within Bayesian AI methods, for averaging over uncertainty within models that have high-dimensional latent structure, or for allowing propagation of uncertainty as we fuse AI models.
It is still unclear to what extent these methods will scale to the massive models used in some applications of AI, such as large language models. However recent breakthroughs in generative models, show the possibility of sampling from complex high-dimensional distributions, and transferring ideas that underpin these methods to other sampling problems holds substantial promise. Recent advances in Bayesian statistics could simplify Bayesian AI methods by replacing traditional sampling methods with recursive simulation from a predictive model; and it is possible to relate sampling, posterior approximations and optimisation algorithms. Work under this theme would look to build on these recent advances with the aim of developing general-purpose and scalable MCMC or other sampling methods.
-
Important insights about AI models have been obtained by considering their limiting behaviour as we let their features, such as layer width or numbers of layers, go to infinity. Perhaps the earliest work in this area is that of Neal who showed that infinitely-wide one-layer neural networks (NNs) converges to a Gaussian process. More recently by viewing deep NNs as dynamic processes obtained behaviour that can be either deterministic or a diffusion. These limiting behaviours are of practical importance. They show how to initialise a NN based on the intuition that the initialisation should correspond to a regime with non-degenerate limiting behaviour; and scaling results allow hyperparameters to be tuned on small models that work well on large models, leading to savings of millions of dollars in training large language models. There is considerable scope for further improving our understanding of AI models with a thorough study of such limiting behaviour in terms of probabilistic and dynamical models.
Our research in this theme will have the common theme of developing understanding in the infinite-dimensional, limiting setting, and then moving to practical algorithms through finite-dimensional discretisations. Such an approach is important for developing NN models of stochastic processes on continuous domains, such as models of weather or fluid flow. NNs offer the promise of flexibly capturing these underlying stochastic processes, but existing approaches, such as neural processes have a fundamental flaw: whilst they can approximate finite-dimensional marginals, they generally produce distributions that are not consistent with each other.
-
Diffusion generative models have demonstrated remarkable performance for approximating the distribution of complex structured data like images, text and molecules. However there currently lacks a deep mathematical understanding of why these approaches are so effective, how they should be best implemented, and how they can be extended to new application domains or inferential tasks. For example, these models incorporate two approximations: First, they approximate the population data distribution by a sample of data points, and second they approximately simulate the time-reversed version of a diffusion that adds noise to these data points. Interestingly, if we did the second stage exactly, the methods would be useless as each simulation of the time-reversed diffusion would produce one of the data points. Thus the success of these methods is due interactions between these two approximations.
Methods based on diffusion generative models have natural links to annealing ideas for score-based models, with a noising procedure as a mechanism for flattening the distribution we wish to sample from, and can be shown to relate to algorithms for finding manifold structure of data. Furthermore, there is scope for ideas from MCMC, Bayesian non-parametrics and Numerical Analysis to impact the design of algorithms: From considering alternative stochastic processes to diffusions and understanding theoretically what impact this choice has, to extending the scope of the models to new data structures (e.g. functional or spatial data such as weather); and from the impact of different assumptions when approximating the score functions used in backward simulation, to how best to control the numerical error of simulating the resulting time-reversed diffusion. There are also unsolved challenges in how to perform conditional and constrained sampling for these models.
-
Current trends in AI have been to improve performance through a `brute-force' approach of fitting models to increasingly large data sets. For example, in natural language processing state-of-the-art performance has improved over the past 5 years due, in part, to using up to 300 times as much data to train models. Whilst this approach may be appropriate for applications where data is plentiful and structured models don’t exist, this is not the case for many applications in Engineering and the Natural Sciences. Here, models are built by exploiting domain-specific knowledge to produce equations whose solutions have properties such as positivity, symmetry, smoothness, invariances, bifurcations etc. In other applications, there may be a domain-specific structure that forces data to lie on a low-dimensional manifold. Indeed, the Manifold Hypothesis postulates that one of the reasons that many ML algorithms work well is that they are able to find and encode the true low-dimensional manifold structures within high-dimensional data.
AI methods are becoming increasingly popular as data-driven approaches to building surrogates of physical models, or as models for complex data. While they offer more flexibility in how data can be incorporated into the approximation process, standard approaches ignore structural information. For example, in fluid flow modelling, we often require velocity approximations to be mass-conserving. In structural engineering problems, approximations may need to have specific spatial structures due to geometric constraints. While there has been some research that attempts to incorporate this information into an AI model, such as physics-informed neural networks these have had limited success in non-linear problems compared to standard methods. Finding novel ways to fuse structural and other domain-specific information, where it exists, with data holds the promise to not only produce more parsimonious AI models that are easier to fit but also more robust approximations that have smaller generalisation error. Surrogate models based on Gaussian processes offer alternatives to NNs with the benefits of interpretability, uncertainty quantification, and controlled extrapolation outside the training data. Across these topics, tools classically used in Numerical Analysis, such as enforcing the PDE constraint weakly with a variational formulation, and linearisation of non-linear operators, have the potential to be transformative.
-
There is an increasing awareness of carbon footprint of AI. Whilst much of this is the cost of fitting foundational models, the cost of running these models can also be substantial. Similar issues arise in domains where there is interest in solving larger-scale physical models -- where the computational cost of one solution can be substantial. Here AI emulators of the physical model can lead to substantial efficiency gains, allowing for reduced computational cost or the ability to run simulations with higher resolution or generate ensembles of solutions with more members. However, they need to come with uncertainty estimates to be used in a trustworthy manner.
For example, the Met Office is looking at using AI methods to develop emulators for one-step prediction of large-scale weather models. These are fit to expensive solutions of PDE models for the atmosphere, and then can be applied recursively, in an auto-regressive manner, to give cheaper multi-step predictions of the weather. Similarly, the evaluation of designs for new buildings traditionally involves solving complex PDE models, but these are being replaced by cheaper emulators that can enable more designs to be considered, resulting in more environmentally friendly developments. However, for such methods to be used reliably requires that the emulators report a measure of uncertainty -- so the user can know when the output can be trusted. This is particularly relevant in multi-fidelity settings, where there is a choice of running different models with different levels of complexity and hence accuracy.
Some traditional approaches to emulation based on Gaussian processes naturally give a measure of uncertainty. However, novel approaches are required to scale the methodology to high-dimensional inputs. Tools that have recently transformed high-dimensional approximation theory, including tensor methods and anisotropic sparse grids and lattice rules, can be incorporated into the Gaussian process prior to beat the curse of dimensionality.