COCA - N-body simulations in an emulated frame of reference

Overview of the problem

N-body simulations, widely used for studying complex systems such as gravitational structure formation, are computationally intensive. To address this, machine learning (ML)-based emulators have been developed to speed up the process. While they offer significant gains in efficiency, these emulators face issues with accuracy and trustworthiness, as they can introduce substantial errors that current methods are unable to correct.

One of the main challenges in using ML for N-body simulations is the inability to reliably detect or correct emulation errors in real-world scenarios, where no ground-truth data is available for comparison. Although perfect accuracy may not always be necessary – given that traditional simulations also make simplifying assumptions – the lack of interpretability in many ML models, especially deep neural networks, makes it difficult to assess the reliability of the results.

To address these concerns, we propose a framework that enhances the trustworthiness of ML emulators by incorporating physical corrections for emulation errors. Rather than aiming for perfect ML predictions, our approach focuses on refining the approximate solutions provided by ML. This allows users to control the trade-off between speed and accuracy by adjusting the number and timing of force evaluations in the simulation. With more evaluations, the system asymptotically converges to the true physical solution, ensuring a balance between efficiency and reliability.

Assessing COCA performance Slice of the final ($a=1$) matter density field for a reference simulation (first column) compared to the corresponding COCA simulation using 10 force evaluations.

Solve in an emulated reference frame

In N-body simulations, the Lagrangian displacement field, $\boldsymbol{\Psi}$, tracks the motion of particles as they evolve from their initial positions to their final states under gravitational forces. N-body simulations determine this displacement as a function of time for a large number of particles, allowing us to simulate the large-scale structure of the Universe and the non-linear dynamics that govern it. It is precisely this field which ML emulators try to predict, yet they often struggle in highly non-linear, collapsed objects. Given that these are some of the most interesting regions of the Universe, we want to introduce a way of correcting for these mistakes.

In the COCA framework, we decompose the Lagrangian displacement field into three components:

\(\boldsymbol{\Psi}(\boldsymbol{q},a) \equiv \boldsymbol{\Psi}_\mathrm{LPT}(\boldsymbol{q},a) + \boldsymbol{\Psi}_\mathrm{ML}(\boldsymbol{q},a) + \boldsymbol{\Psi}_\mathrm{res}(\boldsymbol{q},a).\)

These are

  1. The first contribution comes from the analytic predictions of Lagrangian Perturbation Theory (LPT), denoted as $\boldsymbol{\Psi}_\mathrm{LPT}$. This component provides an accurate description of the displacement on large scales and at early times, where the system is still relatively linear.

  2. The second contribution is $\boldsymbol{\Psi}_\mathrm{ML}$, which represents the machine-learning correction. This improves the model’s accuracy on smaller scales, where non-linearities become significant.

  3. Finally, $\boldsymbol{\Psi}_\mathrm{res}$ captures the residual, or emulation error, which arises because the emulator does not perfectly reproduce the true displacement field.

To correct for this residual error, we solve the true equation of motion in an emulated frame. This process allows us to account for the emulation errors by calculating the residual displacement as:

\(\partial_a^2 \boldsymbol{\Psi}_\mathrm{res}(\boldsymbol{q},a) = -\boldsymbol{\nabla} \Phi(\boldsymbol{x},a) - \partial_a^2 \boldsymbol{\Psi}_\mathrm{LPT}(\boldsymbol{q},a) - \partial_a^2 \boldsymbol{\Psi}_\mathrm{ML}(\boldsymbol{q},a).\)

Here, the first term represents the gravitational force, while the second and third terms correspond to fictitious forces arising from the LPT and machine-learning predictions, respectively. This is equivalent to solving for the (not necessarily small) perturbation around the ML prediction.

This approach is similar to the COmoving Lagrangian Acceleration (COLA) method, where the equation of motion is solved in the LPT frame. However, by using the more accurate machine-learning frame, fewer force evaluations are required compared to the COLA scheme, allowing for faster simulations without sacrificing accuracy.

COCA_formalism COCA formalism for cosmological simulations. One solves for the residual between the true trajectory and the emulated one.

Styled V-net to predict frame from initial conditions

The V-net architecture, originally developed for 3D image segmentation tasks, has proven effective for handling complex spatial data. In our case, we adapt this architecture to model the evolution of particle dynamics in N-body simulations. Specifically, we use a V-net to take the initial conditions of the system as input, which is represented by a single channel. The network then predicts the momentum of the frame of reference as output, using three channels to account for the three spatial dimensions.

A key feature of our implementation is the inclusion of a “style” parameter, which allows the network to capture the time dependence of the system’s evolution. This parameter adjusts the model’s predictions based on the particular stage of the simulation, making the V-net more flexible in representing the complex, time-varying dynamics inherent in cosmological simulations.

pres slice Slices of the input, target, output, and error of the frame of reference emulator at the final time step. It is exactly this error which is uncorrectable if you just used the emulator, but can be identified and removed by using COCA.

Force evaluations correct emulation errors

We find that by predicting the optimal frame of reference – where all particles are at rest – using machine learning, COCA effectively corrects for potential emulation errors in particle trajectories during N-body simulations. Our results show that COCA requires significantly fewer force evaluations to achieve a given accuracy compared to the traditional COLA method.

The frame of reference emulator on its own achieves accuracy between 1% and 10%. Remarkably, it only needs eight force evaluations to reduce emulation errors to the percent level, whereas a COLA simulation would require significantly more. This efficiency makes COCA a cost-effective alternative for N-body simulations.

Moreover, with just eight force evaluations, COCA delivers four to five times greater accuracy than a Lagrangian displacement field emulator when both are trained with the same computational resources. This improvement in accuracy is a key advantage of COCA, as it effectively corrects for emulation errors, surpassing the performance of direct emulation methods discussed in previous literature.

Our frame of reference emulator has shown robustness to variations in cosmological parameters, despite being trained on a fixed cosmology. The COCA framework can handle extrapolation errors when applied outside the range of the training simulations. Even with discrepancies between the ML training and the N-body evolution due to different cosmological parameters, COCA achieves percent-level accuracy for final density and velocity fields, demonstrating its effectiveness and flexibility.

Assessing COCA performance Relative performance of COCA versus an emulator of the displacement field $\boldsymbol{\Psi}$. We compute summary statistics for the final ($a=1$) matter density field and compare results at both the training cosmology and a misspecified one. Although directly emulating $\boldsymbol{\Psi}$ produces a more accurate density field than simply emulating the momentum field $\textbf{p}$ (with $n_{\rm f} = 0$), using the COCA framework (emulating the frame of reference and employing additional force evaluations) yields the best performance.

Summary

Machine learning offers significant potential for accelerating forward modelling in the physical sciences, including gravitational N-body simulations. Although ML models typically produce approximations with inherent emulation errors, our study demonstrates that these errors can be effectively corrected. By using machine learning solutions as approximations and solving the correct physical equations, we can leverage the speed of ML while preserving the reliability of traditional methods. This approach allows us to combine the efficiency of ML with the robustness of established techniques, enhancing computational performance without compromising the accuracy of physical simulations.

References

Acknowledgements

This work was supported by the Simons Collaboration on “Learning the Universe”; the Agence Nationale de la Recherche (ANR) through grant INFOCW, under reference ANR-23-CE46-0006-01; the Swedish Research Council (VR) under the project 2020-05143 – “Deciphering the Dynamics of Cosmic Structure”; and the Centre National d’Etudes Spatiales (CNES). This work was done within the Aquila Consortium (https://www.aquila-consortium.org/).

CNRS Sorbonne IAP Stockholm OKC LtU