Final Report: GSoC '24
- Student Name: Junyi(@junyixu).
- Organization: Trixi Framework community.
- Mentors: Michael(@sloede) and Hendrik(@ranocha)
- Project: Integrating the Modern CFD Package Trixi.jl with Compiler-Based Auto-Diff via Enzyme.jl
- Project Link: https://github.com/junyixu/TrixiEnzyme.jl
Project Overview
Trixi.jl is a numerical simulation framework for conservation laws written in Julia. The integration of Trixi.jl with Compiler-Based (LLVM level) automatic differentiation via Enzyme.jl offers the following benefits: facilitates rapid forward mode AD, enables reverse mode AD, supports cross-language AD, and critically, supports mutating operations and caching, on which Trixi.jl relies, to enhance the performance of both simulation runs and AD. The final deliverable will include as many of Trixi's advanced features as possible, such as adaptive mesh refinement, shock capturing, etc., showcasing the benefits of differentiable programming in Julia's ecosystem.
- Forward Mode Automatic Differentiation (AD) for Discontinuous Galerkin Collocation Spectral Element Method (DGSEM): Implement forward mode automatic differentiation to enhance the calculation of derivatives in DG methods, improving computational efficiency and accuracy for various applications.
- Reverse Mode Automatic Differentiation for DG.
- Improve Performance:
- Extract Parameters Passed to Enzyme: Implement a systematic approach to extract and manage parameters passed to Enzyme, ensuring optimal configuration and efficiency in the execution of AD tasks.
batchsize
for Jacobians:- Optimize for Memory Bandwidth: Fine-tune the batch size in Jacobian computations to optimize the use of memory bandwidth, thus improving the overall performance and speed of the computations.
- Automatically Pick
batchsize
- Interfaces to AD through
rhs_gpu!
(ongoing)
Please note that the last step was planned but remains incomplete due to time constraints and this step will be completed in the future if possible.
Key Highlights
Function Prototyping
- Functions intended for automatic differentiation with
Enzyme.autodiff
should adhere to specific naming conventions:- Functions must start with
enzyme_
. - The primary role of these functions is to unpack
semi.cache
and accurately recreatecache
for effective use with Enzyme’s APIs.
- Functions must start with
Configuration
- The functions
jacobian_enzyme_forward
andjacobian_enzyme_reverse
are configured to behave similarly tojacobian_ad_forward
, with the primary distinction being howbatchsize
is chosen:- An alternative usage pattern involves defining new functions prefixed with
enzyme_
and passing them tojacobian_enzyme_forward
orjacobian_enzyme_reverse
for differentiation.
- An alternative usage pattern involves defining new functions prefixed with
The sole distinction between using reverse mode AD and forward mode AD with Enzyme.jl
is that you set dy
as a onehot instead of setting dx
as a onehot. However, there are some important considerations and potential issues to be aware of:
- In reverse mode,
dx
needs to be reset to prevent it from impacting subsequent calculations. - In reverse mode, mutating functions should
return nothing
; failing to do so can lead to incorrect results from Enzyme. - In reverse mode, you must initialize intermediate values to zero; if not, Enzyme will yield incorrect outcomes.
Optimization Strategies
- To enhance performance, several optimization strategies are recommended:
- Reuse containers for shadow variables during middlebatching and utilize the
@batch
macro for multithreading acceleration to improve computational efficiency. - Minimize the number of arguments extracted from
semi.cache
to reduce overhead and streamline computations. - Current benchmarks for Enzyme indicate mixed results. In scenarios involving smaller caches, like in toy models,
jacobian_enzyme(semi)
performs better thanForwardDiff
. However, in the context of Discontinuous Galerkin Collocation Spectral Element Method (DGSEM) simulations, the performance may lag behindjacobian_ad_forward(semi)
due to the challenges associated with large cache sizes (elements._surface_flux_values
andcache.interfaces._u
) and the complexities involved in unpacking and recreating the cache.
- Reuse containers for shadow variables during middlebatching and utilize the
This package aims to provide a robust framework for integrating advanced differentiation techniques into Trixi, addressing both performance and usability to facilitate high-quality computational research and development.
Future Work
- Automatic Differentiation of GPU Kernels: Complete the prototype of Enzyme-based Jacobian computation (
src/gpu.jl
) forrhs_gpu
! functions to match all TrixiCUDA.jl's functionalities - Resolve Issue #1 and Issue #2260
- To define a custom Enzyme rule for matrix
inv
?
- To define a custom Enzyme rule for matrix
- Add examples for ML paradigms: Maybe extend the "Differentiating through a complete simulation" section?
Acknowledgments
The entire project, along with this website, is developed and maintained by Junyi(@junyixu). The whole project is under the guidance of two outstanding professors, Michael(@sloede) and Hendrik(@ranocha), from Trixi Framework community.
The project also received support from other Julia contributors, including Benedict from Trixi Framework community.