Skip to content

GPU Ensemble Tracing

This example demonstrates the usage of GPU for ensemble tracing. Since GitHub Actions do not have GPU runners for now, we do not show the results on page.

julia
using TestParticle
using DiffEqGPU, OrdinaryDiffEq, CUDA, StaticArrays
using CairoMakie

"Set initial state for EnsembleProblem."
function prob_func(prob, i, repeat)
   prob = @views remake(prob, u0=[prob.u0[1:3]..., i/3, 0.0, 0.0])
end

## Initialization

B(x) = SA[0, 0, 1e-11]
E(x) = SA[0, 0, 1e-13]

x0 = [0.0, 0.0, 0.0] # initial position, [m]
u0 = [1.0, 0.0, 0.0] # initial velocity, [m/s]
stateinit = [x0..., u0...]

param = prepare(E, B, species=Electron)
tspan = (0.0, 10.0)

trajectories = 3

## Solve for the trajectories

prob = ODEProblem(trace!, stateinit, tspan, param)
ensemble_prob = EnsembleProblem(prob; prob_func, safetycopy=false)
sols = solve(ensemble_prob, Tsit5(), EnsembleGPUArray(CUDA.CUDABackend()); trajectories)

## Visualization

f = Figure(fontsize = 18)
ax = Axis3(f[1, 1],
   title = "Electron trajectories",
   xlabel = "X",
   ylabel = "Y",
   zlabel = "Z",
   aspect = :data,
)

for i in eachindex(sols)
   lines!(ax, sols[i], idxs=(1,2,3), label="$i", color=Makie.wong_colors()[i])
end

f

While EnsembleGPUArray has a bit of overhead due to its form of GPU code construction, EnsembleGPUKernel is a more restrictive GPU-itizing algorithm that achieves a much lower overhead in kernel launching costs. However, it requires this problem to be written in out-of-place form and use special solvers. Additionally, a timestep dt or saveat keyword is required for dense outputs.

julia
using TestParticle
using DiffEqGPU, OrdinaryDiffEq, CUDA, StaticArrays
using CairoMakie

"Set initial state for EnsembleProblem."
function prob_func(prob, i, repeat)
   prob = @views remake(prob, u0=SA[prob.u0[1:3]..., i/3, 0.0, 0.0])
end

## Initialization

B(x) = SA[0, 0, 1e-11]
E(x) = SA[0, 0, 1e-13]

x0 = SA[0.0, 0.0, 0.0] # initial position, [m]
u0 = SA[1.0, 0.0, 0.0] # initial velocity, [m/s]
stateinit = SA[x0..., u0...]

param = prepare(E, B, species=Electron)
tspan = (0.0, 10.0)

trajectories = 3

## Solve for the trajectories

prob = ODEProblem(trace, stateinit, tspan, param)
ensemble_prob = EnsembleProblem(prob; prob_func, safetycopy=false)
## saving time interval is required for dense output! 
sols = solve(ensemble_prob, GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend());
   trajectories, saveat=0.4)

## Visualization

f = Figure(fontsize = 18)
ax = Axis3(f[1, 1],
   title = "Electron trajectories",
   xlabel = "X",
   ylabel = "Y",
   zlabel = "Z",
   aspect = :data,
)

for i in eachindex(sols)
   lines!(ax, sols[i], idxs=(1,2,3), label="$i", color=Makie.wong_colors()[i])
end

f

Native GPU Boris Solver

TestParticle provides a native GPU Boris solver implemented with KernelAbstractions.jl, which enables backend-agnostic GPU execution. The solver uses method dispatch on KA.Backend type.

julia
using TestParticle, KernelAbstractions
using StaticArrays

# Define fields
B(x) = SA[0, 0, 1e-8]  # Uniform B field
E(x) = SA[0, 0, 0]     # No E field

# Initial conditions
x0 = [0.0, 0.0, 0.0]
v0 = [1.0e5, 0.0, 0.0]
stateinit = [x0..., v0...]
tspan = (0.0, 1.0e-6)

# Prepare problem
param = prepare(E, B; species=Proton)
prob = TraceProblem(stateinit, tspan, param)

# Solve on CPU backend (GPU backend requires CUDA.jl, AMDGPU.jl, etc.)
using KernelAbstractions
const KA = KernelAbstractions
backend = CPU()
sols = solve(prob, backend; dt=1e-9, trajectories=1000, savestepinterval=10)

The native GPU Boris solver:

  • Uses @kernel macro from KernelAbstractions.jl for backend-agnostic execution

  • Supports multiple GPU backends (CUDA, ROCm, Metal, oneAPI) and CPU fallback

  • Dispatches on KA.Backend type for GPU execution

  • Processes particles in parallel on the GPU

  • Returns solutions in the same format as the CPU solver

To use actual GPU acceleration, install the appropriate backend package and create the corresponding backend:

julia
# For NVIDIA GPUs
using CUDA
backend = CUDABackend()
sols = solve(prob, backend; dt=1e-9, trajectories=1000)

# For AMD GPUs
using AMDGPU
backend = ROCBackend()
sols = solve(prob, backend; dt=1e-9, trajectories=1000)

Note: The native GPU solver supports both analytic and numerical (interpolated) fields. Numerical fields see a particularly large performance benefit from GPU acceleration.