GPU Ensemble Tracing
This example demonstrates the usage of GPU for ensemble tracing. Since GitHub Actions do not have GPU runners for now, we do not show the results on page.
using TestParticle
using DiffEqGPU, OrdinaryDiffEq, CUDA, StaticArrays
using CairoMakie
"Set initial state for EnsembleProblem."
function prob_func(prob, i, repeat)
prob = @views remake(prob, u0=[prob.u0[1:3]..., i/3, 0.0, 0.0])
end
## Initialization
B(x) = SA[0, 0, 1e-11]
E(x) = SA[0, 0, 1e-13]
x0 = [0.0, 0.0, 0.0] # initial position, [m]
u0 = [1.0, 0.0, 0.0] # initial velocity, [m/s]
stateinit = [x0..., u0...]
param = prepare(E, B, species=Electron)
tspan = (0.0, 10.0)
trajectories = 3
## Solve for the trajectories
prob = ODEProblem(trace!, stateinit, tspan, param)
ensemble_prob = EnsembleProblem(prob; prob_func, safetycopy=false)
sols = solve(ensemble_prob, Tsit5(), EnsembleGPUArray(CUDA.CUDABackend()); trajectories)
## Visualization
f = Figure(fontsize = 18)
ax = Axis3(f[1, 1],
title = "Electron trajectories",
xlabel = "X",
ylabel = "Y",
zlabel = "Z",
aspect = :data,
)
for i in eachindex(sols)
lines!(ax, sols[i], idxs=(1,2,3), label="$i", color=Makie.wong_colors()[i])
end
fWhile EnsembleGPUArray has a bit of overhead due to its form of GPU code construction, EnsembleGPUKernel is a more restrictive GPU-itizing algorithm that achieves a much lower overhead in kernel launching costs. However, it requires this problem to be written in out-of-place form and use special solvers. Additionally, a timestep dt or saveat keyword is required for dense outputs.
using TestParticle
using DiffEqGPU, OrdinaryDiffEq, CUDA, StaticArrays
using CairoMakie
"Set initial state for EnsembleProblem."
function prob_func(prob, i, repeat)
prob = @views remake(prob, u0=SA[prob.u0[1:3]..., i/3, 0.0, 0.0])
end
## Initialization
B(x) = SA[0, 0, 1e-11]
E(x) = SA[0, 0, 1e-13]
x0 = SA[0.0, 0.0, 0.0] # initial position, [m]
u0 = SA[1.0, 0.0, 0.0] # initial velocity, [m/s]
stateinit = SA[x0..., u0...]
param = prepare(E, B, species=Electron)
tspan = (0.0, 10.0)
trajectories = 3
## Solve for the trajectories
prob = ODEProblem(trace, stateinit, tspan, param)
ensemble_prob = EnsembleProblem(prob; prob_func, safetycopy=false)
## saving time interval is required for dense output!
sols = solve(ensemble_prob, GPUTsit5(), EnsembleGPUKernel(CUDA.CUDABackend());
trajectories, saveat=0.4)
## Visualization
f = Figure(fontsize = 18)
ax = Axis3(f[1, 1],
title = "Electron trajectories",
xlabel = "X",
ylabel = "Y",
zlabel = "Z",
aspect = :data,
)
for i in eachindex(sols)
lines!(ax, sols[i], idxs=(1,2,3), label="$i", color=Makie.wong_colors()[i])
end
fNative GPU Boris Solver
TestParticle provides a native GPU Boris solver implemented with KernelAbstractions.jl, which enables backend-agnostic GPU execution. The solver uses method dispatch on KA.Backend type.
using TestParticle, KernelAbstractions
using StaticArrays
# Define fields
B(x) = SA[0, 0, 1e-8] # Uniform B field
E(x) = SA[0, 0, 0] # No E field
# Initial conditions
x0 = [0.0, 0.0, 0.0]
v0 = [1.0e5, 0.0, 0.0]
stateinit = [x0..., v0...]
tspan = (0.0, 1.0e-6)
# Prepare problem
param = prepare(E, B; species=Proton)
prob = TraceProblem(stateinit, tspan, param)
# Solve on CPU backend (GPU backend requires CUDA.jl, AMDGPU.jl, etc.)
using KernelAbstractions
const KA = KernelAbstractions
backend = CPU()
sols = solve(prob, backend; dt=1e-9, trajectories=1000, savestepinterval=10)The native GPU Boris solver:
Uses
@kernelmacro from KernelAbstractions.jl for backend-agnostic executionSupports multiple GPU backends (CUDA, ROCm, Metal, oneAPI) and CPU fallback
Dispatches on
KA.Backendtype for GPU executionProcesses particles in parallel on the GPU
Returns solutions in the same format as the CPU solver
To use actual GPU acceleration, install the appropriate backend package and create the corresponding backend:
# For NVIDIA GPUs
using CUDA
backend = CUDABackend()
sols = solve(prob, backend; dt=1e-9, trajectories=1000)
# For AMD GPUs
using AMDGPU
backend = ROCBackend()
sols = solve(prob, backend; dt=1e-9, trajectories=1000)Note: The native GPU solver supports both analytic and numerical (interpolated) fields. Numerical fields see a particularly large performance benefit from GPU acceleration.