Learning AMReX
First Impression
AMReX examples are organized in separate folders. This looks nice to me, similar to the building blocks of OpenFOAM.
The Fortran interface looks really nice.
A key thing in a good parallel mesh library is to hide MPI communications.
A notion of IO processor and non-IO processor is established.
I can tell they are experts. Quotes:
AMReX has a Fortran module,
amrex_mempool_module
that can be used to allocate memory for Fortran pointers. The reason that such a module exists in AMReX is that memory allocation is often very slow in multi-threaded OpenMP parallel regions. AMReXamrex_mempool_module
provides a much faster alternative approach, in which each thread has its own memory pool.
AMReX has built-in multigrid solver. FLEKS is using the multi-block GMRES solver in SWMF, but I also ported that part into pure C++ implementation. It is actually a good chance to see if the MG solver works here. However, note that multigrid is usually for solving elliptic problems (e.g. Poisson’s equation), which is often the most time-consuming part that we try to avoid.
AMReX, because it is built upon C++, differs type copies and references. For example, BoxArray
is a key type in AMR for storing all boxes on the same level. Doing things like
[3].coarsen(2); // DO NOT DO THIS! Doesn't do what one might expect. ba
will only modify a copy instead of the original array.
The distribution of boxes in the domain involves the math of space filling curves. This is perhaps the most interesting question in load balancing.
Functions written in the C++ header files are mostly either inline functions or template functions, for example, the diffusion update kernel in the HeatConduction example.
AMReX uses a subset of cores to do parallel I/O. If all processors attempt to access the disk directly, they will all end up waiting.
Notes
Ann Almgren from Lawrence-Berkeley gave a presentation on AMR with some application introduction to AMReX.
Hands-on
Hands-on training materials including
- Spinning fluid
- Spinning particles
- Pachinko
https://xsdk-project.github.io/MathPackagesTraining/lessons/amrex/
Compilation
AMReX supports both GNU Make and CMake.
Main Components
AMReX.H
for the top level definitions.AMReX_Print.H
for printing.AMReX_ParmParse.H
for parsing input parameters.AMReX_PlotFileUtil.H
for plotting.AMReX_MultiFab.H
for MultiFab support.AMReX_MFParallelFor.H
for newer loop syntax over MultiFabs.AMReX_MultiFabUtil.H
AMReX_Particles.H
AMReX_ParticleMesh.H
It is a common practice to import the amrex
namespace:
using namespace amrex
There are many predefined macros and constant defined in amrex
, e.g. BL_SPACEDIM
, AMREX_SPACEDIM
and AMREX_D_DECL
.
Hello World
Key points: 1. Wrap everything within Initialize
and Finalize
for deterministic behaviors.
#include <AMReX.H>
#include <AMReX_Print.H>
int main(int argc, char* argv[])
{
::Initialize(argc,argv);
amrex{
::Print() << "Hello world from AMReX version " << amrex::Version() << "\n";
amrex}
::Finalize();
amrex}
Refs: * HelloWorld with GNU Make * https://github.com/atmyers/ecp-tutorials/tree/main/01_HelloWorld
Parsing Parameters
Given an input file
an_int_scalar = 2
a_bool_scalar = true
a_real_array = 1. 2. 3. 4.
a_prefix.a_real_scalar = 99.0
a_prefix.a_string = "option"
a_prefix.an_int_array = 4 5 6
#include <AMReX.H>
#include <AMReX_ParmParse.H>
#include <AMReX_Print.H>
void test_parameters ();
int main(int argc, char* argv[])
{
::Initialize(argc, argv);
amrex
();
test_parameters
::Finalize();
amrex}
void test_parameters ()
{
{
::ParmParse pp;
amrexint i;
bool b;
std::vector<amrex::Real> ra;
.get("an_int_scalar", i);
pp.get("a_bool_scalar",b);
pp.getarr("a_real_array", ra);
pp::Print() << "an_int_scalar = " << i << "\n"
amrex<< "a_bool_scalar = " << b << "\n";
::Print() << "a_real_array = ";
amrexfor (auto x : ra) {
::Print() << x << " ";
amrex}
::Print() << "\n";
amrex}
{
::ParmParse pp("a_prefix");
amrexstd::vector<int> ia;
::Real r;
amrexstd::string s;
.getarr("an_int_array", ia);
pp.get("a_real_scalar", r);
pp.get("a_string", s);
pp::Print() << "an_int_array = ";
amrexfor (auto x : ia) {
::Print() << x << " ";
amrex}
::Print() << "\n";
amrex::Print() << "a_prefix.a_real_scalar = " << r << "\n"
amrex<< "a_prefix.a_string = " << s << "\n";
}
}
MultiFab
A MultiFab is a C++ class in AMReX (from AMReX_MultiFab.H
) the stores and operates on multidimensional arrays in parallel. It contains:
- Grid information in the form of a BoxArray that contains one or more components (scalar values) for a single level of the mesh.
- A distribution map, that allows for parallel processing of data in the MultiFab.
- Ghost cells that facilitate a variety of mesh refinement, boundary conditions, and particle algorithms.
#include <AMReX.H>
#include <AMReX_Print.H>
#include <AMReX_MultiFab.H> //For the method most common at time of writing
#include <AMReX_MFParallelFor.H> //For the second newer method
#include <AMReX_PlotFileUtil.H> //For ploting the MultiFab
int main(int argc, char* argv[])
{
::Initialize(argc,argv);
amrex{
::Print() << "Hello world from AMReX version " << amrex::Version() << "\n";
amrex// Goals:
// Define a MultiFab
// Fill a MultiFab with data
// Plot it
// Parameters
// Number of data components at each grid point in the MultiFab
int ncomp = 1;
// how many grid cells in each direction over the problem domain
int n_cell = 32;
// how many grid cells are allowed in each direction over each box
int max_grid_size = 16;
//BoxArray -- Abstract Domain Setup
// integer vector indicating the lower coordindate bounds
::IntVect dom_lo(0,0,0);
amrex// integer vector indicating the upper coordindate bounds
::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1);
amrex// box containing the coordinates of this domain
::Box domain(dom_lo, dom_hi);
amrex
// will contain a list of boxes describing the problem domain
::BoxArray ba(domain);
amrex
// chop the single grid into many small boxes
.maxSize(max_grid_size);
ba
// Distribution Mapping
::DistributionMapping dm(ba);
amrex
//Define MuliFab
::MultiFab mf(ba, dm, ncomp, 0);
amrex
//Geometry -- Physical Properties for data on our domain
::RealBox real_box ({0., 0., 0.}, {1. , 1., 1.});
amrex
::Geometry geom(domain, &real_box);
amrex
//Calculate Cell Sizes
::GpuArray<amrex::Real,3> dx = geom.CellSizeArray(); //dx[0] = dx dx[1] = dy dx[2] = dz
amrex
//Fill a MultiFab with Data
//At the time of writing this is still the most commonly seen method.
for(amrex::MFIter mfi(mf); mfi.isValid(); ++mfi){
const amrex::Box& bx = mfi.validbox();
const amrex::Array4<amrex::Real>& mf_array = mf.array(mfi);
::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k){
amrex
::Real x = (i+0.5) * dx[0];
amrex::Real y = (j+0.5) * dx[1];
amrex::Real z = (k+0.5) * dx[2];
amrex
::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01;
amrex
(i,j,k) = 1.0 + std::exp(-r_squared);
mf_array
});
}
//A second newer method
//In this approach the same functionality is contained in a
//single ParallelFor function.
/*
const amrex::MultiArray4<amrex::Real>& mf_arrs = mf.arrays();
const amrex::IntVect ngs(ngrow);
amrex::ParallelFor(mf, ngs, [=] AMREX_GPU_DEVICE( int nbx, int i, int j, int k) noexcept {
amrex::Real x = (i+0.5) * dx[0];
amrex::Real y = (j+0.5) * dx[1];
amrex::Real z = (k+0.5) * dx[2];
amrex::Real r_squared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01;
mf_arrs[nbx](i,j,k) = 1.0 + std::exp(-r_squared);
});
*/
//Plot MultiFab Data
("plt001", mf, {"comp0"}, geom, 0., 0);
WriteSingleLevelPlotfile
}
::Finalize();
amrex}
I think the newer looping syntax is better for GPU.
Heat Equation
In this Heat Equation example we use two MultiFabs to hold the current and previous values of Phi. Since each MultiFab
is distributed separately among parallel processes, this approach can be easily extended for AMR.
While loading the output from 03HeatEquation with n_cell = 128
, the output variable lives on a 3D uniform mesh of size 128^3. When I tried to load the data into ParaView 5.13.0, it showed the correct number of cells (128^3 = 1,907,152), and the Extents
is from 0 to 64 with 65 points in each dimension.
#include <AMReX.H>
#include <AMReX_PlotFileUtil.H>
#include <AMReX_ParmParse.H>
int main (int argc, char* argv[])
{
::Initialize(argc,argv);
amrex{
// **********************************
// DECLARE SIMULATION PARAMETERS
// **********************************
// number of cells on each side of the domain
int n_cell;
// size of each box (or grid)
int max_grid_size;
// total steps in simulation
int nsteps;
// how often to write a plotfile
int plot_int;
// time step
::Real dt;
amrex
// **********************************
// READ PARAMETER VALUES FROM INPUT DATA
// **********************************
// inputs parameters
{
// ParmParse is way of reading inputs from the inputs file
// pp.get means we require the inputs file to have it
// pp.query means we optionally need the inputs file to have it - but we must supply a default here
::ParmParse pp;
amrex
// We need to get n_cell from the inputs file - this is the number of cells on each side of
// a square (or cubic) domain.
.get("n_cell",n_cell);
pp
// The domain is broken into boxes of size max_grid_size
.get("max_grid_size",max_grid_size);
pp
// Default nsteps to 10, allow us to set it to something else in the inputs file
= 10;
nsteps .query("nsteps",nsteps);
pp
// Default plot_int to -1, allow us to set it to something else in the inputs file
// If plot_int < 0 then no plot files will be written
= -1;
plot_int .query("plot_int",plot_int);
pp
// time step
.get("dt",dt);
pp}
// **********************************
// DEFINE SIMULATION SETUP AND GEOMETRY
// **********************************
// make BoxArray and Geometry
// ba will contain a list of boxes that cover the domain
// geom contains information such as the physical domain size,
// number of points in the domain, and periodicity
::BoxArray ba;
amrex::Geometry geom;
amrex
// define lower and upper indices
::IntVect dom_lo(0,0,0);
amrex::IntVect dom_hi(n_cell-1, n_cell-1, n_cell-1);
amrex
// Make a single box that is the entire domain
::Box domain(dom_lo, dom_hi);
amrex
// Initialize the boxarray "ba" from the single box "domain"
// Note that one can either initialize ba at the time of declaration,
// or define the domain as here after the declaration.
.define(domain);
ba
// Break up boxarray "ba" into chunks no larger than "max_grid_size" along a direction
.maxSize(max_grid_size);
ba
// Define the physical box, [0,1] in each direction.
::RealBox real_box({ 0., 0., 0.}, { 1., 1., 1.});
amrex
// periodic in all direction
::Array<int,3> is_periodic{1,1,1};
amrex
// Define a Geometry object
// Same as the boxarray, a geometry object can be defined afterwards.
.define(domain, real_box, amrex::CoordSys::cartesian, is_periodic);
geom
// extract dx from the geometry object
::GpuArray<amrex::Real,3> dx = geom.CellSizeArray();
amrex
// Nghost = number of ghost cells for each array
int Nghost = 1;
// Ncomp = number of components for each array
int Ncomp = 1;
// How Boxes are distrubuted among MPI processes
::DistributionMapping dm(ba);
amrex
// we allocate two phi multifabs; one will store the old state, the other the new.
::MultiFab phi_old(ba, dm, Ncomp, Nghost);
amrex::MultiFab phi_new(ba, dm, Ncomp, Nghost);
amrex
// time = starting time in the simulation
::Real time = 0.0;
amrex
// **********************************
// INITIALIZE DATA LOOP
// **********************************
// loop over boxes
for (amrex::MFIter mfi(phi_old); mfi.isValid(); ++mfi)
{
const amrex::Box& bx = mfi.validbox();
const amrex::Array4<amrex::Real>& phiOld = phi_old.array(mfi);
// set phi = 1 + e^(-(r-0.5)^2)
::ParallelFor(bx, [=] AMREX_GPU_DEVICE(int i, int j, int k)
amrex{
// **********************************
// SET VALUES FOR EACH CELL
// **********************************
::Real x = (i+0.5) * dx[0];
amrex::Real y = (j+0.5) * dx[1];
amrex::Real z = (k+0.5) * dx[2];
amrex::Real rsquared = ((x-0.5)*(x-0.5)+(y-0.5)*(y-0.5)+(z-0.5)*(z-0.5))/0.01;
amrex(i,j,k) = 1. + std::exp(-rsquared);
phiOld});
}
// **********************************
// WRITE INITIAL PLOT FILE
// **********************************
// Write a plotfile of the initial data if plot_int > 0
if (plot_int > 0)
{
int step = 0;
const std::string& pltfile = amrex::Concatenate("plt",step,5);
(pltfile, phi_old, {"phi"}, geom, time, 0);
WriteSingleLevelPlotfile}
// **********************************
// MAIN TIME EVOLUTION LOOP
// **********************************
for (int step = 1; step <= nsteps; ++step)
{
// fill periodic ghost cells
.FillBoundary(geom.periodicity());
phi_old
// new_phi = old_phi + dt * Laplacian(old_phi)
// loop over boxes
for ( amrex::MFIter mfi(phi_old); mfi.isValid(); ++mfi )
{
const amrex::Box& bx = mfi.validbox();
const amrex::Array4<amrex::Real>& phiOld = phi_old.array(mfi);
const amrex::Array4<amrex::Real>& phiNew = phi_new.array(mfi);
// advance the data by dt
::ParallelFor(bx, [=] AMREX_GPU_DEVICE (int i, int j, int k)
amrex{
// **********************************
// EVOLVE VALUES FOR EACH CELL
// **********************************
(i,j,k) = phiOld(i,j,k) + dt *
phiNew( (phiOld(i+1,j,k) - 2.*phiOld(i,j,k) + phiOld(i-1,j,k)) / (dx[0]*dx[0])
+(phiOld(i,j+1,k) - 2.*phiOld(i,j,k) + phiOld(i,j-1,k)) / (dx[1]*dx[1])
+(phiOld(i,j,k+1) - 2.*phiOld(i,j,k) + phiOld(i,j,k-1)) / (dx[2]*dx[2])
);
});
}
// **********************************
// INCREMENT
// **********************************
// update time
= time + dt;
time
// copy new solution into old solution
::MultiFab::Copy(phi_old, phi_new, 0, 0, 1, 0);
amrex
// Tell the I/O Processor to write out which step we're doing
::Print() << "Advanced step " << step << "\n";
amrex
// **********************************
// WRITE PLOTFILE AT GIVEN INTERVAL
// **********************************
// Write a plotfile of the current data (plot_int was defined in the inputs file)
if (plot_int > 0 && step%plot_int == 0)
{
const std::string& pltfile = amrex::Concatenate("plt",step,5);
(pltfile, phi_new, {"phi"}, geom, time, step);
WriteSingleLevelPlotfile}
}
}
::Finalize();
amrexreturn 0;
}
ParticleMesh
Provided the input file:
# Domain size
nx = 128 # number of grid points along the x axis
ny = 128 # number of grid points along the y axis
nz = 128 # number of grid points along the z axis
# Maximum allowable size of each subdomain in the problem domain;
# this is used to decompose the domain for parallel calculations.
max_grid_size = 32
# Number of particles per cell
nppc = 10
# Verbosity
verbose = true # set to true to get more verbosity
#include <iostream>
#include <AMReX.H>
#include <AMReX_MultiFab.H>
#include <AMReX_MultiFabUtil.H>
#include <AMReX_Particles.H>
#include <AMReX_PlotFileUtil.H>
#include <AMReX_ParticleMesh.H>
using namespace amrex;
struct TestParams {
int nx;
int ny;
int nz;
int max_grid_size;
int nppc;
bool verbose;
};
void testParticleMesh(TestParams& parms)
{
;
RealBox real_boxfor (int n = 0; n < BL_SPACEDIM; n++) {
.setLo(n, 0.0);
real_box.setHi(n, 1.0);
real_box}
(AMREX_D_DECL(0, 0, 0));
IntVect domain_lo(AMREX_D_DECL(parms.nx - 1, parms.ny - 1, parms.nz-1));
IntVect domain_hiconst Box domain(domain_lo, domain_hi);
// This sets the boundary conditions to be doubly or triply periodic
int is_per[BL_SPACEDIM];
for (int i = 0; i < BL_SPACEDIM; i++)
[i] = 1;
is_per(domain, &real_box, CoordSys::cartesian, is_per);
Geometry geom
(domain);
BoxArray ba.maxSize(parms.max_grid_size);
baif (parms.verbose && ParallelDescriptor::IOProcessor()) {
std::cout << "Number of boxes : " << ba.size() << '\n';
std::cout << "Box sizes : " << ba[0].size() << '\n';
}
(ba);
DistributionMapping dmap
(ba, dmap, 1 + BL_SPACEDIM, 1);
MultiFab partMF.setVal(0.0);
partMF
typedef ParticleContainer<1 + 2*BL_SPACEDIM> MyParticleContainer;
(geom, dmap, ba);
MyParticleContainer myPC.SetVerbose(false);
myPC
int num_particles = parms.nppc * parms.nx * parms.ny * parms.nz;
if (ParallelDescriptor::IOProcessor())
std::cout << "Total number of particles : " << num_particles << '\n' << '\n';
bool serialize = true;
int iseed = 451;
= 10.0;
Real mass
::ParticleInitData pdata = {mass, AMREX_D_DECL(1.0, 2.0, 3.0), AMREX_D_DECL(0.0, 0.0, 0.0)};
MyParticleContainer.InitRandom(num_particles, iseed, pdata, serialize);
myPC
int nc = 1 + BL_SPACEDIM;
const auto plo = geom.ProbLoArray();
const auto dxi = geom.InvCellSizeArray();
::ParticleToMesh(myPC, partMF, 0,
amrex[=] AMREX_GPU_DEVICE (const MyParticleContainer::ParticleType& p,
::Array4<amrex::Real> const& rho)
amrex{
::Real lx = (p.pos(0) - plo[0]) * dxi[0] + 0.5;
amrex::Real ly = (p.pos(1) - plo[1]) * dxi[1] + 0.5;
amrex::Real lz = (p.pos(2) - plo[2]) * dxi[2] + 0.5;
amrex
int i = amrex::Math::floor(lx);
int j = amrex::Math::floor(ly);
int k = amrex::Math::floor(lz);
::Real xint = lx - i;
amrex::Real yint = ly - j;
amrex::Real zint = lz - k;
amrex
::Real sx[] = {1.-xint, xint};
amrex::Real sy[] = {1.-yint, yint};
amrex::Real sz[] = {1.-zint, zint};
amrex
for (int kk = 0; kk <= 1; ++kk) {
for (int jj = 0; jj <= 1; ++jj) {
for (int ii = 0; ii <= 1; ++ii) {
::Gpu::Atomic::AddNoRet(&rho(i+ii-1, j+jj-1, k+kk-1, 0),
amrex[ii]*sy[jj]*sz[kk]*p.rdata(0));
sx}
}
}
for (int comp=1; comp < nc; ++comp) {
for (int kk = 0; kk <= 1; ++kk) {
for (int jj = 0; jj <= 1; ++jj) {
for (int ii = 0; ii <= 1; ++ii) {
::Gpu::Atomic::AddNoRet(&rho(i+ii-1, j+jj-1, k+kk-1, comp),
amrex[ii]*sy[jj]*sz[kk]*p.rdata(0)*p.rdata(comp));
sx}
}
}
}
});
(ba, dmap, BL_SPACEDIM, 1);
MultiFab acceleration.setVal(5.0);
acceleration
= BL_SPACEDIM;
nc ::MeshToParticle(myPC, acceleration, 0,
amrex[=] AMREX_GPU_DEVICE (MyParticleContainer::ParticleType& p,
::Array4<const amrex::Real> const& acc)
amrex{
::Real lx = (p.pos(0) - plo[0]) * dxi[0] + 0.5;
amrex::Real ly = (p.pos(1) - plo[1]) * dxi[1] + 0.5;
amrex::Real lz = (p.pos(2) - plo[2]) * dxi[2] + 0.5;
amrex
int i = amrex::Math::floor(lx);
int j = amrex::Math::floor(ly);
int k = amrex::Math::floor(lz);
::Real xint = lx - i;
amrex::Real yint = ly - j;
amrex::Real zint = lz - k;
amrex
::Real sx[] = {1.-xint, xint};
amrex::Real sy[] = {1.-yint, yint};
amrex::Real sz[] = {1.-zint, zint};
amrex
for (int comp=0; comp < nc; ++comp) {
for (int kk = 0; kk <= 1; ++kk) {
for (int jj = 0; jj <= 1; ++jj) {
for (int ii = 0; ii <= 1; ++ii) {
.rdata(4+comp) += sx[ii]*sy[jj]*sz[kk]*acc(i+ii-1,j+jj-1,k+kk-1,comp);
p}
}
}
}
});
("plot", partMF,
WriteSingleLevelPlotfile{"density", "vx", "vy", "vz"},
, 0.0, 0);
geom
.Checkpoint("plot", "particle0");
myPC}
int main(int argc, char* argv[])
{
::Initialize(argc,argv);
amrex
;
ParmParse pp
;
TestParams parms
.get("nx", parms.nx);
pp.get("ny", parms.ny);
pp.get("nz", parms.nz);
pp.get("max_grid_size", parms.max_grid_size);
pp.get("nppc", parms.nppc);
ppif (parms.nppc < 1 && ParallelDescriptor::IOProcessor())
::Abort("Must specify at least one particle per cell");
amrex
.verbose = false;
parms.query("verbose", parms.verbose);
pp
if (parms.verbose && ParallelDescriptor::IOProcessor()) {
std::cout << std::endl;
std::cout << "Number of particles per cell : ";
std::cout << parms.nppc << std::endl;
std::cout << "Size of domain : ";
std::cout << parms.nx << " " << parms.ny << " " << parms.nz << std::endl;
}
(parms);
testParticleMesh
::Finalize();
amrex}
WarpX
Thoughts
AMR is, from my experience, easier said than done.
It would be a pain to build your work upon something that is not easily understandable. I think AMReX beats BATL in every aspect. It has nice documentation, clear syntax, and neat interfaces. I can tell that while BATL is designed by smart scientists, AMReX is motivated by scientific projects and carefully designed by expert programmers. Let us go back a few years and think about history. BATL was written in 2011-2012 before MHD-PIC coulping, which, in my point of view, is clearly motivated by the coupling project. That was actually a really nice time spot when the whole group can switch to OOP and design a general mesh library. Unfortunately, being biased towards pure functional programming and only BATSRUS in mind, we missed that great opportunity. Thus it results in today’s BATL, being fully integrated only for BATSRUS, and not suitable for even multithreads control, let along GPU. AMReX is publicly release in 2017, as a descendant for GUMBO. It was born at a time when all the mainstream massively parallel techiques becomes mature and ready for use. It covers almost all the possibilities in physics simulation using structured grid, which will eventually make it shining over other competitors. I would also say that its developers are really appreciated for making it open source. This is how work done in one group can benefit all other groups and raise fame and honor in the community. Science is not a stand-alone project. With really good fundations we can easily find collaborators and make amazing work.
AMReX has so many things mentioned in the doc. I will go over the doc first before digging into FLEKS, and try to work on FLEKS later.
Julia wrapper
I really want a Julia wrapper over AMReX. There is an experimental project of a Python wrapper, and I also see a shared binary in the Julia registry. But neither of them is functional.