The Intel profiler VTune and Trace Analyzer are now bundled in the OneAPI toolkit. It works best for C/C++ codes, and decent for Fortran. I have some good time using the GUI interface for submitting parallel jobs, but sometimes it may be faster to utilize the command line interface.

Installation

For large systems like Frontera, it is as easy as

module load oneapi
module load vtune

Collecting VTune Data

ibrun -n 4 vtune -c hotspots -r hotspots_base -- ./heart_demo.base -m ../mesh_mid -s ../setup_mid.txt -t 10 -i
ibrun -n 4 vtune -c hpc-performance -r hpcperf_base -- ./heart_demo.base -m ../mesh_mid -s ../setup_mid.txt -t 10 -i
ibrun -n 4 vtune -c threading -r threading_base -- ./heart_demo.base -m ../mesh_mid -s ../setup_mid.txt -t 10 -i

Note that in able to trigger backtracing to source codes, typically -g or debugging mode is required. Stack tracing should be enabled.

Analyzing VTune Data

VTune has a nice GUI that is typically named vtune-gui. We can open the generated profiling reports inside the GUI and check all kinds of metrics.

Trace Analyzer

This is Intel’s tool for improving MPI efficiency.

Collecting Data

If we use Intel’s MPI library,

mpirun -n 16 -trace program.exe

For other MPI libraries, the corresponding dynamic library must be loaded. Note that OpenMPI is not compatible with Trace Analyzer.

Analyzing Tracer Data

Trace Analyzer also has a nice GUI.

traceanalyzer