FAQ
Test Data
If you don't have VLSV data at hand, Vlasiator.jl provides some test data for you to begin with.
using LazyArtifacts
rootpath = artifact"testdata"
files = joinpath.(rootpath, ("bulk.1d.vlsv", "bulk.2d.vlsv", "bulk.amr.vlsv"))
These are also used in the standard test. These will be automatically downloaded from vlsv_data if you run the package test locally.
Performance
The VLSV loader inherits the basic structure from Analysator and is redesigned for performance.
- For general data reading, a dictionary is constructed for cell IDs and orderings for O(1) timings.
- It is faster to read a bunch of cell IDs together, if possible, than to read each cell one-by-one.
- Specific methods are provided for targeted tasks that are much faster than the generic approaches. For instance,
extractsat
for multi-frame static satellite extraction can be more than 10x faster than first reading the metadata and then extracting variables for each frame.
For development, it is recommended to use PkgBenchmark.jl to run the test suite:
using PkgBenchmark, Vlasiator
results = benchmarkpkg(Vlasiator)
or if you want to compare the current status of the package against a different git version
judge(Vlasiator, "97e3dca6b2474d7bdc5b62b5bf98ecf070516e5e")
To export results to Markdown,
export_markdown("testresult", results)
See more in the PkgBenchmark manual.
Precision
For post-processing and data analysis purposes, it makes less sense to stick to double precisions, so we mostly use Float32
in Vlasiator.jl for numerical arrays. Several exceptions are:
- physical constants are defined in
Float64
, since single precision only resolves up to ±3.4E+38, and it may go out of bound in the middle of calculation (e.g. plasma frequency).
Int vs. UInt
Integers but not unsigned integers shall be used for indexing, even though unsigned integers are tempting.
Memory
Vlasiator output files can be large. If we have limited memory relative to the file size, Vlasiator.jl provide direct hard disk mapping through mmap
in Julia. With this mechanism you never need to worry about unable to process data with small free memory. Besides, we found that proper usage of mmap
can also speed up reading and reduce memory comsumption. However, without reinterpret
we may encounter the alignment issue. Therefore, we check the page alignment while using mmap
.
The memory layout of the field solver grid is a bit tricky when converting from 1D arrays to multidimensional arrays. In pyvlasiator/analysator, the FSgrid output fg_b
has size of (nx,ny,nz,3); In Vlasiator.jl, the final output would be size of (3, nx,ny,nz).
Parallelism
The current design choice is to achieve optimal serial performance per file, and apply parallel processing across individual files. In most common cases, the time it takes for post-processing one snapshot is reasonably short, but the number of snapshots are large. Julia's built-in support for all kinds of parallelism paradigm (multithreading, multiprocessing, channel) and external support from packages (MPI.jl, Polyester.jl) can be relatively easily incorported to make the whole workflow parallel.
- multi-threading with
@threads
(recommended when working within one node) - multi-processing with
pmap
- multi-processing with
RemoteChannel
ClusterManagers
for multi-node jobs
See more in the examples.
VTK
VLSV is just an uncompressed binary format. If we convert VLSV to VTK through write_vtk
, the generated VTK files, even the highest resolution one with every coarse cell mapping to the finest level, can be several times smaller than the original VLSV file.
One drawback of this conversion is that it cannot deal with phase space outputs, i.e. VDFs.