Metrics¶
The following metrics are under development (or being planned).
Each metric can be ascribed to a high level family, shown in the table below as the “Family” column. We likely will tweak and improve upon these categories.
Implemented Metrics¶
sys-hwloc¶
Hwloc or “portable hardware locality” can be used to look at the hardware of your system. There is a nice tutorial here for the default command that is run, “lstopo” that does exactly that - listing your hardware topology! Specifically we output a png image and machine spec for the default command, and this can be updated. This man page is recommended to see the different commands and options.
Name | Description | Type | Default |
---|---|---|---|
command | Change the default command to something else. | string | lstopo architecture.png && hwloc-ls machine.xml |
The above saves a png image, and the machine data to xml. Note that if you need to copy the data post-run, you
likely want to set interactive: true
to keep it running.
perf-sysstat¶
This metric provides the “pidstat” executable of the sysstat library. The following options are available:
Name | Description | Type | Default |
---|---|---|---|
color | Set to turn on color parsing | Anything set | unset |
pids | For debugging, show consistent output of ps aux | Anything set | unset |
threads | add -t to each pidstat command to indicate wanting thread-level output |
unset | |
completions | Number of times to run metric | int32 | unset (runs for lifetime of application or indefinitely) |
rate | Seconds to pause between measurements | int32 | 10 |
By default color and pids are set to false anticipating log parsing. And we also provide the option to see “commands” or specific commands based on a job index to the metric. As an example, here is how we would ask to monitor two different commands for a launcher node (index 0) and the rest (workers).
- name: perf-sysstat
options:
pids: "true"
# Custom options
options:
rate: 2
# Look for pids based on commands matched to index
mapOptions:
commands:
# First set all to use the worker command, but give the lead broker a special command
"all": /usr/libexec/flux/cmd/flux-broker --config /etc/flux/config -Scron.directory=/etc/flux/system/cron.d -Stbon.fanout
"0": /usr/bin/python3.8 /usr/libexec/flux/cmd/flux-submit.py -n 2 --quiet --watch lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite
In the map above, order matters, as the command for all indices is first set to be the flux-broker one, and then after the index at 0 gets a custom command. See pidstat for more information on this command, and this file for how we use them. If there is an option or command that is not exposed that you would like, please open an issue.
io-fio¶
This is a nice tool that you can simply point at a path, and it measures IO stats by way of writing a file there! Options you can set include:
Name | Description | Type | Default |
---|---|---|---|
testname | Name for the test | string | test |
blocksize | Size of block to write. It defaults to 4k, but can be set from 256 to 8k. | string | 4k |
iodepth | Number of I/O units to keep in flight against the file. | int | 64 |
size | Total size of file to write | string | 4G |
directory | Directory (usually mounted) to test. | string | /tmp |
pre | Custom logic / command to run before Fio | string | unset |
post | Custom logic / command to run after Fio (e.g., cleanup) | string | unset |
prefix | Prefix to add to running fio commands (like a wrapper) | string | unset |
For the “directory” we use this location to write a temporary file, which will be cleaned up. This allows for testing storage mounted from multiple metric pods without worrying about a name conflict.
io-ior¶
Ior is a really nice IO tool that is now a combination of its previous self and the mdtest tool. We expose a simple set of the working directory and command that you want to run, and the rest is up to you!
Name | Description | Type | Default |
---|---|---|---|
command | The default ior command | string | ior -w -r -o testfile |
workdir | The working directory for the command | string | /opt/ior |
The getting started tutorial is great for seeing how basic commands are done. Note that the container does have mpirun if you want to use it. We don’t have support for this across nodes, but this could be added. Let us know if this would be interesting to you.
io-sysstat¶
This is the “iostat” executable of the sysstat library.
Name | Description | Type | Default |
---|---|---|---|
human | Show tabular, human-readable output inside of json | string "true" or "false" | "false" |
completions | Number of times to run metric | int32 | unset (runs for lifetime of application or indefinitely) |
rate | Seconds to pause between measurements | int32 | 10 |
pre | One or more commands to run before iostat | string | unset |
post | One or more commands to run after iostat | string | unset |
This is good for mounted storage that can be seen by the operating system, but may not work for something like NFS.
dlio¶
While this is a simple performance tool not coded into the Metrics Operator (it is installed on the fly to your container with pip and you minimally require hwloc) it generates pretty cool data that can be visualized with perfetto!
You can see the full example above. It is just installing a library with pip, and then ensuring the tool LD_PRELOAD
is set as the prefix. I added sleep infinity to the end to copy over output data at the end.
network-netmark¶
network-netmark (code still private)
This is currently a private container/software, but we have support for it when it’s ready to be made public (networking) Variables to customize include:
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
tasks | Total number of tasks across pods | options->tasks | string | nproc * pods |
warmups | Number of warmups | options->warmups | int32 | 10 |
trials | Number of trials | options->trials | int32 | 20 |
sendReceiveCycles | Number of send-receive cycles | options-sendReceiveCycles | int32 | 20 |
messageSize | Message size in bytes | options->messageSize | int32 | 0 |
storeEachTrial | Flag to indicate storing each trial data | options->storeEachTrial | string (true/false) | "true" |
soleTenancy | Turn off sole tenancy (one pod/node) | options->soleTenancy | string ("false" or "no") | "true" |
network-osu-benchmark¶
Point to point benchmarks for MPI (networking). If listOptions->commands not set, will use all one-point commands. Variables to customize include:
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
commands | Custom list of osu-benchmark one-sided commands to run | listOptions->commands | array | unset uses default set |
soleTenancy | Turn off sole tenancy (one pod/node) | string ("false" or "no") | "true" | |
all | Run ALL benchmarks with defaults | string ("true" or "yes") | "false" | |
flags | Overwrite defaults flags (experts only!) | string | Defaults to an ideal set per metric (see osu-benchmark.go) | |
timed | String "true" or "yes" to add time prefix to mpirun (for debugging, etc) | string | "false" | |
sleep | Number of seconds to sleep to wait for network to be ready | int32 | 60 |
By default, we run a subset of commands:
osu_get_acc_latency
osu_acc_latency
osu_fop_latency
osu_get_latency
osu_put_latency
osu_allreduce
osu_latency
osu_bibw
osu_bw
However all of the following are available for MPI
Commands available for OSU Benchmarks
.
|-- collective
| |-- osu_allgather
| |-- osu_allgatherv
| |-- osu_allreduce
| |-- osu_alltoall
| |-- osu_alltoallv
| |-- osu_barrier
| |-- osu_bcast
| |-- osu_gather
| |-- osu_gatherv
| |-- osu_iallgather
| |-- osu_iallgatherv
| |-- osu_iallreduce
| |-- osu_ialltoall
| |-- osu_ialltoallv
| |-- osu_ialltoallw
| |-- osu_ibarrier
| |-- osu_ibcast
| |-- osu_igather
| |-- osu_igatherv
| |-- osu_ireduce
| |-- osu_iscatter
| |-- osu_iscatterv
| |-- osu_reduce
| |-- osu_reduce_scatter
| |-- osu_scatter
| `-- osu_scatterv
|-- one-sided
| |-- osu_acc_latency
| |-- osu_cas_latency
| |-- osu_fop_latency
| |-- osu_get_acc_latency
| |-- osu_get_bw
| |-- osu_get_latency
| |-- osu_put_bibw
| |-- osu_put_bw
| `-- osu_put_latency
|-- pt2pt
| |-- osu_bibw
| |-- osu_bw
| |-- osu_latency
| |-- osu_latency_mp
| |-- osu_latency_mt
| |-- osu_mbw_mr
| `-- osu_multi_lat
`-- startup
|-- osu_hello
`-- osu_init
Note that not all of these have been tested on our setups, so if you have any questions please let us know. Here are some useful resources for the benchmarks:
app-lammps¶
Since we were using LAMMPS so often as a benchmark (and testing timing of a network) it made sense to add it here as a standalone metric! Although we are doing MPI with communication via SSH, this can still serve as a means to assess performance.
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The full mpirun and lammps command | options->command | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /opt/lammps/examples/reaxff/HNS# |
soleTenancy | require each pod to have sole tenancy | command->soleTenancy | string | "false" |
For inspection, you can see all the examples provided in the LAMMPS GitHub repository. The default command (if you don’t change it) intended as an example is:
mpirun --hostfile ./hostlist.txt -np 2 --map-by socket lmp -v x 2 -v y 2 -v z 2 -in in.reaxc.hns -nocite(e
In the working directory /opt/lammps/examples/reaxff/HNS#
. You should be calling mpirun
and expecting a ./hostlist.txt in the present working directory (the “workdir” you chose above).
You should also provide the correct number of processes (np) and problem size for LAMMPS (lmp). We left this as open and flexible
anticipating that you as a user would want total control.
app-amg¶
AMG means “algebraic multi-grid” and it’s easy to confuse with the company AMD “Advanced Micro Devices” ! From the guide:
AMG is a parallel algebraic multigrid solver for linear systems arising from problems on unstructured grids. The driver provided for this benchmark builds linear systems for a 3D problem with a 27-point stencil and generates two different problems that are described in section D of the AMG.readme file in the docs directory.
Here are examples of small and medium problem sizes provided in that same guide. Each of these would be given to MPI (mpirun), but srun is provided as an example instead.
# Small size problems
srun –N 32 –n 512 amg –problem 1 –n 96 96 96 –P 8 8 8
srun –N 32 –n 512 amg –problem 2 –n 40 40 40 –P 8 8 8
srun –N 64 –n 1024 amg –problem 1 –n 96 96 96 –P 16 8 8
srun –N 64 –n 1024 amg –problem 2 –n 40 40 40 –P 16 8 8
# Medium size problems
srun –N 512 –n 8192 amg –problem 1 –n 96 96 96 –P 32 16 16
srun –N 512 –n 8192 amg –problem 2 –n 40 40 40 –P 32 16 16
srun –N 1024 –n 16384 amg –problem 1 –n 96 96 96 –P 32 32 16
srun –N 1024 –n 16384 amg –problem 2 –n 40 40 40 –P 32 32 16
By default, akin to LAMMPS we expose the entire mpirun command along with the working directory for you to adjust.
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The amg command (without mpirun) | options->command | string | (see below) |
prefix | The prefix (mpirun command and arguments) | options->mpirun | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /opt/AMG |
By default, when not set, you will just run the amg binary to get a test case run:
# mpirun
mpirun --hostfile ./hostlist.txt
# command
amg
# Assembled into
mpirun --hostfile ./hostlist.txt ./problem.sh
More likely you want an actual problem size on a specific number of node and tasks, and you’ll want to test this. The two problem sizes include:
problem 1 (default) will use conjugate gradient preconditioned with AMG to solve a linear system with a 3D 27-point stencil of size nxnynzPxPy*Pz.
problem 2 simulates a time-dependent problem of size nxnynzPxPy*Pz with AMG-GMRES. The linear system is also a 3D 27-point stencil. The system is sized to be 5-10% of the large problem.
NOTE that the Python parser was written for the test case, and likely we need to extend it to problem 2 or larger sized problems. If you run a larger problem and the parser does not work as expected, please send us the output and we will provide an updated parser. See this guide for more detail.
app-cabanaPIC¶
This is a particle in cell simulation that is experimental because it does not seem to support multiple nodes yet (but should).
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The full command to run | options->command | string | cbnpic |
workdir | The working directory for the command | options->workdir | string | /opt/cabanaPIC/build |
app-quicksilver¶
Quicksilver is a proxy app for Monte Carlo simulation code. You can learn more about it on the GitHub repository. By default, akin to other apps we expose the entire mpirun command along with the working directory for you to adjust.
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The qs command (without mpirun) | options->command | string | (see below) |
prefix | The prefix (mpirun command and arguments) | options->mpirun | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /opt/AMG |
By default, when not set, you will just run the qs (quicksilver) binary on a sample problem, represented by an input text file:
# mpirun
mpirun --hostfile ./hostlist.txt
# command
qs /opt/quicksilver/Examples/CORAL2_Benchmark/Problem1/Coral2_P1.inp
# Assembled into problem.sh as follows:
mpirun --hostfile ./hostlist.txt ./problem.sh
There are many problems that come in the container, and here are the fullpaths:
# Example command
qs /opt/quicksilver/Examples/CORAL2_Benchmark/Problem1/Coral2_P1.inp
# All examples:
/opt/quicksilver/Examples/AllScattering/scatteringOnly.inp
/opt/quicksilver/Examples/NoCollisions/no.collisions.inp
/opt/quicksilver/Examples/NonFlatXC/NonFlatXC.inp
/opt/quicksilver/Examples/CORAL2_Benchmark/Problem2/Coral2_P2_4096.inp
/opt/quicksilver/Examples/CORAL2_Benchmark/Problem2/Coral2_P2.inp
/opt/quicksilver/Examples/CORAL2_Benchmark/Problem2/Coral2_P2_1.inp
/opt/quicksilver/Examples/CORAL2_Benchmark/Problem1/Coral2_P1.inp
/opt/quicksilver/Examples/CORAL2_Benchmark/Problem1/Coral2_P1_1.inp
/opt/quicksilver/Examples/CORAL2_Benchmark/Problem1/Coral2_P1_4096.inp
/opt/quicksilver/Examples/CTS2_Benchmark/CTS2.inp
/opt/quicksilver/Examples/CTS2_Benchmark/CTS2_36.inp
/opt/quicksilver/Examples/CTS2_Benchmark/CTS2_1.inp
/opt/quicksilver/Examples/AllAbsorb/allAbsorb.inp
/opt/quicksilver/Examples/Homogeneous/homogeneousProblem_v4_ts.inp
/opt/quicksilver/Examples/Homogeneous/homogeneousProblem_v5_ts.inp
/opt/quicksilver/Examples/Homogeneous/homogeneousProblem.inp
/opt/quicksilver/Examples/Homogeneous/homogeneousProblem_v3_wq.inp
/opt/quicksilver/Examples/Homogeneous/homogeneousProblem_v7_ts.inp
/opt/quicksilver/Examples/Homogeneous/homogeneousProblem_v4_tm.inp
/opt/quicksilver/Examples/Homogeneous/homogeneousProblem_v3.inp
/opt/quicksilver/Examples/AllEscape/allEscape.inp
/opt/quicksilver/Examples/NoFission/noFission.inp
You can also look more closely in the GitHub repository.
app-pennant¶
Pennant is an unstructured mesh hydrodynamics for advanced architectures. The documentation is sparse, but you can find the source code on GitHub. By default, akin to other apps we expose the entire mpirun prefix and command along with the working directory for you to adjust.
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The pennant command (without mpirun) | options->command | string | (see below) |
prefix | The prefix (mpirun command and arguments) | options->mpirun | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /opt/AMG |
By default, when not set, you will just run pennant on a test problem, represented by an input text file:
# mpirun
mpirun --hostfile ./hostlist.txt
# command
pennant /opt/pennant/test/sedovsmall/sedovsmall.pnt
# Assembled into problem.sh as follows:
mpirun --hostfile ./hostlist.txt ./problem.sh
There are many input files that come in the container, and here are the fullpaths in /opt/pennant/test
:
Input files available to pennant
|-- leblanc
| |-- leblanc.pnt
| |-- leblanc.xy.std
| `-- leblanc.xy.std4
|-- leblancbig
| `-- leblancbig.pnt
|-- leblancx16
| `-- leblancx16.pnt
|-- leblancx4
| `-- leblancx4.pnt
|-- leblancx48
| `-- leblancx48.pnt
|-- leblancx64
| `-- leblancx64.pnt
|-- noh
| |-- noh.pnt
| |-- noh.xy.std
| `-- noh.xy.std4
|-- nohpoly
| `-- nohpoly.pnt
|-- nohsmall
| |-- nohsmall.pnt
| |-- nohsmall.xy.std
| `-- nohsmall.xy.std4
|-- nohsquare
| `-- nohsquare.pnt
|-- sample_outputs
| |-- edison
| | |-- leblancbig.thr1.out
| | |-- leblancx16.thr1024.out
| | |-- leblancx4.thr16.out
| | |-- leblancx64.mpi2048.out
| | `-- nohpoly.thr1.out
| `-- vulcan
| |-- leblancx16.out
| |-- leblancx48.out
| |-- sedovflat.out
| |-- sedovflatx16.out
| |-- sedovflatx4.out
| `-- sedovflatx40.out
|-- sedov
| |-- sedov.pnt
| |-- sedov.xy.std
| `-- sedov.xy.std4
|-- sedovbig
| `-- sedovbig.pnt
|-- sedovflat
| `-- sedovflat.pnt
|-- sedovflatx120
| `-- sedovflatx120.pnt
|-- sedovflatx16
| `-- sedovflatx16.pnt
|-- sedovflatx4
| `-- sedovflatx4.pnt
|-- sedovflatx40
| `-- sedovflatx40.pnt
`-- sedovsmall
|-- sedovsmall.pnt
|-- sedovsmall.xy
|-- sedovsmall.xy.std
`-- sedovsmall.xy.std4
And likely you will need to adjust the mpirun parameters, etc.
app-kripke¶
Kripke is (from the README):
Kripke is a simple, scalable, 3D Sn deterministic particle transport code. Its primary purpose is to research how data layout, programming paradigms and architectures effect the implementation and performance of Sn transport. A main goal of Kripke is investigating how different data-layouts affect instruction, thread and task level parallelism, and what the implications are on overall solver performance.
Akin to AMG, we allow you to modify each of the mpirun and kripke commands via:
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The amg command (without mpirun) | options->command | string | (see below) |
prefix | The prefix (mpirun command and arguments) | options->mpirun | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /opt/AMG |
By default, when not set, you will just run the kripke binary to get a test case run, so mpirun is set to be blank.
# mpirun is blank
""
# But could be an actual mpirun command
mpirun --hostfile ./hostlist.txt
# command written to problem.sh
kripke
# Assembled into
mpirun --hostfile ./hostlist.txt ./problem.sh
There is a nice guide here that can help you to decide on your specific command or problem size. Also note that we expose the following executables built with it:
ex1_vector-addition ex4_atomic-histogram ex7_nested-loop-reorder
ex1_vector-addition_solution ex4_atomic-histogram_solution ex7_nested-loop-reorder_solution
ex2_approx-pi ex5_line-of-sight ex8_tiled-matrix-transpose
ex2_approx-pi_solution ex5_line-of-sight_solution ex8_tiled-matrix-transpose_solution
ex3_colored-indexset ex6_stencil-offset-layout ex9_matrix-transpose-local-array
ex3_colored-indexset_solution ex6_stencil-offset-layout_solution ex9_matrix-transpose-local-array_solution
(meaning on the PATH in /opt/Kripke/build/bin
in the container).
For apps / metrics to be added, please see this issue.
app-ldms¶
LDMS is “a low-overhead, low-latency framework for collecting, transferring, and storing metric data on a large distributed computer system” and is packaged alongside ovis-hpc. While there are complex aggregator setups we could run, for this simple metric we simply run (on each separate pod/node). The following variables are supported:
Name | Description | Type | Default |
---|---|---|---|
command | The command to issue to ldms_ls (or that) | string | (see below) |
workdir | The working directory for the command | string | /opt |
completions | Number of times to run metric | int32 | unset (runs for lifetime of application or indefinitely) |
rate | Seconds to pause between measurements | int32 | 10 |
The following is the default command:
ldms_ls -h localhost -x sock -p 10444 -l -v
app-nekbone¶
Nekbone comes with a set of example that primarily depend on you choosing the correct workikng directory and command to run from. You can do this via these primary two commands:
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The full mpirun and nekbone command | options->command | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /root/nekbone-3.0/test |
And the following combinations are supported. Note that example1 did not build, and example2 is the default (if you don’t set these variables).
Command | Workdir |
---|---|
mpiexec --hostfile ./hostlist.txt -np 2 ./nekbone | /root/nekbone-3.0/test/example2 |
mpiexec --hostfile ./hostlist.txt -np 2 ./nekbone | /root/nekbone-3.0/test/example3 |
mpiexec --hostfile ./hostlist.txt -np 2 ./nekbone | /root/nekbone-3.0/test/nek_comm |
mpiexec --hostfile ./hostlist.txt -np 2 ./nekbone | /root/nekbone-3.0/test/nek_mgrid |
mpiexec --hostfile ./hostlist.txt -np 2 ./nekbone | /root/nekbone-3.0/test/nek_delay |
You can see the archived repository here. If there are interesting metrics in this project it would be worth bringing it back to life I think.
app-laghos¶
From the Laghos README:
Laghos (LAGrangian High-Order Solver) is a miniapp that solves the time-dependent Euler equations of compressible gas dynamics in a moving Lagrangian frame using unstructured high-order finite element spatial discretization and explicit high-order time-stepping.
Akin to other apps, you can customize the command and workdir. Note that the laghos
executable is at /workflow/laghos
and not on
the path, so the default references it as ./laghos
.
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The full mpirun and laghos command | options->command | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /workdir/laghos |
app-bdas¶
BDAS standards for “Big Data Analysis Suite” and you can read more about it here.
The container has machine learning analyses (provided in R) that work with MPI (openmpi),
The benchmarks are in /opt/bdas/benchmarks/r
in the container, and we provide an example for princomp
(see default command below):
Name | Description | Option Key | Type | Default |
---|---|---|---|---|
command | The full mpirun and Rscript command | options->command | string | (see below) |
workdir | The working directory for the command | options->workdir | string | /opt/bdas/benchmarks/r |
# This is the default command. You must target the --hostfile and use the allow as root flag!
mpirun --allow-run-as-root -np 4 --hostfile ./hostlist.txt Rscript /opt/bdas/benchmarks/r/princomp.r 250 50
Try setting the logging->interactive: true option in the spec to keep the container running and explore other benchmarks. These are the ones I’ve tried:
# This is the default command. You must target the --hostfile and use the allow as root flag!
mpirun --allow-run-as-root -np 4 --hostfile ./hostlist.txt Rscript /opt/bdas/benchmarks/r/kmeans.r 250 50
mpirun --allow-run-as-root -np 4 --hostfile ./hostlist.txt Rscript /opt/bdas/benchmarks/r/svm.r 250 50
Containers¶
To see all associated app containers, look at the converged-computing/metrics-container
repository (with Dockerfile
s and automation) and associated packages.