In Progress Metrics¶

These are metrics that are consistered under development (and likely need more eyes) to get fully working.

Network¶

network-chatterbug¶

network-chatterbug

Chatterbug provides a suite of communication proxy applications for HPC. We use a launcher/worker design.

Name	Description	Type	Default
mpirun	The options to give to mpirun (includes tasks)	string	`-N 8`
command	The chatterbug command (subdirectory) to run, see options below	string	stencil3d
args	Arguments for the command	string	`1 2 2 10 10 10 4 1`
sole-tenancy	Require sole tenancy	string ("true" or "false")	"true"

By default, we require sole-tenancy, but you can disable this. Note that the best place to look for “documentation” on the commands seems to be the source code. The following command options are available for command:

pairs
ping-ping
spread
stencil3d
stencil4d
subcom2d-coll
subcom2d-a2a
unstr-mesh

We have tested mostly stencil3d. Note that the mpirun command is parsed as follows:

$ mpirun --hostfile ./hostfile.txt --allow-run-as-root -N 4 /root/chatterbug/${command}/${executable} ${args}

Thus for the defaults, you’d get this command (on one pod):

$ mpirun --hostfile ./hostfile.txt --allow-run-as-root -N 4 /root/chatterbug/stencil3d/stencil3d.x 1 2 2 10 10 10 4 1

See the example linked in the header for a metrics.yaml example.

Standalone¶

app-hpl¶

app-hpl

The Linpack benchmark is used for the Top500, and generally is solving a dense system of linear equations. Arguments to customize include the following:

Name	Description	Type	Default
mpiargs	Arguments to give to mpi	string	empty string
tasks	Number of tasks per node	int32	detected used nproc
ratio	target memory occupation	string (but as a float, e.g., "0.3")	"0.3"
memory	memory in GiB	int32	detected from proc
blocksize	blocksize is the NBs "number blocks" value	int32
pfact		int32
nbmin		int32
ndiv		int32
row_or_colmajor_pmapping	PMAP process mapping (0=Row-,1=Column-major)	int32	0
rfact	(0=left, 1=Crout, 2=Right)	int32	0
bcast	(0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)	int32	0
depth	number of lookahead depth	int32	0
swap	(0=bin-exch,1=long,2=mix)	int32	0
swappingThreshold		int32	64
l1transposed	(0=transposed,1=no-transposed)	int32	0
utransposed	(0=transposed,1=no-transposed)	int32	0
memAlignment	memory alignment in double (> 0) (4,8,16)	int32

For the meaning of each of these, see this documentation and how they are used in hpl.go I made an effort to define them above, but you should consult the documentation above, because I don’t fully understand these yet.

We provide a simple build here, as typically vendors spend a lot of time custom-compiling the code for their architectures (and we are compiling for general use). We will use a script compute_N from the OLHPC Tutorials to generate input data for a particular problem size, and you can vary the input to this script via the computeArgs parameters. We use a default, and you can inspect the script help below:

compute_N --help

# compute_N -h
Compute N for HPL runs.

SYNOPSIS
  compute_N [-v] [--mem <SIZE_IN_GB>] [-N <NODES>] [-r <RATIO>] [-NB <NB>]
  compute_N [-v] [--mem <SIZE_IN_GB>] [-N <NODES>] [-p <PERCENTAGE_MEM>] [-NB <NB>]

  The following formulae is used (when using '-r <ratio>'):
     N = <ratio>*SQRT( Total Memory Size in bytes / sizeof(double) )
       = <ratio>*SQRT( <nnodes> * <ram_size> / 8)

  Alternatively you may wish to specify a memory usage ratio (with -p <percentage_mem>),
  in which case the following formulae is used:
      N = SQRT( <percentage_mem>/100 * Total Memory Size in bytes / sizeof(doubl)

OPTIONS
  -m --mem --ramsize <SIZE>
     Specify the total memory size per node, in GiB.
     Default RAM size consider (yet in KiB): 16051112 KiB
  -N --nodes <N>
     Number of compute nodes
  -NB <NB>
     NB parameters to use. Default: 192 (384 for skylake)
  -p --memshare <PERCENTAGE_MEM>
     Percentage of the total memory size to use.
     Derived from the below global ratio (i.e. 0% since RATIO=0.8)
  -r --ratio <RATIO>
     Global ratio to apply. Default: 0.8

EXAMPLE
  For 2 broadwell nodes on iris cluster, using 30% of the total memory per node:
     compute_N -N 2 -p 30 -m 128 -NB 192
  For 4 skylake nodes on iris cluster, using 85% of the total memory per node:
     compute_N -N 4 -p 85 -m 128 -NB 384

AUTHORS
  Sebastien Varrette <Sebastien.Varrette@uni.lu> and UL HPC Team

COPYRIGHT
  This is free software; see the source for copying conditions.  There is
  NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The following examples are provided to generate the HPL.dat for the analysis:

/opt/tutorials/benchmarks/HPL/scripts/compute_N -h
# 1 Broadwell node, alpha = 0.3
/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 192 -r 0.3 -N 1
# 2 Skylake (regular) nodes, alpha = 0.3
/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 384 -r 0.3 -N 2
# 4 bigmem (skylake) nodes, beta = 0.85
/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 3072 -NB 384 -p 85 -N 4

Here is a tiny setup I created for a testing case:

/opt/tutorials/benchmarks/HPL/scripts/compute_N -m 128 -NB 192 -r 0.3 -N 2

Next, you might care about the input data, a file called hpl.dat. By default we use a template that is populated by the above variables, and here is another example that I found in the repository:

Default hpl.dat

HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out      output file name (if any)
          device out (6=stdout,7=stderr,file)
          # of problems sizes (N)
       Ns
          # of NBs
         NBs
          PMAP process mapping (0=Row-,1=Column-major)
          # of process grids (P x Q)
4             Ps
7            Qs
0         threshold
          # of panel fact
          PFACTs (0=left, 1=Crout, 2=Right)
          # of recursive stopping criterium
          NBMINs (>= 1)
          # of panels in recursion
          NDIVs
          # of recursive panel fact.
          RFACTs (0=left, 1=Crout, 2=Right)
          # of broadcast
          BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
          # of lookahead depth
          DEPTHs (>=0)
          SWAP (0=bin-exch,1=long,2=mix)
         swapping threshold
          L1 in (0=transposed,1=no-transposed) form
          U  in (0=transposed,1=no-transposed) form
          Equilibration (0=no,1=yes)
          memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
                             Number of additional problem sizes for PTRANS
10000 30000                values of N
                             number of additional blocking sizes for PTRANS
9 8 13 13 20 16 32 64        values of NB

If there is something above not properly exposed please let us know.

Last update: Nov 27, 2023