Aligning Star Charts... Stand by for scatter data...
This study was run by setting a fixed local problem size per core/process. Instance types vary in their total core/process counts (`N`), which also defines the X-axis for the scatter plot. Comparing raw values is not always an "apples-to-apples" comparison for a fixed global problem size, but rather shows how performance scales. The following table explains the metrics available and how they might be interpreted in this context.
Original Metric (from HPCG Output) | Normalized Metric Name for Plot | Normalization Formula (Conceptual) | Expected Trend in Ideal Weak Scaling |
---|---|---|---|
GFLOP/s Summary::Raw SpMV= (and other raw GFLOP/s) |
Per-Core SpMV GFLOP/s (or MG, DDOT, WAXPBY, Total) | V(Raw_Metric_GFLOPs) / N |
Constant |
Final Summary::HPCG result is VALID with a GFLOP/s rating of= |
Per-Core Official FOM (GFLOP/s) | V(Official_FOM) / N |
Constant |
Benchmark Time Summary::Total= |
Avg. Time per CG Iteration (approx.) | V(Total_Time) / Total_Optimized_CG_Iterations |
Constant |
GB/s Summary::Raw Total B/W= (and read/write) |
Per-Core Total Memory BW (GB/s) (or Read/Write) | V(Total_BW) / N |
Constant |
Memory Use Information::Total memory used for data (Gbytes)= |
Per-Core Memory Used (GB) | V(Memory_Used) / N |
Constant |
Setup Information::Setup Time= |
Per-Core Setup Time (sec) | V(Setup_Time) / N |
Constant / Slow Growth |
DDOT Timing Variations::Max DDOT MPI_Allreduce time= |
Max MPI_Allreduce Time (sec) | V(Max_Allreduce_Time) (compare raw values) |
Slow Growth |
DDOT Timing Variations::Avg DDOT MPI_Allreduce time= |
Avg MPI_Allreduce Time (sec) | V(Avg_Allreduce_Time) (compare raw values) |
Slow Growth |
Benchmark Time Summary::DDOT= |
Avg. DDOT Time per CG Iteration (approx.) | V(Total_DDOT_Time) / Total_Optimized_CG_Iterations |
Constant / Slow Growth |
Spectral Convergence Tests::Unpreconditioned::Maximum iteration count= |
Unpreconditioned CG Iterations | V(Iteration_Count) (compare raw values) |
Constant |
Note: `V(Metric)` refers to the value of the metric, and `N` is the total number of Cores/MPI processes for the instance (from the JSON `coreMap`). "Total Optimized CG Iterations" is also extracted from the HPCG output. For metrics in the table labeled "Per-Core," you would perform the division by `N` *before* creating the JSON file if you want the Y-axis to show these normalized values. Otherwise, the Y-axis shows raw values, and the table guides interpretation. Metrics not explicitly listed as "Per-Core" in the table are generally plotted as their raw values.