Getting started with Intel Advisor 2018 roofline model

Instructions on how to prepare a roofline model with Intel advisor 2018 on Cray-xc40

For this test case, I will use NAS Benchmarks (LU). Moreover, I use Shaheen II supercomputer, a Cray-XC40 at KAUST Supercomputing Laboratory. Adjust the paths and the executable name accordingly.

  1. Connect to the system with X11
ssh -X ...
  1. We load the appropriate modules (it depends on the system)
module swap PrgEnv-cray/5.2.82 PrgEnv-intel
module load advisor/2018.1.1.535164 
module swap intel/15.0.2.164 intel/17.4.4.196
  1. We need to compile our application with debug mode and dynamic compilation

For example ftn -g -dynamic …

All the submission and config files are included in the roofline

  1. Using the Intel advisor

MPI application

With Cray MPI is better to use Intel advisor on one process, we will use the multi-prog feature

0 advixe-cl -v -collect survey -project-dir=/path_to_project/ -- ./executable
1-15 ./executable

This means that the Intel advisor will be used on the first rank only, declare the appropriate path and the name of the executable

Change default program tree processing mode (especially for Fortran code):

0 advixe-cl -v -collect survey –stackwalk-mode=online –no-stack-stitching -project-dir=/path_to_project/ -- ./executable
1-15 ./executable

Disable system and non interesting modules, for example for a module called demo.so:

0 advixe-cl -v -collect survey -module-filter-mode=include -module-filter=demo.so -project-dir=/path_to_project/ -- ./executable
1-15 ./executable

See: Intel Advisor overhead

sbatch submit_initial.sh

On our system, there are some errors at the end, but be sure that the execution of the application is finished without issues, then the errors are coming from some libraries on our system not related to the studied application.

sbatch submit_flops.sh

where the config_flops.txt contains this:

0 advixe-cl -collect tripcounts -flop -project-dir=/path_to_project/ -- ./lu.C.16
1-15 ./lu.C.16

If everything worked as expected, you have a folder called e000

If the execution time is too slow, you could disable the tricount or apply some techniques:

Disable tripcount:

0 advixe-cl -collect -flop -no-trip-counts -project-dir=/path_to_project/ -- ./lu.C.16
1-15 ./lu.C.16

Select loops to profile:

0 advixe-cl -collect tripcounts -flop -mark-up-list=<id1> -project-dir=/path_to_project/ -- ./lu.C.16
1-15 ./lu.C.16

or

0 advixe-cl -collect tripcounts -flop -loops=scalar,loop-height=0 -project-dir=/path_to_project/ -- ./lu.C.16
1-15 ./lu.C.16
0 advixe-cl -collect dependencies -track-stack-variables -no-filter-reductions -no-filter-by-scope -stop-after=0 -- ./lu.C.16
1-15 ./lu.C.16

GUI

alt text

alt text

alt text

alt text

alt text