VTune™ Profiler in a Docker* Container#

Run the profiling application in a Docker* container with the VTune™ profiler.

Run the Sample Application#

  1. Check if your installation has the eiforamr-full-flavour-sdk Docker* image.

    docker images |grep eiforamr-full-flavour-sdk
    #if you have it installed, the result is:
    eiforamr-full-flavour-sdk
    

    Note

    If the image is not installed, continuing with these steps triggers a build that takes longer than an hour (sometimes, a lot longer depending on the system resources and internet connection).

  2. If the image is not installed, Intel® recommends installing the Robot Complete Kit with the Get Started Guide for Robots.

  3. Check that EI for AMR environment is set:

    echo $AMR_TUTORIALS
    # should output the path to EI for AMR tutorials
    /home/user/edge_insights_for_amr/Edge_Insights_for_Autonomous_Mobile_Robots_2023.1/AMR_containers/01_docker_sdk_env/docker_compose/05_tutorials
    

    If nothing is output, refer to Get Started Guide for Robots Step 5 for information on how to configure the environment.

  4. Run the VTune™ profiler:

    CHOOSE_USER=root docker compose -f $AMR_TUTORIALS/vtune.tutorial.yml up oneapi
    

    Expected output:

    vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.
    vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/matrix_multiply_vtune/r001gh -command stop.
    Address of buf1 = 0x7f4578e4b010
    Offset of buf1 = 0x7f4578e4b180
    Address of buf2 = 0x7f457864a010
    Offset of buf2 = 0x7f457864a1c0
    Address of buf3 = 0x7f45746e2010
    Offset of buf3 = 0x7f45746e2100
    Address of buf4 = 0x7f4573ee1010
    Offset of buf4 = 0x7f4573ee1140
    Using multiply kernel: multiply1
    Running on Intel(R) Iris(R) Xe Graphics [0x9a49]
    Elapsed Time: 0.91916s
    vtune: Collection stopped.
    vtune: Using result path `/tmp/matrix_multiply_vtune/r001gh'
    vtune: Executing actions 19 % Resolving information for `libpi_opencl.so'
    vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libze_intel_gpu.so.1'.
    vtune: Executing actions 20 % Resolving information for `libc-dynamic.so'
    vtune: Warning: Cannot locate debugging information for file `/lib/modules/5.10.65/kernel/fs/overlayfs/overlay.ko'.
    vtune: Executing actions 20 % Resolving information for `libm-2.31.so'
    vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libm-2.31.so'.
    vtune: Executing actions 20 % Resolving information for `libc-2.31.so'
    vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libc-2.31.so'.
    vtune: Executing actions 20 % Resolving information for `ld-2.31.so'
    vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/ld-2.31.so'.
    vtune: Warning: Cannot locate file `vmlinux'.
    vtune: Executing actions 20 % Resolving information for `libpin3dwarf.so'
    vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libigc.so.1.0.8517'.
    vtune: Executing actions 20 % Resolving information for `libxed.so'
    vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis is not possible. Function-level analysis is limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.
    vtune: Executing actions 21 % Resolving information for `libgcc_s.so.1'
    vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libgcc_s.so.1'.
    vtune: Executing actions 21 % Resolving information for `libstdc++.so.6.0.28'
    vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28'.
    vtune: Executing actions 21 % Resolving information for `libtpsstool.so'
    vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/libtpsstool.so'.
    vtune: Executing actions 21 % Resolving information for `i915.ko'
    vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/runtime/libittnotify_collector.so'.
    vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/runtime/libittnotify_collector.so'.
    vtune: Executing actions 22 % Resolving information for `libOpenCL.so.1'
    vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libze_intel_gpu.so.1.2.20939'.
    vtune: Executing actions 22 % Resolving information for `libigdrcl.so'
    vtune: Warning: Cannot locate debugging information for file `/lib/modules/5.10.65/kernel/drivers/gpu/drm/i915/i915.ko'.
    vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/intel-opencl/libigdrcl.so'.
    vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/intel-opencl/libigdrcl.so'.
    vtune: Executing actions 75 % Generating a report                              Elapsed Time: 1.163s
       GPU Time: 0.041s
    EU Array Stalled/Idle: 55.0% of Elapsed time with GPU busy
    | The percentage of time when the EUs were stalled or idle is high, which has a
    | negative impact on compute-bound applications.
    |
       GPU L3 Bandwidth Bound: 82.0% of peak value
       | L3 bandwidth was high when EUs were stalled or idle. Consider improving
       | cache reuse.
       |
    
          Hottest GPU Computing Tasks Bound by GPU L3 Bandwidth
          Computing Task  Total Time
          --------------  ----------
          Matrix1<float>      0.035s
       Occupancy: 91.1% of peak value
    
          Hottest GPU Computing Tasks with Low Occupancy
          Computing Task  Total Time  SIMD Width  Peak Occupancy(%)  Occupancy(%)  SIMD Utilization(%)
          --------------  ----------  ----------  -----------------  ------------  -------------------
       Sampler Busy: 0.0% of peak value
    
          Hottest GPU Computing Tasks with High Sampler Usage
          Computing Task  Total Time
          --------------  ----------
    Collection and Platform Info
       Application Command Line: ./matrix.dpcpp
       Operating System: 5.10.65 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS"
       Computer Name: glaic3aeon2
       Result Size: 28.3 MB
       Collection start time: 15:39:14 04/01/2022 UTC
       Collection stop time: 15:39:15 04/01/2022 UTC
       Collector Type: Event-based sampling driver,Driverless Perf system-wide sampling,User-mode sampling and tracing
       CPU
          Name: Intel(R) microarchitecture code named Tigerlake
          Frequency: 2.803 GHz
          Logical CPU Count: 8
       GPU
          Name: TigerLake GT2 [Iris Xe Graphics]
          Vendor: Intel Corporation
          EU Count: 96
          Max EU Thread Count: 7
          Max Core Frequency: 1.350 GHz
          GPU OpenCL Info
                Version
                Max Compute Units: 96
                Max Work Group Size: 512
                Local Memory: 65.5 KB
                SVM Capabilities
    
    If you want to skip descriptions of detected performance issues in the report,
    enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>.
    Alternatively, you may view the report in the csv format: vtune -report
    <report_name> -format=csv.
    vtune: Executing actions 100 % done
    
  5. For a list of the steps that were executed, see 01_docker_sdk_env/docker_compose/05_tutorials/vtune.tutorial.yml.

Troubleshooting#

For general robot issues, go to: Troubleshooting for Robot Tutorials.