VTune™ Profiler in a Docker* Container#
Run the profiling application in a Docker* container with the VTune™ profiler.
Run the Sample Application#
Check if your installation has the eiforamr-full-flavour-sdk Docker* image.
docker images |grep eiforamr-full-flavour-sdk #if you have it installed, the result is: eiforamr-full-flavour-sdk
Note
If the image is not installed, continuing with these steps triggers a build that takes longer than an hour (sometimes, a lot longer depending on the system resources and internet connection).
If the image is not installed, Intel® recommends installing the Robot Complete Kit with the Get Started Guide for Robots.
Check that EI for AMR environment is set:
echo $AMR_TUTORIALS # should output the path to EI for AMR tutorials /home/user/edge_insights_for_amr/Edge_Insights_for_Autonomous_Mobile_Robots_2023.1/AMR_containers/01_docker_sdk_env/docker_compose/05_tutorials
If nothing is output, refer to Get Started Guide for Robots Step 5 for information on how to configure the environment.
Run the VTune™ profiler:
CHOOSE_USER=root docker compose -f $AMR_TUTORIALS/vtune.tutorial.yml up oneapi
Expected output:
vtune: Warning: To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location. vtune: Collection started. To stop the collection, either press CTRL-C or enter from another console window: vtune -r /tmp/matrix_multiply_vtune/r001gh -command stop. Address of buf1 = 0x7f4578e4b010 Offset of buf1 = 0x7f4578e4b180 Address of buf2 = 0x7f457864a010 Offset of buf2 = 0x7f457864a1c0 Address of buf3 = 0x7f45746e2010 Offset of buf3 = 0x7f45746e2100 Address of buf4 = 0x7f4573ee1010 Offset of buf4 = 0x7f4573ee1140 Using multiply kernel: multiply1 Running on Intel(R) Iris(R) Xe Graphics [0x9a49] Elapsed Time: 0.91916s vtune: Collection stopped. vtune: Using result path `/tmp/matrix_multiply_vtune/r001gh' vtune: Executing actions 19 % Resolving information for `libpi_opencl.so' vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libze_intel_gpu.so.1'. vtune: Executing actions 20 % Resolving information for `libc-dynamic.so' vtune: Warning: Cannot locate debugging information for file `/lib/modules/5.10.65/kernel/fs/overlayfs/overlay.ko'. vtune: Executing actions 20 % Resolving information for `libm-2.31.so' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libm-2.31.so'. vtune: Executing actions 20 % Resolving information for `libc-2.31.so' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libc-2.31.so'. vtune: Executing actions 20 % Resolving information for `ld-2.31.so' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/ld-2.31.so'. vtune: Warning: Cannot locate file `vmlinux'. vtune: Executing actions 20 % Resolving information for `libpin3dwarf.so' vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libigc.so.1.0.8517'. vtune: Executing actions 20 % Resolving information for `libxed.so' vtune: Warning: Cannot locate debugging information for the Linux kernel. Source-level analysis is not possible. Function-level analysis is limited to kernel symbol tables. See the Enabling Linux Kernel Analysis topic in the product online help for instructions. vtune: Executing actions 21 % Resolving information for `libgcc_s.so.1' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libgcc_s.so.1'. vtune: Executing actions 21 % Resolving information for `libstdc++.so.6.0.28' vtune: Warning: Cannot locate debugging information for file `/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.28'. vtune: Executing actions 21 % Resolving information for `libtpsstool.so' vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/libtpsstool.so'. vtune: Executing actions 21 % Resolving information for `i915.ko' vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/runtime/libittnotify_collector.so'. vtune: Warning: Cannot locate debugging information for file `/opt/intel/oneapi/vtune/2022.0.0/lib64/runtime/libittnotify_collector.so'. vtune: Executing actions 22 % Resolving information for `libOpenCL.so.1' vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/libze_intel_gpu.so.1.2.20939'. vtune: Executing actions 22 % Resolving information for `libigdrcl.so' vtune: Warning: Cannot locate debugging information for file `/lib/modules/5.10.65/kernel/drivers/gpu/drm/i915/i915.ko'. vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/intel-opencl/libigdrcl.so'. vtune: Warning: Cannot locate debugging information for file `/usr/local/lib/intel-opencl/libigdrcl.so'. vtune: Executing actions 75 % Generating a report Elapsed Time: 1.163s GPU Time: 0.041s EU Array Stalled/Idle: 55.0% of Elapsed time with GPU busy | The percentage of time when the EUs were stalled or idle is high, which has a | negative impact on compute-bound applications. | GPU L3 Bandwidth Bound: 82.0% of peak value | L3 bandwidth was high when EUs were stalled or idle. Consider improving | cache reuse. | Hottest GPU Computing Tasks Bound by GPU L3 Bandwidth Computing Task Total Time -------------- ---------- Matrix1<float> 0.035s Occupancy: 91.1% of peak value Hottest GPU Computing Tasks with Low Occupancy Computing Task Total Time SIMD Width Peak Occupancy(%) Occupancy(%) SIMD Utilization(%) -------------- ---------- ---------- ----------------- ------------ ------------------- Sampler Busy: 0.0% of peak value Hottest GPU Computing Tasks with High Sampler Usage Computing Task Total Time -------------- ---------- Collection and Platform Info Application Command Line: ./matrix.dpcpp Operating System: 5.10.65 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=20.04 DISTRIB_CODENAME=focal DISTRIB_DESCRIPTION="Ubuntu 20.04.3 LTS" Computer Name: glaic3aeon2 Result Size: 28.3 MB Collection start time: 15:39:14 04/01/2022 UTC Collection stop time: 15:39:15 04/01/2022 UTC Collector Type: Event-based sampling driver,Driverless Perf system-wide sampling,User-mode sampling and tracing CPU Name: Intel(R) microarchitecture code named Tigerlake Frequency: 2.803 GHz Logical CPU Count: 8 GPU Name: TigerLake GT2 [Iris Xe Graphics] Vendor: Intel Corporation EU Count: 96 Max EU Thread Count: 7 Max Core Frequency: 1.350 GHz GPU OpenCL Info Version Max Compute Units: 96 Max Work Group Size: 512 Local Memory: 65.5 KB SVM Capabilities If you want to skip descriptions of detected performance issues in the report, enter: vtune -report summary -report-knob show-issues=false -r <my_result_dir>. Alternatively, you may view the report in the csv format: vtune -report <report_name> -format=csv. vtune: Executing actions 100 % done
For a list of the steps that were executed, see
01_docker_sdk_env/docker_compose/05_tutorials/vtune.tutorial.yml
.
Troubleshooting#
For general robot issues, go to: Troubleshooting for Robot Tutorials.