OpenVINO™ Benchmarking Tool#
This tutorial tells you how to run the benchmark application on an 11th Generation Intel® Core™ processor with Intel® Iris® Xe Integrated Graphics or Intel® UHD Graphics. It uses the asynchronous mode to estimate deep learning inference engine performance and latency.
Start Docker* Container#
Check if your installation has the eiforamr-full-flavour-sdk Docker* image.
docker images |grep eiforamr-full-flavour-sdk #if you have it installed, the result is: eiforamr-full-flavour-sdk
Note
If the image is not installed, continuing with these steps triggers a build that takes longer than an hour (sometimes, a lot longer depending on the system resources and internet connection).
If the image is not installed, Intel® recommends installing the Robot Complete Kit with the Get Started Guide for Robots.
Check that EI for AMR environment is set:
echo $AMR_TUTORIALS # should output the path to EI for AMR tutorials /home/user/edge_insights_for_amr/Edge_Insights_for_Autonomous_Mobile_Robots_2023.1/AMR_containers/01_docker_sdk_env/docker_compose/05_tutorials
If nothing is output, refer to Get Started Guide for Robots Step 5 for information on how to configure the environment.
Go to the AMR_containers folder:
cd $CONTAINER_BASE_PATH
Start the Docker* container as root:
./run_interactive_docker.sh eiforamr-full-flavour-sdk:<TAG> root
Set Environment Variables#
The environment variables must be set before you can compile and run OpenVINO™ applications.
Run the following script:
source /opt/intel/openvino/setupvars.sh --or-- source <OPENVINO_INSTALL_DIR>/setupvars.sh
Build Benchmark Application#
Change directory and build the benchmark application using the
cmake
script file using the following commands:cd /opt/intel/openvino/samples/cpp ./build_samples.sh
Once the build is successful, access the benchmark application in the following directory:
cd /root/openvino_cpp_samples_build/intel64/Release -- or -- cd <INSTALL_DIR>/openvino_cpp_samples_build/intel64/Release
The
benchmark_app
application is available inside the Release folder.
Input File#
Select an image file or a sample video file to provide an input to the benchmark application from the following directory:
cd /root/inference_engine_cpp_samples_build/intel64/Release
Application Syntax and Options#
The benchmark application syntax is as follows:
./benchmark_app [OPTION]
In this tutorial, we recommend you select the following options:
./benchmark_app -m <model> -i <input> -d <device> -nireq <num_reqs> -nthreads <num_threads> -b <batch>
where:
<model>-------------The complete path to the model .xml file
<input>-------------The path to the folder containing image or sample video file.
<device>------------The device type can be GPU or CPU etc.,
<num_reqs>----------No of parallel inference requests (optional)
<num_threads>-------No of threads to use for inference on the CPU throughput mode (optional)
<batch>-------------Batch size value (optional)
For complete details on the available options, run the following command:
./benchmark_app -h
Run the Application#
The benchmark application is executed as seen below. This tutorial uses the following settings:
Benchmark application is executed on
frozen_inference_graph
model.Number of parallel inference requests is set as 8.
Number of CPU threads to use for inference is set as 8.
Device type is GPU.
./benchmark_app -d GPU -i ~/<dir>/input/ -m /home/eiforamr/workspace/object_detection/src/object_detection/models/ssd_mobilenet_v2_coco/frozen_inference_graph.xml -nireq 8 -nthreads 8
./benchmark_app -d GPU -i /home/eiforamr/data_samples/media_samples/plates_720.mp4 -m /home/eiforamr/workspace/object_detection/src/object_detection/models/ssd_mobilenet_v2_coco/frozen_inference_graph.xml -nireq 8 -nthreads 8 -hint none
Expected output:
[Step 1/11] Parsing and validating input arguments
[ INFO ] Parsing input parameters
[ INFO ] For input 1 files were added
[Step 2/11] Loading OpenVINO Runtime
[ INFO ] OpenVINO:
Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[ INFO ] Device info:
GPU
Build ................................. 2022.3.0-9052-9752fafe8eb-releases/2022/3
[Step 3/11] Setting device configuration
[ WARNING ] -nstreams default value is determined automatically for GPU device. Although the automatic selection usually provides a reasonable performance,but it still may be non-optimal for some cases, for more information look at README.
[Step 4/11] Reading model files
[ INFO ] Loading model files
[ INFO ] Read model took 26.60 ms
[ INFO ] Original model I/O parameters:
[ INFO ] Network inputs:
[ INFO ] image_tensor (node: image_tensor) : f32 / [...] / [1,3,300,300]
[ INFO ] Network outputs:
[ INFO ] DetectionOutput (node: DetectionOutput) : f32 / [...] / [1,1,100,7]
[Step 5/11] Resizing model to match image sizes and given batch
[ INFO ] Number of test configurations is calculated basing on number of input images
[ WARNING ] image_tensor: layout is not set explicitly, so it is defaulted to NCHW. It is STRONGLY recommended to set layout manually to avoid further issues.
[Step 6/11] Configuring input of the model
[ INFO ] Model batch size: 1
[ INFO ] Network inputs:
[ INFO ] image_tensor (node: image_tensor) : u8 / [N,C,H,W] / [1,3,300,300]
[ INFO ] Network outputs:
[ INFO ] DetectionOutput (node: DetectionOutput) : f32 / [...] / [1,1,100,7]
[Step 7/11] Loading the model to the device
[ INFO ] Compile model took 4922.37 ms
[Step 8/11] Querying optimal runtime parameters
[ INFO ] Model:
[ INFO ] NETWORK_NAME: frozen_inference_graph
[ INFO ] OPTIMAL_NUMBER_OF_INFER_REQUESTS: 4
[ INFO ] PERF_COUNT: NO
[ INFO ] MODEL_PRIORITY: MEDIUM
[ INFO ] GPU_HOST_TASK_PRIORITY: MEDIUM
[ INFO ] GPU_QUEUE_PRIORITY: MEDIUM
[ INFO ] GPU_QUEUE_THROTTLE: MEDIUM
[ INFO ] GPU_ENABLE_LOOP_UNROLLING: YES
[ INFO ] CACHE_DIR:
[ INFO ] PERFORMANCE_HINT:
[ INFO ] COMPILATION_NUM_THREADS: 4
[ INFO ] NUM_STREAMS: 2
[ INFO ] PERFORMANCE_HINT_NUM_REQUESTS: 8
[ INFO ] INFERENCE_PRECISION_HINT: undefined
[ INFO ] DEVICE_ID: 0
[Step 9/11] Creating infer requests and preparing input tensors
[ WARNING ] No supported image inputs found! Please check your file extensions: bmp, dib, jpeg, jpg, jpe, jp2, png, pbm, pgm, ppm, sr, ras, tiff, tif
[ INFO ] Test Config 0
[ INFO ] image_tensor ([N,C,H,W], u8, [1,3,300,300], static): random (image is expected)
[Step 10/11] Measuring performance (Start inference asynchronously, 8 inference requests using 2 streams for GPU, limits: 60000 ms duration)
[ INFO ] Benchmarking in inference only mode (inputs filling are not included in measurement loop).
[ INFO ] First inference took 11.73 ms
[Step 11/11] Dumping statistics report
[ INFO ] Count: 7112 iterations
[ INFO ] Duration: 60134.04 ms
[ INFO ] Latency:
[ INFO ] Median: 66.79 ms
[ INFO ] Average: 67.58 ms
[ INFO ] Min: 19.33 ms
[ INFO ] Max: 76.27 ms
[ INFO ] Throughput: 118.27 FPS
Benchmark Report#
Sample execution results using an 11th Gen Intel® Core™ i7-1185GRE @ 2.80 GHz.
Read network time (ms) |
89 |
Load network time (ms) |
44714.68 |
First inference time (ms) |
10.01 |
Total execution time (ms) |
60066.11 |
Total num of iterations |
9456 |
Latency (ms) |
51.33 |
Throughput (FPS) |
157.43 |
Note
Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. No product or component can be absolutely secure. Performance varies by use, configuration and other factors. Learn more at Intel® Performance Index.
Troubleshooting#
For general robot issues, go to: Troubleshooting for Robot Tutorials.