&&&& RUNNING TensorRT.sample_onnx_mnist [TensorRT v8200] # D:\Lib\TensorRT-8.2.0.6\bin\sample_onnx_mnist.exe [11/20/2021-16:47:05] [I] Building and running a GPU inference engine for Onnx MNIST [11/20/2021-16:47:06] [I] [TRT] [MemUsageChange] Init CUDA: CPU +739, GPU +0, now: CPU 9213, GPU 1284 (MiB) [11/20/2021-16:47:06] [I] [TRT] ---------------------------------------------------------------- [11/20/2021-16:47:06] [I] [TRT] Input filename: D:\Python\yolov4-pytorch\\yolov4_author_batch1_180.onnx [11/20/2021-16:47:06] [I] [TRT] ONNX IR version: 0.0.6 [11/20/2021-16:47:06] [I] [TRT] Opset version: 11 [11/20/2021-16:47:06] [I] [TRT] Producer name: pytorch [11/20/2021-16:47:06] [I] [TRT] Producer version: 1.8 [11/20/2021-16:47:06] [I] [TRT] Domain: [11/20/2021-16:47:06] [I] [TRT] Model version: 0 [11/20/2021-16:47:06] [I] [TRT] Doc string: [11/20/2021-16:47:06] [I] [TRT] ---------------------------------------------------------------- Run in FP16 mode. Start buildEngineWithConfig [11/20/2021-16:47:09] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 9607 MiB, GPU 1284 MiB [11/20/2021-16:47:10] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.1 but loaded cuBLAS/cuBLAS LT 11.3.0 [11/20/2021-16:47:10] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +742, GPU +266, now: CPU 10356, GPU 1550 (MiB) [11/20/2021-16:47:11] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +497, GPU +258, now: CPU 10853, GPU 1808 (MiB) [11/20/2021-16:47:11] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [11/20/2021-16:47:11] [I] [TRT] Local timing cache in use. Profiling results in this builder pass will not be stored. [11/20/2021-16:49:26] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output. [11/20/2021-17:29:04] [I] [TRT] [BlockAssignment] Algorithm Linear took 0.0088ms to assign 161 blocks to 161 nodes requiring 255066136 bytes. [11/20/2021-17:29:04] [I] [TRT] Total Activation Memory: 255066136 [11/20/2021-17:29:04] [I] [TRT] Detected 1 inputs and 3 output network tensors. [11/20/2021-17:29:05] [I] [TRT] Total Host Persistent Memory: 255296 [11/20/2021-17:29:05] [I] [TRT] Total Device Persistent Memory: 131863552 [11/20/2021-17:29:05] [I] [TRT] Total Scratch Memory: 0 [11/20/2021-17:29:05] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 137 MiB, GPU 16 MiB [11/20/2021-17:29:05] [I] [TRT] [BlockAssignment] Algorithm ShiftNTopDown took 34.3711ms to assign 6 blocks to 160 nodes requiring 51118080 bytes. [11/20/2021-17:29:05] [W] [TRT] TensorRT was linked against cuBLAS/cuBLAS LT 11.6.1 but loaded cuBLAS/cuBLAS LT 11.3.0 [11/20/2021-17:29:05] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +1, GPU +8, now: CPU 12310, GPU 2164 (MiB) [11/20/2021-17:29:05] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +5, GPU +8, now: CPU 12315, GPU 2172 (MiB) [11/20/2021-17:29:05] [W] [TRT] TensorRT was linked against cuDNN 8.2.1 but loaded cuDNN 8.0.5 [11/20/2021-17:29:05] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 12313 MiB, GPU 2138 MiB 生成engine文件时间:2.51581e+06ms Finish buildEngineWithConfig