To run with multigpu, please change --gpus based on the number of available GPUs in your machine. 2022-08-29 20:07:59,395 [INFO] root: Registry: ['nvcr.io'] 2022-08-29 20:07:59,451 [INFO] tlt.components.instance_handler.local_instance: Running command in container: nvcr.io/nvidia/tao/tao-toolkit-tf:v3.22.05-tf1.15.5-py3 2022-08-29 20:07:59,463 [WARNING] tlt.components.docker_handler.docker_handler: Docker will run the commands as root. If you would like to retain your local host permissions, please add the "user":"UID:GID" in the DockerOptions portion of the "/home/alexknish/.tao_mounts.json" file. You can obtain your users UID and GID by using the "id -u" and "id -g" commands on the terminal. Using TensorFlow backend. Using TensorFlow backend. WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them. /usr/local/lib/python3.6/dist-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.26.5) or chardet (3.0.4) doesn't match a supported version! RequestsDependencyWarning) WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py:42: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py:42: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py:45: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/scripts/train.py:45: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:153: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead. INFO: Log file already exists at /workspace/tao-experiments/yolo_v4_tiny/experiment_dir_unpruned/status.json WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:8: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:8: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:8: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:8: The name tf.AUTO_REUSE is deprecated. Please use tf.compat.v1.AUTO_REUSE instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:9: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:9: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:55: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/data_loader/generate_shape_tensors.py:55: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead. WARNING:tensorflow:From /opt/nvidia/third_party/keras/tensorflow_backend.py:183: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead. WARNING: From /opt/nvidia/third_party/keras/tensorflow_backend.py:183: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:2018: The name tf.image.resize_nearest_neighbor is deprecated. Please use tf.compat.v1.image.resize_nearest_neighbor instead. INFO: Serial augmentation enabled = False INFO: Pseudo sharding enabled = False INFO: Max Image Dimensions (all sources): (0, 0) INFO: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: -1 INFO: total dataset size 662, number of sources: 1, batch size per gpu: 20, steps: 34 WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead. WARNING: From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/converters/directives.py:119: The name tf.set_random_seed is deprecated. Please use tf.compat.v1.set_random_seed instead. WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code INFO: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates. INFO: shuffle: True - shard 0 of 1 INFO: sampling 1 datasets with weights: INFO: source: 0 weight: 1.000000 WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code /opt/nvidia/third_party/keras/tensorflow_backend.py:356: UserWarning: Seed 42 from outer graph might be getting used by function Dataset_map__map_func_set_random_wrapper, if the random op has not been provided any seed. Explicitly set the seed in the function if this is not the intended behavior. self, _map_func_set_random_wrapper, num_parallel_calls=num_parallel_calls /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py:302: UserWarning: tf.data static optimizations are not compatible with tf.Variable. The following optimizations will be disabled: map_and_batch_fusion, noop_elimination, shuffle_and_repeat_fusion. To enable optimizations, use resource variables instead by calling `tf.enable_resource_variables()` at the start of the program. ", ".join(static_optimizations)) WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/dataio/tf_data_pipe.py:131: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v4/dataio/tf_data_pipe.py:131: The name tf.image.resize_images is deprecated. Please use tf.image.resize instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:190: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:199: The name tf.is_variable_initialized is deprecated. Please use tf.compat.v1.is_variable_initialized instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:206: The name tf.variables_initializer is deprecated. Please use tf.compat.v1.variables_initializer instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead. WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead. WARNING: From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:986: The name tf.assign_add is deprecated. Please use tf.compat.v1.assign_add instead. INFO: Serial augmentation enabled = False INFO: Pseudo sharding enabled = False INFO: Max Image Dimensions (all sources): (0, 0) INFO: number of cpus: 16, io threads: 32, compute threads: 16, buffered batches: -1 INFO: total dataset size 260, number of sources: 1, batch size per gpu: 8, steps: 33 WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code INFO: Bounding box coordinates were detected in the input specification! Bboxes will be automatically converted to polygon coordinates. INFO: shuffle: False - shard 0 of 1 INFO: sampling 1 datasets with weights: INFO: source: 0 weight: 1.000000 WARNING:tensorflow:Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code WARNING: Entity > could not be transformed and will be executed as-is. Please report this to the AutoGraph team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output. Cause: Unable to locate the source code of >. Note that functions defined in certain environments, like the interactive Python shell do not expose their source code. If that is the case, you should to define them in a .py source file. If you are certain the code is graph-compatible, wrap the call using @tf.autograph.do_not_convert. Original error: could not get source code /usr/local/lib/python3.6/dist-packages/keras/engine/saving.py:292: UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually. warnings.warn('No training configuration found in save file: ' INFO: Log file already exists at /workspace/tao-experiments/yolo_v4_tiny/experiment_dir_unpruned/status.json __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== Input (InputLayer) (None, 3, None, None 0 __________________________________________________________________________________________________ conv_0 (Conv2D) (None, 32, None, Non 864 Input[0][0] __________________________________________________________________________________________________ conv_0_bn (BatchNormalization) (None, 32, None, Non 128 conv_0[0][0] __________________________________________________________________________________________________ conv_0_mish (LeakyReLU) (None, 32, None, Non 0 conv_0_bn[0][0] __________________________________________________________________________________________________ conv_1 (Conv2D) (None, 64, None, Non 18432 conv_0_mish[0][0] __________________________________________________________________________________________________ conv_1_bn (BatchNormalization) (None, 64, None, Non 256 conv_1[0][0] __________________________________________________________________________________________________ conv_1_mish (LeakyReLU) (None, 64, None, Non 0 conv_1_bn[0][0] __________________________________________________________________________________________________ conv_2_conv_0 (Conv2D) (None, 64, None, Non 36864 conv_1_mish[0][0] __________________________________________________________________________________________________ conv_2_conv_0_bn (BatchNormaliz (None, 64, None, Non 256 conv_2_conv_0[0][0] __________________________________________________________________________________________________ conv_2_conv_0_mish (LeakyReLU) (None, 64, None, Non 0 conv_2_conv_0_bn[0][0] __________________________________________________________________________________________________ conv_2_split_0 (Split) (None, 32, None, Non 0 conv_2_conv_0_mish[0][0] __________________________________________________________________________________________________ conv_2_conv_1 (Conv2D) (None, 32, None, Non 9216 conv_2_split_0[0][0] __________________________________________________________________________________________________ conv_2_conv_1_bn (BatchNormaliz (None, 32, None, Non 128 conv_2_conv_1[0][0] __________________________________________________________________________________________________ conv_2_conv_1_mish (LeakyReLU) (None, 32, None, Non 0 conv_2_conv_1_bn[0][0] __________________________________________________________________________________________________ conv_2_conv_2 (Conv2D) (None, 32, None, Non 9216 conv_2_conv_1_mish[0][0] __________________________________________________________________________________________________ conv_2_conv_2_bn (BatchNormaliz (None, 32, None, Non 128 conv_2_conv_2[0][0] __________________________________________________________________________________________________ conv_2_conv_2_mish (LeakyReLU) (None, 32, None, Non 0 conv_2_conv_2_bn[0][0] __________________________________________________________________________________________________ conv_2_concat_0 (Concatenate) (None, 64, None, Non 0 conv_2_conv_2_mish[0][0] conv_2_conv_1_mish[0][0] __________________________________________________________________________________________________ conv_2_conv_3 (Conv2D) (None, 64, None, Non 4096 conv_2_concat_0[0][0] __________________________________________________________________________________________________ conv_2_conv_3_bn (BatchNormaliz (None, 64, None, Non 256 conv_2_conv_3[0][0] __________________________________________________________________________________________________ conv_2_conv_3_mish (LeakyReLU) (None, 64, None, Non 0 conv_2_conv_3_bn[0][0] __________________________________________________________________________________________________ conv_2_concat_1 (Concatenate) (None, 128, None, No 0 conv_2_conv_0_mish[0][0] conv_2_conv_3_mish[0][0] __________________________________________________________________________________________________ conv_2_pool_0 (MaxPooling2D) (None, 128, None, No 0 conv_2_concat_1[0][0] __________________________________________________________________________________________________ conv_3_conv_0 (Conv2D) (None, 128, None, No 147456 conv_2_pool_0[0][0] __________________________________________________________________________________________________ conv_3_conv_0_bn (BatchNormaliz (None, 128, None, No 512 conv_3_conv_0[0][0] __________________________________________________________________________________________________ conv_3_conv_0_mish (LeakyReLU) (None, 128, None, No 0 conv_3_conv_0_bn[0][0] __________________________________________________________________________________________________ conv_3_split_0 (Split) (None, 64, None, Non 0 conv_3_conv_0_mish[0][0] __________________________________________________________________________________________________ conv_3_conv_1 (Conv2D) (None, 64, None, Non 36864 conv_3_split_0[0][0] __________________________________________________________________________________________________ conv_3_conv_1_bn (BatchNormaliz (None, 64, None, Non 256 conv_3_conv_1[0][0] __________________________________________________________________________________________________ conv_3_conv_1_mish (LeakyReLU) (None, 64, None, Non 0 conv_3_conv_1_bn[0][0] __________________________________________________________________________________________________ conv_3_conv_2 (Conv2D) (None, 64, None, Non 36864 conv_3_conv_1_mish[0][0] __________________________________________________________________________________________________ conv_3_conv_2_bn (BatchNormaliz (None, 64, None, Non 256 conv_3_conv_2[0][0] __________________________________________________________________________________________________ conv_3_conv_2_mish (LeakyReLU) (None, 64, None, Non 0 conv_3_conv_2_bn[0][0] __________________________________________________________________________________________________ conv_3_concat_0 (Concatenate) (None, 128, None, No 0 conv_3_conv_2_mish[0][0] conv_3_conv_1_mish[0][0] __________________________________________________________________________________________________ conv_3_conv_3 (Conv2D) (None, 128, None, No 16384 conv_3_concat_0[0][0] __________________________________________________________________________________________________ conv_3_conv_3_bn (BatchNormaliz (None, 128, None, No 512 conv_3_conv_3[0][0] __________________________________________________________________________________________________ conv_3_conv_3_mish (LeakyReLU) (None, 128, None, No 0 conv_3_conv_3_bn[0][0] __________________________________________________________________________________________________ conv_3_concat_1 (Concatenate) (None, 256, None, No 0 conv_3_conv_0_mish[0][0] conv_3_conv_3_mish[0][0] __________________________________________________________________________________________________ conv_3_pool_0 (MaxPooling2D) (None, 256, None, No 0 conv_3_concat_1[0][0] __________________________________________________________________________________________________ conv_4_conv_0 (Conv2D) (None, 256, None, No 589824 conv_3_pool_0[0][0] __________________________________________________________________________________________________ conv_4_conv_0_bn (BatchNormaliz (None, 256, None, No 1024 conv_4_conv_0[0][0] __________________________________________________________________________________________________ conv_4_conv_0_mish (LeakyReLU) (None, 256, None, No 0 conv_4_conv_0_bn[0][0] __________________________________________________________________________________________________ conv_4_split_0 (Split) (None, 128, None, No 0 conv_4_conv_0_mish[0][0] __________________________________________________________________________________________________ conv_4_conv_1 (Conv2D) (None, 128, None, No 147456 conv_4_split_0[0][0] __________________________________________________________________________________________________ conv_4_conv_1_bn (BatchNormaliz (None, 128, None, No 512 conv_4_conv_1[0][0] __________________________________________________________________________________________________ conv_4_conv_1_mish (LeakyReLU) (None, 128, None, No 0 conv_4_conv_1_bn[0][0] __________________________________________________________________________________________________ conv_4_conv_2 (Conv2D) (None, 128, None, No 147456 conv_4_conv_1_mish[0][0] __________________________________________________________________________________________________ conv_4_conv_2_bn (BatchNormaliz (None, 128, None, No 512 conv_4_conv_2[0][0] __________________________________________________________________________________________________ conv_4_conv_2_mish (LeakyReLU) (None, 128, None, No 0 conv_4_conv_2_bn[0][0] __________________________________________________________________________________________________ conv_4_concat_0 (Concatenate) (None, 256, None, No 0 conv_4_conv_2_mish[0][0] conv_4_conv_1_mish[0][0] __________________________________________________________________________________________________ conv_4_conv_3 (Conv2D) (None, 256, None, No 65536 conv_4_concat_0[0][0] __________________________________________________________________________________________________ conv_4_conv_3_bn (BatchNormaliz (None, 256, None, No 1024 conv_4_conv_3[0][0] __________________________________________________________________________________________________ conv_4_conv_3_mish (LeakyReLU) (None, 256, None, No 0 conv_4_conv_3_bn[0][0] __________________________________________________________________________________________________ conv_4_concat_1 (Concatenate) (None, 512, None, No 0 conv_4_conv_0_mish[0][0] conv_4_conv_3_mish[0][0] __________________________________________________________________________________________________ conv_4_pool_0 (MaxPooling2D) (None, 512, None, No 0 conv_4_concat_1[0][0] __________________________________________________________________________________________________ conv_5 (Conv2D) (None, 512, None, No 2359296 conv_4_pool_0[0][0] __________________________________________________________________________________________________ conv_5_bn (BatchNormalization) (None, 512, None, No 2048 conv_5[0][0] __________________________________________________________________________________________________ conv_5_mish (LeakyReLU) (None, 512, None, No 0 conv_5_bn[0][0] __________________________________________________________________________________________________ yolo_conv1_1 (Conv2D) (None, 256, None, No 131072 conv_5_mish[0][0] __________________________________________________________________________________________________ yolo_conv1_1_bn (BatchNormaliza (None, 256, None, No 1024 yolo_conv1_1[0][0] __________________________________________________________________________________________________ yolo_conv1_1_lrelu (LeakyReLU) (None, 256, None, No 0 yolo_conv1_1_bn[0][0] __________________________________________________________________________________________________ yolo_conv2 (Conv2D) (None, 128, None, No 32768 yolo_conv1_1_lrelu[0][0] __________________________________________________________________________________________________ yolo_conv2_bn (BatchNormalizati (None, 128, None, No 512 yolo_conv2[0][0] __________________________________________________________________________________________________ yolo_conv2_lrelu (LeakyReLU) (None, 128, None, No 0 yolo_conv2_bn[0][0] __________________________________________________________________________________________________ upsample0 (UpSampling2D) (None, 128, None, No 0 yolo_conv2_lrelu[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate) (None, 384, None, No 0 upsample0[0][0] conv_4_conv_3_mish[0][0] __________________________________________________________________________________________________ yolo_conv1_6 (Conv2D) (None, 512, None, No 1179648 yolo_conv1_1_lrelu[0][0] __________________________________________________________________________________________________ yolo_conv3_6 (Conv2D) (None, 256, None, No 884736 concatenate_2[0][0] __________________________________________________________________________________________________ yolo_conv1_6_bn (BatchNormaliza (None, 512, None, No 2048 yolo_conv1_6[0][0] __________________________________________________________________________________________________ yolo_conv3_6_bn (BatchNormaliza (None, 256, None, No 1024 yolo_conv3_6[0][0] __________________________________________________________________________________________________ yolo_conv1_6_lrelu (LeakyReLU) (None, 512, None, No 0 yolo_conv1_6_bn[0][0] __________________________________________________________________________________________________ yolo_conv3_6_lrelu (LeakyReLU) (None, 256, None, No 0 yolo_conv3_6_bn[0][0] __________________________________________________________________________________________________ conv_big_object (Conv2D) (None, 33, None, Non 16929 yolo_conv1_6_lrelu[0][0] __________________________________________________________________________________________________ conv_mid_object (Conv2D) (None, 33, None, Non 8481 yolo_conv3_6_lrelu[0][0] __________________________________________________________________________________________________ bg_permute (Permute) (None, None, None, 3 0 conv_big_object[0][0] __________________________________________________________________________________________________ md_permute (Permute) (None, None, None, 3 0 conv_mid_object[0][0] __________________________________________________________________________________________________ bg_reshape (Reshape) (None, None, 11) 0 bg_permute[0][0] __________________________________________________________________________________________________ md_reshape (Reshape) (None, None, 11) 0 md_permute[0][0] __________________________________________________________________________________________________ bg_anchor (YOLOAnchorBox) (None, None, 6) 0 conv_big_object[0][0] __________________________________________________________________________________________________ bg_bbox_processor (BBoxPostProc (None, None, 11) 0 bg_reshape[0][0] __________________________________________________________________________________________________ md_anchor (YOLOAnchorBox) (None, None, 6) 0 conv_mid_object[0][0] __________________________________________________________________________________________________ md_bbox_processor (BBoxPostProc (None, None, 11) 0 md_reshape[0][0] __________________________________________________________________________________________________ encoded_bg (Concatenate) (None, None, 17) 0 bg_anchor[0][0] bg_bbox_processor[0][0] __________________________________________________________________________________________________ encoded_md (Concatenate) (None, None, 17) 0 md_anchor[0][0] md_bbox_processor[0][0] __________________________________________________________________________________________________ encoded_detections (Concatenate (None, None, 17) 0 encoded_bg[0][0] encoded_md[0][0] ================================================================================================== Total params: 5,891,874 Trainable params: 5,885,666 Non-trainable params: 6,208 __________________________________________________________________________________________________ WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:7: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:7: The name tf.local_variables_initializer is deprecated. Please use tf.compat.v1.local_variables_initializer instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:8: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:8: The name tf.tables_initializer is deprecated. Please use tf.compat.v1.tables_initializer instead. WARNING:tensorflow:From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:9: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead. WARNING: From /root/.cache/bazel/_bazel_root/ed34e6d125608f91724fda23656f1726/execroot/ai_infra/bazel-out/k8-fastbuild/bin/magnet/packages/iva/build_wheel.runfiles/ai_infra/iva/yolo_v3/utils/tensor_utils.py:9: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead. INFO: Starting Training Loop. Epoch 1/5000 2/83 [..............................] - ETA: 14:41 - loss: 12710.5020/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (1.365228). Check your callbacks. % delta_t_median) 83/83 [==============================] - 137s 2s/step - loss: 12398.1065 4c7ff9edf27a:77:132 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.5<0> 4c7ff9edf27a:77:132 [0] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so 4c7ff9edf27a:77:132 [0] NCCL INFO P2P plugin IBext 4c7ff9edf27a:77:132 [0] NCCL INFO NET/IB : No device found. 4c7ff9edf27a:77:132 [0] NCCL INFO NET/IB : No device found. 4c7ff9edf27a:77:132 [0] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.5<0> 4c7ff9edf27a:77:132 [0] NCCL INFO Using network Socket NCCL version 2.11.4+cuda11.6 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 00/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 01/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 02/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 03/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 04/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 05/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 06/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 07/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 08/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 09/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 10/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 11/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 12/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 13/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 14/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 15/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 16/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 17/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 18/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 19/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 20/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 21/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 22/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 23/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 24/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 25/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 26/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 27/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 28/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 29/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 30/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Channel 31/32 : 0 4c7ff9edf27a:77:132 [0] NCCL INFO Trees [0] -1/-1/-1->0->-1 [1] -1/-1/-1->0->-1 [2] -1/-1/-1->0->-1 [3] -1/-1/-1->0->-1 [4] -1/-1/-1->0->-1 [5] -1/-1/-1->0->-1 [6] -1/-1/-1->0->-1 [7] -1/-1/-1->0->-1 [8] -1/-1/-1->0->-1 [9] -1/-1/-1->0->-1 [10] -1/-1/-1->0->-1 [11] -1/-1/-1->0->-1 [12] -1/-1/-1->0->-1 [13] -1/-1/-1->0->-1 [14] -1/-1/-1->0->-1 [15] -1/-1/-1->0->-1 [16] -1/-1/-1->0->-1 [17] -1/-1/-1->0->-1 [18] -1/-1/-1->0->-1 [19] -1/-1/-1->0->-1 [20] -1/-1/-1->0->-1 [21] -1/-1/-1->0->-1 [22] -1/-1/-1->0->-1 [23] -1/-1/-1->0->-1 [24] -1/-1/-1->0->-1 [25] -1/-1/-1->0->-1 [26] -1/-1/-1->0->-1 [27] -1/-1/-1->0->-1 [28] -1/-1/-1->0->-1 [29] -1/-1/-1->0->-1 [30] -1/-1/-1->0->-1 [31] -1/-1/-1->0->-1 4c7ff9edf27a:77:132 [0] NCCL INFO Connected all rings 4c7ff9edf27a:77:132 [0] NCCL INFO Connected all trees 4c7ff9edf27a:77:132 [0] NCCL INFO 32 coll channels, 32 p2p channels, 32 p2p channels per peer 4c7ff9edf27a:77:132 [0] NCCL INFO comm 0x7fcd847cc270 rank 0 nranks 1 cudaDev 0 busId 1000 - Init COMPLETE INFO: Training loop in progress Epoch 2/5000 83/83 [==============================] - 121s 1s/step - loss: 15265.6744 INFO: Training loop in progress Epoch 3/5000 83/83 [==============================] - 126s 2s/step - loss: 14793.9405 INFO: Training loop in progress Epoch 4/5000 83/83 [==============================] - 146s 2s/step - loss: 17846.2810 INFO: Training loop in progress Epoch 5/5000 83/83 [==============================] - 132s 2s/step - loss: 15980.1396 INFO: Training loop in progress Epoch 6/5000 83/83 [==============================] - 112s 1s/step - loss: 11544.7244 INFO: Training loop in progress Epoch 7/5000 83/83 [==============================] - 133s 2s/step - loss: 16378.1407 INFO: Training loop in progress Epoch 8/5000 83/83 [==============================] - 117s 1s/step - loss: 11959.1695 INFO: Training loop in progress Epoch 9/5000 83/83 [==============================] - 128s 2s/step - loss: 12267.0871 INFO: Training loop in progress Epoch 10/5000 83/83 [==============================] - 139s 2s/step - loss: 13654.4403 INFO: Training loop in progress Epoch 11/5000 83/83 [==============================] - 128s 2s/step - loss: 14408.9538 INFO: Training loop in progress Epoch 12/5000 83/83 [==============================] - 131s 2s/step - loss: 11331.6713 INFO: Training loop in progress Epoch 13/5000 83/83 [==============================] - 108s 1s/step - loss: 8661.9793 INFO: Training loop in progress Epoch 14/5000 83/83 [==============================] - 111s 1s/step - loss: 8735.2296 INFO: Training loop in progress Epoch 15/5000 83/83 [==============================] - 121s 1s/step - loss: 9798.7786 INFO: Training loop in progress Epoch 16/5000 83/83 [==============================] - 126s 2s/step - loss: 8772.4136 INFO: Training loop in progress Epoch 17/5000 83/83 [==============================] - 127s 2s/step - loss: 8087.9690 INFO: Training loop in progress Epoch 18/5000 83/83 [==============================] - 132s 2s/step - loss: 8138.8316 INFO: Training loop in progress Epoch 19/5000 83/83 [==============================] - 102s 1s/step - loss: 7857.7488 INFO: Training loop in progress Epoch 20/5000 83/83 [==============================] - 117s 1s/step - loss: 5288.7733 INFO: Training loop in progress Epoch 21/5000 83/83 [==============================] - 117s 1s/step - loss: 7267.4274 INFO: Training loop in progress Epoch 22/5000 83/83 [==============================] - 122s 1s/step - loss: 6803.8384 INFO: Training loop in progress Epoch 23/5000 83/83 [==============================] - 106s 1s/step - loss: 4753.9337 INFO: Training loop in progress Epoch 24/5000 83/83 [==============================] - 116s 1s/step - loss: 5136.7879 INFO: Training loop in progress Epoch 25/5000 83/83 [==============================] - 111s 1s/step - loss: 5493.7028 INFO: Training loop in progress Epoch 26/5000 83/83 [==============================] - 93s 1s/step - loss: 5249.6733 INFO: Training loop in progress Epoch 27/5000 83/83 [==============================] - 129s 2s/step - loss: 5695.3268 INFO: Training loop in progress Epoch 28/5000 83/83 [==============================] - 108s 1s/step - loss: 4542.8570 INFO: Training loop in progress Epoch 29/5000 83/83 [==============================] - 129s 2s/step - loss: 5126.4654 INFO: Training loop in progress Epoch 30/5000 83/83 [==============================] - 117s 1s/step - loss: 3799.6377 INFO: Training loop in progress Epoch 31/5000 83/83 [==============================] - 143s 2s/step - loss: 4677.1346 INFO: Training loop in progress Epoch 32/5000 83/83 [==============================] - 126s 2s/step - loss: 4114.9535 INFO: Training loop in progress Epoch 33/5000 83/83 [==============================] - 105s 1s/step - loss: 3051.0272 INFO: Training loop in progress Epoch 34/5000 83/83 [==============================] - 129s 2s/step - loss: 3503.5695 INFO: Training loop in progress Epoch 35/5000 83/83 [==============================] - 95s 1s/step - loss: 3034.4759 INFO: Training loop in progress Epoch 36/5000 83/83 [==============================] - 136s 2s/step - loss: 4217.8427 INFO: Training loop in progress Epoch 37/5000 1/83 [..............................] - ETA: 38s - loss: 4737.6479/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.473203). Check your callbacks. % delta_t_median) 83/83 [==============================] - 129s 2s/step - loss: 3271.2251 INFO: Training loop in progress Epoch 38/5000 83/83 [==============================] - 128s 2s/step - loss: 3098.7845 INFO: Training loop in progress Epoch 39/5000 83/83 [==============================] - 132s 2s/step - loss: 3596.0278 INFO: Training loop in progress Epoch 40/5000 83/83 [==============================] - 108s 1s/step - loss: 3233.4570 INFO: Training loop in progress Epoch 41/5000 83/83 [==============================] - 119s 1s/step - loss: 2927.8716 INFO: Training loop in progress Epoch 42/5000 83/83 [==============================] - 121s 1s/step - loss: 2764.3816 INFO: Training loop in progress Epoch 43/5000 83/83 [==============================] - 122s 1s/step - loss: 3414.7361 INFO: Training loop in progress Epoch 44/5000 83/83 [==============================] - 133s 2s/step - loss: 2879.3942 INFO: Training loop in progress Epoch 45/5000 83/83 [==============================] - 110s 1s/step - loss: 2315.7608 INFO: Training loop in progress Epoch 46/5000 83/83 [==============================] - 103s 1s/step - loss: 2432.8093 INFO: Training loop in progress Epoch 47/5000 83/83 [==============================] - 117s 1s/step - loss: 2374.0634 INFO: Training loop in progress Epoch 48/5000 83/83 [==============================] - 125s 2s/step - loss: 2874.9952 INFO: Training loop in progress Epoch 49/5000 83/83 [==============================] - 100s 1s/step - loss: 2190.8406 INFO: Training loop in progress Epoch 50/5000 83/83 [==============================] - 107s 1s/step - loss: 2232.5123 INFO: Training loop in progress Epoch 51/5000 83/83 [==============================] - 130s 2s/step - loss: 2351.0396 INFO: Training loop in progress Epoch 52/5000 83/83 [==============================] - 105s 1s/step - loss: 1734.0688 INFO: Training loop in progress Epoch 53/5000 83/83 [==============================] - 85s 1s/step - loss: 1800.6466 INFO: Training loop in progress Epoch 54/5000 83/83 [==============================] - 108s 1s/step - loss: 1827.3820 INFO: Training loop in progress Epoch 55/5000 83/83 [==============================] - 121s 1s/step - loss: 2275.5785 INFO: Training loop in progress Epoch 56/5000 83/83 [==============================] - 101s 1s/step - loss: 1463.1735 INFO: Training loop in progress Epoch 57/5000 83/83 [==============================] - 119s 1s/step - loss: 1880.0658 INFO: Training loop in progress Epoch 58/5000 83/83 [==============================] - 117s 1s/step - loss: 1544.9079 INFO: Training loop in progress Epoch 59/5000 83/83 [==============================] - 115s 1s/step - loss: 2004.9750 INFO: Training loop in progress Epoch 60/5000 83/83 [==============================] - 119s 1s/step - loss: 1809.0794 INFO: Training loop in progress Epoch 61/5000 83/83 [==============================] - 98s 1s/step - loss: 1521.3681 INFO: Training loop in progress Epoch 62/5000 83/83 [==============================] - 108s 1s/step - loss: 1525.1027 INFO: Training loop in progress Epoch 63/5000 83/83 [==============================] - 95s 1s/step - loss: 1392.5238 INFO: Training loop in progress Epoch 64/5000 83/83 [==============================] - 98s 1s/step - loss: 1582.1694 INFO: Training loop in progress Epoch 65/5000 83/83 [==============================] - 102s 1s/step - loss: 1632.9866 INFO: Training loop in progress Epoch 66/5000 83/83 [==============================] - 124s 1s/step - loss: 1484.8646 INFO: Training loop in progress Epoch 67/5000 83/83 [==============================] - 115s 1s/step - loss: 1714.1005 INFO: Training loop in progress Epoch 68/5000 83/83 [==============================] - 114s 1s/step - loss: 1508.3418 INFO: Training loop in progress Epoch 69/5000 83/83 [==============================] - 107s 1s/step - loss: 1180.3545 INFO: Training loop in progress Epoch 70/5000 83/83 [==============================] - 98s 1s/step - loss: 1280.4054 INFO: Training loop in progress Epoch 71/5000 83/83 [==============================] - 91s 1s/step - loss: 1222.9809 INFO: Training loop in progress Epoch 72/5000 83/83 [==============================] - 92s 1s/step - loss: 1370.2211 INFO: Training loop in progress Epoch 73/5000 1/83 [..............................] - ETA: 41s - loss: 1494.2764/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.532754). Check your callbacks. % delta_t_median) 83/83 [==============================] - 94s 1s/step - loss: 1090.1139 INFO: Training loop in progress Epoch 74/5000 1/83 [..............................] - ETA: 36s - loss: 1467.2612/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.522248). Check your callbacks. % delta_t_median) 83/83 [==============================] - 96s 1s/step - loss: 1085.7539 INFO: Training loop in progress Epoch 75/5000 83/83 [==============================] - 103s 1s/step - loss: 1077.9666 INFO: Training loop in progress Epoch 76/5000 83/83 [==============================] - 98s 1s/step - loss: 1246.7316 INFO: Training loop in progress Epoch 77/5000 83/83 [==============================] - 84s 1s/step - loss: 1154.8009 INFO: Training loop in progress Epoch 78/5000 83/83 [==============================] - 112s 1s/step - loss: 1192.7518 INFO: Training loop in progress Epoch 79/5000 83/83 [==============================] - 91s 1s/step - loss: 980.9180 INFO: Training loop in progress Epoch 80/5000 83/83 [==============================] - 107s 1s/step - loss: 928.7656 INFO: Training loop in progress Epoch 81/5000 83/83 [==============================] - 92s 1s/step - loss: 870.8427 INFO: Training loop in progress Epoch 82/5000 83/83 [==============================] - 90s 1s/step - loss: 935.1182 INFO: Training loop in progress Epoch 83/5000 83/83 [==============================] - 117s 1s/step - loss: 1010.3532 INFO: Training loop in progress Epoch 84/5000 83/83 [==============================] - 89s 1s/step - loss: 866.7936 INFO: Training loop in progress Epoch 85/5000 83/83 [==============================] - 113s 1s/step - loss: 1136.3068 INFO: Training loop in progress Epoch 86/5000 83/83 [==============================] - 104s 1s/step - loss: 955.8329 INFO: Training loop in progress Epoch 87/5000 83/83 [==============================] - 95s 1s/step - loss: 725.9081 INFO: Training loop in progress Epoch 88/5000 83/83 [==============================] - 95s 1s/step - loss: 959.9510 INFO: Training loop in progress Epoch 89/5000 83/83 [==============================] - 114s 1s/step - loss: 866.8557 INFO: Training loop in progress Epoch 90/5000 83/83 [==============================] - 120s 1s/step - loss: 719.4310 INFO: Training loop in progress Epoch 91/5000 83/83 [==============================] - 97s 1s/step - loss: 735.1600 INFO: Training loop in progress Epoch 92/5000 83/83 [==============================] - 114s 1s/step - loss: 869.5980 INFO: Training loop in progress Epoch 93/5000 83/83 [==============================] - 98s 1s/step - loss: 714.0628 INFO: Training loop in progress Epoch 94/5000 83/83 [==============================] - 102s 1s/step - loss: 915.7860 INFO: Training loop in progress Epoch 95/5000 83/83 [==============================] - 105s 1s/step - loss: 769.7096 INFO: Training loop in progress Epoch 96/5000 83/83 [==============================] - 103s 1s/step - loss: 677.0479 INFO: Training loop in progress Epoch 97/5000 83/83 [==============================] - 121s 1s/step - loss: 804.5204 INFO: Training loop in progress Epoch 98/5000 83/83 [==============================] - 89s 1s/step - loss: 538.6006 INFO: Training loop in progress Epoch 99/5000 83/83 [==============================] - 91s 1s/step - loss: 550.1122 INFO: Training loop in progress Epoch 100/5000 83/83 [==============================] - 115s 1s/step - loss: 657.6963 INFO: Training loop in progress Epoch 101/5000 83/83 [==============================] - 93s 1s/step - loss: 611.8604 INFO: Training loop in progress Epoch 102/5000 83/83 [==============================] - 98s 1s/step - loss: 638.0953 INFO: Training loop in progress Epoch 103/5000 83/83 [==============================] - 95s 1s/step - loss: 548.6527 INFO: Training loop in progress Epoch 104/5000 83/83 [==============================] - 98s 1s/step - loss: 550.3563 INFO: Training loop in progress Epoch 105/5000 83/83 [==============================] - 89s 1s/step - loss: 513.0715 INFO: Training loop in progress Epoch 106/5000 83/83 [==============================] - 93s 1s/step - loss: 536.9277 INFO: Training loop in progress Epoch 107/5000 83/83 [==============================] - 120s 1s/step - loss: 545.2270 INFO: Training loop in progress Epoch 108/5000 83/83 [==============================] - 99s 1s/step - loss: 473.9657 INFO: Training loop in progress Epoch 109/5000 83/83 [==============================] - 97s 1s/step - loss: 507.0017 INFO: Training loop in progress Epoch 110/5000 83/83 [==============================] - 106s 1s/step - loss: 498.9058 INFO: Training loop in progress Epoch 111/5000 83/83 [==============================] - 90s 1s/step - loss: 394.7878 INFO: Training loop in progress Epoch 112/5000 83/83 [==============================] - 130s 2s/step - loss: 527.5291 INFO: Training loop in progress Epoch 113/5000 83/83 [==============================] - 95s 1s/step - loss: 539.9469 INFO: Training loop in progress Epoch 114/5000 83/83 [==============================] - 90s 1s/step - loss: 416.6743 INFO: Training loop in progress Epoch 115/5000 83/83 [==============================] - 90s 1s/step - loss: 367.1502 INFO: Training loop in progress Epoch 116/5000 83/83 [==============================] - 90s 1s/step - loss: 429.5203 INFO: Training loop in progress Epoch 117/5000 83/83 [==============================] - 116s 1s/step - loss: 449.4038 INFO: Training loop in progress Epoch 118/5000 83/83 [==============================] - 96s 1s/step - loss: 369.8847 INFO: Training loop in progress Epoch 119/5000 83/83 [==============================] - 104s 1s/step - loss: 341.8989 INFO: Training loop in progress Epoch 120/5000 83/83 [==============================] - 112s 1s/step - loss: 367.0283 INFO: Training loop in progress Epoch 121/5000 83/83 [==============================] - 95s 1s/step - loss: 320.6785 INFO: Training loop in progress Epoch 122/5000 83/83 [==============================] - 95s 1s/step - loss: 311.5415 INFO: Training loop in progress Epoch 123/5000 83/83 [==============================] - 92s 1s/step - loss: 295.0144 INFO: Training loop in progress Epoch 124/5000 83/83 [==============================] - 90s 1s/step - loss: 296.5388 INFO: Training loop in progress Epoch 125/5000 83/83 [==============================] - 112s 1s/step - loss: 347.2858 INFO: Training loop in progress Epoch 126/5000 83/83 [==============================] - 98s 1s/step - loss: 326.3743 INFO: Training loop in progress Epoch 127/5000 83/83 [==============================] - 114s 1s/step - loss: 306.2084 INFO: Training loop in progress Epoch 128/5000 83/83 [==============================] - 100s 1s/step - loss: 297.9684 INFO: Training loop in progress Epoch 129/5000 83/83 [==============================] - 90s 1s/step - loss: 311.1759 INFO: Training loop in progress Epoch 130/5000 83/83 [==============================] - 98s 1s/step - loss: 263.7128 INFO: Training loop in progress Epoch 131/5000 83/83 [==============================] - 82s 985ms/step - loss: 264.6633 INFO: Training loop in progress Epoch 132/5000 83/83 [==============================] - 97s 1s/step - loss: 268.1432 INFO: Training loop in progress Epoch 133/5000 83/83 [==============================] - 77s 927ms/step - loss: 261.3628 INFO: Training loop in progress Epoch 134/5000 83/83 [==============================] - 105s 1s/step - loss: 229.4112 INFO: Training loop in progress Epoch 135/5000 83/83 [==============================] - 88s 1s/step - loss: 207.7338 INFO: Training loop in progress Epoch 136/5000 83/83 [==============================] - 106s 1s/step - loss: 234.7611 INFO: Training loop in progress Epoch 137/5000 83/83 [==============================] - 92s 1s/step - loss: 216.0250 INFO: Training loop in progress Epoch 138/5000 83/83 [==============================] - 105s 1s/step - loss: 196.7560 INFO: Training loop in progress Epoch 139/5000 83/83 [==============================] - 95s 1s/step - loss: 210.3635 INFO: Training loop in progress Epoch 140/5000 83/83 [==============================] - 87s 1s/step - loss: 177.5004 INFO: Training loop in progress Epoch 141/5000 83/83 [==============================] - 94s 1s/step - loss: 170.8008 INFO: Training loop in progress Epoch 142/5000 83/83 [==============================] - 98s 1s/step - loss: 189.9218 INFO: Training loop in progress Epoch 143/5000 83/83 [==============================] - 99s 1s/step - loss: 168.5321 INFO: Training loop in progress Epoch 144/5000 83/83 [==============================] - 85s 1s/step - loss: 185.8155 INFO: Training loop in progress Epoch 145/5000 83/83 [==============================] - 79s 953ms/step - loss: 181.1816 INFO: Training loop in progress Epoch 146/5000 83/83 [==============================] - 89s 1s/step - loss: 165.5687 INFO: Training loop in progress Epoch 147/5000 83/83 [==============================] - 95s 1s/step - loss: 169.8106 INFO: Training loop in progress Epoch 148/5000 83/83 [==============================] - 78s 942ms/step - loss: 134.2400 INFO: Training loop in progress Epoch 149/5000 83/83 [==============================] - 116s 1s/step - loss: 163.8508 INFO: Training loop in progress Epoch 150/5000 83/83 [==============================] - 104s 1s/step - loss: 147.2635 INFO: Training loop in progress Epoch 151/5000 83/83 [==============================] - 106s 1s/step - loss: 154.3167 INFO: Training loop in progress Epoch 152/5000 83/83 [==============================] - 113s 1s/step - loss: 150.1823 INFO: Training loop in progress Epoch 153/5000 1/83 [..............................] - ETA: 40s - loss: 173.7841/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.512343). Check your callbacks. % delta_t_median) 83/83 [==============================] - 92s 1s/step - loss: 116.6175 INFO: Training loop in progress Epoch 154/5000 83/83 [==============================] - 97s 1s/step - loss: 124.8001 INFO: Training loop in progress Epoch 155/5000 83/83 [==============================] - 91s 1s/step - loss: 147.4269 INFO: Training loop in progress Epoch 156/5000 83/83 [==============================] - 90s 1s/step - loss: 115.9906 INFO: Training loop in progress Epoch 157/5000 83/83 [==============================] - 91s 1s/step - loss: 114.9483 INFO: Training loop in progress Epoch 158/5000 83/83 [==============================] - 89s 1s/step - loss: 121.0786 INFO: Training loop in progress Epoch 159/5000 83/83 [==============================] - 106s 1s/step - loss: 125.2021 INFO: Training loop in progress Epoch 160/5000 83/83 [==============================] - 74s 891ms/step - loss: 113.6659 INFO: Training loop in progress Epoch 161/5000 83/83 [==============================] - 113s 1s/step - loss: 122.0314 INFO: Training loop in progress Epoch 162/5000 83/83 [==============================] - 99s 1s/step - loss: 112.2857 INFO: Training loop in progress Epoch 163/5000 83/83 [==============================] - 85s 1s/step - loss: 99.1985 INFO: Training loop in progress Epoch 164/5000 83/83 [==============================] - 80s 965ms/step - loss: 94.6075 INFO: Training loop in progress Epoch 165/5000 83/83 [==============================] - 108s 1s/step - loss: 96.5499 INFO: Training loop in progress Epoch 166/5000 83/83 [==============================] - 89s 1s/step - loss: 95.1305 INFO: Training loop in progress Epoch 167/5000 83/83 [==============================] - 94s 1s/step - loss: 100.1285 INFO: Training loop in progress Epoch 168/5000 83/83 [==============================] - 76s 921ms/step - loss: 94.3105 INFO: Training loop in progress Epoch 169/5000 83/83 [==============================] - 97s 1s/step - loss: 93.1753 INFO: Training loop in progress Epoch 170/5000 83/83 [==============================] - 101s 1s/step - loss: 86.1731 INFO: Training loop in progress Epoch 171/5000 83/83 [==============================] - 82s 984ms/step - loss: 80.9342 INFO: Training loop in progress Epoch 172/5000 83/83 [==============================] - 79s 948ms/step - loss: 83.6847 INFO: Training loop in progress Epoch 173/5000 83/83 [==============================] - 98s 1s/step - loss: 87.9784 INFO: Training loop in progress Epoch 174/5000 83/83 [==============================] - 87s 1s/step - loss: 85.8837 INFO: Training loop in progress Epoch 175/5000 83/83 [==============================] - 85s 1s/step - loss: 83.5836 INFO: Training loop in progress Epoch 176/5000 83/83 [==============================] - 82s 992ms/step - loss: 74.3695 INFO: Training loop in progress Epoch 177/5000 83/83 [==============================] - 99s 1s/step - loss: 74.4122 INFO: Training loop in progress Epoch 178/5000 83/83 [==============================] - 81s 974ms/step - loss: 70.5649 INFO: Training loop in progress Epoch 179/5000 83/83 [==============================] - 85s 1s/step - loss: 73.7532 INFO: Training loop in progress Epoch 180/5000 83/83 [==============================] - 85s 1s/step - loss: 68.5900 INFO: Training loop in progress Epoch 181/5000 83/83 [==============================] - 115s 1s/step - loss: 73.0512 INFO: Training loop in progress Epoch 182/5000 83/83 [==============================] - 82s 989ms/step - loss: 63.1558 INFO: Training loop in progress Epoch 183/5000 83/83 [==============================] - 93s 1s/step - loss: 68.0476 INFO: Training loop in progress Epoch 184/5000 83/83 [==============================] - 90s 1s/step - loss: 63.4033 INFO: Training loop in progress Epoch 185/5000 83/83 [==============================] - 100s 1s/step - loss: 65.0072 INFO: Training loop in progress Epoch 186/5000 83/83 [==============================] - 88s 1s/step - loss: 61.3231 INFO: Training loop in progress Epoch 187/5000 83/83 [==============================] - 87s 1s/step - loss: 60.4128 INFO: Training loop in progress Epoch 188/5000 83/83 [==============================] - 98s 1s/step - loss: 66.4245 INFO: Training loop in progress Epoch 189/5000 83/83 [==============================] - 80s 962ms/step - loss: 55.7736 INFO: Training loop in progress Epoch 190/5000 83/83 [==============================] - 99s 1s/step - loss: 56.5062 INFO: Training loop in progress Epoch 191/5000 83/83 [==============================] - 75s 901ms/step - loss: 53.4407 INFO: Training loop in progress Epoch 192/5000 83/83 [==============================] - 79s 947ms/step - loss: 57.1764 INFO: Training loop in progress Epoch 193/5000 83/83 [==============================] - 101s 1s/step - loss: 51.1663 INFO: Training loop in progress Epoch 194/5000 83/83 [==============================] - 81s 975ms/step - loss: 50.3608 INFO: Training loop in progress Epoch 195/5000 83/83 [==============================] - 107s 1s/step - loss: 52.9982 INFO: Training loop in progress Epoch 196/5000 83/83 [==============================] - 103s 1s/step - loss: 49.6794 INFO: Training loop in progress Epoch 197/5000 83/83 [==============================] - 94s 1s/step - loss: 49.8056 INFO: Training loop in progress Epoch 198/5000 83/83 [==============================] - 78s 941ms/step - loss: 47.3493 INFO: Training loop in progress Epoch 199/5000 83/83 [==============================] - 115s 1s/step - loss: 51.6163 INFO: Training loop in progress Epoch 200/5000 83/83 [==============================] - 98s 1s/step - loss: 48.1429 INFO: Training loop in progress Epoch 201/5000 83/83 [==============================] - 82s 988ms/step - loss: 48.9617 INFO: Training loop in progress Epoch 202/5000 83/83 [==============================] - 101s 1s/step - loss: 44.8695 INFO: Training loop in progress Epoch 203/5000 83/83 [==============================] - 92s 1s/step - loss: 43.8990 INFO: Training loop in progress Epoch 204/5000 83/83 [==============================] - 77s 929ms/step - loss: 44.8837 INFO: Training loop in progress Epoch 205/5000 83/83 [==============================] - 80s 962ms/step - loss: 44.0750 INFO: Training loop in progress Epoch 206/5000 83/83 [==============================] - 107s 1s/step - loss: 45.5007 INFO: Training loop in progress Epoch 207/5000 83/83 [==============================] - 77s 925ms/step - loss: 43.0572 INFO: Training loop in progress Epoch 208/5000 83/83 [==============================] - 79s 947ms/step - loss: 42.0789 INFO: Training loop in progress Epoch 209/5000 83/83 [==============================] - 76s 911ms/step - loss: 43.5815 INFO: Training loop in progress Epoch 210/5000 83/83 [==============================] - 109s 1s/step - loss: 41.0844 INFO: Training loop in progress Epoch 211/5000 83/83 [==============================] - 96s 1s/step - loss: 39.5998 INFO: Training loop in progress Epoch 212/5000 1/83 [..............................] - ETA: 35s - loss: 31.2878/usr/local/lib/python3.6/dist-packages/keras/callbacks.py:122: UserWarning: Method on_batch_end() is slow compared to the batch update (0.521688). Check your callbacks. % delta_t_median) 83/83 [==============================] - 108s 1s/step - loss: 40.0560 INFO: Training loop in progress Epoch 213/5000 83/83 [==============================] - 90s 1s/step - loss: 42.4123 INFO: Training loop in progress Epoch 214/5000 83/83 [==============================] - 84s 1s/step - loss: 40.0125 INFO: Training loop in progress Epoch 215/5000 68/83 [=======================>......] - ETA: 13s - loss: 38.24372022-08-29 23:16:48.877772: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:273] Unexpected Event status: 1 [4c7ff9edf27a:00077] *** Process received signal *** [4c7ff9edf27a:00077] Signal: Aborted (6) [4c7ff9edf27a:00077] Signal code: (-6) [4c7ff9edf27a:00077] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x46210)[0x7fce6cc4c210] [4c7ff9edf27a:00077] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb)[0x7fce6cc4c18b] [4c7ff9edf27a:00077] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b)[0x7fce6cc2b859] [4c7ff9edf27a:00077] [ 3] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0xc1b1788)[0x7fcdf10c8788] [4c7ff9edf27a:00077] [ 4] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0x235cb2a)[0x7fcde7273b2a] [4c7ff9edf27a:00077] [ 5] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow8EventMgr8PollLoopEv+0xbb)[0x7fcdeec6f2db] [4c7ff9edf27a:00077] [ 6] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x28d)[0x7fcde432ae6d] [4c7ff9edf27a:00077] [ 7] /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/../libtensorflow_framework.so.1(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x4c)[0x7fcde432797c] [4c7ff9edf27a:00077] [ 8] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xd6de4)[0x7fce677f6de4] [4c7ff9edf27a:00077] [ 9] /usr/lib/x86_64-linux-gnu/libpthread.so.0(+0x9609)[0x7fce6cbec609] [4c7ff9edf27a:00077] [10] /usr/lib/x86_64-linux-gnu/libc.so.6(clone+0x43)[0x7fce6cd28293] [4c7ff9edf27a:00077] *** End of error message *** 2022-08-30 02:16:52,501 [INFO] tlt.components.docker_handler.docker_handler: Stopping container. print("To resume from checkpoint, please change pretrain_model_path to resume_model_path in config file.")