=================================== == NVIDIA NIM for Text Reranking == =================================== NVIDIA Release 1.3.0 Model: nvidia/llama-3.2-nv-rerankqa-1b-v2 Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This NIM container is governed by the NVIDIA AI Product Agreement here: https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/ A copy of this license can be found under /opt/nim/LICENSE. The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/). Third Party Software Attributions and Licenses can be found under /opt/nim/NOTICE NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.6 driver version 560.35.03 with kernel driver version 535.54.03. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. Overriding NIM_LOG_LEVEL: replacing NIM_LOG_LEVEL=unset with NIM_LOG_LEVEL=INFO Running automatic profile selection: NIM_MANIFEST_PROFILE is not set Selected profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-04T09:33:03Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:33:03,804", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:33:03Z INFO: nimlib.nim_sdk - Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:33:03,804", "level": "INFO", "message": "Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:33:03Z INFO: nimlib.nim_sdk - Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:33:03,804", "level": "INFO", "message": "Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:33:03.805796Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-04T09:33:05.531136Z INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer_config.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/621272f7a30c1e5dceee2b9a6106be03" 2025-02-04T09:33:05.532451Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-04T09:34:03.451601Z INFO nim_hub_ngc::api::tokio: Downloaded filename: model.plan to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/43082401df8829c5a24b26a7730c2310-5" 2025-02-04T09:34:03.452893Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-04T09:34:04.675884Z INFO nim_hub_ngc::api::tokio: Downloaded filename: special_tokens_map.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/16d301f27f8ec48865d30f0a749187e5" 2025-02-04T09:34:04.677135Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-04T09:34:05.641083Z INFO nim_hub_ngc::api::tokio: Downloaded filename: metadata.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/ea40d6eaaa8cb9af698b83691e2622b9" 2025-02-04T09:34:05.642330Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-04T09:34:06.489948Z INFO nim_hub_ngc::api::tokio: Downloaded filename: checksums.blake3 to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/47cfb34576b948532057dcdb94df6add" 2025-02-04T09:34:06.491360Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https "timestamp": "2025-02-04 09:34:07,633", "level": "INFO", "message": "Using the workspace specified during init: /opt/nim/workspace" "timestamp": "2025-02-04 09:34:07,634", "level": "INFO", "message": "Materializing workspace to: /opt/nim/workspace" 2025-02-04T09:34:07.632569Z INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/4e69b4b7c83649358684b311b7eddf93" 2025-02-04T09:34:07Z INFO: nimlib.nim_sdk - Using the workspace specified during init: /opt/nim/workspace 2025-02-04T09:34:07Z INFO: nimlib.nim_sdk - Materializing workspace to: /opt/nim/workspace There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. [02/04/2025-09:34:10] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. Adding extra output token_count to ensemble. Not consumed by next model OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [492] [INFO] Starting gunicorn 23.0.0 [2025-02-04 09:34:11 +0000] [492] [INFO] Listening at: http://0.0.0.0:8000 (492) [2025-02-04 09:34:11 +0000] [492] [INFO] Using worker: uvicorn.workers.UvicornWorker [2025-02-04 09:34:11 +0000] [495] [INFO] Booting worker with pid: 495 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [496] [INFO] Booting worker with pid: 496 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [497] [INFO] Booting worker with pid: 497 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [498] [INFO] Booting worker with pid: 498 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [499] [INFO] Booting worker with pid: 499 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [500] [INFO] Booting worker with pid: 500 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [501] [INFO] Booting worker with pid: 501 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. I0204 09:34:11.331979 491 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f5c7c000000' with size 268435456" I0204 09:34:11.338409 491 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864" I0204 09:34:11.349122 491 model_lifecycle.cc:472] "loading: _nvidia_llama_3_2_nv_rerankqa_1b_v2_model:1" I0204 09:34:11.349183 491 model_lifecycle.cc:472] "loading: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer:1" [2025-02-04 09:34:11 +0000] [509] [INFO] Booting worker with pid: 509 I0204 09:34:11.371516 491 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt" I0204 09:34:11.371551 491 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19" I0204 09:34:11.371556 491 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19" I0204 09:34:11.371562 491 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}" I0204 09:34:11.375433 491 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_model (version 1)" OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [585] [INFO] Booting worker with pid: 585 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [588] [INFO] Booting worker with pid: 588 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [779] [INFO] Booting worker with pid: 779 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [781] [INFO] Booting worker with pid: 781 [2025-02-04 09:34:11 +0000] [782] [INFO] Booting worker with pid: 782 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [785] [INFO] Booting worker with pid: 785 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [912] [INFO] Booting worker with pid: 912 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:11 +0000] [977] [INFO] Booting worker with pid: 977 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. "timestamp": "2025-02-04 09:34:11,956", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:11Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-04 09:34:11 +0000] [495] [INFO] Started server process [495] [2025-02-04 09:34:11 +0000] [495] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,016", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,025", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,033", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml "timestamp": "2025-02-04 09:34:12,034", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-04 09:34:12,034", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-04 09:34:12 +0000] [495] [INFO] Application startup complete. [2025-02-04 09:34:12 +0000] [497] [INFO] Started server process [497] [2025-02-04 09:34:12 +0000] [497] [INFO] Waiting for application startup. [2025-02-04 09:34:12 +0000] [496] [INFO] Started server process [496] [2025-02-04 09:34:12 +0000] [496] [INFO] Waiting for application startup. 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml "timestamp": "2025-02-04 09:34:12,108", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,108", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-04 09:34:12,108", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-04 09:34:12 +0000] [497] [INFO] Application startup complete. 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:34:12,121", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,121", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-04 09:34:12,122", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. [2025-02-04 09:34:12 +0000] [496] [INFO] Application startup complete. 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,160", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-04 09:34:12 +0000] [499] [INFO] Started server process [499] [2025-02-04 09:34:12 +0000] [499] [INFO] Waiting for application startup. 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,209", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-04 09:34:12,230", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,230", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-04 09:34:12,230", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [499] [INFO] Application startup complete. [2025-02-04 09:34:12 +0000] [498] [INFO] Started server process [498] [2025-02-04 09:34:12 +0000] [498] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,270", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,284", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,284", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml "timestamp": "2025-02-04 09:34:12,285", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-04 09:34:12 +0000] [498] [INFO] Application startup complete. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-04 09:34:12 +0000] [500] [INFO] Started server process [500] [2025-02-04 09:34:12 +0000] [500] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,293", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-04 09:34:12 +0000] [501] [INFO] Started server process [501] [2025-02-04 09:34:12 +0000] [501] [INFO] Waiting for application startup. 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml "timestamp": "2025-02-04 09:34:12,322", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,322", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-04 09:34:12,322", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-04 09:34:12 +0000] [500] [INFO] Application startup complete. "timestamp": "2025-02-04 09:34:12,333", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml "timestamp": "2025-02-04 09:34:12,350", "level": "INFO", "message": "Registered custom profile selectors: []" [2025-02-04 09:34:12 +0000] [509] [INFO] Started server process [509] "timestamp": "2025-02-04 09:34:12,350", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " [2025-02-04 09:34:12 +0000] [509] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,350", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-04 09:34:12 +0000] [501] [INFO] Application startup complete. "timestamp": "2025-02-04 09:34:12,381", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-04 09:34:12,381", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-04 09:34:12,381", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [509] [INFO] Application startup complete. "timestamp": "2025-02-04 09:34:12,464", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-04 09:34:12 +0000] [585] [INFO] Started server process [585] [2025-02-04 09:34:12 +0000] [585] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,492", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,507", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-04 09:34:12,507", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:34:12,508", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [782] [INFO] Started server process [782] 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,508", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-04 09:34:12 +0000] [782] [INFO] Waiting for application startup. [2025-02-04 09:34:12 +0000] [585] [INFO] Application startup complete. [2025-02-04 09:34:12 +0000] [588] [INFO] Started server process [588] [2025-02-04 09:34:12 +0000] [588] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,535", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-04 09:34:12,537", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,537", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-04 09:34:12,537", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [782] [INFO] Application startup complete. [2025-02-04 09:34:12 +0000] [779] [INFO] Started server process [779] [2025-02-04 09:34:12 +0000] [779] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,554", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-04 09:34:12,554", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:34:12,554", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [588] [INFO] Application startup complete. 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,577", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-04 09:34:12,582", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-04 09:34:12,582", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:34:12,582", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [779] [INFO] Application startup complete. [2025-02-04 09:34:12 +0000] [785] [INFO] Started server process [785] [2025-02-04 09:34:12 +0000] [785] [INFO] Waiting for application startup. 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-04 09:34:12,594", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-04 09:34:12,595", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-04 09:34:12 +0000] [781] [INFO] Started server process [781] [2025-02-04 09:34:12 +0000] [781] [INFO] Waiting for application startup. [2025-02-04 09:34:12 +0000] [912] [INFO] Started server process [912] [2025-02-04 09:34:12 +0000] [912] [INFO] Waiting for application startup. "timestamp": "2025-02-04 09:34:12,620", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,621", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-04 09:34:12,621", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [785] [INFO] Application startup complete. 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml "timestamp": "2025-02-04 09:34:12,641", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,641", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-04 09:34:12,641", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-04 09:34:12,643", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-04 09:34:12,643", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-04 09:34:12,643", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [781] [INFO] Application startup complete. [2025-02-04 09:34:12 +0000] [912] [INFO] Application startup complete. "timestamp": "2025-02-04 09:34:12,669", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-04T09:34:12Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-04 09:34:12 +0000] [977] [INFO] Started server process [977] [2025-02-04 09:34:12 +0000] [977] [INFO] Waiting for application startup. 2025-02-04T09:34:12Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-04 09:34:12,712", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-04 09:34:12,712", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-04T09:34:12Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest 2025-02-04T09:34:12Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-04 09:34:12,712", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-04T09:34:12Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-04 09:34:12 +0000] [977] [INFO] Application startup complete. I0204 09:34:13.427197 491 logging.cc:46] "Loaded engine size: 2364 MiB" W0204 09:34:13.439301 491 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." I0204 09:34:13.787891 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_0 (CPU device 0)" I0204 09:34:13.788094 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_1 (CPU device 0)" I0204 09:34:13.788225 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_2 (CPU device 0)" I0204 09:34:13.788328 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_3 (CPU device 0)" I0204 09:34:13.788440 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_4 (CPU device 0)" I0204 09:34:13.788526 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_5 (CPU device 0)" I0204 09:34:13.788651 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_6 (CPU device 0)" I0204 09:34:13.788801 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_7 (CPU device 0)" I0204 09:34:13.788916 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_8 (CPU device 0)" I0204 09:34:13.788991 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_9 (CPU device 0)" I0204 09:34:13.789165 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_10 (CPU device 0)" I0204 09:34:13.789298 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_11 (CPU device 0)" I0204 09:34:13.789427 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_12 (CPU device 0)" I0204 09:34:13.789533 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_13 (CPU device 0)" I0204 09:34:13.789588 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_14 (CPU device 0)" I0204 09:34:13.789705 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_15 (CPU device 0)" I0204 09:34:13.800916 491 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_model_0_0 (GPU device 0)" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_5', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_0', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_15', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_1', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_4', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_8', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_3', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_11', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_13', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_2', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_9', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_7', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_12', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_10', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_6', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_14', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-04T09:34:14Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-04T09:34:15Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 I0204 09:34:15.456875 491 model_lifecycle.cc:839] "successfully loaded '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'" I0204 09:34:16.111012 491 logging.cc:46] "Loaded engine size: 2364 MiB" W0204 09:34:16.111260 491 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." E0204 09:34:16.452904 491 logging.cc:40] "[defaultAllocator.cpp::allocate::31] Error Code 1: Cuda Runtime (out of memory)" W0204 09:34:16.452953 491 logging.cc:43] "Requested amount of GPU memory (19629343232 bytes) could not be allocated. There may not be enough free memory for allocation to succeed." E0204 09:34:16.469799 491 logging.cc:40] "[executionContext.cpp::ExecutionContext::579] Error Code 2: OutOfMemory (Requested size was 19629343232 bytes.)" I0204 09:34:16.469864 491 tensorrt.cc:353] "TRITONBACKEND_ModelInstanceFinalize: delete instance state" E0204 09:34:16.469890 491 backend_model.cc:692] "ERROR: Failed to create instance: unable to create TensorRT context: [executionContext.cpp::ExecutionContext::579] Error Code 2: OutOfMemory (Requested size was 19629343232 bytes.)" I0204 09:34:16.469906 491 tensorrt.cc:274] "TRITONBACKEND_ModelFinalize: delete model state" E0204 09:34:16.478312 491 logging.cc:40] "IRuntime::~IRuntime: Error Code 3: API Usage Error (Parameter check failed, condition: mEngineCounter.use_count() == 1. Destroying a runtime before destroying deserialized engines created by the runtime leads to undefined behavior.)" E0204 09:34:16.478404 491 model_lifecycle.cc:642] "failed to load '_nvidia_llama_3_2_nv_rerankqa_1b_v2_model' version 1: Internal: unable to create TensorRT context: [executionContext.cpp::ExecutionContext::579] Error Code 2: OutOfMemory (Requested size was 19629343232 bytes.)" I0204 09:34:16.478423 491 model_lifecycle.cc:777] "failed to load '_nvidia_llama_3_2_nv_rerankqa_1b_v2_model'" E0204 09:34:16.478510 491 model_repository_manager.cc:703] "Invalid argument: ensemble 'nvidia_llama_3_2_nv_rerankqa_1b_v2' depends on '_nvidia_llama_3_2_nv_rerankqa_1b_v2_model' which has no loaded version. Model '_nvidia_llama_3_2_nv_rerankqa_1b_v2_model' loading failed with error: version 1 is at UNAVAILABLE state: Internal: unable to create TensorRT context: [executionContext.cpp::ExecutionContext::579] Error Code 2: OutOfMemory (Requested size was 19629343232 bytes.);" I0204 09:34:16.478580 491 server.cc:604] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+ I0204 09:34:16.478611 491 server.cc:631] +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | | tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0204 09:34:16.478654 491 server.cc:674] +-----------------------------------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Model | Version | Status | +-----------------------------------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | _nvidia_llama_3_2_nv_rerankqa_1b_v2_model | 1 | UNAVAILABLE: Internal: unable to create TensorRT context: [executionContext.cpp::ExecutionContext::579] Error Code 2: OutOfMemory (Requested size was 19629343232 bytes.) | | _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer | 1 | READY | +-----------------------------------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0204 09:34:16.513723 491 metrics.cc:877] "Collecting metrics for GPU 0: NVIDIA A100-SXM4-80GB" I0204 09:34:16.521196 491 metrics.cc:770] "Collecting CPU metrics" I0204 09:34:16.521395 491 tritonserver.cc:2598] +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.51.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging | | model_repository_path[0] | /opt/nim/tmp/run/triton-model-repository | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | model_config_name | | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0204 09:34:16.521444 491 server.cc:305] "Waiting for in-flight requests to complete." I0204 09:34:16.521452 491 server.cc:321] "Timeout 30: Found 0 model versions that have in-flight inferences" I0204 09:34:16.521683 491 server.cc:336] "All models are stopped, unloading models" I0204 09:34:16.521692 491 server.cc:345] "Timeout 30: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:17.521771 491 server.cc:345] "Timeout 29: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:18.521857 491 server.cc:345] "Timeout 28: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:19.521980 491 server.cc:345] "Timeout 27: Found 1 live models and 0 in-flight non-inference requests" 10.169.20.130:60940 - "GET /v1/health/ready HTTP/1.1" 503 2025-02-04T09:34:20Z INFO: uvicorn.access - 10.169.20.130:60940 - "GET /v1/health/ready HTTP/1.1" 503 I0204 09:34:20.522127 491 server.cc:345] "Timeout 26: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:21.522260 491 server.cc:345] "Timeout 25: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:22.522364 491 server.cc:345] "Timeout 24: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:23.522474 491 server.cc:345] "Timeout 23: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:24.522589 491 server.cc:345] "Timeout 22: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:25.522693 491 server.cc:345] "Timeout 21: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:26.522821 491 server.cc:345] "Timeout 20: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:27.522933 491 server.cc:345] "Timeout 19: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:28.523084 491 server.cc:345] "Timeout 18: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:29.523214 491 server.cc:345] "Timeout 17: Found 1 live models and 0 in-flight non-inference requests" 10.169.20.130:36130 - "GET /v1/health/ready HTTP/1.1" 503 2025-02-04T09:34:30Z INFO: uvicorn.access - 10.169.20.130:36130 - "GET /v1/health/ready HTTP/1.1" 503 I0204 09:34:30.523345 491 server.cc:345] "Timeout 16: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:31.523484 491 server.cc:345] "Timeout 15: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:32.523651 491 server.cc:345] "Timeout 14: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:33.523792 491 server.cc:345] "Timeout 13: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:34.523927 491 server.cc:345] "Timeout 12: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:35.524071 491 server.cc:345] "Timeout 11: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:36.524210 491 server.cc:345] "Timeout 10: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:37.524339 491 server.cc:345] "Timeout 9: Found 1 live models and 0 in-flight non-inference requests" I0204 09:34:37.944377 491 model_lifecycle.cc:624] "successfully unloaded '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer' version 1" I0204 09:34:38.524449 491 server.cc:345] "Timeout 8: Found 0 live models and 0 in-flight non-inference requests" error: creating server: Internal - failed to load all models [2025-02-04 09:34:39 +0000] [492] [INFO] Handling signal: term [2025-02-04 09:34:39 +0000] [977] [INFO] Shutting down [2025-02-04 09:34:39 +0000] [977] [INFO] Error while closing socket [Errno 9] Bad file descriptor [2025-02-04 09:34:39 +0000] [785] [INFO] Shutting down [2025-02-04 09:34:39 +0000] [785] [INFO] Error while closing socket [Errno 9] Bad file descriptor [2025-02-04 09:34:39 +0000] [496] [INFO] Shutting down [2025-02-04 09:34:39 +0000] [496] [INFO] Error while closing socket [Errno 9] Bad file descriptor [2025-02-04 09:34:39 +0000] [500] [INFO] Shutting down [2025-02-04 09:34:39 +0000] [500] [INFO] Error while closing socket [Errno 9] Bad file descriptor [2025-02-04 09:34:39 +0000] [499] [INFO] Shutting down [2025-02-04 09:34:39 +0000] [499] [INFO] Error while closing socket [Errno 9] Bad file descriptor