=================================== == NVIDIA NIM for Text Reranking == =================================== NVIDIA Release 1.3.0 Model: nvidia/llama-3.2-nv-rerankqa-1b-v2 Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This NIM container is governed by the NVIDIA AI Product Agreement here: https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/ A copy of this license can be found under /opt/nim/LICENSE. The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/). Third Party Software Attributions and Licenses can be found under /opt/nim/NOTICE NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.6 driver version 560.35.03 with kernel driver version 535.54.03. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. Overriding NIM_LOG_LEVEL: replacing NIM_LOG_LEVEL=unset with NIM_LOG_LEVEL=INFO Running automatic profile selection: NIM_MANIFEST_PROFILE is not set Selected profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. "timestamp": "2025-02-13 06:02:30,274", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:02:30Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-13 06:02:30,274", "level": "INFO", "message": "Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:02:30Z INFO: nimlib.nim_sdk - Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-13 06:02:30,274", "level": "INFO", "message": "Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:02:30Z INFO: nimlib.nim_sdk - Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:02:30.275777Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-13T06:02:30.276379Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-13T06:02:30.276699Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-13T06:02:30.277006Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-13T06:02:30.277328Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-13T06:02:30.277636Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-13T06:02:30.277684Z ERROR nim_sdk::hub::repo: One or more errors fetching files: 2025-02-13T06:02:30.277689Z ERROR nim_sdk::hub::repo: The requested operation requires an API key, but none was found 2025-02-13T06:02:30.277692Z ERROR nim_sdk::hub::repo: The requested operation requires an API key, but none was found 2025-02-13T06:02:30.277696Z ERROR nim_sdk::hub::repo: The requested operation requires an API key, but none was found 2025-02-13T06:02:30.277700Z ERROR nim_sdk::hub::repo: The requested operation requires an API key, but none was found 2025-02-13T06:02:30.277704Z ERROR nim_sdk::hub::repo: The requested operation requires an API key, but none was found 2025-02-13T06:02:30.277708Z ERROR nim_sdk::hub::repo: The requested operation requires an API key, but none was found Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/nimlib/nim_sdk.py", line 304, in download_models cache = repo.get_all() # download model artifacts to cache Exception: The requested operation requires an API key, but none was found The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/bin/get-model-from-ngc", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/nemo_retriever_reranking/cli/get_model_from_ngc.py", line 15, in main get_model() File "/usr/local/lib/python3.10/dist-packages/nemo_retriever_reranking_triton_gen/tools/ngc_models.py", line 213, in get_model model_manifest.download_models(profile_id=profile_id, materialize_workspace=True) File "/usr/local/lib/python3.10/dist-packages/nimlib/nim_sdk.py", line 344, in download_models raise ManifestDownloadError(f"Error downloading manifest: {err}") from err nimlib.exceptions.ManifestDownloadError: Error downloading manifest: The requested operation requires an API key, but none was found taeyoung.yoo@dgx-a100:~$ export NGC_API_KEY=NmhsNzVhZmRhbjVlZGk2dHN2YzZ0dmEybmU6Y2Q0NjljMTYtNTM2Zi00NzU1LTkwYzUtM2JhNTg1ZDNlNGZl taeyoung.yoo@dgx-a100:~$ docker run -it --rm --gpus "device=6" --shm-size=16GB -e NGC_API_KEY -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" -u $(id -u) -p 8000:8000 nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.3.0^C taeyoung.yoo@dgx-a100:~$ docker run -it --rm \ --gpus "device=6" \ --shm-size=16GB \ -e NGC_API_KEY \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ -name rerank_nemo \ nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.3.0 unknown shorthand flag: 'n' in -name See 'docker run --help'. taeyoung.yoo@dgx-a100:~$ docker run -it --rm \ --gpus "device=6" \ --shm-size=16GB \ -e NGC_API_KEY \ -v "$LOCAL_NIM_CACHE:/opt/nim/.cache" \ -u $(id -u) \ -p 8000:8000 \ --name rerank_nemo \ nvcr.io/nim/nvidia/llama-3.2-nv-rerankqa-1b-v2:1.3.0 =================================== == NVIDIA NIM for Text Reranking == =================================== NVIDIA Release 1.3.0 Model: nvidia/llama-3.2-nv-rerankqa-1b-v2 Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This NIM container is governed by the NVIDIA AI Product Agreement here: https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/ A copy of this license can be found under /opt/nim/LICENSE. The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/). Third Party Software Attributions and Licenses can be found under /opt/nim/NOTICE NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.6 driver version 560.35.03 with kernel driver version 535.54.03. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. Overriding NIM_LOG_LEVEL: replacing NIM_LOG_LEVEL=unset with NIM_LOG_LEVEL=INFO Running automatic profile selection: NIM_MANIFEST_PROFILE is not set Selected profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. "timestamp": "2025-02-13 06:03:26,824", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:03:26Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-13 06:03:26,825", "level": "INFO", "message": "Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:03:26Z INFO: nimlib.nim_sdk - Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-13 06:03:26,825", "level": "INFO", "message": "Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:03:26Z INFO: nimlib.nim_sdk - Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:03:26.826017Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https checksums.blake3 [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 157 B/157 B 15.09 KiB/s (0s)2025-02-13T06:03:29.235772Z INFO nim_hub_ngc::api::tokio: Downloaded filename: checksums.blake3 to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/47cfb34576b948532057dcdb94df6add" 2025-02-13T06:03:29.237072Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https model.plan [00:00:23] [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 2.31 GiB/2.31 GiB 101.38 MiB/s (0s)2025-02-13T06:03:54.218917Z INFO nim_hub_ngc::api::tokio: Downloaded filename: model.plan to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/43082401df8829c5a24b26a7730c2310-5" 2025-02-13T06:03:54.220079Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https special_tokens_map.json [00:00:00] [█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 449 B/449 B 36.32 KiB/s (0s)2025-02-13T06:03:55.330827Z INFO nim_hub_ngc::api::tokio: Downloaded filename: special_tokens_map.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/16d301f27f8ec48865d30f0a749187e5" 2025-02-13T06:03:55.332108Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https metadata.json [00:00:00] [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 219 B/219 B 19.82 KiB/s (0s)2025-02-13T06:03:56.945933Z INFO nim_hub_ngc::api::tokio: Downloaded filename: metadata.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/ea40d6eaaa8cb9af698b83691e2622b9" 2025-02-13T06:03:56.947316Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https tokenizer.json [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 8.66 MiB/8.66 MiB 78.75 MiB/s (0s)2025-02-13T06:03:58.438338Z INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/4e69b4b7c83649358684b311b7eddf93" 2025-02-13T06:03:58.439772Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https tokenizer_config.json [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 49.35 KiB/49.35 KiB 2.54 MiB/s (0s)2025-02-13T06:03:59.506843Z INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer_config.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-rerankqa-1b-v2/blobs/621272f7a30c1e5dceee2b9a6106be03" "timestamp": "2025-02-13 06:03:59,507", "level": "INFO", "message": "Using the workspace specified during init: /opt/nim/workspace" 2025-02-13T06:03:59Z INFO: nimlib.nim_sdk - Using the workspace specified during init: /opt/nim/workspace "timestamp": "2025-02-13 06:03:59,508", "level": "INFO", "message": "Materializing workspace to: /opt/nim/workspace" 2025-02-13T06:03:59Z INFO: nimlib.nim_sdk - Materializing workspace to: /opt/nim/workspace There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. [02/13/2025-06:04:02] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. Adding extra output token_count to ensemble. Not consumed by next model OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:03 +0000] [492] [INFO] Starting gunicorn 23.0.0 [2025-02-13 06:04:03 +0000] [492] [INFO] Listening at: http://0.0.0.0:8000 (492) [2025-02-13 06:04:03 +0000] [492] [INFO] Using worker: uvicorn.workers.UvicornWorker [2025-02-13 06:04:03 +0000] [494] [INFO] Booting worker with pid: 494 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:03 +0000] [496] [INFO] Booting worker with pid: 496 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:03 +0000] [497] [INFO] Booting worker with pid: 497 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:03 +0000] [498] [INFO] Booting worker with pid: 498 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [499] [INFO] Booting worker with pid: 499 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [500] [INFO] Booting worker with pid: 500 [2025-02-13 06:04:04 +0000] [501] [INFO] Booting worker with pid: 501 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [502] [INFO] Booting worker with pid: 502 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. I0213 06:04:04.135983 491 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f162a000000' with size 268435456" I0213 06:04:04.139274 491 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864" I0213 06:04:04.146154 491 model_lifecycle.cc:472] "loading: _nvidia_llama_3_2_nv_rerankqa_1b_v2_model:1" I0213 06:04:04.146195 491 model_lifecycle.cc:472] "loading: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer:1" I0213 06:04:04.174766 491 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt" I0213 06:04:04.174809 491 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19" I0213 06:04:04.174815 491 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19" I0213 06:04:04.174820 491 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}" I0213 06:04:04.177669 491 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_model (version 1)" [2025-02-13 06:04:04 +0000] [582] [INFO] Booting worker with pid: 582 [2025-02-13 06:04:04 +0000] [583] [INFO] Booting worker with pid: 583 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [653] [INFO] Booting worker with pid: 653 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [655] [INFO] Booting worker with pid: 655 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [713] [INFO] Booting worker with pid: 713 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [847] [INFO] Booting worker with pid: 847 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [849] [INFO] Booting worker with pid: 849 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. [2025-02-13 06:04:04 +0000] [1040] [INFO] Booting worker with pid: 1040 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. "timestamp": "2025-02-13 06:04:04,710", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:04Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:04 +0000] [494] [INFO] Started server process [494] [2025-02-13 06:04:04 +0000] [494] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:04,800", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:04Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:04,800", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:04Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:04,800", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:04Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:04Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:04 +0000] [494] [INFO] Application startup complete. "timestamp": "2025-02-13 06:04:04,810", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:04Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:04 +0000] [496] [INFO] Started server process [496] [2025-02-13 06:04:04 +0000] [496] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:04,885", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:04Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:04,885", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:04Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:04,885", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:04Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:04Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:04 +0000] [496] [INFO] Application startup complete. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. "timestamp": "2025-02-13 06:04:04,914", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:04Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:04 +0000] [497] [INFO] Started server process [497] [2025-02-13 06:04:04 +0000] [497] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:04,980", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:04Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-13 06:04:04,993", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:04Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:04,993", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:04Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:04,993", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:04Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:04Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:04 +0000] [497] [INFO] Application startup complete. [2025-02-13 06:04:05 +0000] [498] [INFO] Started server process [498] [2025-02-13 06:04:05 +0000] [498] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,044", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. "timestamp": "2025-02-13 06:04:05,057", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,057", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,057", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [498] [INFO] Application startup complete. [2025-02-13 06:04:05 +0000] [499] [INFO] Started server process [499] [2025-02-13 06:04:05 +0000] [499] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,096", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,096", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,096", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [499] [INFO] Application startup complete. "timestamp": "2025-02-13 06:04:05,103", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-13 06:04:05,113", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:05 +0000] [501] [INFO] Started server process [501] [2025-02-13 06:04:05 +0000] [501] [INFO] Waiting for application startup. [2025-02-13 06:04:05 +0000] [500] [INFO] Started server process [500] [2025-02-13 06:04:05 +0000] [500] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,152", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,152", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,152", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-13 06:04:05,152", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:05 +0000] [501] [INFO] Application startup complete. "timestamp": "2025-02-13 06:04:05,163", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,163", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,163", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [500] [INFO] Application startup complete. [2025-02-13 06:04:05 +0000] [502] [INFO] Started server process [502] [2025-02-13 06:04:05 +0000] [502] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,196", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-13 06:04:05,202", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,202", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,203", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [502] [INFO] Application startup complete. [2025-02-13 06:04:05 +0000] [582] [INFO] Started server process [582] [2025-02-13 06:04:05 +0000] [582] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,233", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-13 06:04:05,242", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-13 06:04:05,246", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,246", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,246", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [582] [INFO] Application startup complete. [2025-02-13 06:04:05 +0000] [653] [INFO] Started server process [653] [2025-02-13 06:04:05 +0000] [653] [INFO] Waiting for application startup. [2025-02-13 06:04:05 +0000] [583] [INFO] Started server process [583] [2025-02-13 06:04:05 +0000] [583] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,276", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-13 06:04:05,283", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,283", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,283", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [653] [INFO] Application startup complete. "timestamp": "2025-02-13 06:04:05,289", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,289", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,289", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [583] [INFO] Application startup complete. [2025-02-13 06:04:05 +0000] [655] [INFO] Started server process [655] [2025-02-13 06:04:05 +0000] [655] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,325", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,325", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,325", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [655] [INFO] Application startup complete. "timestamp": "2025-02-13 06:04:05,355", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:05 +0000] [713] [INFO] Started server process [713] [2025-02-13 06:04:05 +0000] [713] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,401", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,402", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,402", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [713] [INFO] Application startup complete. "timestamp": "2025-02-13 06:04:05,408", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:05 +0000] [847] [INFO] Started server process [847] [2025-02-13 06:04:05 +0000] [847] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,451", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml "timestamp": "2025-02-13 06:04:05,453", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,453", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,453", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [847] [INFO] Application startup complete. [2025-02-13 06:04:05 +0000] [849] [INFO] Started server process [849] [2025-02-13 06:04:05 +0000] [849] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,498", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,498", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,499", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [849] [INFO] Application startup complete. "timestamp": "2025-02-13 06:04:05,525", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" 2025-02-13T06:04:05Z INFO: nimlib.nim_inference_api_builder.otel - No OTel configuration file found at expected path: /etc/nim/config/otel.yaml [2025-02-13 06:04:05 +0000] [1040] [INFO] Started server process [1040] [2025-02-13 06:04:05 +0000] [1040] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:04:05,572", "level": "INFO", "message": "Registered custom profile selectors: []" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Registered custom profile selectors: [] "timestamp": "2025-02-13 06:04:05,572", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " 2025-02-13T06:04:05Z INFO: nimlib.profiles - selector used: EnvProfileSelector for manifest "timestamp": "2025-02-13 06:04:05,572", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:04:05Z INFO: nimlib.profiles - Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df 2025-02-13T06:04:05Z INFO: root - Loading service config from /opt/nim/tmp/run/triton-model-repository/service_config.yaml [2025-02-13 06:04:05 +0000] [1040] [INFO] Application startup complete. I0213 06:04:06.455687 491 logging.cc:46] "Loaded engine size: 2364 MiB" W0213 06:04:06.472976 491 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." I0213 06:04:06.527968 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_0 (CPU device 0)" I0213 06:04:06.528053 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_1 (CPU device 0)" I0213 06:04:06.528169 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_2 (CPU device 0)" I0213 06:04:06.528226 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_3 (CPU device 0)" I0213 06:04:06.528369 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_4 (CPU device 0)" I0213 06:04:06.528580 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_5 (CPU device 0)" I0213 06:04:06.528663 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_6 (CPU device 0)" I0213 06:04:06.528928 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_7 (CPU device 0)" I0213 06:04:06.529193 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_11 (CPU device 0)" I0213 06:04:06.529236 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_10 (CPU device 0)" I0213 06:04:06.529285 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_9 (CPU device 0)" I0213 06:04:06.529389 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_8 (CPU device 0)" I0213 06:04:06.530013 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_12 (CPU device 0)" I0213 06:04:06.530316 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_13 (CPU device 0)" I0213 06:04:06.530936 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_14 (CPU device 0)" I0213 06:04:06.531529 491 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_15 (CPU device 0)" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_8', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_5', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_1', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_15', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:07Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_3', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_9', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_2', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_10', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 I0213 06:04:08.391865 491 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: _nvidia_llama_3_2_nv_rerankqa_1b_v2_model_0_0 (GPU device 0)" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_12', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_14', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_13', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_4', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_7', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_0', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_6', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:08Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. OTEL Logging handler requested, but Python logging auto-instrumentation not set up. Set OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true to enable logging auto-instrumentation. 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Initializing with args: {'model_config': '{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer","platform":"","backend":"python","runtime":"","version_policy":{"latest":{"num_versions":1}},"max_batch_size":0,"input":[{"name":"QUERY","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"PASSAGE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false},{"name":"TRUNCATE","data_type":"TYPE_STRING","format":"FORMAT_NONE","dims":[-1],"is_shape_tensor":false,"allow_ragged_batch":false,"optional":false,"is_non_linear_format_io":false}],"output":[{"name":"input_ids","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"attention_mask","data_type":"TYPE_INT64","dims":[-1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false},{"name":"token_count","data_type":"TYPE_INT32","dims":[1],"label_filename":"","is_shape_tensor":false,"is_non_linear_format_io":false}],"batch_input":[],"batch_output":[],"optimization":{"priority":"PRIORITY_DEFAULT","input_pinned_memory":{"enable":true},"output_pinned_memory":{"enable":true},"gather_kernel_buffer_threshold":0,"eager_batching":false},"dynamic_batching":{"preferred_batch_size":[],"max_queue_delay_microseconds":100,"preserve_ordering":false,"priority_levels":0,"default_priority_level":0,"priority_queue_policy":{}},"instance_group":[{"name":"_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0","kind":"KIND_CPU","count":16,"gpus":[],"secondary_devices":[],"profile":[],"passive":false,"host_policy":""}],"default_model_filename":"model.py","cc_model_filenames":{},"metric_tags":{},"parameters":{"template_two_param":{"string_value":"question:{query} \\n \\n passage:{passage}"}},"model_warmup":[]}', 'model_instance_kind': 'CPU', 'model_instance_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer_0_11', 'model_instance_device_id': '0', 'model_repository': '/opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer', 'model_version': '1', 'model_name': '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'} 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Runtime config: hf_model=None tokenizer_path=None template_two_param='question:{query} \n \n passage:{passage}' max_seq_length=None 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Loading tokenizer from /opt/nim/tmp/run/triton-model-repository/_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer/1/tokenizer 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - Tokenizer is fast: True 2025-02-13T06:04:09Z INFO: nemo_retriever_reranking_triton_gen.tools.tokenizers - tokenizer.model_max_length: 8192 I0213 06:04:09.559800 491 model_lifecycle.cc:839] "successfully loaded '_nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer'" I0213 06:04:10.544462 491 logging.cc:46] "Loaded engine size: 2364 MiB" W0213 06:04:10.544648 491 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." I0213 06:04:11.244134 491 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18720, now: CPU 0, GPU 21077 (MiB)" I0213 06:04:11.246834 491 instance_state.cc:186] "Created instance _nvidia_llama_3_2_nv_rerankqa_1b_v2_model_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];" I0213 06:04:11.247303 491 model_lifecycle.cc:839] "successfully loaded '_nvidia_llama_3_2_nv_rerankqa_1b_v2_model'" I0213 06:04:11.247735 491 model_lifecycle.cc:472] "loading: nvidia_llama_3_2_nv_rerankqa_1b_v2:1" I0213 06:04:11.248023 491 model_lifecycle.cc:839] "successfully loaded 'nvidia_llama_3_2_nv_rerankqa_1b_v2'" I0213 06:04:11.248092 491 server.cc:604] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+ I0213 06:04:11.248138 491 server.cc:631] +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | | tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0213 06:04:11.248189 491 server.cc:674] +-----------------------------------------------+---------+--------+ | Model | Version | Status | +-----------------------------------------------+---------+--------+ | _nvidia_llama_3_2_nv_rerankqa_1b_v2_model | 1 | READY | | _nvidia_llama_3_2_nv_rerankqa_1b_v2_tokenizer | 1 | READY | | nvidia_llama_3_2_nv_rerankqa_1b_v2 | 1 | READY | +-----------------------------------------------+---------+--------+ I0213 06:04:11.339103 491 metrics.cc:877] "Collecting metrics for GPU 0: NVIDIA A100-SXM4-80GB" I0213 06:04:11.346544 491 metrics.cc:770] "Collecting CPU metrics" I0213 06:04:11.346820 491 tritonserver.cc:2598] +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.51.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging | | model_repository_path[0] | /opt/nim/tmp/run/triton-model-repository | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | model_config_name | | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0213 06:04:11.352303 491 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:8001" I0213 06:04:11.352342 491 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"