=================================== == NVIDIA NIM for Text Embedding == =================================== NVIDIA Release 1.3.0 Model: nvidia/llama-3.2-nv-embedqa-1b-v2 Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This NIM container is governed by the NVIDIA AI Product Agreement here: https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/ A copy of this license can be found under /opt/nim/LICENSE. The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/). Third Party Software Attributions and Licenses can be found under /opt/nim/NOTICE NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.6 driver version 560.35.03 with kernel driver version 535.54.03. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. Overriding NIM_LOG_LEVEL: replacing NIM_LOG_LEVEL=unset with NIM_LOG_LEVEL=INFO Running automatic profile selection: NIM_MANIFEST_PROFILE is not set Selected profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-11 01:46:43,106", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-11 01:46:43,106", "level": "INFO", "message": "Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-11 01:46:43,106", "level": "INFO", "message": "Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-11T01:46:43.107565Z  INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-11T01:46:47.679595Z  INFO nim_hub_ngc::api::tokio: Downloaded filename: special_tokens_map.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/ca1622ba94d36a31aa7ebaa7973ed497" 2025-02-11T01:46:47.680353Z  INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-11T01:46:48.856570Z  INFO nim_hub_ngc::api::tokio: Downloaded filename: checksums.blake3 to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/65310a3e313fc53877edc6225aa48e5d" 2025-02-11T01:46:48.857467Z  INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-11T01:46:50.064422Z  INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer_config.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/048e9b9b79e1c5b6be971f79026d18f4" 2025-02-11T01:46:50.065955Z  INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-11T01:47:57.890831Z  INFO nim_hub_ngc::api::tokio: Downloaded filename: model.plan to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/ffe3a7f946dc59b05bead36ec3943549-5" 2025-02-11T01:47:57.892414Z  INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-11T01:47:58.975796Z  INFO nim_hub_ngc::api::tokio: Downloaded filename: metadata.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/b320276ae291b616de45b8aff08c586c" 2025-02-11T01:47:58.977624Z  INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https 2025-02-11T01:48:00.196510Z  INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/7ccbfc239ab3e285da1a311d893c165c" "timestamp": "2025-02-11 01:48:00,197", "level": "INFO", "message": "Using the workspace specified during init: /opt/nim/workspace" "timestamp": "2025-02-11 01:48:00,198", "level": "INFO", "message": "Materializing workspace to: /opt/nim/workspace" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. [02/11/2025-01:48:03] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. Adding extra output token_count to ensemble. Not consumed by next model [2025-02-11 01:48:05 +0000] [754] [INFO] Starting gunicorn 23.0.0 [2025-02-11 01:48:05 +0000] [754] [INFO] Listening at: http://0.0.0.0:8000 (754) [2025-02-11 01:48:05 +0000] [754] [INFO] Using worker: uvicorn.workers.UvicornWorker [2025-02-11 01:48:05 +0000] [757] [INFO] Booting worker with pid: 757 [2025-02-11 01:48:05 +0000] [758] [INFO] Booting worker with pid: 758 [2025-02-11 01:48:05 +0000] [759] [INFO] Booting worker with pid: 759 [2025-02-11 01:48:05 +0000] [760] [INFO] Booting worker with pid: 760 [2025-02-11 01:48:05 +0000] [761] [INFO] Booting worker with pid: 761 [2025-02-11 01:48:05 +0000] [762] [INFO] Booting worker with pid: 762 I0211 01:48:05.391623 753 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7f13c4000000' with size 268435456" I0211 01:48:05.396878 753 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864" I0211 01:48:05.405474 753 model_lifecycle.cc:472] "loading: nvidia_llama_3_2_nv_embedqa_1b_v2_model:1" I0211 01:48:05.405512 753 model_lifecycle.cc:472] "loading: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer:1" I0211 01:48:05.426739 753 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt" I0211 01:48:05.426773 753 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19" I0211 01:48:05.426779 753 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19" I0211 01:48:05.426786 753 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}" I0211 01:48:05.429435 753 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_model (version 1)" [2025-02-11 01:48:05 +0000] [830] [INFO] Booting worker with pid: 830 [2025-02-11 01:48:05 +0000] [835] [INFO] Booting worker with pid: 835 [2025-02-11 01:48:05 +0000] [976] [INFO] Booting worker with pid: 976 [2025-02-11 01:48:05 +0000] [978] [INFO] Booting worker with pid: 978 [2025-02-11 01:48:05 +0000] [1042] [INFO] Booting worker with pid: 1042 [2025-02-11 01:48:05 +0000] [1108] [INFO] Booting worker with pid: 1108 [2025-02-11 01:48:05 +0000] [1235] [INFO] Booting worker with pid: 1235 [2025-02-11 01:48:05 +0000] [1237] [INFO] Booting worker with pid: 1237 [2025-02-11 01:48:05 +0000] [1238] [INFO] Booting worker with pid: 1238 [2025-02-11 01:48:05 +0000] [1304] [INFO] Booting worker with pid: 1304 "timestamp": "2025-02-11 01:48:05,958", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:05 +0000] [757] [INFO] Started server process [757] [2025-02-11 01:48:05 +0000] [757] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,034", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,034", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,034", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [757] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,046", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-11 01:48:06,065", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [758] [INFO] Started server process [758] [2025-02-11 01:48:06 +0000] [758] [INFO] Waiting for application startup. [2025-02-11 01:48:06 +0000] [759] [INFO] Started server process [759] [2025-02-11 01:48:06 +0000] [759] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,121", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,121", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,121", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [758] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,141", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,141", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,141", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [759] [INFO] Application startup complete. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. "timestamp": "2025-02-11 01:48:06,208", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [760] [INFO] Started server process [760] [2025-02-11 01:48:06 +0000] [760] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,263", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,263", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,263", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [760] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,301", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-11 01:48:06,304", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [762] [INFO] Started server process [762] [2025-02-11 01:48:06 +0000] [762] [INFO] Waiting for application startup. [2025-02-11 01:48:06 +0000] [761] [INFO] Started server process [761] [2025-02-11 01:48:06 +0000] [761] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,379", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,379", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,379", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [762] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,385", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,385", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,385", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [761] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,449", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-11 01:48:06,465", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [830] [INFO] Started server process [830] [2025-02-11 01:48:06 +0000] [830] [INFO] Waiting for application startup. [2025-02-11 01:48:06 +0000] [835] [INFO] Started server process [835] [2025-02-11 01:48:06 +0000] [835] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,485", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-11 01:48:06,500", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,500", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,500", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [976] [INFO] Started server process [976] [2025-02-11 01:48:06 +0000] [976] [INFO] Waiting for application startup. [2025-02-11 01:48:06 +0000] [830] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,516", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,516", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,516", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-11 01:48:06,519", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [835] [INFO] Application startup complete. [2025-02-11 01:48:06 +0000] [1042] [INFO] Started server process [1042] [2025-02-11 01:48:06 +0000] [1042] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,538", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,538", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,538", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [976] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,570", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,570", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,570", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [1042] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,578", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-11 01:48:06,588", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [978] [INFO] Started server process [978] [2025-02-11 01:48:06 +0000] [978] [INFO] Waiting for application startup. [2025-02-11 01:48:06 +0000] [1108] [INFO] Started server process [1108] [2025-02-11 01:48:06 +0000] [1108] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,627", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,627", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,627", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [978] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,637", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,637", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,637", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [1108] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,690", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [1235] [INFO] Started server process [1235] [2025-02-11 01:48:06 +0000] [1235] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,716", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-11 01:48:06,728", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [1237] [INFO] Started server process [1237] [2025-02-11 01:48:06 +0000] [1237] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,740", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,740", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,740", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [1235] [INFO] Application startup complete. [2025-02-11 01:48:06 +0000] [1238] [INFO] Started server process [1238] [2025-02-11 01:48:06 +0000] [1238] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,765", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,765", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,766", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-11 01:48:06,768", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-11 01:48:06 +0000] [1237] [INFO] Application startup complete. "timestamp": "2025-02-11 01:48:06,779", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,779", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,779", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [1238] [INFO] Application startup complete. [2025-02-11 01:48:06 +0000] [1304] [INFO] Started server process [1304] [2025-02-11 01:48:06 +0000] [1304] [INFO] Waiting for application startup. "timestamp": "2025-02-11 01:48:06,818", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-11 01:48:06,819", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-11 01:48:06,819", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-11 01:48:06 +0000] [1304] [INFO] Application startup complete. I0211 01:48:07.753913 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_0 (CPU device 0)" I0211 01:48:07.754126 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_1 (CPU device 0)" I0211 01:48:07.754217 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_2 (CPU device 0)" I0211 01:48:07.754385 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_3 (CPU device 0)" I0211 01:48:07.754479 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_4 (CPU device 0)" I0211 01:48:07.754688 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_5 (CPU device 0)" I0211 01:48:07.754763 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_6 (CPU device 0)" I0211 01:48:07.754936 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_7 (CPU device 0)" I0211 01:48:07.755098 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_8 (CPU device 0)" I0211 01:48:07.755237 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_9 (CPU device 0)" I0211 01:48:07.755403 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_10 (CPU device 0)" I0211 01:48:07.755535 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_11 (CPU device 0)" I0211 01:48:07.755642 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_12 (CPU device 0)" I0211 01:48:07.755727 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_13 (CPU device 0)" I0211 01:48:07.755891 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_14 (CPU device 0)" I0211 01:48:07.756052 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_15 (CPU device 0)" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. I0211 01:48:09.154846 753 logging.cc:46] "Loaded engine size: 2364 MiB" W0211 01:48:09.171771 753 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. I0211 01:48:09.672859 753 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_model_0_0 (GPU device 0)" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. I0211 01:48:10.398762 753 model_lifecycle.cc:839] "successfully loaded 'nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer'" I0211 01:48:11.637546 753 logging.cc:46] "Loaded engine size: 2364 MiB" W0211 01:48:11.637672 753 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." I0211 01:48:12.287264 753 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18720, now: CPU 0, GPU 21077 (MiB)" I0211 01:48:12.287789 753 instance_state.cc:186] "Created instance nvidia_llama_3_2_nv_embedqa_1b_v2_model_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];" I0211 01:48:12.288173 753 model_lifecycle.cc:839] "successfully loaded 'nvidia_llama_3_2_nv_embedqa_1b_v2_model'" I0211 01:48:12.288589 753 model_lifecycle.cc:472] "loading: nvidia_llama_3_2_nv_embedqa_1b_v2:1" I0211 01:48:12.289060 753 model_lifecycle.cc:839] "successfully loaded 'nvidia_llama_3_2_nv_embedqa_1b_v2'" I0211 01:48:12.289211 753 server.cc:604] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+ I0211 01:48:12.289278 753 server.cc:631] +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0211 01:48:12.289355 753 server.cc:674] +---------------------------------------------+---------+--------+ | Model | Version | Status | +---------------------------------------------+---------+--------+ | nvidia_llama_3_2_nv_embedqa_1b_v2 | 1 | READY | | nvidia_llama_3_2_nv_embedqa_1b_v2_model | 1 | READY | | nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer | 1 | READY | +---------------------------------------------+---------+--------+ I0211 01:48:12.390816 753 metrics.cc:877] "Collecting metrics for GPU 0: NVIDIA A100-SXM4-80GB" I0211 01:48:12.397971 753 metrics.cc:770] "Collecting CPU metrics" I0211 01:48:12.398145 753 tritonserver.cc:2598] +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.51.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging | | model_repository_path[0] | /opt/nim/tmp/run/triton-model-repository | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | model_config_name | | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0211 01:48:12.407689 753 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:8001" I0211 01:48:12.407726 753 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"