=================================== == NVIDIA NIM for Text Embedding == =================================== NVIDIA Release 1.3.0 Model: nvidia/llama-3.2-nv-embedqa-1b-v2 Container image Copyright (c) 2016-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. This NIM container is governed by the NVIDIA AI Product Agreement here: https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/ A copy of this license can be found under /opt/nim/LICENSE. The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement (found at https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-ai-foundation-models-community-license-agreement/). Third Party Software Attributions and Licenses can be found under /opt/nim/NOTICE NOTE: CUDA Forward Compatibility mode ENABLED. Using CUDA 12.6 driver version 560.35.03 with kernel driver version 535.54.03. See https://docs.nvidia.com/deploy/cuda-compatibility/ for details. Overriding NIM_LOG_LEVEL: replacing NIM_LOG_LEVEL=unset with NIM_LOG_LEVEL=INFO Running automatic profile selection: NIM_MANIFEST_PROFILE is not set Selected profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df "timestamp": "2025-02-13 06:05:22,802", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-13 06:05:22,802", "level": "INFO", "message": "Using the profile specified by the user: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-13 06:05:22,802", "level": "INFO", "message": "Downloading manifest profile: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" 2025-02-13T06:05:22.803229Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https tokenizer_config.json [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 49.35 KiB/49.35 KiB 2.33 MiB/s (0s)2025-02-13T06:05:24.274700Z INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer_config.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/048e9b9b79e1c5b6be971f79026d18f4" 2025-02-13T06:05:24.275991Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https checksums.blake3 [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 157 B/157 B 18.07 KiB/s (0s)2025-02-13T06:05:25.411490Z INFO nim_hub_ngc::api::tokio: Downloaded filename: checksums.blake3 to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/65310a3e313fc53877edc6225aa48e5d" 2025-02-13T06:05:25.412836Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https tokenizer.json [00:00:00] [████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 8.66 MiB/8.66 MiB 73.57 MiB/s (0s)2025-02-13T06:05:26.637254Z INFO nim_hub_ngc::api::tokio: Downloaded filename: tokenizer.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/7ccbfc239ab3e285da1a311d893c165c" 2025-02-13T06:05:26.638627Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https metadata.json [00:00:00] [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 219 B/219 B 25.05 KiB/s (0s)2025-02-13T06:05:27.838041Z INFO nim_hub_ngc::api::tokio: Downloaded filename: metadata.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/b320276ae291b616de45b8aff08c586c" 2025-02-13T06:05:27.839563Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https model.plan [00:00:21] [███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 2.31 GiB/2.31 GiB 108.14 MiB/s (0s)2025-02-13T06:05:51.415511Z INFO nim_hub_ngc::api::tokio: Downloaded filename: model.plan to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/ffe3a7f946dc59b05bead36ec3943549-5" 2025-02-13T06:05:51.417278Z INFO nim_hub_ngc::api::tokio::builder: ngc configured with api_loc: api.ngc.nvidia.com auth_loc: authn.nvidia.com scheme: https special_tokens_map.json [00:00:00] [█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████] 449 B/449 B 61.64 KiB/s (0s)2025-02-13T06:05:52.650293Z INFO nim_hub_ngc::api::tokio: Downloaded filename: special_tokens_map.json to blob: "/opt/nim/.cache/ngc/hub/models--nim--nvidia--llama-3-2-nv-embedqa-1b-v2/blobs/ca1622ba94d36a31aa7ebaa7973ed497" "timestamp": "2025-02-13 06:05:52,651", "level": "INFO", "message": "Using the workspace specified during init: /opt/nim/workspace" "timestamp": "2025-02-13 06:05:52,651", "level": "INFO", "message": "Materializing workspace to: /opt/nim/workspace" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. [02/13/2025-06:05:56] [TRT] [W] Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors. Adding extra output token_count to ensemble. Not consumed by next model [2025-02-13 06:05:57 +0000] [754] [INFO] Starting gunicorn 23.0.0 [2025-02-13 06:05:57 +0000] [754] [INFO] Listening at: http://0.0.0.0:8000 (754) [2025-02-13 06:05:57 +0000] [754] [INFO] Using worker: uvicorn.workers.UvicornWorker [2025-02-13 06:05:57 +0000] [756] [INFO] Booting worker with pid: 756 [2025-02-13 06:05:57 +0000] [758] [INFO] Booting worker with pid: 758 [2025-02-13 06:05:57 +0000] [759] [INFO] Booting worker with pid: 759 [2025-02-13 06:05:57 +0000] [760] [INFO] Booting worker with pid: 760 [2025-02-13 06:05:57 +0000] [761] [INFO] Booting worker with pid: 761 [2025-02-13 06:05:57 +0000] [762] [INFO] Booting worker with pid: 762 [2025-02-13 06:05:57 +0000] [763] [INFO] Booting worker with pid: 763 [2025-02-13 06:05:57 +0000] [764] [INFO] Booting worker with pid: 764 [2025-02-13 06:05:57 +0000] [765] [INFO] Booting worker with pid: 765 [2025-02-13 06:05:57 +0000] [766] [INFO] Booting worker with pid: 766 [2025-02-13 06:05:57 +0000] [767] [INFO] Booting worker with pid: 767 [2025-02-13 06:05:57 +0000] [831] [INFO] Booting worker with pid: 831 [2025-02-13 06:05:57 +0000] [933] [INFO] Booting worker with pid: 933 I0213 06:05:58.031002 753 pinned_memory_manager.cc:277] "Pinned memory pool is created at '0x7fe370000000' with size 268435456" I0213 06:05:58.033838 753 cuda_memory_manager.cc:107] "CUDA memory pool is created on device 0 with size 67108864" [2025-02-13 06:05:58 +0000] [1085] [INFO] Booting worker with pid: 1085 I0213 06:05:58.041149 753 model_lifecycle.cc:472] "loading: nvidia_llama_3_2_nv_embedqa_1b_v2_model:1" I0213 06:05:58.041214 753 model_lifecycle.cc:472] "loading: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer:1" I0213 06:05:58.079796 753 tensorrt.cc:65] "TRITONBACKEND_Initialize: tensorrt" I0213 06:05:58.079848 753 tensorrt.cc:75] "Triton TRITONBACKEND API version: 1.19" I0213 06:05:58.079864 753 tensorrt.cc:81] "'tensorrt' TRITONBACKEND API version: 1.19" I0213 06:05:58.079878 753 tensorrt.cc:105] "backend configuration:\n{\"cmdline\":{\"auto-complete-config\":\"true\",\"backend-directory\":\"/opt/tritonserver/backends\",\"min-compute-capability\":\"6.000000\",\"default-max-batch-size\":\"4\"}}" I0213 06:05:58.083484 753 tensorrt.cc:231] "TRITONBACKEND_ModelInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_model (version 1)" [2025-02-13 06:05:58 +0000] [1103] [INFO] Booting worker with pid: 1103 [2025-02-13 06:05:58 +0000] [1238] [INFO] Booting worker with pid: 1238 "timestamp": "2025-02-13 06:05:58,571", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-13 06:05:58,577", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-13 06:05:58,584", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:58 +0000] [756] [INFO] Started server process [756] [2025-02-13 06:05:58 +0000] [756] [INFO] Waiting for application startup. [2025-02-13 06:05:58 +0000] [758] [INFO] Started server process [758] [2025-02-13 06:05:58 +0000] [758] [INFO] Waiting for application startup. [2025-02-13 06:05:58 +0000] [759] [INFO] Started server process [759] [2025-02-13 06:05:58 +0000] [759] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:58,659", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,659", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,659", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-13 06:05:58,662", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,662", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,662", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:58 +0000] [756] [INFO] Application startup complete. [2025-02-13 06:05:58 +0000] [758] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:58,668", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,668", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,668", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:58 +0000] [759] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:58,682", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:58 +0000] [761] [INFO] Started server process [761] [2025-02-13 06:05:58 +0000] [761] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:58,727", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:58 +0000] [760] [INFO] Started server process [760] [2025-02-13 06:05:58 +0000] [760] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:58,760", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,760", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,760", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:58 +0000] [761] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:58,777", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-13 06:05:58,786", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,786", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,786", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" "timestamp": "2025-02-13 06:05:58,791", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:58 +0000] [760] [INFO] Application startup complete. [2025-02-13 06:05:58 +0000] [762] [INFO] Started server process [762] [2025-02-13 06:05:58 +0000] [762] [INFO] Waiting for application startup. [2025-02-13 06:05:58 +0000] [763] [INFO] Started server process [763] [2025-02-13 06:05:58 +0000] [763] [INFO] Waiting for application startup. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. "timestamp": "2025-02-13 06:05:58,834", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,834", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,834", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:58 +0000] [762] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:58,848", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,849", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,849", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:58 +0000] [763] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:58,867", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:58 +0000] [764] [INFO] Started server process [764] [2025-02-13 06:05:58 +0000] [764] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:58,903", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-13 06:05:58,920", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,920", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,920", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:58 +0000] [765] [INFO] Started server process [765] [2025-02-13 06:05:58 +0000] [765] [INFO] Waiting for application startup. [2025-02-13 06:05:58 +0000] [764] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:58,957", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:58,957", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:58,957", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:58 +0000] [765] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:58,964", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-13 06:05:58,969", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:58 +0000] [766] [INFO] Started server process [766] [2025-02-13 06:05:58 +0000] [766] [INFO] Waiting for application startup. [2025-02-13 06:05:58 +0000] [767] [INFO] Started server process [767] [2025-02-13 06:05:58 +0000] [767] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:59,018", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:59,018", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:59,018", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:59 +0000] [766] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:59,023", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:59,023", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:59,023", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:59 +0000] [767] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:59,037", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:59 +0000] [1085] [INFO] Started server process [1085] [2025-02-13 06:05:59 +0000] [1085] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:59,055", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:59 +0000] [831] [INFO] Started server process [831] [2025-02-13 06:05:59 +0000] [831] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:59,089", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:59,090", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:59,090", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:59 +0000] [1085] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:59,094", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-13 06:05:59,109", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:59,109", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:59,109", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:59 +0000] [933] [INFO] Started server process [933] [2025-02-13 06:05:59 +0000] [933] [INFO] Waiting for application startup. [2025-02-13 06:05:59 +0000] [831] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:59,134", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" "timestamp": "2025-02-13 06:05:59,148", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:59,148", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:59,148", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:59 +0000] [933] [INFO] Application startup complete. [2025-02-13 06:05:59 +0000] [1103] [INFO] Started server process [1103] [2025-02-13 06:05:59 +0000] [1103] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:59,169", "level": "INFO", "message": "No OTel configuration file found at expected path: /etc/nim/config/otel.yaml" [2025-02-13 06:05:59 +0000] [1238] [INFO] Started server process [1238] [2025-02-13 06:05:59 +0000] [1238] [INFO] Waiting for application startup. "timestamp": "2025-02-13 06:05:59,190", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:59,190", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:59,190", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:59 +0000] [1103] [INFO] Application startup complete. "timestamp": "2025-02-13 06:05:59,221", "level": "INFO", "message": "Registered custom profile selectors: []" "timestamp": "2025-02-13 06:05:59,221", "level": "INFO", "message": "selector used: EnvProfileSelector for manifest " "timestamp": "2025-02-13 06:05:59,222", "level": "INFO", "message": "Matched profile_id in manifest from env NIM_MODEL_PROFILE to: ab412b0239ed85250b13b6907a1e5efcee8c64d9a8bfd8f978482fbaa92660df" [2025-02-13 06:05:59 +0000] [1238] [INFO] Application startup complete. I0213 06:06:00.121699 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_0 (CPU device 0)" I0213 06:06:00.121900 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_1 (CPU device 0)" I0213 06:06:00.121980 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_2 (CPU device 0)" I0213 06:06:00.122109 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_3 (CPU device 0)" I0213 06:06:00.122149 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_4 (CPU device 0)" I0213 06:06:00.122299 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_5 (CPU device 0)" I0213 06:06:00.122492 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_6 (CPU device 0)" I0213 06:06:00.123045 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_8 (CPU device 0)" I0213 06:06:00.123090 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_7 (CPU device 0)" I0213 06:06:00.123258 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_9 (CPU device 0)" I0213 06:06:00.123755 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_10 (CPU device 0)" I0213 06:06:00.124136 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_11 (CPU device 0)" I0213 06:06:00.124386 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_12 (CPU device 0)" I0213 06:06:00.124528 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_13 (CPU device 0)" I0213 06:06:00.124871 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_14 (CPU device 0)" I0213 06:06:00.124968 753 python_be.cc:2249] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer_0_15 (CPU device 0)" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. I0213 06:06:01.324523 753 logging.cc:46] "Loaded engine size: 2364 MiB" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. W0213 06:06:01.353850 753 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. I0213 06:06:01.789631 753 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: nvidia_llama_3_2_nv_embedqa_1b_v2_model_0_0 (GPU device 0)" There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. There was a problem when trying to write in your cache folder (/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. I0213 06:06:02.624584 753 model_lifecycle.cc:839] "successfully loaded 'nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer'" I0213 06:06:03.695375 753 logging.cc:46] "Loaded engine size: 2364 MiB" W0213 06:06:03.695516 753 logging.cc:43] "Using an engine plan file across different models of devices is not recommended and is likely to affect performance or even cause errors." I0213 06:06:04.375708 753 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +18720, now: CPU 0, GPU 21077 (MiB)" I0213 06:06:04.380404 753 instance_state.cc:186] "Created instance nvidia_llama_3_2_nv_embedqa_1b_v2_model_0_0 on GPU 0 with stream priority 0 and optimization profile default[0];" I0213 06:06:04.381874 753 model_lifecycle.cc:839] "successfully loaded 'nvidia_llama_3_2_nv_embedqa_1b_v2_model'" I0213 06:06:04.382348 753 model_lifecycle.cc:472] "loading: nvidia_llama_3_2_nv_embedqa_1b_v2:1" I0213 06:06:04.382738 753 model_lifecycle.cc:839] "successfully loaded 'nvidia_llama_3_2_nv_embedqa_1b_v2'" I0213 06:06:04.382876 753 server.cc:604] +------------------+------+ | Repository Agent | Path | +------------------+------+ +------------------+------+ I0213 06:06:04.382953 753 server.cc:631] +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Backend | Path | Config | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ | tensorrt | /opt/tritonserver/backends/tensorrt/libtriton_tensorrt.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | | python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","default-max-batch-size":"4"}} | +----------+-----------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0213 06:06:04.383088 753 server.cc:674] +---------------------------------------------+---------+--------+ | Model | Version | Status | +---------------------------------------------+---------+--------+ | nvidia_llama_3_2_nv_embedqa_1b_v2 | 1 | READY | | nvidia_llama_3_2_nv_embedqa_1b_v2_model | 1 | READY | | nvidia_llama_3_2_nv_embedqa_1b_v2_tokenizer | 1 | READY | +---------------------------------------------+---------+--------+ I0213 06:06:04.538504 753 metrics.cc:877] "Collecting metrics for GPU 0: NVIDIA A100-SXM4-80GB" I0213 06:06:04.550217 753 metrics.cc:770] "Collecting CPU metrics" I0213 06:06:04.550452 753 tritonserver.cc:2598] +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.51.0 | | server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging | | model_repository_path[0] | /opt/nim/tmp/run/triton-model-repository | | model_control_mode | MODE_NONE | | strict_model_config | 0 | | model_config_name | | | rate_limit | OFF | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | | cache_enabled | 0 | +----------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I0213 06:06:04.556202 753 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:8001" I0213 06:06:04.556247 753 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002"