[ 0.844189] NVRM: loading NVIDIA UNIX Open Kernel Module for x86_64 575.51.02 Release Build (dvs-builder@U22-I3-G01-3-2) Thu Apr 10 15:55:07 UTC 2025 [ 0.942679] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 1.469393] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 1.469399] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 1.469400] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 1.469402] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:49 [ 1.469403] NVRM: RISC-V CSR State: [ 1.469403] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 1.469404] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 1.469405] NVRM: RISC-V GPR State: [ 1.469405] NVRM: ra:0x000000000140d0f6 sp:0x00000047f240f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 1.469406] NVRM: a0:0x0000000000000000 a1:0x00000047eb220530 a2:0x0000000000000004 a3:0x00000047f2a41000 [ 1.469406] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 1.469407] NVRM: s0:0x00000047f240f740 s1:0x00000047eb444270 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 1.469407] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047eb380590 s7:0x0000000000001500 [ 1.469408] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047eb37e590 [ 1.469409] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 1.469409] NVRM: t4:0x0000000000000000 t5:0x00000047f240f3c1 t6:0x0000000000000020 [ 1.469409] NVRM: Stack Trace: [ 1.469410] NVRM: 0x000000000140ca4a [ 1.469410] NVRM: 0x00000000017d7c26 [ 1.469410] NVRM: 0x00000000017de386 [ 1.469411] NVRM: 0x00000000017dfca8 [ 1.469411] NVRM: 0x00000000017d66b2 [ 1.469411] NVRM: 0x00000000014164f2 [ 1.469411] NVRM: 0x0000000001a259ee [ 1.469412] NVRM: 0x0000000001a483f8 [ 1.469412] NVRM: 0x0000000001b8486c [ 1.469412] NVRM: 0x0000000001a2a74e [ 1.469412] NVRM: Local I/O Register State: [ 1.469413] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 1.469414] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 1.469415] NVRM: ------------[ end crash report ]------------ [ 1.469425] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 1.469426] NVRM: GPU0 RPC history (CPU -> GSP): [ 1.469426] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 1.469426] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357f9d929f95 0x0000000000000000 y [ 1.469428] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357f9d929f93 0x0000000000000000 [ 1.469429] NVRM: GPU0 RPC event history (CPU <- GSP): [ 1.469430] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 1.469430] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357f9d9a6388 0x0006357f9d9a638a 2us y [ 1.469432] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 1.469437] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 1.469439] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 1.469440] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 1.469447] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 1.469448] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 1.469461] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 1.471223] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 1.471233] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 1.472321] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 19.154576] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 19.171768] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 19.171770] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 19.171775] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 19.171783] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 19.172829] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 19.173740] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 19.211267] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 19.227498] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 19.227499] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 19.227505] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 19.227513] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 19.228972] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 19.229769] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 26.713610] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 28.453745] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 28.453753] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 28.453754] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 28.453757] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 28.453758] NVRM: RISC-V CSR State: [ 28.453759] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 28.453760] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 28.453760] NVRM: RISC-V GPR State: [ 28.453760] NVRM: ra:0x000000000140d0f6 sp:0x00000047e3a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 28.453761] NVRM: a0:0x0000000000000000 a1:0x00000047dc820530 a2:0x0000000000000004 a3:0x00000047e4041000 [ 28.453761] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 28.453762] NVRM: s0:0x00000047e3a0f740 s1:0x00000047dca442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 28.453763] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047dc9805f0 s7:0x0000000000001500 [ 28.453763] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047dc97e5f0 [ 28.453764] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 28.453764] NVRM: t4:0x0000000000000000 t5:0x00000047e3a0f3c1 t6:0x0000000000000020 [ 28.453765] NVRM: Stack Trace: [ 28.453765] NVRM: 0x000000000140ca4a [ 28.453765] NVRM: 0x00000000017d7c26 [ 28.453765] NVRM: 0x00000000017de386 [ 28.453766] NVRM: 0x00000000017dfca8 [ 28.453766] NVRM: 0x00000000017d66b2 [ 28.453766] NVRM: 0x00000000014164f2 [ 28.453766] NVRM: 0x0000000001a259ee [ 28.453767] NVRM: 0x0000000001a483f8 [ 28.453767] NVRM: 0x0000000001b8486c [ 28.453767] NVRM: 0x0000000001a2a74e [ 28.453767] NVRM: Local I/O Register State: [ 28.453768] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 28.453769] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 28.453770] NVRM: ------------[ end crash report ]------------ [ 28.453783] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 28.453783] NVRM: GPU0 RPC history (CPU -> GSP): [ 28.453784] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 28.453785] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357f9f2e60d1 0x0000000000000000 y [ 28.453786] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357f9f2e60cb 0x0000000000000000 [ 28.453787] NVRM: GPU0 RPC event history (CPU <- GSP): [ 28.453787] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 28.453788] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357f9f3621b8 0x0006357f9f3621b9 1us y [ 28.453791] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 28.453799] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 28.453800] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 28.453802] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 28.453814] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 28.453814] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 28.453867] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 28.455610] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 28.455627] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 28.457553] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 29.124718] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 30.962410] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 30.962423] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 30.962426] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 30.962434] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 30.962435] NVRM: RISC-V CSR State: [ 30.962435] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 30.962436] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 30.962436] NVRM: RISC-V GPR State: [ 30.962437] NVRM: ra:0x000000000140d0f6 sp:0x00000047c6a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 30.962437] NVRM: a0:0x0000000000000000 a1:0x00000047bf820530 a2:0x0000000000000004 a3:0x00000047c7041000 [ 30.962438] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 30.962438] NVRM: s0:0x00000047c6a0f740 s1:0x00000047bfa442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 30.962439] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047bf9805f0 s7:0x0000000000001500 [ 30.962439] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047bf97e5f0 [ 30.962440] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 30.962440] NVRM: t4:0x0000000000000000 t5:0x00000047c6a0f3c1 t6:0x0000000000000020 [ 30.962441] NVRM: Stack Trace: [ 30.962441] NVRM: 0x000000000140ca4a [ 30.962442] NVRM: 0x00000000017d7c26 [ 30.962442] NVRM: 0x00000000017de386 [ 30.962442] NVRM: 0x00000000017dfca8 [ 30.962442] NVRM: 0x00000000017d66b2 [ 30.962443] NVRM: 0x00000000014164f2 [ 30.962443] NVRM: 0x0000000001a259ee [ 30.962443] NVRM: 0x0000000001a483f8 [ 30.962444] NVRM: 0x0000000001b8486c [ 30.962444] NVRM: 0x0000000001a2a74e [ 30.962444] NVRM: Local I/O Register State: [ 30.962445] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 30.962446] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 30.962448] NVRM: ------------[ end crash report ]------------ [ 30.962462] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 30.962463] NVRM: GPU0 RPC history (CPU -> GSP): [ 30.962463] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 30.962464] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357f9f549c82 0x0000000000000000 y [ 30.962465] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357f9f549c7f 0x0000000000000000 [ 30.962466] NVRM: GPU0 RPC event history (CPU <- GSP): [ 30.962466] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 30.962467] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357f9f5c6983 0x0006357f9f5c6983 y [ 30.962469] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 30.962478] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 30.962480] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 30.962481] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 30.962493] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 30.962493] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 30.962533] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 30.964468] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 30.964489] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 30.966276] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 30.966623] NVRM: nvAssertFailedNoLog: Assertion failed: rmapiLockIsOwner() && rmGpuLockIsOwner() @ conf_compute_api.c:77 [ 31.637935] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 33.479361] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 33.479375] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 33.479378] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 33.479385] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 33.479387] NVRM: RISC-V CSR State: [ 33.479387] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 33.479390] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 33.479391] NVRM: RISC-V GPR State: [ 33.479392] NVRM: ra:0x000000000140d0f6 sp:0x00000047a9a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 33.479393] NVRM: a0:0x0000000000000000 a1:0x00000047a2820530 a2:0x0000000000000004 a3:0x00000047aa041000 [ 33.479395] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 33.479396] NVRM: s0:0x00000047a9a0f740 s1:0x00000047a2a442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 33.479397] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047a29805f0 s7:0x0000000000001500 [ 33.479399] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047a297e5f0 [ 33.479400] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 33.479401] NVRM: t4:0x0000000000000000 t5:0x00000047a9a0f3c1 t6:0x0000000000000020 [ 33.479402] NVRM: Stack Trace: [ 33.479403] NVRM: 0x000000000140ca4a [ 33.479404] NVRM: 0x00000000017d7c26 [ 33.479405] NVRM: 0x00000000017de386 [ 33.479405] NVRM: 0x00000000017dfca8 [ 33.479406] NVRM: 0x00000000017d66b2 [ 33.479407] NVRM: 0x00000000014164f2 [ 33.479407] NVRM: 0x0000000001a259ee [ 33.479408] NVRM: 0x0000000001a483f8 [ 33.479409] NVRM: 0x0000000001b8486c [ 33.479409] NVRM: 0x0000000001a2a74e [ 33.479410] NVRM: Local I/O Register State: [ 33.479411] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 33.479413] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 33.479415] NVRM: ------------[ end crash report ]------------ [ 33.479446] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 33.479449] NVRM: GPU0 RPC history (CPU -> GSP): [ 33.479450] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 33.479451] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357f9f7b0e62 0x0000000000000000 y [ 33.479455] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357f9f7b0e5d 0x0000000000000000 [ 33.479457] NVRM: GPU0 RPC event history (CPU <- GSP): [ 33.479458] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 33.479459] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357f9f82d0e2 0x0006357f9f82d0e5 3us y [ 33.479468] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 33.479485] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 33.479490] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 33.479496] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 33.479516] NVRM: kgspInitRm_IMPL: Max GSP-RM boot attempts exceeded: 4/4 [ 33.479590] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 33.481856] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 33.481876] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x62:1941) [ 33.483629] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 33.517913] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 33.531090] NVRM: kgspInitRm_IMPL: Initial shift, 4, is larger than max allowed [0, 3]. Modulo applied [ 33.531095] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 33.531095] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 33.531099] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 33.531106] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 33.532247] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 33.532890] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 33.533082] NVRM: nvAssertFailedNoLog: Assertion failed: rmapiLockIsOwner() && rmGpuLockIsOwner() @ conf_compute_api.c:77 [ 34.206958] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 35.488005] NVRM: kgspInitRm_IMPL: Initial shift, 4, is larger than max allowed [0, 3]. Modulo applied [ 35.997337] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 35.997347] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 35.997349] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 35.997353] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 35.997353] NVRM: RISC-V CSR State: [ 35.997354] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 35.997355] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 35.997355] NVRM: RISC-V GPR State: [ 35.997356] NVRM: ra:0x000000000140d0f6 sp:0x00000047f240f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 35.997356] NVRM: a0:0x0000000000000000 a1:0x00000047eb220530 a2:0x0000000000000004 a3:0x00000047f2a41000 [ 35.997357] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 35.997357] NVRM: s0:0x00000047f240f740 s1:0x00000047eb4442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 35.997358] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047eb3805f0 s7:0x0000000000001500 [ 35.997359] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047eb37e5f0 [ 35.997359] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 35.997360] NVRM: t4:0x0000000000000000 t5:0x00000047f240f3c1 t6:0x0000000000000020 [ 35.997360] NVRM: Stack Trace: [ 35.997361] NVRM: 0x000000000140ca4a [ 35.997361] NVRM: 0x00000000017d7c26 [ 35.997361] NVRM: 0x00000000017de386 [ 35.997361] NVRM: 0x00000000017dfca8 [ 35.997362] NVRM: 0x00000000017d66b2 [ 35.997362] NVRM: 0x00000000014164f2 [ 35.997362] NVRM: 0x0000000001a259ee [ 35.997362] NVRM: 0x0000000001a483f8 [ 35.997363] NVRM: 0x0000000001b8486c [ 35.997363] NVRM: 0x0000000001a2a74e [ 35.997363] NVRM: Local I/O Register State: [ 35.997364] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 35.997364] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 35.997366] NVRM: ------------[ end crash report ]------------ [ 35.997381] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 35.997382] NVRM: GPU0 RPC history (CPU -> GSP): [ 35.997382] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 35.997383] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357f9fa17d3b 0x0000000000000000 y [ 35.997384] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357f9fa17d34 0x0000000000000000 [ 35.997385] NVRM: GPU0 RPC event history (CPU <- GSP): [ 35.997386] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 35.997386] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357f9fa93d66 0x0006357f9fa93d68 2us y [ 35.997390] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 35.997399] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 35.997401] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 35.997403] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 35.997769] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 35.997772] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 35.997822] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 35.999760] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 35.999780] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 36.000977] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 36.036434] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 36.049796] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 36.049798] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 36.049804] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 36.049814] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 36.051584] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 36.052546] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 36.726683] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 38.571959] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 38.571975] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 38.571979] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 38.571987] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 38.571989] NVRM: RISC-V CSR State: [ 38.571990] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 38.571992] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 38.571994] NVRM: RISC-V GPR State: [ 38.571995] NVRM: ra:0x000000000140d0f6 sp:0x00000047e3a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 38.571997] NVRM: a0:0x0000000000000000 a1:0x00000047dc820530 a2:0x0000000000000004 a3:0x00000047e4041000 [ 38.571998] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 38.572000] NVRM: s0:0x00000047e3a0f740 s1:0x00000047dca442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 38.572001] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047dc9805f0 s7:0x0000000000001500 [ 38.572003] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047dc97e5f0 [ 38.572004] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 38.572005] NVRM: t4:0x0000000000000000 t5:0x00000047e3a0f3c1 t6:0x0000000000000020 [ 38.572007] NVRM: Stack Trace: [ 38.572008] NVRM: 0x000000000140ca4a [ 38.572009] NVRM: 0x00000000017d7c26 [ 38.572009] NVRM: 0x00000000017de386 [ 38.572010] NVRM: 0x00000000017dfca8 [ 38.572011] NVRM: 0x00000000017d66b2 [ 38.572012] NVRM: 0x00000000014164f2 [ 38.572012] NVRM: 0x0000000001a259ee [ 38.572013] NVRM: 0x0000000001a483f8 [ 38.572014] NVRM: 0x0000000001b8486c [ 38.572015] NVRM: 0x0000000001a2a74e [ 38.572015] NVRM: Local I/O Register State: [ 38.572016] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 38.572019] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 38.572021] NVRM: ------------[ end crash report ]------------ [ 38.572196] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 38.572199] NVRM: GPU0 RPC history (CPU -> GSP): [ 38.572199] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 38.572200] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357f9fc8bca4 0x0000000000000000 y [ 38.572202] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357f9fc8bca1 0x0000000000000000 [ 38.572203] NVRM: GPU0 RPC event history (CPU <- GSP): [ 38.572203] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 38.572204] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357f9fd085af 0x0006357f9fd085b0 1us y [ 38.572208] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 38.572218] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 38.572220] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 38.572221] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 38.572235] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 38.572235] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 38.572282] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 38.574204] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 38.574219] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 38.575857] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 38.576102] NVRM: nvAssertFailedNoLog: Assertion failed: rmapiLockIsOwner() && rmGpuLockIsOwner() @ conf_compute_api.c:77 [ 39.255166] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 41.078776] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 41.078791] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 41.078794] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 41.078799] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 41.078800] NVRM: RISC-V CSR State: [ 41.078800] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 41.078801] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 41.078802] NVRM: RISC-V GPR State: [ 41.078802] NVRM: ra:0x000000000140d0f6 sp:0x00000047c6a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 41.078803] NVRM: a0:0x0000000000000000 a1:0x00000047bf820530 a2:0x0000000000000004 a3:0x00000047c7041000 [ 41.078803] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 41.078804] NVRM: s0:0x00000047c6a0f740 s1:0x00000047bfa442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 41.078805] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047bf9805f0 s7:0x0000000000001500 [ 41.078805] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047bf97e5f0 [ 41.078806] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 41.078806] NVRM: t4:0x0000000000000000 t5:0x00000047c6a0f3c1 t6:0x0000000000000020 [ 41.078807] NVRM: Stack Trace: [ 41.078807] NVRM: 0x000000000140ca4a [ 41.078807] NVRM: 0x00000000017d7c26 [ 41.078807] NVRM: 0x00000000017de386 [ 41.078808] NVRM: 0x00000000017dfca8 [ 41.078808] NVRM: 0x00000000017d66b2 [ 41.078808] NVRM: 0x00000000014164f2 [ 41.078809] NVRM: 0x0000000001a259ee [ 41.078809] NVRM: 0x0000000001a483f8 [ 41.078809] NVRM: 0x0000000001b8486c [ 41.078809] NVRM: 0x0000000001a2a74e [ 41.078810] NVRM: Local I/O Register State: [ 41.078810] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 41.078811] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 41.078812] NVRM: ------------[ end crash report ]------------ [ 41.078827] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 41.078828] NVRM: GPU0 RPC history (CPU -> GSP): [ 41.078828] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 41.078829] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357f9feef9ce 0x0000000000000000 y [ 41.078830] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357f9feef9c5 0x0000000000000000 [ 41.078831] NVRM: GPU0 RPC event history (CPU <- GSP): [ 41.078832] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 41.078833] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357f9ff6c6ab 0x0006357f9ff6c6ac 1us y [ 41.078836] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 41.078844] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 41.078846] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 41.078847] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 41.078860] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 41.078860] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 41.078900] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 41.080610] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 41.080631] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 41.083004] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 41.116709] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 41.128886] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 41.128888] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 41.128892] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 41.128900] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 41.130266] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 41.131109] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 41.131303] NVRM: nvAssertFailedNoLog: Assertion failed: rmapiLockIsOwner() && rmGpuLockIsOwner() @ conf_compute_api.c:77 [ 41.805353] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 43.655734] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 43.655748] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 43.655750] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 43.655755] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 43.655756] NVRM: RISC-V CSR State: [ 43.655756] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 43.655758] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 43.655758] NVRM: RISC-V GPR State: [ 43.655758] NVRM: ra:0x000000000140d0f6 sp:0x00000047a9a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 43.655759] NVRM: a0:0x0000000000000000 a1:0x00000047a2820530 a2:0x0000000000000004 a3:0x00000047aa041000 [ 43.655760] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 43.655760] NVRM: s0:0x00000047a9a0f740 s1:0x00000047a2a442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 43.655761] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047a29805f0 s7:0x0000000000001500 [ 43.655761] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047a297e5f0 [ 43.655762] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 43.655762] NVRM: t4:0x0000000000000000 t5:0x00000047a9a0f3c1 t6:0x0000000000000020 [ 43.655763] NVRM: Stack Trace: [ 43.655763] NVRM: 0x000000000140ca4a [ 43.655764] NVRM: 0x00000000017d7c26 [ 43.655764] NVRM: 0x00000000017de386 [ 43.655764] NVRM: 0x00000000017dfca8 [ 43.655764] NVRM: 0x00000000017d66b2 [ 43.655765] NVRM: 0x00000000014164f2 [ 43.655765] NVRM: 0x0000000001a259ee [ 43.655765] NVRM: 0x0000000001a483f8 [ 43.655765] NVRM: 0x0000000001b8486c [ 43.655766] NVRM: 0x0000000001a2a74e [ 43.655766] NVRM: Local I/O Register State: [ 43.655766] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 43.655767] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 43.655768] NVRM: ------------[ end crash report ]------------ [ 43.655783] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 43.655784] NVRM: GPU0 RPC history (CPU -> GSP): [ 43.655785] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 43.655786] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa01651e2 0x0000000000000000 y [ 43.655787] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa01651df 0x0000000000000000 [ 43.655788] NVRM: GPU0 RPC event history (CPU <- GSP): [ 43.655788] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 43.655789] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa01e1858 0x0006357fa01e185a 2us y [ 43.655793] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 43.655803] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 43.655805] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 43.655807] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 43.655819] NVRM: kgspInitRm_IMPL: Max GSP-RM boot attempts exceeded: 4/4 [ 43.655869] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 43.657711] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 43.657728] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x62:1941) [ 43.661018] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 43.697047] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 43.709714] NVRM: kgspInitRm_IMPL: Initial shift, 4, is larger than max allowed [0, 3]. Modulo applied [ 43.709719] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 43.709719] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 43.709723] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 43.709733] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 43.711254] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 43.713064] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 43.713691] NVRM: nvAssertFailedNoLog: Assertion failed: rmapiLockIsOwner() && rmGpuLockIsOwner() @ conf_compute_api.c:77 [ 61.163178] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 62.494104] NVRM: kgspInitRm_IMPL: Initial shift, 4, is larger than max allowed [0, 3]. Modulo applied [ 63.006851] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 63.006864] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 63.006867] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 63.006874] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 63.006877] NVRM: RISC-V CSR State: [ 63.006878] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 63.006880] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 63.006882] NVRM: RISC-V GPR State: [ 63.006882] NVRM: ra:0x000000000140d0f6 sp:0x00000047f240f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 63.006884] NVRM: a0:0x0000000000000000 a1:0x00000047eb220530 a2:0x0000000000000004 a3:0x00000047f2a41000 [ 63.006886] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 63.006887] NVRM: s0:0x00000047f240f740 s1:0x00000047eb4442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 63.006889] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047eb3805f0 s7:0x0000000000001500 [ 63.006890] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047eb37e5f0 [ 63.006892] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 63.006893] NVRM: t4:0x0000000000000000 t5:0x00000047f240f3c1 t6:0x0000000000000020 [ 63.006895] NVRM: Stack Trace: [ 63.006895] NVRM: 0x000000000140ca4a [ 63.006896] NVRM: 0x00000000017d7c26 [ 63.006897] NVRM: 0x00000000017de386 [ 63.006898] NVRM: 0x00000000017dfca8 [ 63.006899] NVRM: 0x00000000017d66b2 [ 63.006899] NVRM: 0x00000000014164f2 [ 63.006900] NVRM: 0x0000000001a259ee [ 63.006901] NVRM: 0x0000000001a483f8 [ 63.006902] NVRM: 0x0000000001b8486c [ 63.006902] NVRM: 0x0000000001a2a74e [ 63.006903] NVRM: Local I/O Register State: [ 63.006904] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 63.006907] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 63.006909] NVRM: ------------[ end crash report ]------------ [ 63.007094] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 63.007101] NVRM: GPU0 RPC history (CPU -> GSP): [ 63.007102] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 63.007104] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa140ed75 0x0000000000000000 y [ 63.007108] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa140ed72 0x0000000000000000 [ 63.007110] NVRM: GPU0 RPC event history (CPU <- GSP): [ 63.007111] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 63.007112] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa148c84c 0x0006357fa148c84d 1us y [ 63.007119] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 63.007131] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 63.007136] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 63.007138] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 63.007154] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 63.007155] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 63.007202] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 63.008951] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 63.008970] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 63.011046] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 63.045716] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 63.058849] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 63.058851] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 63.058855] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 63.058866] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 63.060230] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 63.061320] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 63.728477] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 65.583844] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 65.583854] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 65.583855] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 65.583860] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 65.583861] NVRM: RISC-V CSR State: [ 65.583861] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 65.583862] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 65.583862] NVRM: RISC-V GPR State: [ 65.583863] NVRM: ra:0x000000000140d0f6 sp:0x00000047e3a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 65.583863] NVRM: a0:0x0000000000000000 a1:0x00000047dc820530 a2:0x0000000000000004 a3:0x00000047e4041000 [ 65.583864] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 65.583864] NVRM: s0:0x00000047e3a0f740 s1:0x00000047dca442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 65.583865] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047dc9805f0 s7:0x0000000000001500 [ 65.583865] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047dc97e5f0 [ 65.583866] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 65.583866] NVRM: t4:0x0000000000000000 t5:0x00000047e3a0f3c1 t6:0x0000000000000020 [ 65.583867] NVRM: Stack Trace: [ 65.583867] NVRM: 0x000000000140ca4a [ 65.583867] NVRM: 0x00000000017d7c26 [ 65.583868] NVRM: 0x00000000017de386 [ 65.583868] NVRM: 0x00000000017dfca8 [ 65.583868] NVRM: 0x00000000017d66b2 [ 65.583868] NVRM: 0x00000000014164f2 [ 65.583869] NVRM: 0x0000000001a259ee [ 65.583869] NVRM: 0x0000000001a483f8 [ 65.583869] NVRM: 0x0000000001b8486c [ 65.583869] NVRM: 0x0000000001a2a74e [ 65.583870] NVRM: Local I/O Register State: [ 65.583870] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 65.583871] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 65.583872] NVRM: ------------[ end crash report ]------------ [ 65.583886] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 65.583887] NVRM: GPU0 RPC history (CPU -> GSP): [ 65.583887] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 65.583888] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa1687712 0x0000000000000000 y [ 65.583890] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa168770f 0x0000000000000000 [ 65.583891] NVRM: GPU0 RPC event history (CPU <- GSP): [ 65.583891] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 65.583891] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa1704e7c 0x0006357fa1704e7d 1us y [ 65.583895] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 65.583904] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 65.583906] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 65.583907] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 65.583918] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 65.583919] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 65.583945] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 65.585663] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 65.585683] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 65.588461] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 66.261073] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 68.074421] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 68.074437] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 68.074439] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 68.074443] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 68.074444] NVRM: RISC-V CSR State: [ 68.074444] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 68.074445] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 68.074446] NVRM: RISC-V GPR State: [ 68.074446] NVRM: ra:0x000000000140d0f6 sp:0x00000047c6a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 68.074447] NVRM: a0:0x0000000000000000 a1:0x00000047bf820530 a2:0x0000000000000004 a3:0x00000047c7041000 [ 68.074448] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 68.074448] NVRM: s0:0x00000047c6a0f740 s1:0x00000047bfa442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 68.074449] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047bf9805f0 s7:0x0000000000001500 [ 68.074449] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047bf97e5f0 [ 68.074450] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 68.074450] NVRM: t4:0x0000000000000000 t5:0x00000047c6a0f3c1 t6:0x0000000000000020 [ 68.074451] NVRM: Stack Trace: [ 68.074451] NVRM: 0x000000000140ca4a [ 68.074452] NVRM: 0x00000000017d7c26 [ 68.074452] NVRM: 0x00000000017de386 [ 68.074452] NVRM: 0x00000000017dfca8 [ 68.074453] NVRM: 0x00000000017d66b2 [ 68.074453] NVRM: 0x00000000014164f2 [ 68.074453] NVRM: 0x0000000001a259ee [ 68.074453] NVRM: 0x0000000001a483f8 [ 68.074454] NVRM: 0x0000000001b8486c [ 68.074454] NVRM: 0x0000000001a2a74e [ 68.074454] NVRM: Local I/O Register State: [ 68.074455] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 68.074455] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 68.074456] NVRM: ------------[ end crash report ]------------ [ 68.074471] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 68.074472] NVRM: GPU0 RPC history (CPU -> GSP): [ 68.074472] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 68.074473] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa18ea1f5 0x0000000000000000 y [ 68.074474] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa18ea1f2 0x0000000000000000 [ 68.074475] NVRM: GPU0 RPC event history (CPU <- GSP): [ 68.074476] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 68.074476] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa196742d 0x0006357fa196742e 1us y [ 68.074480] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 68.074488] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 68.074490] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 68.074491] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 68.074504] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 68.074504] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 68.074542] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 68.076069] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 68.076087] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 68.078131] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 68.832147] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 70.590444] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 70.590457] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 70.590458] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 70.590462] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 70.590463] NVRM: RISC-V CSR State: [ 70.590463] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 70.590464] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 70.590465] NVRM: RISC-V GPR State: [ 70.590465] NVRM: ra:0x000000000140d0f6 sp:0x00000047a9a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 70.590466] NVRM: a0:0x0000000000000000 a1:0x00000047a2820530 a2:0x0000000000000004 a3:0x00000047aa041000 [ 70.590466] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 70.590467] NVRM: s0:0x00000047a9a0f740 s1:0x00000047a2a442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 70.590468] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047a29805f0 s7:0x0000000000001500 [ 70.590468] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047a297e5f0 [ 70.590469] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 70.590469] NVRM: t4:0x0000000000000000 t5:0x00000047a9a0f3c1 t6:0x0000000000000020 [ 70.590470] NVRM: Stack Trace: [ 70.590470] NVRM: 0x000000000140ca4a [ 70.590470] NVRM: 0x00000000017d7c26 [ 70.590471] NVRM: 0x00000000017de386 [ 70.590471] NVRM: 0x00000000017dfca8 [ 70.590471] NVRM: 0x00000000017d66b2 [ 70.590471] NVRM: 0x00000000014164f2 [ 70.590472] NVRM: 0x0000000001a259ee [ 70.590472] NVRM: 0x0000000001a483f8 [ 70.590472] NVRM: 0x0000000001b8486c [ 70.590473] NVRM: 0x0000000001a2a74e [ 70.590473] NVRM: Local I/O Register State: [ 70.590473] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 70.590474] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 70.590475] NVRM: ------------[ end crash report ]------------ [ 70.590489] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 70.590490] NVRM: GPU0 RPC history (CPU -> GSP): [ 70.590490] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 70.590491] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa1b5234f 0x0000000000000000 y [ 70.590493] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa1b5234c 0x0000000000000000 [ 70.590494] NVRM: GPU0 RPC event history (CPU <- GSP): [ 70.590494] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 70.590495] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa1bcf295 0x0006357fa1bcf296 1us y [ 70.590498] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 70.590507] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 70.590509] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 70.590510] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 70.590519] NVRM: kgspInitRm_IMPL: Max GSP-RM boot attempts exceeded: 4/4 [ 70.590552] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 70.592357] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 70.592378] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x62:1941) [ 70.594380] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 71.261151] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 72.572353] NVRM: kgspInitRm_IMPL: Initial shift, 4, is larger than max allowed [0, 3]. Modulo applied [ 73.084807] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 73.084820] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 73.084823] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 73.084831] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 73.084833] NVRM: RISC-V CSR State: [ 73.084834] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 73.084837] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 73.084838] NVRM: RISC-V GPR State: [ 73.084839] NVRM: ra:0x000000000140d0f6 sp:0x00000047f240f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 73.084841] NVRM: a0:0x0000000000000000 a1:0x00000047eb220530 a2:0x0000000000000004 a3:0x00000047f2a41000 [ 73.084842] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 73.084844] NVRM: s0:0x00000047f240f740 s1:0x00000047eb4442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 73.084845] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047eb3805f0 s7:0x0000000000001500 [ 73.084847] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047eb37e5f0 [ 73.084848] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 73.084850] NVRM: t4:0x0000000000000000 t5:0x00000047f240f3c1 t6:0x0000000000000020 [ 73.084851] NVRM: Stack Trace: [ 73.084852] NVRM: 0x000000000140ca4a [ 73.084853] NVRM: 0x00000000017d7c26 [ 73.084854] NVRM: 0x00000000017de386 [ 73.084855] NVRM: 0x00000000017dfca8 [ 73.084855] NVRM: 0x00000000017d66b2 [ 73.084856] NVRM: 0x00000000014164f2 [ 73.084857] NVRM: 0x0000000001a259ee [ 73.084858] NVRM: 0x0000000001a483f8 [ 73.084858] NVRM: 0x0000000001b8486c [ 73.084859] NVRM: 0x0000000001a2a74e [ 73.084860] NVRM: Local I/O Register State: [ 73.084861] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 73.084863] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 73.084866] NVRM: ------------[ end crash report ]------------ [ 73.084893] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 73.084896] NVRM: GPU0 RPC history (CPU -> GSP): [ 73.084897] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 73.084898] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa1db45e1 0x0000000000000000 y [ 73.084902] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa1db45de 0x0000000000000000 [ 73.084904] NVRM: GPU0 RPC event history (CPU <- GSP): [ 73.084905] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 73.084906] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa1e315c5 0x0006357fa1e315c6 1us y [ 73.084913] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 73.084942] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 73.084946] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 73.084948] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 73.084966] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 73.084967] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 73.085021] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 73.087197] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 73.087219] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 73.088890] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 73.762671] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 75.576229] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 75.576245] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 75.576248] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 75.576256] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 75.576258] NVRM: RISC-V CSR State: [ 75.576259] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 75.576261] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 75.576263] NVRM: RISC-V GPR State: [ 75.576264] NVRM: ra:0x000000000140d0f6 sp:0x00000047e3a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 75.576266] NVRM: a0:0x0000000000000000 a1:0x00000047dc820530 a2:0x0000000000000004 a3:0x00000047e4041000 [ 75.576267] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 75.576269] NVRM: s0:0x00000047e3a0f740 s1:0x00000047dca442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 75.576270] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047dc9805f0 s7:0x0000000000001500 [ 75.576272] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047dc97e5f0 [ 75.576273] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 75.576275] NVRM: t4:0x0000000000000000 t5:0x00000047e3a0f3c1 t6:0x0000000000000020 [ 75.576276] NVRM: Stack Trace: [ 75.576277] NVRM: 0x000000000140ca4a [ 75.576278] NVRM: 0x00000000017d7c26 [ 75.576279] NVRM: 0x00000000017de386 [ 75.576280] NVRM: 0x00000000017dfca8 [ 75.576280] NVRM: 0x00000000017d66b2 [ 75.576281] NVRM: 0x00000000014164f2 [ 75.576282] NVRM: 0x0000000001a259ee [ 75.576283] NVRM: 0x0000000001a483f8 [ 75.576283] NVRM: 0x0000000001b8486c [ 75.576284] NVRM: 0x0000000001a2a74e [ 75.576285] NVRM: Local I/O Register State: [ 75.576286] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 75.576288] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 75.576291] NVRM: ------------[ end crash report ]------------ [ 75.576318] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 75.576320] NVRM: GPU0 RPC history (CPU -> GSP): [ 75.576321] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 75.576323] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa201579d 0x0000000000000000 y [ 75.576326] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa201579a 0x0000000000000000 [ 75.576329] NVRM: GPU0 RPC event history (CPU <- GSP): [ 75.576330] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 75.576331] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa2092637 0x0006357fa2092638 1us y [ 75.576338] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 75.576366] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 75.576370] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 75.576373] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 75.576396] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 75.576398] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 75.576450] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 75.579435] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 75.579454] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 75.582007] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 76.253599] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 78.069969] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 78.069979] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 78.069980] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 78.069984] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 78.069985] NVRM: RISC-V CSR State: [ 78.069986] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 78.069986] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 78.069987] NVRM: RISC-V GPR State: [ 78.069987] NVRM: ra:0x000000000140d0f6 sp:0x00000047c6a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 78.069988] NVRM: a0:0x0000000000000000 a1:0x00000047bf820530 a2:0x0000000000000004 a3:0x00000047c7041000 [ 78.069988] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 78.069989] NVRM: s0:0x00000047c6a0f740 s1:0x00000047bfa442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 78.069989] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047bf9805f0 s7:0x0000000000001500 [ 78.069990] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047bf97e5f0 [ 78.069991] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 78.069991] NVRM: t4:0x0000000000000000 t5:0x00000047c6a0f3c1 t6:0x0000000000000020 [ 78.069992] NVRM: Stack Trace: [ 78.069992] NVRM: 0x000000000140ca4a [ 78.069992] NVRM: 0x00000000017d7c26 [ 78.069993] NVRM: 0x00000000017de386 [ 78.069993] NVRM: 0x00000000017dfca8 [ 78.069993] NVRM: 0x00000000017d66b2 [ 78.069993] NVRM: 0x00000000014164f2 [ 78.069994] NVRM: 0x0000000001a259ee [ 78.069994] NVRM: 0x0000000001a483f8 [ 78.069994] NVRM: 0x0000000001b8486c [ 78.069994] NVRM: 0x0000000001a2a74e [ 78.069995] NVRM: Local I/O Register State: [ 78.069995] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 78.069996] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 78.069997] NVRM: ------------[ end crash report ]------------ [ 78.070012] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 78.070013] NVRM: GPU0 RPC history (CPU -> GSP): [ 78.070013] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 78.070014] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa22770f2 0x0000000000000000 y [ 78.070015] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa22770ef 0x0000000000000000 [ 78.070016] NVRM: GPU0 RPC event history (CPU <- GSP): [ 78.070017] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 78.070017] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa22f3da0 0x0006357fa22f3da2 2us y [ 78.070020] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 78.070029] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 78.070031] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 78.070032] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 78.070044] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 78.070045] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 78.070086] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 78.071781] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 78.071800] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 78.073479] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 78.750476] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 80.547532] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 80.547549] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 80.547554] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 80.547564] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 80.547567] NVRM: RISC-V CSR State: [ 80.547568] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 80.547571] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 80.547572] NVRM: RISC-V GPR State: [ 80.547573] NVRM: ra:0x000000000140d0f6 sp:0x00000047a9a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 80.547575] NVRM: a0:0x0000000000000000 a1:0x00000047a2820530 a2:0x0000000000000004 a3:0x00000047aa041000 [ 80.547577] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 80.547578] NVRM: s0:0x00000047a9a0f740 s1:0x00000047a2a442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 80.547580] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047a29805f0 s7:0x0000000000001500 [ 80.547581] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047a297e5f0 [ 80.547582] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 80.547583] NVRM: t4:0x0000000000000000 t5:0x00000047a9a0f3c1 t6:0x0000000000000020 [ 80.547584] NVRM: Stack Trace: [ 80.547584] NVRM: 0x000000000140ca4a [ 80.547585] NVRM: 0x00000000017d7c26 [ 80.547585] NVRM: 0x00000000017de386 [ 80.547585] NVRM: 0x00000000017dfca8 [ 80.547586] NVRM: 0x00000000017d66b2 [ 80.547586] NVRM: 0x00000000014164f2 [ 80.547586] NVRM: 0x0000000001a259ee [ 80.547586] NVRM: 0x0000000001a483f8 [ 80.547587] NVRM: 0x0000000001b8486c [ 80.547587] NVRM: 0x0000000001a2a74e [ 80.547587] NVRM: Local I/O Register State: [ 80.547588] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 80.547589] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 80.547590] NVRM: ------------[ end crash report ]------------ [ 80.547605] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 80.547606] NVRM: GPU0 RPC history (CPU -> GSP): [ 80.547606] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 80.547607] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fa24d497b 0x0000000000000000 y [ 80.547609] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fa24d4978 0x0000000000000000 [ 80.547609] NVRM: GPU0 RPC event history (CPU <- GSP): [ 80.547610] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 80.547610] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fa25511c8 0x0006357fa25511c9 1us y [ 80.547614] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 80.547623] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 80.547625] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 80.547626] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 80.547635] NVRM: kgspInitRm_IMPL: Max GSP-RM boot attempts exceeded: 4/4 [ 80.547701] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 80.549541] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 80.549562] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x62:1941) [ 80.551337] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 80.587388] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 80.600492] NVRM: kgspInitRm_IMPL: Initial shift, 4, is larger than max allowed [0, 3]. Modulo applied [ 80.600497] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 80.600498] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 80.600502] NVRM: crashcatWayfinderGetReportQueue_V1: insufficiently-sized L1 wayfinder scratch location 0 [ 80.600510] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 80.602061] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 80.603146] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 361.251120] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 362.557007] NVRM: kgspInitRm_IMPL: Initial shift, 4, is larger than max allowed [0, 3]. Modulo applied [ 363.067663] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 363.067678] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 363.067681] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 363.067685] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 363.067686] NVRM: RISC-V CSR State: [ 363.067687] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 363.067688] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 363.067688] NVRM: RISC-V GPR State: [ 363.067688] NVRM: ra:0x000000000140d0f6 sp:0x00000047f240f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 363.067689] NVRM: a0:0x0000000000000000 a1:0x00000047eb220530 a2:0x0000000000000004 a3:0x00000047f2a41000 [ 363.067690] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 363.067690] NVRM: s0:0x00000047f240f740 s1:0x00000047eb4442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 363.067691] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047eb3805f0 s7:0x0000000000001500 [ 363.067691] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047eb37e5f0 [ 363.067692] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 363.067692] NVRM: t4:0x0000000000000000 t5:0x00000047f240f3c1 t6:0x0000000000000020 [ 363.067693] NVRM: Stack Trace: [ 363.067693] NVRM: 0x000000000140ca4a [ 363.067694] NVRM: 0x00000000017d7c26 [ 363.067694] NVRM: 0x00000000017de386 [ 363.067694] NVRM: 0x00000000017dfca8 [ 363.067694] NVRM: 0x00000000017d66b2 [ 363.067695] NVRM: 0x00000000014164f2 [ 363.067695] NVRM: 0x0000000001a259ee [ 363.067695] NVRM: 0x0000000001a483f8 [ 363.067696] NVRM: 0x0000000001b8486c [ 363.067696] NVRM: 0x0000000001a2a74e [ 363.067696] NVRM: Local I/O Register State: [ 363.067696] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 363.067697] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 363.067698] NVRM: ------------[ end crash report ]------------ [ 363.067714] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 363.067715] NVRM: GPU0 RPC history (CPU -> GSP): [ 363.067716] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 363.067717] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fb3247e3c 0x0000000000000000 y [ 363.067718] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fb3247e39 0x0000000000000000 [ 363.067719] NVRM: GPU0 RPC event history (CPU <- GSP): [ 363.067720] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 363.067720] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fb32c446f 0x0006357fb32c4472 3us y [ 363.067724] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 363.067733] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 363.067735] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 363.067737] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 363.067767] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 363.067767] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 363.067815] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 363.070002] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 363.070025] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 363.071689] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 382.048848] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 383.923728] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 383.923746] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 383.923749] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 383.923758] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 383.923760] NVRM: RISC-V CSR State: [ 383.923761] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 383.923764] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 383.923766] NVRM: RISC-V GPR State: [ 383.923767] NVRM: ra:0x000000000140d0f6 sp:0x00000047e3a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 383.923769] NVRM: a0:0x0000000000000000 a1:0x00000047dc820530 a2:0x0000000000000004 a3:0x00000047e4041000 [ 383.923770] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 383.923772] NVRM: s0:0x00000047e3a0f740 s1:0x00000047dca442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 383.923774] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047dc9805f0 s7:0x0000000000001500 [ 383.923776] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047dc97e5f0 [ 383.923777] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 383.923779] NVRM: t4:0x0000000000000000 t5:0x00000047e3a0f3c1 t6:0x0000000000000020 [ 383.923780] NVRM: Stack Trace: [ 383.923781] NVRM: 0x000000000140ca4a [ 383.923782] NVRM: 0x00000000017d7c26 [ 383.923783] NVRM: 0x00000000017de386 [ 383.923784] NVRM: 0x00000000017dfca8 [ 383.923785] NVRM: 0x00000000017d66b2 [ 383.923785] NVRM: 0x00000000014164f2 [ 383.923786] NVRM: 0x0000000001a259ee [ 383.923787] NVRM: 0x0000000001a483f8 [ 383.923788] NVRM: 0x0000000001b8486c [ 383.923789] NVRM: 0x0000000001a2a74e [ 383.923790] NVRM: Local I/O Register State: [ 383.923791] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 383.923793] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 383.923796] NVRM: ------------[ end crash report ]------------ [ 383.923826] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 383.923829] NVRM: GPU0 RPC history (CPU -> GSP): [ 383.923830] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 383.923831] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fb462c1c5 0x0000000000000000 y [ 383.923835] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fb462c1c3 0x0000000000000000 [ 383.923838] NVRM: GPU0 RPC event history (CPU <- GSP): [ 383.923839] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 383.923840] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fb46a85fd 0x0006357fb46a85fe 1us y [ 383.923848] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 383.923879] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 383.923883] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 383.923885] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 383.923903] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 383.923904] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 383.923951] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 383.926259] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 383.926279] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 383.928276] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 [ 386.441224] NVRM: testIfDsmSubFunctionEnabled: GPS ACPI DSM called before _acpiDsmSupportedFuncCacheInit subfunction = 11. [ 388.289851] NVRM: kgspHealthCheck_TU102: ****************************** GSP-CrashCat Report ******************************* [ 388.289865] NVRM: GPU at PCI:0000:01:00: GPU-40bddfe9-fde0-15e2-04ba-e09cf5c0b1a8 [ 388.289867] NVRM: Xid (PCI:0000:01:00): 120, GSP task exception: load access page fault (cause:0xd) @ pc:0x140ca4a, partition:4#0, task:3 [ 388.289871] NVRM: Reported by libos partition:4#5 kernel v3.1 [0] @ ts:2 [ 388.289872] NVRM: RISC-V CSR State: [ 388.289873] NVRM: sstatus:0x0000000200000020 sscratch:0xffffffffa30144d0 sie:0x0000000000000220 sip:0x0000000000000000 [ 388.289873] NVRM: sepc:0x000000000140ca4a stval:0x0000000000000000 scause:0x000000000000000d [ 388.289874] NVRM: RISC-V GPR State: [ 388.289874] NVRM: ra:0x000000000140d0f6 sp:0x00000047c6a0f5b0 gp:0x0000000000000000 tp:0x0000000000000000 [ 388.289875] NVRM: a0:0x0000000000000000 a1:0x00000047bf820530 a2:0x0000000000000004 a3:0x00000047c7041000 [ 388.289876] NVRM: a4:0x0000000000000000 a5:0x0000000000000000 a6:0x0000000000001010 a7:0x0000000000000004 [ 388.289876] NVRM: s0:0x00000047c6a0f740 s1:0x00000047bfa442d0 s2:0x0000000000000002 s3:0x00000000017d7c26 [ 388.289877] NVRM: s4:0x00000000040d36b0 s5:0x00000000001a8000 s6:0x00000047bf9805f0 s7:0x0000000000001500 [ 388.289877] NVRM: s8:0x00000000040d3bc8 s9:0x0000000000000000 s10:0x0000000000000000 s11:0x00000047bf97e5f0 [ 388.289878] NVRM: t0:0x0000000000000020 t1:0x0000000000000001 t2:0x0000000000000000 t3:0x0000000000000020 [ 388.289878] NVRM: t4:0x0000000000000000 t5:0x00000047c6a0f3c1 t6:0x0000000000000020 [ 388.289879] NVRM: Stack Trace: [ 388.289879] NVRM: 0x000000000140ca4a [ 388.289879] NVRM: 0x00000000017d7c26 [ 388.289879] NVRM: 0x00000000017de386 [ 388.289880] NVRM: 0x00000000017dfca8 [ 388.289880] NVRM: 0x00000000017d66b2 [ 388.289880] NVRM: 0x00000000014164f2 [ 388.289881] NVRM: 0x0000000001a259ee [ 388.289881] NVRM: 0x0000000001a483f8 [ 388.289881] NVRM: 0x0000000001b8486c [ 388.289881] NVRM: 0x0000000001a2a74e [ 388.289882] NVRM: Local I/O Register State: [ 388.289882] NVRM: 0x01450800:0x00000000 0x01450900:0xbadf202b 0x01450a00:0x00000000 0x01450c00:0x00000000 [ 388.289883] NVRM: 0x01454a00:0x810400d0 0x01454b00:0x010800d0 0x01454c00:0x00080000 0x01400200:0x00000040 [ 388.289884] NVRM: ------------[ end crash report ]------------ [ 388.289898] NVRM: GPU0 GSP RPC buffer contains function 4128 (GSP_POST_NOCAT_RECORD) and data 0x0000000000000005 0x00000000017d7c26. [ 388.289899] NVRM: GPU0 RPC history (CPU -> GSP): [ 388.289900] NVRM: entry function data0 data1 ts_start ts_end duration actively_polling [ 388.289901] NVRM: 0 73 SET_REGISTRY 0x0000000000000000 0x0000000000000000 0x0006357fb4a55bc4 0x0000000000000000 y [ 388.289902] NVRM: -1 72 GSP_SET_SYSTEM_INFO 0x0000000000000000 0x0000000000000000 0x0006357fb4a55bc2 0x0000000000000000 [ 388.289903] NVRM: GPU0 RPC event history (CPU <- GSP): [ 388.289903] NVRM: entry function data0 data1 ts_start ts_end duration during_incomplete_rpc [ 388.289904] NVRM: 0 4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000017d7c26 0x0006357fb4ad26c2 0x0006357fb4ad26c4 2us y [ 388.289907] NVRM: kgspRcAndNotifyAllChannels_IMPL: RC all user channels for critical error 120. [ 388.289916] NVRM: kgspHealthCheck_TU102: ********************************************************************************** [ 388.289918] NVRM: nvCheckOkFailedNoLog: Check failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from rpcRecvPoll(pGpu, pRpc, NV_VGPU_MSG_EVENT_GSP_INIT_DONE) @ kernel_gsp.c:4877 [ 388.289919] NVRM: nvAssertOkFailedNoLog: Assertion failed: Reset required [NV_ERR_RESET_REQUIRED] (0x00000062) returned from kgspWaitForRmInitDone(pGpu, pKernelGsp) @ kernel_gsp_gh100.c:952 [ 388.289933] NVRM: _kgspBootGspRm: unexpected WPR2 already up, cannot proceed with booting GSP [ 388.289933] NVRM: _kgspBootGspRm: (the GPU is likely in a bad state and may need to be reset) [ 388.289977] NVRM: RmInitAdapter: Cannot initialize GSP firmware RM [ 388.292017] NVRM: iovaspaceDestruct_IMPL: 1 left-over mappings in IOVAS 0x100 [ 388.292031] NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x62:0x40:1941) [ 388.293263] NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0