View Issue Details

IDProjectCategoryView StatusLast Update
0005677Rocky-Linux-9kernelpublic2024-02-10 19:26
ReporterHonggang Li Assigned To 
PrioritynormalSeveritycrashReproducibilityalways
Status newResolutionopen 
Summary0005677: modprobe ib_srpt with srpt_service_guid always panic
Description[ 9.351208] XFS (dm-2): Ending clean mount
[ 9.392686] XFS (sda1): Ending recovery (logdev: internal)
[ 9.557040] RPC: Registered named UNIX socket transport module.
[ 9.557044] RPC: Registered udp transport module.
[ 9.557045] RPC: Registered tcp transport module.
[ 9.557058] RPC: Registered tcp NFSv4.1 backchannel transport module.
[ 9.679319] RPC: Registered rdma transport module.
[ 9.679322] RPC: Registered rdma backchannel transport module.
[ 12.087796] mlx5_core 0000:05:00.0 mlx5_roce_p1: Link down
[ 12.192227] mlx5_core 0000:05:00.1 mlx5_roce_p2: Link up
[ 13.145053] IPv6: ADDRCONF(NETDEV_CHANGE): mlx5_roce_p2: link becomes ready
[ 14.978711] e1000e 0000:08:00.0 enp8s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 14.979045] IPv6: ADDRCONF(NETDEV_CHANGE): enp8s0: link becomes ready
[ 15.571706] e1000e 0000:09:00.0 enp9s0: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
[ 15.572047] IPv6: ADDRCONF(NETDEV_CHANGE): enp9s0: link becomes ready
[ 20.055851] block dm-0: the capability attribute has been deprecated.
[ 389.810750] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 389.810776] #PF: supervisor instruction fetch in kernel mode
[ 389.810792] #PF: error_code(0x0010) - not-present page
[ 389.810807] PGD 0 P4D 0
[ 389.810816] Oops: 0010 [#1] PREEMPT SMP PTI
[ 389.810829] CPU: 7 PID: 1644 Comm: modprobe Kdump: loaded Not tainted 5.14.0-362.8.1.el9_3.x86_64 #1
[ 389.810852] Hardware name: Supermicro X9DRL-3F/iF/X9DRL-3F/iF, BIOS 3.0a 08/08/2013
[ 389.810872] RIP: 0010:0x0
[ 389.810906] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[ 389.810922] RSP: 0018:ffffb91c4060fca8 EFLAGS: 00010246
[ 389.810937] RAX: 0000000000000000 RBX: ffffffffc108fb00 RCX: 0000000000000012
[ 389.810955] RDX: ffff9214085e3980 RSI: ffffffffc108fa88 RDI: ffff9214e29e7ed2
[ 389.810972] RBP: ffff9214e29e7ed2 R08: 0000000000000000 R09: 00000000ffff8000
[ 389.810989] R10: 0000000000000011 R11: ffffb91c4060fd38 R12: ffffffffc10950d8
[ 389.811007] R13: ffffffff865e1810 R14: ffff9214e29e7ec0 R15: ffffffffc108fa88
[ 389.811025] FS: 00007fb0ddf99740(0000) GS:ffff921b5fdc0000(0000) knlGS:0000000000000000
[ 389.811044] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 389.811059] CR2: ffffffffffffffd6 CR3: 0000000142056005 CR4: 00000000001706e0
[ 389.811076] Call Trace:
[ 389.811086] <TASK>
[ 389.811093] ? show_trace_log_lvl+0x1c4/0x2df
[ 389.811108] ? show_trace_log_lvl+0x1c4/0x2df
[ 389.811121] ? parse_one+0xdd/0x1f0
[ 389.811133] ? __die_body.cold+0x8/0xd
[ 389.811146] ? page_fault_oops+0x134/0x170
[ 389.811160] ? exc_page_fault+0x62/0x150
[ 389.811173] ? asm_exc_page_fault+0x22/0x30
[ 389.811189] ? __pfx_unknown_module_param_cb+0x10/0x10
[ 389.811205] parse_one+0xdd/0x1f0
[ 389.811219] parse_args+0xeb/0x190
[ 389.811723] ? __pfx_unknown_module_param_cb+0x10/0x10
[ 389.812175] ? __pfx_unknown_module_param_cb+0x10/0x10
[ 389.812619] load_module+0xa61/0xb80
[ 389.813057] ? __pfx_unknown_module_param_cb+0x10/0x10
[ 389.813492] __do_sys_init_module+0x12e/0x1b0
[ 389.813919] do_syscall_64+0x5c/0x90
[ 389.814339] ? do_user_addr_fault+0x1d6/0x6a0
[ 389.814765] ? exc_page_fault+0x62/0x150
[ 389.815186] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 389.815606] RIP: 0033:0x7fb0dd63f69e
[ 389.816024] Code: 48 8b 0d 85 a7 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 52 a7 1b 00 f7 d8 64 89 01 48
[ 389.816903] RSP: 002b:00007ffd4c6b3b68 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[ 389.817344] RAX: ffffffffffffffda RBX: 000055781cc95f80 RCX: 00007fb0dd63f69e
[ 389.817783] RDX: 000055781cc99420 RSI: 000000000003806e RDI: 00007fb0dde39010
[ 389.818230] RBP: 00007fb0dde39010 R08: 000055781cc96af0 R09: 0000000000039000
[ 389.818684] R10: 0000000000000005 R11: 0000000000000246 R12: 000055781cc99420
[ 389.819146] R13: 000055781cc96260 R14: 000055781cc95f80 R15: 000055781cc96aa0
[ 389.819603] </TASK>
[ 389.820045] Modules linked in: ib_srpt(+) rfkill rpcrdma sunrpc rdma_ucm ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp mlx5_ib coretemp kvm_intel ib_uverbs ib_core ipmi_ssif kvm mxm_wmi mei_me mgag200 mei i2c_algo_bit joydev i2c_i801 drm_shmem_helper drm_kms_helper iTCO_wdt iTCO_vendor_support syscopyarea lpc_ich sysfillrect irqbypass sysimgblt i2c_smbus acpi_ipmi pcspkr ipmi_si ipmi_devintf rapl intel_cstate ipmi_msghandler intel_uncore ioatdma dca drm fuse xfs libcrc32c sd_mod t10_pi sg mlx5_core ahci libahci mlxfw crct10dif_pclmul crc32_pclmul crc32c_intel psample e1000e libata ghash_clmulni_intel tls pci_hyperv_intf wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: ib_srpt]
[ 389.823645] CR2: 0000000000000000
Steps To Reproduce[root@m201 ~]# ibstat
CA 'mlx5_0'
    CA type: MT4117
    Number of ports: 1
    Firmware version: 14.27.6122
    Hardware version: 0
    Node GUID: 0xb8cef60300724be8
    System image GUID: 0xb8cef60300724be8
    Port 1:
        State: Down
        Physical state: Disabled
        Rate: 40
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x00010000
        Port GUID: 0xbacef6fffe724be8
        Link layer: Ethernet
CA 'mlx5_1'
    CA type: MT4117
    Number of ports: 1
    Firmware version: 14.27.6122
    Hardware version: 0
    Node GUID: 0xb8cef60300724be9
    System image GUID: 0xb8cef60300724be8
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 25
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x00010000
        Port GUID: 0xbacef6fffe724be9
        Link layer: Ethernet

[root@m201 ~]# rmmod ib_srpt
[root@m201 ~]# modprobe -v ib_srpt srpt_service_guid=0xbacef6fffe724be9
insmod /lib/modules/5.14.0-362.8.1.el9_3.x86_64/kernel/drivers/infiniband/ulp/srpt/ib_srpt.ko.xz srpt_service_guid=0xbacef6fffe724be9
(kernel panic and system reboot)
TagsNo tags attached.

Activities

Honggang Li

Honggang Li

2024-02-05 05:52

reporter   ~0005875

https://patchwork.kernel.org/project/linux-rdma/patch/20240205004207.17031-1-bvanassche@acm.org/

This patch fixes the kernel panic issue for me.
Akemi Yagi

Akemi Yagi

2024-02-10 19:26

reporter   ~0006007

Nice!

It will take a while before the patch rolls down to the Rocky kernel. In the meantime, as soon as it makes it into the mainline kernel, elrepo's kernel-ml should have it.

Issue History

Date Modified Username Field Change
2024-01-31 01:44 Honggang Li New Issue
2024-02-05 05:52 Honggang Li Note Added: 0005875
2024-02-10 19:26 Akemi Yagi Note Added: 0006007