View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0009109 | Rocky-Linux-8 | kernel | public | 2025-03-28 09:21 | 2025-03-28 09:21 |
Reporter | Denis Shipochki | Assigned To | |||
Priority | normal | Severity | minor | Reproducibility | always |
Status | new | Resolution | open | ||
Summary | 0009109: Kdump not working on Rocky Linux 8.10 using kernels "4.18.0-553.30.1.el8_10.x86_64" and newer | ||||
Description | After triggering a crash (via "echo c > /proc/sysrq-trigger") on Rocky Linux 8.10 running with kernel versions "4.18.0-553.30.1.el8_10.x86_64" and newer, kexec doesn't seem to start a capture kernel boot sequence to collect vmcore. The issue is observed on: HPE ProLiant DL385 Gen10 Plus (P14280-B21) with an AMD EPYC 7402 CPU. Dell PowerEdge R6615 with an AMD EPYC 9474F CPU. We couldn't reproduce the problem on servers with an Intel CPU and a Supermicro AS -1114CS-TNR-EU server with an AMD EPYC 7543P CPU. By checking via a serial console, only information about the crash is displayed: [ 128.509691] sysrq: SysRq : Trigger a crash [ 128.513931] Kernel panic - not syncing: sysrq triggered crash [ 128.513931] [ 128.521367] CPU: 67 PID: 9288 Comm: bash Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.27.1.el8_10.x86_64 #1 [ 128.533368] Hardware name: Dell Inc. PowerEdge R6615/047GPR, BIOS 1.11.2 12/19/2024 [ 128.541317] Call Trace: [ 128.543842] dump_stack+0x41/0x60 [ 128.547262] panic+0xe7/0x2ac [ 128.550321] ? printk+0x58/0x73 [ 128.553555] sysrq_handle_crash+0x11/0x20 [ 128.557696] __handle_sysrq.cold.13+0x48/0xff [ 128.562180] write_sysrq_trigger+0x2b/0x40 [ 128.566407] proc_reg_write+0x39/0x60 [ 128.570177] vfs_write+0xa5/0x1b0 [ 128.573598] ksys_write+0x4f/0xb0 [ 128.577032] do_syscall_64+0x5b/0x1a0 [ 128.580803] entry_SYSCALL_64_after_hwframe+0x66/0xcb [ 128.586003] RIP: 0033:0x7fc5f858b5a8 [ 128.595093] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 b5 71 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 [ 128.625703] RSP: 002b:00007fffb75c3988 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 128.639195] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fc5f858b5a8 [ 128.652313] RDX: 0000000000000002 RSI: 000055afad109ea0 RDI: 0000000000000001 [ 128.665573] RBP: 000055afad109ea0 R08: 000000000000000a R09: 00007fc5f85ed220 [ 128.678560] R10: 000000000000000a R11: 0000000000000246 R12: 00007fc5f882e6e0 [ 128.691492] R13: 0000000000000002 R14: 00007fc5f8829860 R15: 0000000000000002 Subsequently, after ~20 seconds and no additional text on the screen, the server reboots. The kdumpctl utility reports that kdump is operational, and the machine boots with the crashkernel=auto kernel parameter. Kernel versions "4.18.0-553.27.1.el8_10.x86_64" and earlier do not experience this issue, and kdump is working as expected. So, maybe something has changed between those two kernel versions and caused this problem? | ||||
Steps To Reproduce | On a similar to the aforementioned servers with an AMD-based CPU running Rocky Linux 8.10: 1. Install kernel "4.18.0-553.30.1.el8_10.x86_64" or newer; 2. Simulate a kernel crash via "echo c > /proc/sysrq-trigger". The expected outcome is a successful kdump collection consisting of a vmcore dump file, kexec-dmesg.log, and vmcore-dmesg.txt text files in a new directory in /var/crash/. | ||||
Tags | No tags attached. | ||||
Date Modified | Username | Field | Change |
---|---|---|---|
2025-03-28 09:21 | Denis Shipochki | New Issue |