0009109: Kdump not working on Rocky Linux 8.10 using kernels "4.18.0-553.30.1.el8_10.x86_64" and newer - Rocky Linux BugTracker

ID	Project	Category	View Status	Date Submitted	Last Update

0009109	Rocky-Linux-8	kernel	public	2025-03-28 09:21	2025-04-17 20:58

Reporter	Denis Shipochki	Assigned To
Priority	normal	Severity	minor	Reproducibility	always
Status	new	Resolution	open

Summary	0009109: Kdump not working on Rocky Linux 8.10 using kernels "4.18.0-553.30.1.el8_10.x86_64" and newer
Description	After triggering a crash (via "echo c > /proc/sysrq-trigger") on Rocky Linux 8.10 running with kernel versions "4.18.0-553.30.1.el8_10.x86_64" and newer, kexec doesn't seem to start a capture kernel boot sequence to collect vmcore. The issue is observed on: HPE ProLiant DL385 Gen10 Plus (P14280-B21) with an AMD EPYC 7402 CPU. Dell PowerEdge R6615 with an AMD EPYC 9474F CPU. We couldn't reproduce the problem on servers with an Intel CPU and a Supermicro AS -1114CS-TNR-EU server with an AMD EPYC 7543P CPU. By checking via a serial console, only information about the crash is displayed: [ 128.509691] sysrq: SysRq : Trigger a crash [ 128.513931] Kernel panic - not syncing: sysrq triggered crash [ 128.513931] [ 128.521367] CPU: 67 PID: 9288 Comm: bash Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.27.1.el8_10.x86_64 #1 [ 128.533368] Hardware name: Dell Inc. PowerEdge R6615/047GPR, BIOS 1.11.2 12/19/2024 [ 128.541317] Call Trace: [ 128.543842] dump_stack+0x41/0x60 [ 128.547262] panic+0xe7/0x2ac [ 128.550321] ? printk+0x58/0x73 [ 128.553555] sysrq_handle_crash+0x11/0x20 [ 128.557696] __handle_sysrq.cold.13+0x48/0xff [ 128.562180] write_sysrq_trigger+0x2b/0x40 [ 128.566407] proc_reg_write+0x39/0x60 [ 128.570177] vfs_write+0xa5/0x1b0 [ 128.573598] ksys_write+0x4f/0xb0 [ 128.577032] do_syscall_64+0x5b/0x1a0 [ 128.580803] entry_SYSCALL_64_after_hwframe+0x66/0xcb [ 128.586003] RIP: 0033:0x7fc5f858b5a8 [ 128.595093] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 b5 71 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55 [ 128.625703] RSP: 002b:00007fffb75c3988 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 128.639195] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fc5f858b5a8 [ 128.652313] RDX: 0000000000000002 RSI: 000055afad109ea0 RDI: 0000000000000001 [ 128.665573] RBP: 000055afad109ea0 R08: 000000000000000a R09: 00007fc5f85ed220 [ 128.678560] R10: 000000000000000a R11: 0000000000000246 R12: 00007fc5f882e6e0 [ 128.691492] R13: 0000000000000002 R14: 00007fc5f8829860 R15: 0000000000000002 Subsequently, after ~20 seconds and no additional text on the screen, the server reboots. The kdumpctl utility reports that kdump is operational, and the machine boots with the crashkernel=auto kernel parameter. Kernel versions "4.18.0-553.27.1.el8_10.x86_64" and earlier do not experience this issue, and kdump is working as expected. So, maybe something has changed between those two kernel versions and caused this problem?
Steps To Reproduce	On a similar to the aforementioned servers with an AMD-based CPU running Rocky Linux 8.10: 1. Install kernel "4.18.0-553.30.1.el8_10.x86_64" or newer; 2. Simulate a kernel crash via "echo c > /proc/sysrq-trigger". The expected outcome is a successful kdump collection consisting of a vmcore dump file, kexec-dmesg.log, and vmcore-dmesg.txt text files in a new directory in /var/crash/.
Tags	No tags attached.

Date Modified	Username	Field	Change
2025-03-28 09:21	Denis Shipochki	New Issue
2025-04-17 20:58	Denis Shipochki	Note Added: 0009835