View Issue Details

IDProjectCategoryView StatusLast Update
0009109Rocky-Linux-8kernelpublic2025-03-28 09:21
ReporterDenis Shipochki Assigned To 
PrioritynormalSeverityminorReproducibilityalways
Status newResolutionopen 
Summary0009109: Kdump not working on Rocky Linux 8.10 using kernels "4.18.0-553.30.1.el8_10.x86_64" and newer
DescriptionAfter triggering a crash (via "echo c > /proc/sysrq-trigger") on Rocky Linux 8.10 running with kernel versions "4.18.0-553.30.1.el8_10.x86_64" and newer, kexec doesn't seem to start a capture kernel boot sequence to collect vmcore.

The issue is observed on:
HPE ProLiant DL385 Gen10 Plus (P14280-B21) with an AMD EPYC 7402 CPU.
Dell PowerEdge R6615 with an AMD EPYC 9474F CPU.

We couldn't reproduce the problem on servers with an Intel CPU and a Supermicro AS -1114CS-TNR-EU server with an AMD EPYC 7543P CPU.

By checking via a serial console, only information about the crash is displayed:

[ 128.509691] sysrq: SysRq : Trigger a crash
[ 128.513931] Kernel panic - not syncing: sysrq triggered crash
[ 128.513931]
[ 128.521367] CPU: 67 PID: 9288 Comm: bash Kdump: loaded Tainted: G OE -------- - - 4.18.0-553.27.1.el8_10.x86_64 #1
[ 128.533368] Hardware name: Dell Inc. PowerEdge R6615/047GPR, BIOS 1.11.2 12/19/2024
[ 128.541317] Call Trace:
[ 128.543842] dump_stack+0x41/0x60
[ 128.547262] panic+0xe7/0x2ac
[ 128.550321] ? printk+0x58/0x73
[ 128.553555] sysrq_handle_crash+0x11/0x20
[ 128.557696] __handle_sysrq.cold.13+0x48/0xff
[ 128.562180] write_sysrq_trigger+0x2b/0x40
[ 128.566407] proc_reg_write+0x39/0x60
[ 128.570177] vfs_write+0xa5/0x1b0
[ 128.573598] ksys_write+0x4f/0xb0
[ 128.577032] do_syscall_64+0x5b/0x1a0
[ 128.580803] entry_SYSCALL_64_after_hwframe+0x66/0xcb
[ 128.586003] RIP: 0033:0x7fc5f858b5a8
[ 128.595093] Code: 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 f3 0f 1e fa 48 8d 05 b5 71 2a 00 8b 00 85 c0 75 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 41 54 49 89 d4 55
[ 128.625703] RSP: 002b:00007fffb75c3988 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 128.639195] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fc5f858b5a8
[ 128.652313] RDX: 0000000000000002 RSI: 000055afad109ea0 RDI: 0000000000000001
[ 128.665573] RBP: 000055afad109ea0 R08: 000000000000000a R09: 00007fc5f85ed220
[ 128.678560] R10: 000000000000000a R11: 0000000000000246 R12: 00007fc5f882e6e0
[ 128.691492] R13: 0000000000000002 R14: 00007fc5f8829860 R15: 0000000000000002

Subsequently, after ~20 seconds and no additional text on the screen, the server reboots.

The kdumpctl utility reports that kdump is operational, and the machine boots with the crashkernel=auto kernel parameter.

Kernel versions "4.18.0-553.27.1.el8_10.x86_64" and earlier do not experience this issue, and kdump is working as expected. So, maybe something has changed between those two kernel versions and caused this problem?
Steps To ReproduceOn a similar to the aforementioned servers with an AMD-based CPU running Rocky Linux 8.10:

1. Install kernel "4.18.0-553.30.1.el8_10.x86_64" or newer;
2. Simulate a kernel crash via "echo c > /proc/sysrq-trigger".

The expected outcome is a successful kdump collection consisting of a vmcore dump file, kexec-dmesg.log, and vmcore-dmesg.txt text files in a new directory in /var/crash/.
TagsNo tags attached.

Activities

There are no notes attached to this issue.

Issue History

Date Modified Username Field Change
2025-03-28 09:21 Denis Shipochki New Issue