View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0008086 | Cloud | General | public | 2024-10-18 13:26 | 2024-11-14 01:36 |
Reporter | John O'Brien | Assigned To | |||
Priority | normal | Severity | major | Reproducibility | always |
Status | new | Resolution | open | ||
Platform | EC2 | OS | Rocky | OS Version | 9.4 aarch64 |
Summary | 0008086: EC2 instance bricked after updating kernel to 5.14.0-427.40.1 | ||||
Description | An AWS EC2 instance of Rocky 9.4 on aarch64 becomes unresponsive on next boot after updating the kernel packages to 5.14.0-427.40.1. | ||||
Steps To Reproduce | 1. In us-east-2, launch AMI ID ami-018925a289077b035 ("Rocky-9-EC2-Base-9.4-20240509.0.aarch64") on a t4g.medium instance 2. Observe that the initial kernel version is 5.14.0-427.13.1 3. Run `dnf update -y` and confirm that the kernel is updated to 5.14.0-427.40.1 (or later, presumably) 4. Reboot the instance 5. Instance is bricked | ||||
Additional Information | During the update, the kernel-core scriptlet emits this error: dracut-install: Failed to find module 'xen_netfront' dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.NZeuPh/initramfs --kerneldir /lib/modules/5.14.0-427.40.1.el9_4.aarch64/ -m xen_netfront xen-blkfront dracut-install: Failed to find module 'xen_netfront' dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.Tk4Tzu/initramfs --kerneldir /lib/modules/5.14.0-427.40.1.el9_4.aarch64/ -m xen_netfront xen-blkfront The xen-netfront and xen-blkfront modules are not present on aarch64 as confirmed by searching in /lib/modules and inspecting the output of `lsinitrd` on a newly-launched instance. Deleting /etc/dracut.conf.d/xen.conf prior to updating the kernel makes the error go away but does not prevent the instance from becoming bricked. To avoid the problem, run the following before upgrading the kernel. sudo dnf install -y python3-dnf-plugin-versionlock sudo dnf versionlock kernel kernel-core kernel-modules kernel-modules-core | ||||
Tags | No tags attached. | ||||
Also, I was unable to reproduce this with Rocky 9.4 x86_64, and RHEL 9.4 on either architecture. | |
This remains reproducible with 5.14.0-427.42.1 | |
Seeing this in eu-central-1 region as well (ami-0a8d23cee495f671e) but not us-west-2 (at least, so far). - Create new instance (t4g or c7g) - sudo dnf update kernel* (to 5.14.0-427.42.1.el9_4.aarch64) - Reboot - the instance will get stuck with the following repeating on its console: Starting dracut initqueue hook... [ 168.102020] dracut-initqueue[418]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks: [ 168.170316] dracut-initqueue[418]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2floop0p3.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then [ 168.280260] dracut-initqueue[418]: [ -e "/dev/mapper/loop0p3" ] [ 168.330162] dracut-initqueue[418]: fi" [ 168.360155] dracut-initqueue[418]: Warning: dracut-initqueue: starting timeout scripts - Eventually it'll drop into dracut emergency shell |
|
I have a few more data-points: - ami-002f56ac86a4e51e7 (el9_4.aarch64 in us-west-2 from May 2024) has the same issue - ami-043c5144484c4ff60 (el9_3.aarch64 in eu-central-1 from November 2023) does NOT have this issue. I verified with two tests: 1) create new instance; dnf-update kernel* (to 5.14.0-427.42.1.el9_4.aarch64); reboot 2) create new instance; dnf update --exclude=kernel*; reboot (still on 362.8.1); dnf update kernel*; reboot (now on 427.42) PS. In all cases there is a message about xen_netfront mentioned above in this ticket, so it doesn't seem to be the culprit. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2024-10-18 13:26 | John O'Brien | New Issue | |
2024-10-18 13:38 | John O'Brien | Note Added: 0008515 | |
2024-11-11 23:14 | John O'Brien | Note Added: 0008746 | |
2024-11-13 18:12 | Vitaliy F. | Note Added: 0008779 | |
2024-11-14 01:36 | Vitaliy F. | Note Added: 0008780 |