View Issue Details

IDProjectCategoryView StatusLast Update
0008086CloudGeneralpublic2024-12-05 15:41
ReporterJohn O'Brien Assigned ToNeil Hanlon  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
PlatformEC2OSRockyOS Version9.4 aarch64
Summary0008086: EC2 instance bricked after updating kernel to 5.14.0-427.40.1
DescriptionAn AWS EC2 instance of Rocky 9.4 on aarch64 becomes unresponsive on next boot after updating the kernel packages to 5.14.0-427.40.1.
Steps To Reproduce1. In us-east-2, launch AMI ID ami-018925a289077b035 ("Rocky-9-EC2-Base-9.4-20240509.0.aarch64") on a t4g.medium instance
2. Observe that the initial kernel version is 5.14.0-427.13.1
3. Run `dnf update -y` and confirm that the kernel is updated to 5.14.0-427.40.1 (or later, presumably)
4. Reboot the instance
5. Instance is bricked
Additional InformationDuring the update, the kernel-core scriptlet emits this error:

dracut-install: Failed to find module 'xen_netfront'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.NZeuPh/initramfs --kerneldir /lib/modules/5.14.0-427.40.1.el9_4.aarch64/ -m xen_netfront xen-blkfront
dracut-install: Failed to find module 'xen_netfront'
dracut: FAILED: /usr/lib/dracut/dracut-install -D /var/tmp/dracut.Tk4Tzu/initramfs --kerneldir /lib/modules/5.14.0-427.40.1.el9_4.aarch64/ -m xen_netfront xen-blkfront

The xen-netfront and xen-blkfront modules are not present on aarch64 as confirmed by searching in /lib/modules and inspecting the output of `lsinitrd` on a newly-launched instance. Deleting /etc/dracut.conf.d/xen.conf prior to updating the kernel makes the error go away but does not prevent the instance from becoming bricked.

To avoid the problem, run the following before upgrading the kernel.

sudo dnf install -y python3-dnf-plugin-versionlock
sudo dnf versionlock kernel kernel-core kernel-modules kernel-modules-core
TagsNo tags attached.

Activities

John O'Brien

John O'Brien

2024-10-18 13:38

reporter   ~0008515

Also, I was unable to reproduce this with Rocky 9.4 x86_64, and RHEL 9.4 on either architecture.
John O'Brien

John O'Brien

2024-11-11 23:14

reporter   ~0008746

This remains reproducible with 5.14.0-427.42.1
Vitaliy F.

Vitaliy F.

2024-11-13 18:12

reporter   ~0008779

Seeing this in eu-central-1 region as well (ami-0a8d23cee495f671e) but not us-west-2 (at least, so far).

- Create new instance (t4g or c7g)
- sudo dnf update kernel* (to 5.14.0-427.42.1.el9_4.aarch64)
- Reboot - the instance will get stuck with the following repeating on its console:

         Starting dracut initqueue hook...
[ 168.102020] dracut-initqueue[418]: Warning: dracut-initqueue: timeout, still waiting for following initqueue hooks:
[ 168.170316] dracut-initqueue[418]: Warning: /lib/dracut/hooks/initqueue/finished/devexists-\x2fdev\x2fmapper\x2floop0p3.sh: "if ! grep -q After=remote-fs-pre.target /run/systemd/generator/systemd-cryptsetup@*.service 2>/dev/null; then
[ 168.280260] dracut-initqueue[418]: [ -e "/dev/mapper/loop0p3" ]
[ 168.330162] dracut-initqueue[418]: fi"
[ 168.360155] dracut-initqueue[418]: Warning: dracut-initqueue: starting timeout scripts

- Eventually it'll drop into dracut emergency shell
Vitaliy F.

Vitaliy F.

2024-11-14 01:36

reporter   ~0008780

I have a few more data-points:
- ami-002f56ac86a4e51e7 (el9_4.aarch64 in us-west-2 from May 2024) has the same issue
- ami-043c5144484c4ff60 (el9_3.aarch64 in eu-central-1 from November 2023) does NOT have this issue. I verified with two tests:
1) create new instance; dnf-update kernel* (to 5.14.0-427.42.1.el9_4.aarch64); reboot
2) create new instance; dnf update --exclude=kernel*; reboot (still on 362.8.1); dnf update kernel*; reboot (now on 427.42)

PS. In all cases there is a message about xen_netfront mentioned above in this ticket, so it doesn't seem to be the culprit.
Vitaliy F.

Vitaliy F.

2024-12-05 15:26

reporter   ~0009024

I just tested with ami-04de53fd0d752b714 (Rocky-9-EC2-Base-9.5-20241118.0.aarch64 in eu-central-1) - the problem is gone.

On a fresh RL 9.5 VM, dnf upgraded kernel-5.14.0-503.14.1.el9_5.aarch64 to 5.14.0-503.15.1.el9_5 and there were no problems rebooting afterwards.
Neil Hanlon

Neil Hanlon

2024-12-05 15:41

administrator   ~0009025

Excellent! Thank you for reporting back, and apologies for the latency in addressing the bug report.

Issue History

Date Modified Username Field Change
2024-10-18 13:26 John O'Brien New Issue
2024-10-18 13:38 John O'Brien Note Added: 0008515
2024-11-11 23:14 John O'Brien Note Added: 0008746
2024-11-13 18:12 Vitaliy F. Note Added: 0008779
2024-11-14 01:36 Vitaliy F. Note Added: 0008780
2024-12-05 14:56 Neil Hanlon Assigned To => Neil Hanlon
2024-12-05 14:56 Neil Hanlon Status new => assigned
2024-12-05 15:26 Vitaliy F. Note Added: 0009024
2024-12-05 15:41 Neil Hanlon Status assigned => resolved
2024-12-05 15:41 Neil Hanlon Resolution open => fixed
2024-12-05 15:41 Neil Hanlon Note Added: 0009025