View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000176 | Rocky-Linux-8 | kernel | public | 2022-08-04 08:08 | 2022-08-25 19:20 |
Reporter | Neil neil | Assigned To | Louis Abel | ||
Priority | high | Severity | block | Reproducibility | sometimes |
Status | closed | Resolution | no change required | ||
Platform | x86-64 inter | OS | Rocky Linux release 8.6 (Green O | OS Version | Rocky Linux rele |
Summary | 0000176: The physical machine restarts a server every day, and there is no related log | ||||
Description | A newly purchased batch of server services run with ceph storage openstack compute A physical machine is restarted in the cluster every day., This may seem like a hardware problem, but we contacted the hardware support, which is the instruction of Warm Reset /var/crash/ There is no information under the path, and kdumap did not find any useful logs. I hope the organization can help analyze and troubleshoot the direction of the problem. Thank you very much! | ||||
Steps To Reproduce | Before rebooting: 2022-08-03T19:17:25.529508+08:00 xxxhost ceph-osd[5068]: 2022-08-03T19:17:25.528+0800 7f23e3c1c700 -1 --2- [v2:10.1.1.1:6825/5068,v1:10.1.1.1:6831/5068] >> [v2:10.1.1.3:6817/2056885,v1:10.1.1.3:6861/2056885] conn(0x55eec5782000 0x55ee74bda500 unknown :-1 s=BANNER_CONNECTING pgs=30571 cs=11571 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer [v2:10.1.1.3:6817/2056885,v1:10.1.1.3:6861/2056885] is using msgr V1 protocol 2022-08-03T19:17:26.013482+08:00 xxxhost ceph-osd[5086]: 2022-08-03T19:17:26.012+0800 7f8b6e342700 -1 --2- [v2:10.1.1.1:6813/1005086,v1:10.1.1.1:6815/1005086] >> [v2:10.1.1.4:6821/3013756,v1:10.1.1.4:6833/3013756] conn(0x562f3ab58c00 0x562f4a52d700 unknown :-1 s=BANNER_CONNECTING pgs=27487 cs=11571 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer [v2:10.1.1.4:6821/3013756,v1:10.1.1.4:6833/3013756] is using msgr V1 protocol 2022-08-03T19:17:40.506488+08:00 xxxhost ceph-osd[5118]: 2022-08-03T19:17:40.505+0800 7fbb2aace700 -1 --2- [v2:10.1.1.1:6814/5118,v1:10.1.1.1:6818/5118] >> [v2:10.1.1.2:6845/1005121,v1:10.1.1.2:6847/1005121] conn(0x563a454a1000 0x563b04a31e00 unknown :-1 s=BANNER_CONNECTING pgs=9017 cs=11566 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer [v2:10.1.1.2:6845/1005121,v1:10.1.1.2:6847/1005121] is using msgr V1 protocol 2022-08-03T19:17:40.530498+08:00 xxxhost ceph-osd[5068]: 2022-08-03T19:17:40.529+0800 7f23e3c1c700 -1 --2- [v2:10.1.1.1:6825/5068,v1:10.1.1.1:6831/5068] >> [v2:10.1.1.3:6817/2056885,v1:10.1.1.3:6861/2056885] conn(0x55eec5782000 0x55ee74bda500 unknown :-1 s=BANNER_CONNECTING pgs=30571 cs=11572 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer [v2:10.1.1.3:6817/2056885,v1:10.1.1.3:6861/2056885] is using msgr V1 protocol 2022-08-03T19:17:41.014477+08:00 xxxhost ceph-osd[5086]: 2022-08-03T19:17:41.013+0800 7f8b6e342700 -1 --2- [v2:10.1.1.1:6813/1005086,v1:10.1.1.1:6815/1005086] >> [v2:10.1.1.4:6821/3013756,v1:10.1.1.4:6833/3013756] conn(0x562f3ab58c00 0x562f4a52d700 unknown :-1 s=BANNER_CONNECTING pgs=27487 cs=11572 l=0 rev1=1 rx=0 tx=0)._handle_peer_banner peer [v2:10.1.1.4:6821/3013756,v1:10.1.1.4:6833/3013756] is using msgr V1 protocol After restart: 2022-08-03T19:23:35.955783+08:00 xxxhost kernel: Linux version 4.18.0-372.13.1.el8_6.x86_64 (mockbuild@dal1-prod-builder001.bld.equ.rockylinux.org) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-10) (GCC)) #1 SMP Wed Jun 29 17:21:09 UTC 2022 2022-08-03T19:23:35.955966+08:00 xxxhost kernel: Command line: BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-372.13.1.el8_6.x86_64 root=UUID=a863cc8c-1a0f-421a-bce5-0f435eb74954 ro crashkernel=512M@16M grep -E "error|Error|ERROR|fail|Fail|FAIL" /var/log/messages 2022-08-03T19:23:36.026616+08:00 xxxhost kernel: pci 0000:65:00.0: BAR 6: failed to assign [mem size 0x00100000 pref] 2022-08-03T19:23:36.032640+08:00 xxxhost kernel: ERST: Error Record Serialization Table (ERST) support is initialized. 2022-08-03T19:23:36.463509+08:00 xxxhost kernel: bnxt_en 0000:31:00.0 (unnamed net_device) (uninitialized): PTP initialization failed. 2022-08-03T19:23:36.644505+08:00 xxxhost kernel: bnxt_en 0000:31:00.1 (unnamed net_device) (uninitialized): PTP initialization failed. 2022-08-03T19:23:36.877340+08:00 xxxhost kernel: bnxt_en 0000:b1:00.0 (unnamed net_device) (uninitialized): PTP initialization failed. 2022-08-03T19:23:37.089581+08:00 xxxhost kernel: bnxt_en 0000:b1:00.1 (unnamed net_device) (uninitialized): PTP initialization failed. 2022-08-03T19:23:51.268879+08:00 xxxhost kernel: ACPI Error: No handler for Region [SYSI] (000000007fd51248) [IPMI] (20210604/evregion-135) 2022-08-03T19:23:51.268972+08:00 xxxhost kernel: ACPI Error: Region IPMI (ID=7) has no handler (20210604/exfldio-265) 2022-08-03T19:23:51.284200+08:00 xxxhost kernel: ACPI Error: Aborting method \_SB.PMI0._GHL due to previous error (AE_NOT_EXIST) (20210604/psparse-531) 2022-08-03T19:23:51.291671+08:00 xxxhost kernel: ACPI Error: Aborting method \_SB.PMI0._PMC due to previous error (AE_NOT_EXIST) (20210604/psparse-531) 2022-08-03T19:23:51.291902+08:00 xxxhost kernel: ACPI Error: AE_NOT_EXIST, Evaluating _PMC (20210604/power_meter-759) 2022-08-03T19:23:52.788965+08:00 xxxhost kernel: bnxt_en 0000:31:00.0: bnxt_re: probe error: RoCE is not supported on this device 2022-08-03T19:23:52.789520+08:00 xxxhost kernel: bnxt_en 0000:31:00.1: bnxt_re: probe error: RoCE is not supported on this device 2022-08-03T19:23:52.789657+08:00 xxxhost kernel: bnxt_en 0000:b1:00.0: bnxt_re: probe error: RoCE is not supported on this device 2022-08-03T19:23:52.789742+08:00 xxxhost kernel: bnxt_en 0000:b1:00.1: bnxt_re: probe error: RoCE is not supported on this device 2022-08-03T19:23:56.328783+08:00 xxxhost augenrules[3942]: failure 1 2022-08-03T19:23:56.328783+08:00 xxxhost augenrules[3942]: failure 1 2022-08-03T19:23:56.328783+08:00 xxxhost augenrules[3942]: failure 1 2022-08-03T19:24:13.887961+08:00 xxxhost rsyslogd[6998]: imjournal: fscanf on state file `/var/lib/rsyslog/imjournal.state' failed [v8.2102.0-7.el8_6.1 try https://www.rsyslog.com/e/2027 ] kdumap log: + 2022-08-03 19:24:38 /usr/bin/kdumpctl@698: /sbin/kexec -s -d -p '--command-line=BOOT_IMAGE=(hd0,gpt2)/vmlinuz-4.18.0-372.13.1.el8_6.x86_64 ro irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd hest_disable disable_cpu_apicid=0 iTCO_wdt.pretimeout=0' --initrd=/boot/initramfs-4.18.0-372.13.1.el8_6.x86_64kdump.img /boot/vmlinuz-4.18.0-372.13.1.el8_6.x86_64 Try gzip decompression. Try LZMA decompression. lzma_decompress_file: read on /boot/vmlinuz-4.18.0-372.13.1.el8_6.x86_64 of 65536 bytes failed + 2022-08-03 19:24:39 /usr/bin/kdumpctl@702: ret=0 + 2022-08-03 19:24:39 /usr/bin/kdumpctl@703: set +x | ||||
Tags | reboot | ||||
ceph Version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) openstack version Victoria openstack-nova-compute-22.2.2-1.el8.noarch openstack-neutron-common-17.2.1-1.el8.noarch openstack-nova-common-22.2.2-1.el8.noarch openstack-neutron-linuxbridge-17.2.1-1.el8.noarch python3-openstacksdk-0.50.0-1.el8.noarch |
|
Thank you for the report. We are not able to determine from this bug report if it is an issue with Rocky Linux or hardware due to missing information: * An SOS report (can be ran via installing sos and running sosreport) or general hardware information * other relevant logs Because you are using software that we do not ship, it makes it difficult for us help troubleshoot. If you have installed openstack and this problem began shortly after, you may need to work with the openstack community to resolve the issue. As you are using a version of openstack that is in extended maintenance, the community may or may not request that you upgrade to a supported version. Please note that this bug tracker is not meant for general support questions. |
|
We finally took out the kdump log, which looks like a hardware compatibility issue. Ask the community to help analyze what caused it. | |
Hello, now our kdump cannot mount the hard drive and dump kdump core. At present, we can only restart whether to enter kdump shell or not. Is there a better way for the community to dump kdump core files to external media? Dump kdump core files to external media? |
|
You can use the rescue mode of any Rocky Linux ISO to obtain the kernel dumps. The default location will be /var/crash. | |
Thank you very much for the help of the community leader. I think we have found the reason. One of our small partners added a lethal configuration to ceph's service, which caused the kernel to crash and restart. | [Unit] Description=ceph Slice Documentation=man:systemd.special(7) Before=slices.target [Slice] MemoryAccounting=true #MemoryLimit=2048M MemoryMax=4G CPUAccounting=true CPUQuota=90% /usr/lib/systemd/system/ceph-osd*.service [Service] Slice=ceph.slice CPUAffinity=4-21 Nice=-20 | After we checked and removed this configuration, everything returned to normal. Once again,thank the community boss for his kind help. Please accept my knee. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2022-08-04 08:08 | Neil neil | New Issue | |
2022-08-04 08:08 | Neil neil | Tag Attached: reboot | |
2022-08-04 08:15 | Neil neil | Note Added: 0000320 | |
2022-08-04 16:09 | Louis Abel | Note Added: 0000323 | |
2022-08-10 08:32 | Neil neil | Note Added: 0000334 | |
2022-08-10 08:32 | Neil neil | File Added: 1.jpg | |
2022-08-10 08:32 | Neil neil | File Added: 2.jpg | |
2022-08-10 08:32 | Neil neil | File Added: 3.jpg | |
2022-08-10 08:32 | Neil neil | File Added: 4.jpg | |
2022-08-10 08:35 | Neil neil | Note Added: 0000335 | |
2022-08-10 08:35 | Neil neil | File Added: 1-2.jpg | |
2022-08-10 08:35 | Neil neil | File Added: 2-2.jpg | |
2022-08-10 08:35 | Neil neil | File Added: 3-2.jpg | |
2022-08-10 08:35 | Neil neil | File Added: 4.jpeg | |
2022-08-11 01:31 | Neil neil | Note Added: 0000336 | |
2022-08-11 02:46 | Louis Abel | Assigned To | => Louis Abel |
2022-08-11 02:46 | Louis Abel | Status | new => needinfo |
2022-08-12 08:16 | Neil neil | Note Added: 0000340 | |
2022-08-12 08:16 | Neil neil | File Added: image.png | |
2022-08-12 08:16 | Neil neil | File Added: image-2.png | |
2022-08-13 06:01 | Louis Abel | Note Added: 0000344 | |
2022-08-24 02:20 | Neil neil | Note Added: 0000466 | |
2022-08-25 19:20 | Louis Abel | Status | needinfo => closed |
2022-08-25 19:20 | Louis Abel | Resolution | open => no change required |