View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000166 | Rocky-Linux-9 | NetworkManager | public | 2022-07-29 05:57 | 2022-08-24 07:23 |
Reporter | Pascal Häussler | Assigned To | |||
Priority | high | Severity | block | Reproducibility | always |
Status | new | Resolution | open | ||
Summary | 0000166: NetworkManager fails to configure IP over InfiniBand (IPoIB) connections | ||||
Description | We setup a HPE ProLiant DL380 system with Rocky 9 minimal install. Based on that, we installed InfiniBand support (`dnf group install "InfiniBand Support"`). The HPE InfiniBand NIC (in fact, a Mellanox ConnectX 5 adapter) is detected and the kernel modules are loaded. Both, a ib verbs capable IB device `mlx5_0`and a IPoIB default device `ips2`are available. When trying to configure an IPoIB connection with NetworkManager `nmcli` along the steps described in the RHEL 9 manual on InfiniBand support, NetworkManager fails to bring the connection up. We see these log entries in `journalctl`: ``` Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.0438] device (ibs2): Activation: starting connection 'mlx5-ipoib' (e60615fc-b9fc-48fd-9797-db1addb6625e) Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.0439] device (ibs2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <warn> [1659073597.4487] device (ibs2): mtu: failure to set IPv6 MTU Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.4487] device (ibs2): state change: prepare -> failed (reason 'config-failed', sys-iface-state: 'managed') Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <warn> [1659073597.4489] device (ibs2): Activation: failed for connection 'mlx5-ipoib' Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.4491] device (ibs2): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed') Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <warn> [1659073597.4493] device (ibs2): mtu: failure to set IPv6 MTU Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.4730] device (ibs2): carrier: link connected ``` The connection `mlx5-ipoib`doesn't come online but remains in disconnected state: ``` [root@master01 ~]# nmcli con sh NAME UUID TYPE DEVICE br1 cd93090e-f721-4e7b-87f9-7c1dde6fb994 bridge br1 br0 d7f956d0-f696-43b8-9ba4-86e3541aaa1e bridge br0 bond1 ead7ec11-04d5-4d00-8772-d34412e911d2 bond bond1 bond1-eno5 56affc33-d0bf-4081-b36e-527e6ea02da9 ethernet eno5 bond1-eno6 a4f443b9-35fa-416e-93f6-befe2b07b0d7 ethernet eno6 eno1 787aaf7c-58d7-361c-a85b-48b801a7e4ac ethernet eno1 eno2 2af9052d-6922-4398-bc96-68b7f5f12bef ethernet -- eno3 5af585d0-cd7f-49ba-9e37-f034b5ca0399 ethernet -- eno4 24b4ebd6-c39b-4b5a-a466-1f56bc8df945 ethernet -- eno5 e3d6d805-8919-3131-a47e-cbe9d4341037 ethernet -- eno6 0ad230e1-b47c-389e-bceb-a1722b0fdbee ethernet -- mlx5-ipoib e60615fc-b9fc-48fd-9797-db1addb6625e infiniband -- ``` The device configured for this connection, namely `ibs2`is in a disconnected state: ``` [root@master01 ~]# nmcli device DEVICE TYPE STATE CONNECTION br1 bridge connected br1 br0 bridge connected br0 bond1 bond connected bond1 eno1 ethernet connected eno1 eno5 ethernet connected bond1-eno5 eno6 ethernet connected bond1-eno6 ibs2 infiniband disconnected -- eno2 ethernet unavailable -- eno3 ethernet unavailable -- eno4 ethernet unavailable -- lo loopback unmanaged -- ``` I can reproduce this error on a second system with the exact same hardware. Note: Both of these systems were running on CentOS 7.8 before and uses a peer-to-peer IPoIB connections on these adapters successfully. | ||||
Steps To Reproduce | - Install Rocky Linux 9.0 minimal - Install InfiniBand support - Install and start `opensm`subnet manager - Configure IPoIB support as explained in the RHEL 9 manual "Configuring InfiniBand and RDMA support" - Try to gring up the IPoIB connection | ||||
Additional Information | Commands used to configure the connection: ``` nmcli connection add type infiniband con-name mlx5_ib0 ifname ibs2 transport-mode Connected mtu 65520 nmcli connection modify mlx5_ib0 ipv4.addresses 10.20.1.1/24 nmcli connection modify mlx5_ib0 ipv4.method manual nmcli connection modify mlx5_ib0 ipv6.method ignore nmcli connection up mlx5_ib0 ``` Note: As you can see, we set IPv6 support to "ignore". Nevertheless, in the journalctl log excerpt above you can see a warning saying that setting IPv6 MTU fails. | ||||
Tags | No tags attached. | ||||
+++ Update +++ This issue might be caused by MELLANOX_OFED v5 inside the RHEL InfiniBand support packages. I found a known issue on this NVIDIA/Mellanox site descritbing exactly this problem on RHEL systems (see issue 1061298 in the table): https://docs.nvidia.com/networking/display/OFEDv501000/Known+Issues I don't know what OFED version is contained in Rocky 9/RHEL 9 but the issue described there is exactly what we observe. The suggested work arround, namely using NO or AUTO as setting for the connection mode works albeit this is not optimal. |
|
Date Modified | Username | Field | Change |
---|---|---|---|
2022-07-29 05:57 | Pascal Häussler | New Issue | |
2022-08-24 07:23 | Pascal Häussler | Note Added: 0000467 |