View Issue Details

IDProjectCategoryView StatusLast Update
0000166Rocky-Linux-9NetworkManagerpublic2022-08-24 07:23
ReporterPascal Häussler Assigned To 
PriorityhighSeverityblockReproducibilityalways
Status newResolutionopen 
Summary0000166: NetworkManager fails to configure IP over InfiniBand (IPoIB) connections
DescriptionWe setup a HPE ProLiant DL380 system with Rocky 9 minimal install. Based on that, we installed InfiniBand support (`dnf group install "InfiniBand Support"`). The HPE InfiniBand NIC (in fact, a Mellanox ConnectX 5 adapter) is detected and the kernel modules are loaded. Both, a ib verbs capable IB device `mlx5_0`and a IPoIB default device `ips2`are available.

When trying to configure an IPoIB connection with NetworkManager `nmcli` along the steps described in the RHEL 9 manual on InfiniBand support, NetworkManager fails to bring the connection up. We see these log entries in `journalctl`:

```
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.0438] device (ibs2): Activation: starting connection 'mlx5-ipoib' (e60615fc-b9fc-48fd-9797-db1addb6625e)
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.0439] device (ibs2): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <warn> [1659073597.4487] device (ibs2): mtu: failure to set IPv6 MTU
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.4487] device (ibs2): state change: prepare -> failed (reason 'config-failed', sys-iface-state: 'managed')
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <warn> [1659073597.4489] device (ibs2): Activation: failed for connection 'mlx5-ipoib'
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.4491] device (ibs2): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <warn> [1659073597.4493] device (ibs2): mtu: failure to set IPv6 MTU
Jul 29 07:46:37 master01.c.hpc.zhaw.ch NetworkManager[1521]: <info> [1659073597.4730] device (ibs2): carrier: link connected
```

The connection `mlx5-ipoib`doesn't come online but remains in disconnected state:

```
[root@master01 ~]# nmcli con sh
NAME UUID TYPE DEVICE
br1 cd93090e-f721-4e7b-87f9-7c1dde6fb994 bridge br1
br0 d7f956d0-f696-43b8-9ba4-86e3541aaa1e bridge br0
bond1 ead7ec11-04d5-4d00-8772-d34412e911d2 bond bond1
bond1-eno5 56affc33-d0bf-4081-b36e-527e6ea02da9 ethernet eno5
bond1-eno6 a4f443b9-35fa-416e-93f6-befe2b07b0d7 ethernet eno6
eno1 787aaf7c-58d7-361c-a85b-48b801a7e4ac ethernet eno1
eno2 2af9052d-6922-4398-bc96-68b7f5f12bef ethernet --
eno3 5af585d0-cd7f-49ba-9e37-f034b5ca0399 ethernet --
eno4 24b4ebd6-c39b-4b5a-a466-1f56bc8df945 ethernet --
eno5 e3d6d805-8919-3131-a47e-cbe9d4341037 ethernet --
eno6 0ad230e1-b47c-389e-bceb-a1722b0fdbee ethernet --
mlx5-ipoib e60615fc-b9fc-48fd-9797-db1addb6625e infiniband --
```

The device configured for this connection, namely `ibs2`is in a disconnected state:

```
[root@master01 ~]# nmcli device
DEVICE TYPE STATE CONNECTION
br1 bridge connected br1
br0 bridge connected br0
bond1 bond connected bond1
eno1 ethernet connected eno1
eno5 ethernet connected bond1-eno5
eno6 ethernet connected bond1-eno6
ibs2 infiniband disconnected --
eno2 ethernet unavailable --
eno3 ethernet unavailable --
eno4 ethernet unavailable --
lo loopback unmanaged --
```

I can reproduce this error on a second system with the exact same hardware.

Note: Both of these systems were running on CentOS 7.8 before and uses a peer-to-peer IPoIB connections on these adapters successfully.

Steps To Reproduce- Install Rocky Linux 9.0 minimal
- Install InfiniBand support
- Install and start `opensm`subnet manager
- Configure IPoIB support as explained in the RHEL 9 manual "Configuring InfiniBand and RDMA support"
- Try to gring up the IPoIB connection
Additional InformationCommands used to configure the connection:

```
nmcli connection add type infiniband con-name mlx5_ib0 ifname ibs2 transport-mode Connected mtu 65520

nmcli connection modify mlx5_ib0 ipv4.addresses 10.20.1.1/24
nmcli connection modify mlx5_ib0 ipv4.method manual
nmcli connection modify mlx5_ib0 ipv6.method ignore

nmcli connection up mlx5_ib0
```

Note: As you can see, we set IPv6 support to "ignore". Nevertheless, in the journalctl log excerpt above you can see a warning saying that setting IPv6 MTU fails.

TagsNo tags attached.

Activities

Pascal Häussler

Pascal Häussler

2022-08-24 07:23

reporter   ~0000467

+++ Update +++

This issue might be caused by MELLANOX_OFED v5 inside the RHEL InfiniBand support packages. I found a known issue on this NVIDIA/Mellanox site descritbing exactly this problem on RHEL systems (see issue 1061298 in the table):

https://docs.nvidia.com/networking/display/OFEDv501000/Known+Issues

I don't know what OFED version is contained in Rocky 9/RHEL 9 but the issue described there is exactly what we observe.

The suggested work arround, namely using NO or AUTO as setting for the connection mode works albeit this is not optimal.

Issue History

Date Modified Username Field Change
2022-07-29 05:57 Pascal Häussler New Issue
2022-08-24 07:23 Pascal Häussler Note Added: 0000467