View Issue Details

IDProjectCategoryView StatusLast Update
0000662Rocky-Linux-8NetworkManagerpublic2024-11-27 01:40
ReporterCarl Gromatzky Assigned ToLouis Abel  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionsuspended 
PlatformESXi 7.0.3, open-vm-tools 11.3.5OSRocky 8.6OS Version4.18.0-372.9.1.e
Summary0000662: MAC Addr Corruption VMWare open-vm-tools 11.3.5-1.el8
DescriptionSubject:
Rock 8.6 VMXNET3 Device Enumeration Corruption

Message:

Running ESXi 7.0.3 build 20328353 (or build 20395099) on a UCS-C220-M5SX with firmware 4.2(2a) and a Intel Xeon Platinum 8168 2.7GHz processor
VM is Rocky 8.6 (4.18.0-372.9.1.el8.x86_64), using VMXNET3 network devices

Network devices mac addresses enumerate in the VM OS (Rocky 8.6) just fine, up to 3 network devices.
i.e.:
eth0 MAC addr = Network Adapter 1 MAC addr
eth1 MAC addr = Network Adapter 2 MAC addr
eth2 MAC addr = Network Adapter 3 MAC addr

Adding a fourth network device causes nmcli to show corrupted MAC address to device mappings. Rebooting causes dmesg errors and network loss (the driver appears to enumerate all vmxnet3 network devices incorrectly upon adding the 4th vmxnet3 device, mapping connection profiles to incorrect devices and causing kernel errors and network connectivity loss)

MAC addresses enumerate differently in the OS than the network adapters listed in vCenter after the fourth adapter is added to the vm,
i.e.:
eth0 MAC addr = Network Adapter 4 MAC addr
eth1 MAC addr = Network Adapter 1 MAC addr
eth2 MAC addr = Network Adapter 2 MAC addr
eth3 MAC addr = Network Adapter 3 MAC addr

open-vm-tools version experiencing this issue is 11.3.5-1.el8_6.1 from @AppStream repo

I also downloaded open-vm-tools versions 12.0.0, 12.0.5, 12.1.0 from the vmware github and rpm'd them into the build. All provide the same behavior on this vm. 12.1.0 actually started experiencing the MAC enumeration corruption when adding the third Network Adapter.

I opened a case with VMWare. They closed my case provided that I must open a case with Rocky Linux directly.
Steps To ReproduceThe VM is given 1 vmxnet3 adapter to start. The network is configured and connectivity good.
The VM is given 1 additional adapter. The network connectivity is good.
The VM is given 1 additional adapter. The network connectivity is good.
The VM is given 1 additional adapter. The network connectivity fails, with mac address to nmcli connection mappings corrupted.
Generally, the error will not present until the VM is rebooted.

OR

The VM is given 4 vmxnet3 adapters to start. The network is configured and connectivity good.
The VM is rebooted, connectivity fails, with mac address to nmcli connection mappings corrupted.
Additional InformationThe OS is hardened, running in mult-user.target mode.
Running generic Rocky 8.6 build provides the following behavior in the same VMWare environment:
Add 4 vmxnet3 devices, the nmcli device list shows mac addr enumeration is fine, though the default network device nomenclature is ens[n].

I notice that the first 2 network devices enumerate as ens2[nn] and that the 3rd and 4th devices enumerate as ens1[nn].

In the production, hardened build, the network device nomenclature is eth[n].
TagsNo tags attached.

Activities

Louis Abel

Louis Abel

2022-11-01 16:08

administrator   ~0000802

Unfortunately we simply rebuild the open-vm-tools package as provided by Red Hat, which contains the sources that VMWare tools builds. So there's not much of a way for us to investigate.

I disagree with them that you must come to us to resolve this issue when it's their tools and platform. Please reopen a ticket and reference this bug ticket as well as https://bugs.rockylinux.org/view.php?id=200 to open a discussion with their engineering teams to find a solution.
Carl Gromatzky

Carl Gromatzky

2022-11-01 16:15

reporter   ~0000803

I've update the VMWare ticket and am awaiting response from that support team, regarding the support-ability dispute. Thank you, Louis.
Carl Gromatzky

Carl Gromatzky

2022-11-02 23:44

reporter   ~0000826

I uninstalled open-vm-tools 11.3.5 and confirmed in vcenter. Installed VMWare Tools 10.3.25, confirmed in vcenter, and experienced the same mac address corruption issue.

VMWare engineer updated. Perhaps, they will support VMWare Tools official. VMWare engineer also escalating as an open-vm-tools issue to VMWare global engineers (seeing if they will pickup at all).

The /sys/class/net/eth[0,1]/address sysfs files show the incorrectly mapped mac addresses. This seems to be a driver issue. Hoping to get feedback. Will update here.
Louis Abel

Louis Abel

2022-11-02 23:49

administrator   ~0000827

Thank you for the update. It's starting to sound like a platform issue (esx level). Perhaps ESX is providing the wrong information to the firmware, and the kernel/driver/udev is just following what it sees. Hoping to hear what they find.

One of our testers is currently moving their infra, so if vmware doesn't come back right away, he can probably try to see if the issue is reproducible in his environment with both Rocky Linux and RHEL. It would be interesting to see if it's also a problem in RHEL.
Carl Gromatzky

Carl Gromatzky

2022-11-04 18:48

reporter   ~0000862

That's an interesting note. I researched and tested further, finding the VWMare Firmware/BIOS is indeed the culprit, in that it enumerates Network Adapters on the PCIe bus out of logical sequence. Red Hat referred to this as "giving the kernel information that doesn't make any sense," since 2016, so it's a long-ongoing issue. Below is text from the updated email (with additional research references that led to better understanding) that I sent to the VMWare engineer.

---------------------------------

To re-iterate the change in scope: I replicated the MAC Address to Interface corruption in the official VMWare Tools package. This is no longer only an open-vm-tools issue. This is a VMWare-specific problem, as noted by several vendors, and it has to due with PCI/PCIe bus enumeration problems of VMWare.

I believe that this VMWare article, which pulls information from Red Hat KBs, may prove the next path for testing:
https://docs.vmware.com/en/VMware-Adapter-for-SAP-Landscape-Management/services/Administration-Guide-for-LaMa-Administrators/GUID-0603A3F3-CCB1-42AF-A1E2-6B61979C00CB.html

I need to know more about how the PCIe Bus enumeration and Ethernet Port enumeration in VMWare function in order to create a reasonable udev rule to address VMWare's hardware handling.

Questions:

    Is BDF bus assignment consistent and predictable when adding n number of Network Adapters to a VMWare VM?
    What is the order of operations and resulting output of Bus ID assignment for n number of Network Adapters?
    Are Ethernet port IDs user-definable in the Advanced Configuration Parameters of VM Settings?
        I see that I can adjust advanced configuration port numbers for interfaces, but not sure if port number assignments would change bus ID behavior?
        How are port numbers derived/calculated?
        Would changing a port number persist a reboot, or would VMWare simply automatically reassign a port number?


Root cause of the issue:

Root cause of the MAC Addr corruption that takes Ethernet interfaces down appears to be that VMWare Tools passes bus IDs to the Rocky Linux kernel out of numerical order (this is a VMWare-specific issue, as noted by several OS vendors).

This behavior causes the naming sequence discrepancy for naming convention of Interface, ethX, to Ethernet Device Connection name, ensY.

The ethX old naming style is needed for many environments in order to comply with needs of devops and secops

Result:

Interface eth0 may map to Connection ens192 when Network Adapter 1 is first presented to the OS, then eth0 may map to ens161 on the next boot. This is due to ID_NET_NAME_PATH nomenclature assigned by udev, as enpMsN, enumerating out of logical sequence, as based on BUS ID enumeration from the VMWare Firmware/BIOS passing through to kernel udev/rules.d naming rules during POST.

i.e.

    Interface eth0 will be enumerated first by the kernel and mapped to the first-presented virtual PCIe connection, which enumerates based on bus ID

    Generall,y Network Adapter 1 maps to enp11s0 reliably

    With 4 network adapters, the BUS ID doesn't matter at first presentation of the Network Adapter (hot-plugging gives Network Adapters 1-4 the appropriate eth0-3 names because they're being hot-plugged in order to the OS by VMWare)

    On the next boot, Network Adapter 4 gets presented to the kernel on enp4s0, connection ens161, so udev then assignes eth0 to ens161 because the lowest numbers in the database get paired together

The MAC Address corruption grows in complexity with the number of VMWare Network Adapters added. If I give the VM 8 Network Adapters, the BUS IDs get injected in no predictable order, which goes back to the questions above. If I can understand how VMWare enumerates the PCIe bus and Ethernet ports, I believe I can create the udev rule to handle ethX naming at scale in el7+ (and other OS kernel) builds and satisfy dev and sec requirements.

Further links leading to above VMWare KB in research of this problem:
https://github.com/Azure/WALinuxAgent/issues/1750
https://github.com/coreos/bugs/issues/2437
https://github.com/systemd/systemd/pull/8458
https://unix.stackexchange.com/questions/611762/how-to-fix-the-conflict-in-the-naming-scheme-for-network-interfaces-use-by-predi
https://www.linuxsysadmins.com/systemd-network-interface-name-ensx-to-eth0/
https://www.redhat.com/en/blog/red-hat-enterprise-linux-73-achieving-persistent-and-consistent-network-interface-naming-vmware-environments
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/ch-consistent_network_device_naming#sec-Naming_Schemes_Hierarchy
https://communities.vmware.com/t5/vSphere-vNetwork-Discussions/how-to-fix-a-virtual-Network-Adapter-to-be-the-first-one-from/m-p/916181
https://kb.vmware.com/s/article/2047927
https://unix.stackexchange.com/questions/134483/why-is-my-ethernet-interface-called-enp0s10-instead-of-eth0
https://wiki.xenproject.org/wiki/Bus:Device.Function_(BDF)_Notation
https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-understanding_the_predictable_network_interface_device_names
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/consistent-network-interface-device-naming_configuring-and-managing-networking
https://access.redhat.com/solutions/2592561
Carl Gromatzky

Carl Gromatzky

2022-11-04 21:29

reporter   ~0000863

I meant to attach some files for reference (see attached)
netid_badrock.txt (1,771 bytes)   
--------------------------------
UDEV
udev test /sys/class/net/eth[0-3]
---------------------------------
$>_
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056ac64fa
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp11s0
ID_NET_NAME_SLOT=ens192
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056acda97
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp19s0
ID_NET_NAME_SLOT=ens224
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056acd390
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp27s0
ID_NET_NAME_SLOT=ens256
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056ac98bd
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp4s0
ID_NET_NAME_SLOT=ens161
-------------------
after reboot
----------------
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056ac98bd
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp4s0
ID_NET_NAME_SLOT=ens161
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056ac64fa
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp11s0
ID_NET_NAME_SLOT=ens192
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056acda97
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp19s0
ID_NET_NAME_SLOT=ens224
ID_NET_NAMING_SCHEME=rhel-8.0
ID_NET_NAME_MAC=enx005056acd390
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_NET_NAME_PATH=enp27s0
ID_NET_NAME_SLOT=ens256
----------------------------
lspci for Ehternet adapters
---------------------------
04:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
0b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
13:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
1b:00.0 Ethernet controller [0200]: VMware VMXNET3 Ethernet Controller [15ad:07b0] (rev 01)
netid_badrock.txt (1,771 bytes)   
udevtest_badrock_afterreboot.txt (1,091 bytes)   
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:15.1/0000:04:00.0/net/eth0
ID_BUS=pci
ID_MODEL_FROM_DATABASE=VMXNET3 Ethernet Controller
ID_MODEL_ID=0x07b0
ID_NET_DRIVER=vmxnet3
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME_MAC=enx005056acf7d9
ID_NET_NAME_PATH=enp4s0
ID_NET_NAME_SLOT=ens161
ID_NET_NAMING_SCHEME=rhel-8.0
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_PATH=pci-0000:04:00.0
ID_PATH_TAG=pci-0000_04_00_0
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=VMware
ID_VENDOR_ID=0x15ad
IFINDEX=2
INTERFACE=eth0
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth0
TAGS=:systemd:
UDEV_BIOSDEVNAME=0
USEC_INITIALIZED=2574611
biosdevname=0
run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/eth0 --prefix=/net/ipv4/neigh/eth0 --prefix=/net/ipv6/conf/eth0 --prefix=/net/ipv6/neigh/eth0'
udevtest_badrock.txt (1,096 bytes)   
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:16.0/0000:0b:00.0/net/eth0
ID_BUS=pci
ID_MODEL_FROM_DATABASE=VMXNET3 Ethernet Controller
ID_MODEL_ID=0x07b0
ID_NET_DRIVER=vmxnet3
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME_MAC=enx005056ac64fa
ID_NET_NAME_PATH=enp11s0
ID_NET_NAME_SLOT=ens192
ID_NET_NAMING_SCHEME=rhel-8.0
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_PATH=pci-0000:0b:00.0
ID_PATH_TAG=pci-0000_0b_00_0
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=VMware
ID_VENDOR_ID=0x15ad
IFINDEX=10
INTERFACE=eth0
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/eth0
TAGS=:systemd:
UDEV_BIOSDEVNAME=0
USEC_INITIALIZED=1279538639
biosdevname=0
run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/eth0 --prefix=/net/ipv4/neigh/eth0 --prefix=/net/ipv6/conf/eth0 --prefix=/net/ipv6/neigh/eth0'
udevtest_badrock.txt (1,096 bytes)   
udevtest_r8default.txt (1,090 bytes)   
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

ACTION=add
DEVPATH=/devices/pci0000:00/0000:00:15.1/0000:04:00.0/net/ens161
ID_BUS=pci
ID_MM_CANDIDATE=1
ID_MODEL_FROM_DATABASE=VMXNET3 Ethernet Controller
ID_MODEL_ID=0x07b0
ID_NET_DRIVER=vmxnet3
ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
ID_NET_NAME_MAC=enx005056ac5de9
ID_NET_NAME_PATH=enp4s0
ID_NET_NAME_SLOT=ens161
ID_NET_NAMING_SCHEME=rhel-8.0
ID_OUI_FROM_DATABASE=VMware, Inc.
ID_PATH=pci-0000:04:00.0
ID_PATH_TAG=pci-0000_04_00_0
ID_PCI_CLASS_FROM_DATABASE=Network controller
ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller
ID_VENDOR_FROM_DATABASE=VMware
ID_VENDOR_ID=0x15ad
IFINDEX=2
INTERFACE=ens161
SUBSYSTEM=net
SYSTEMD_ALIAS=/sys/subsystem/net/devices/ens161
TAGS=:systemd:
USEC_INITIALIZED=4634244
run: '/usr/lib/systemd/systemd-sysctl --prefix=/net/ipv4/conf/ens161 --prefix=/net/ipv4/neigh/ens161 --prefix=/net/ipv6/conf/ens161 --prefix=/net/ipv6/neigh/ens161'
udevtest_r8default.txt (1,090 bytes)   
Carl Gromatzky

Carl Gromatzky

2022-11-05 19:30

reporter   ~0000864

Keeping this string updated with research as discovered:

Based on the VMWare article and the Kemp article below, the community has understood the VMWare PCIe bus enumeration and configured a .vmx file that attaches max Ethernet devices and hard-codes their sequence to a logical order for the OS in user-space. This seems like a logical way to scale a template-able fix (which is my primary concern, as addressing this issue at scale any time someone wants more the 3 adapters would be inefficient). Any thoughts?

(I've ported this comment over to the VMWare engineer assigned. Want to make sure this passes VMWare's eyes and get the ok, regarding any pitfalls or obvious errors.)

Kemp article: https://support.kemptechnologies.com/hc/en-us/articles/201978745-When-adding-4-or-more-VMXNET3-NICs-to-a-VLM-in-VMware-the-order-is-incorrect
VMWare article: https://kb.vmware.com/s/article/2047927
Carl Gromatzky

Carl Gromatzky

2022-11-07 22:12

reporter   ~0000866

I deployed this config in a .vmx for a test vm, worked great

# 001.00101.00000 # PCI Slot=4 # Parent slot 21: # Bus:Dev.Func = 00h:15h.01h # Guest OS Order eth0
ethernet0.pciSlotNumber = "1184"

# 010.00101.00000 # PCI Slot=4 # Parent slot 21: # Bus:Dev.Func = 00h:15h.02h # Guest OS Order eth1
ethernet1.pciSlotNumber = "2208"

# 011.00101.00000 # PCI Slot=4 # Parent slot 21: # Bus:Dev.Func = 00h:15h.03h # Guest OS Order eth2
ethernet2.pciSlotNumber = "3232"

# 001.00110.00000 # PCI Slot=5 # Parent slot 22: # Bus:Dev.Func = 00h:16h.01h # Guest OS Order eth3
ethernet3.pciSlotNumber = "1216"

# 010.00110.00000 # PCI Slot=5 # Parent slot 22: # Bus:Dev.Func = 00h:16h.02h # Guest OS Order eth4
ethernet4.pciSlotNumber = "2240"

# 011.00110.00000 # PCI Slot=5 # Parent slot 22: # Bus:Dev.Func = 00h:16h.03h # Guest OS Order eth5
ethernet5.pciSlotNumber = "3264"

# 001.00111.00000 # PCI Slot=6 # Parent slot 23: # Bus:Dev.Func = 00h:17h.01h # Guest OS Order eth6
ethernet6.pciSlotNumber = "1248"

# 010.00111.00000 # PCI Slot=6 # Parent slot 23: # Bus:Dev.Func = 00h:17h.02h # Guest OS Order eth7
ethernet7.pciSlotNumber = "2272"

# 001.01000.00000 # PCI Slot=7 # Parent slot 24: # Bus:Dev.Func = 00h:18h.01h # Guest OS Order eth8
ethernet8.pciSlotNumber = "1280"

# 010.01000.00000 # PCI Slot=7 # Parent slot 24: # Bus:Dev.Func = 00h:18h.02h # Guest OS Order eth9
ethernet9.pciSlotNumber = "2304"

I also found that these .vmx lines need to be removed (they'll be automatically repopulated at next power state change of the vm)

ethernetN.dvs.portId = ...

ethernetN.generatedAddress = ...

ethernetN.addressType = ...

ethernetN.generateAddressOffset = ...

-------------------------------------------------------------------------------------------

If configuring a VM with the GUI, Need to Edit Settings > VM Optios (tab at the top) > Advanced > Edit Configuration > Add Configuration Params (particularly useful for new VM configuration)

Then, add ethernetN.pciSlotNumber and the slot ID noted from the lines above for the n-th interface (where N is the n-th interface in ethernetN)

-------------------------------------------------------------------------------------------

VMWare Engineer is also looking into opening a PR for VMWare-Tools and Rocky 8.6. (I would want to guess that this is a way to channel this issue into the right VMWare support workflow vs. VMWare assuming this an OS problem....the long years of history of this issue at the global level leaves interpretation up in the air, but I'm brave enough to feel hopeful lol)
Louis Abel

Louis Abel

2024-11-27 01:40

administrator   ~0008915

Closing as Rocky Linux 8.6 is end of life. If you are still having issues, please open a new bug report.

Issue History

Date Modified Username Field Change
2022-11-01 16:01 Carl Gromatzky New Issue
2022-11-01 16:08 Louis Abel Assigned To => Louis Abel
2022-11-01 16:08 Louis Abel Status new => needinfo
2022-11-01 16:08 Louis Abel Note Added: 0000802
2022-11-01 16:15 Carl Gromatzky Note Added: 0000803
2022-11-02 23:44 Carl Gromatzky Note Added: 0000826
2022-11-02 23:49 Louis Abel Note Added: 0000827
2022-11-04 18:48 Carl Gromatzky Note Added: 0000862
2022-11-04 21:29 Carl Gromatzky Note Added: 0000863
2022-11-04 21:29 Carl Gromatzky File Added: netid_badrock.txt
2022-11-04 21:29 Carl Gromatzky File Added: udevtest_badrock_afterreboot.txt
2022-11-04 21:29 Carl Gromatzky File Added: udevtest_badrock.txt
2022-11-04 21:29 Carl Gromatzky File Added: udevtest_r8default.txt
2022-11-05 19:30 Carl Gromatzky Note Added: 0000864
2022-11-07 22:12 Carl Gromatzky Note Added: 0000866
2024-11-27 01:40 Louis Abel Status needinfo => closed
2024-11-27 01:40 Louis Abel Resolution open => suspended
2024-11-27 01:40 Louis Abel Note Added: 0008915