After some blog posts about the latest versions of Backup Tools is time for some technical blogging with this vMotion failed because of duplicated IP.
This week had a strange issue with one of our vCenters while trying to migrate some VMs from one of the vCenter Clusters. The migrations always failed when trying to put one host in maintenance mode, and it was not possible to evacuate the ESXi host.
The error itself on the tasks was not very helpful. Just that was not possible to migrate the xpto VM from ESXi host A to ESXi host B. But notice that ESXi had a warning that lost redundancy network.
So with a vmnic down need to check where the problem is.
Checking vCenter tasks and events to see if I get more information about this problem. Then I saw a lot of issues in the event, with vmnic4 and vmnic5 (Storage and vMotion interfaces) flapping up and down. But since these Interfaces, it is configured with LACP, that is normal behavior when you have an issue with one of the interfaces.
1. Check events.
Then I notice the main issue for this problem: “A duplicate IP address was detected for 100.10.58.43 on the interface vmk4. The current owner is 00:50:56:62:19:77.”
The root cause vMotion failed because duplicated IP, that is why some of the vMotion migrations are failing, and others are finishing with success.
Next, I need to check DNS to see if anyone made any mistake to delete/add an entry that was already in use. Found no issues in the DNS, vMotion vmkernel IP is set in the DNS for this ESXi host.
2. Check logs.
Next, I check the logs from VOB to check if I can get more information in /var/log.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
[root@ESXi-host:/var/log] cat vobd.log [esx.problem.net.vmknic.ip.duplicate] Duplicate IP address detected for 100.10.58.43 on interface vmk4, current owner being 00:50:56:62:19:77. [vob.net.vmknic.ip.duplicate] A duplicate IP address was detected for 100.10.58.43 on interface vmk4. The current owner is 00:50:56:62:19:77. [esx.problem.net.vmknic.ip.duplicate] Duplicate IP address detected for 100.10.58.43 on interface vmk4, current owner being 00:50:56:62:19:77. [vob.net.vmknic.ip.duplicate] A duplicate IP address was detected for 100.10.58.43 on interface vmk4. The current owner is 00:50:56:62:19:77. [esx.problem.net.vmknic.ip.duplicate] Duplicate IP address detected for 100.10.58.43 on interface vmk4, current owner being 00:50:56:62:19:77. [vob.net.pg.uplink.transition.down] Uplink: vmnic5 is down. Affected portgroup: NFS. 1 uplinks up. Failed criteria: 128 [vob.net.pg.uplink.transition.down] Uplink: vmnic5 is down. Affected portgroup: vMotion10g. 1 uplinks up. Failed criteria: 128 [vob.net.vmnic.linkstate.down] vmnic vmnic5 linkstate down [vob.net.vmnic.linkstate.up] vmnic vmnic5 linkstate up [netCorrelator] 5421105189290us: [vob.net.pg.uplink.transition.down] Uplink: vmnic4 is down. Affected portgroup: NFS. 0 uplinks up. Failed criteria: 128 [netCorrelator] 5421105189293us: [vob.net.pg.uplink.transition.down] Uplink: vmnic4 is down. Affected portgroup: vMotion10g. 0 uplinks up. Failed criteria: 128 [netCorrelator] 5421105189309us: [vob.net.vmnic.linkstate.down] vmnic vmnic4 linkstate down [netCorrelator] 5421105189369us: [vob.net.vmnic.linkstate.up] vmnic vmnic4 linkstate up [netCorrelator] 5421105289229us: [vob.net.pg.uplink.transition.up] Uplink:vmnic5 is up. Affected portgroup: NFS. 1 uplinks up [netCorrelator] 5421105289234us: [vob.net.pg.uplink.transition.up] Uplink:vmnic5 is up. Affected portgroup: vMotion10g. 1 uplinks up [netCorrelator] 5421105289386us: [vob.net.pg.uplink.transition.up] Uplink:vmnic4 is up. Affected portgroup: NFS. 2 uplinks up [netCorrelator] 5421105289389us: [vob.net.pg.uplink.transition.up] Uplink:vmnic4 is up. Affected portgroup: vMotion10g. 2 uplinks up [netCorrelator] 5421106820266us: [esx.problem.net.connectivity.lost] Lost network connectivity on virtual switch "vSwitch1". Physical NIC vmnic4 is down. Affected port groups: "NFS", "vMotion10g" [netCorrelator] 5421106820544us: [esx.clear.net.redundancy.restored] Uplink redundancy restored on virtual switch "vSwitch1", portgroups: "NFS", "vMotion10g". Physical NIC vmnic4 is up [netCorrelator] 5421106820712us: [esx.problem.net.redundancy.lost] Lost uplink redundancy on virtual switch "vSwitch1". Physical NIC vmnic5 is down. Affected port groups: "NFS", "vMotion10g" [netCorrelator] 5421106820872us: [esx.clear.net.connectivity.restored] Network connectivity restored on virtual switch "vSwitch1", portgroups: "NFS", "vMotion10g". Physical NIC vmnic5 is u |
The information in the VOB log is similar to what I have seen in the events, so I need to go to the next check.
3. Check the arp table.
So need to find where this mac address is. Checking arp table using the command “esxcli network ip neighbor list” you can check mac addresses, but you need to check each host one by one.
This is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
[root@ESXi-host:/var/log] esxcli network ip neighbor list Neighbor Mac Address Vmknic Expiry State Type ----------- ----------------- ------ -------- ----- ------- 100.87.10 .27 ec:b1:d7:80:63:30 vmk0 1199 sec Unknown 100.87.10 .28 ec:b1:d7:80:b3:34 vmk0 1198 sec Unknown 100.87.10 .19 00:0c:29:78:3a:4b vmk0 252 sec Unknown 100.87.10 .49 ac:16:2d:74:f6:91 vmk0 1199 sec Unknown 100.87.10 .43 00:50:56:80:62:ed vmk0 1176 sec Unknown 100.87.10 .40 94:57:a5:6b:6a:40 vmk0 1199 sec Unknown 100.87.10 .45 e0:07:1b:f6:01:ec vmk0 1200 sec Unknown 100.87.10 .76 00:50:56:80:6c:18 vmk0 1193 sec Unknown 100.87.10 .44 e0:07:1b:f4:2c:7c vmk0 1199 sec Unknown 100.87.10 .1 00:10:db:ff:10:01 vmk0 1199 sec Unknown 100.87.10 .39 5c:b9:01:fe:22:10 vmk0 1198 sec Unknown 100.87.10 .38 5c:b9:01:fe:04:f4 vmk0 1199 sec Unknown 100.87.10 .37 (incomplete) vmk0 0 sec Unknown 100.87.10 .36 00:50:56:80:7f:0e vmk0 579 sec Unknown 100.10.23.28 02:08:ba:f8:d1:9f vmk1 474 sec Unknown 100.10.23.22 02:a0:98:1d:38:bf vmk1 861 sec Unknown 100.10.23.21 02:a0:98:1c:ef:17 vmk1 866 sec Unknown [root@ESXi-host:/var/log] |
Checking a couple of ESXi hosts did not find anything about that duplicated IP and mac address. If you have dozens or hundreds of hosts, this is enormous work, so why not using PowerShell and search per vCenter.
4. Using PowerShell.
Open PowerShell connect to your vCenter and run the following command: Get-VMHost | Get-VMHostNetworkAdapter | Where-Object {$_.Mac -eq “00:50:56:62:19:77”} | Select VMHost, PortGroupName, DeviceName, Mac
Running the command in vCenter still nothing. So the problem must be in another vCenter. Run the PowerShell in all vCenters that we have in our infrastructure and in one vCenter… voila!
1 2 3 4 5 6 7 8 9 10 11 12 13 |
PS C:\Users\userid> Connect-VIServer 'vCenterIP' -User "administrator@vsphere.local" -Password "xxxxx" Name Port User ---- ---- ---- vCenterIP 443 VSPHERE.LOCAL\Administrator PS C:\Users\edualuc> Get-VMHost | Get-VMHostNetworkAdapter | Where-Object {$_.Mac -eq "00:50:56:62:19:77"} | Select VMHost, PortGroupName, DeviceName, Mac VMHost PortGroupName DeviceName Mac ------ ------------- ---------- --- esxi-host vMotion10g vmk1 00:50:56:62:19:77 |
Found the vCenter vs ESXi host and the vmkernel that was using the same IP address for vMotion. It is strange since both ESXi hosts are in production for a couple of years now, and this never happens.
Checking the DNS entries again, this ESXi host for vMotion IP address have 100.10.58.12. So I don’t know who did make this mistake. So someone must have to make this change manually and add this used IP address on this ESXi host. Who did, is something for the next step, for now, I need to fix this asap.
Finally, to prevent this happen and I am not aware, I create a new alarm rule for duplicate IP addresses for all vCenters in the infrastructure. I created a post just for this, you can check here, How to create a vCenter Alarm for Duplicated IPs.
I hope this post will help you to troubleshoot and identify where is the duplicated IP.
Note: Share this article if you think it is worth sharing.
[…] Home/VMware/Create vCenter Alarm for Duplicated IP Previous Next […]