Some weeks ago we had a major issue in one of our Windows Production vCenter 6.0 update 3. vCenter service was stopping after some minutes. It could be online for 5m, or 40m, or even 1h, then just stop.
Spend a lot of hours trying to troubleshoot the issue without finding any reason why the service was stopping. We check connections to the SQL, no issues. No issues with SSO, no issues related to DNS or firewall. Nothing had changed in that particular vCenter.
Had a quick look in the vpx.log to check if there were any entries for this issue, at first glance no problems found. Just entries related to service stopped (no entries for the reason or error that may relate to the issue).
Did a full restore of vCenter VMs (vCenter, vCenter service, and vCenter DB). The problem was fix for a couple of days, then after we apply some Windows updates, issue return.
So I start to suspect of any Windows updates that may be causing this. Spending lot of hours removing one by one to try to check which Windows update one was the possible problem, nothing here also. Since the vCenter service was still stopping.
Then one of our Team members while we were troubleshooting the issue, he starts to look at the logs more in detail and found this:
2017-05-31T09:52:02.680+02:00 error vpxd [Originator@6876 sub=Default opID=713601b] [Vdb::IsRecoverableErrorCode] Unable to recover from HY000:0
2017-05-31T09:52:02.680+02:00 error vpxd [Originator@6876 sub=Default opID=713601b] [VdbStatement] SQLError was thrown: “ODBC error: (HY000) – [Microsoft][SQL Server Native Client 11.0]Connection is busy with results for another command” is returned when executing SQL statement “select SUPPORTED_FLG, VLAN_ID_START, VLAN_ID_END from VPX_DVHOST_HC_MTU_RESULT where DVS_ID = ? and HOST_ID = ? and UPLINK_PORT_KEY = ?”
2017-05-31T09:52:02.689+02:00 error vpxd [Originator@6876 sub=dbQuery opID=713601b] [HealthCheck::LoadHealthCheckResultFromDB] database error [LoadHealthCheckResultFromDB]: “ODBC error: (HY000) – [Microsoft][SQL Server Native Client 11.0]Connection is busy with results for another command” is returned when executing SQL statement “select SUPPORTED_FLG, VLAN_ID_START, VLAN_ID_END from VPX_DVHOST_HC_MTU_RESULT where DVS_ID = ? and HOST_ID = ? and UPLINK_PORT_KEY = ?”
2017-05-31T09:52:02.689+02:00 error vpxd [Originator@6876 sub=PropertyProvider opID=713601b] Unexpected fault reading property: vim.DistributedVirtualSwitch, GetRuntime: class Vim::Fault::DatabaseError::Exception(vim.fault.DatabaseError)
After we have this information, rechecking DB connections and ODBC, no problems found. Check DB, tables and try some queries to check if everything was running fine, again, no issues found.
After many hours spent troubleshooting the issue and not finding a route cause, and not sure if this was the cause of the vCenter problem was time to open a VMware ticket support.
With VMware Engineer on the line a and looking at the logs looking at the same vpx.log entries he suspected was a known issue for vCenter 6.0 update 3 and 6.5.
The official error was, “Issue has been introduced in 6.0 update3 and 6.5 because Microsoft database does not support shared connection. This is a known issue affecting VMware vCenter Server 6.0 Update 3 and 6.5 when the vCenter Server database back-end is a Microsoft SQL Server and the feature vSphere Distributed Switch Health Check is enabled.“
But there is no patch, or official update yet to fix this issue. So after some VMware internal discussions in the engineering, they stated that this issue was the same that they had seen before and now we need to apply for a VMware hotfix to fix the issue.
After 24h VMware provided us the hotfix (full vCenter install):
Size: 2.8 GB
Also, we receive the future VMware KB that VMware was already preparing for this problem. However, this was and still not public yet. Don’t know why VMware did not make public this KB already since this is a known issue.
VCenter 6.0 U3 or 6.5 crashes with PanicAccess Violation related to a database error LoadHealthCheckResultFromDB (2150190)
Note: I even wait for this KB be online before I write this article, but until today VMware did not make public the issue, so decided to write anyway.
So after we receive the hotfix, we install the new hotfix with a vCenter version 6.0.0-5306148. vCenter start working and no service problems again. So this hotfix did fix the issue.
If we go to VMware vCenter version numbers HERE, we don’t see this number on the list.
Googling we don’t see any post, or article (official, or not official) related to this issue. Now VMware users have something if they encounter the same problem.
So if you find a related issue in your vCenter 6.0 update 3, or 6.5, open a VMware ticket support to apply for a hotfix for this known issue.
Note: Share this article, if you think it is worth sharing.