I wanted to share an example of unexpected problems with Network design in vSphere.
These are not problems you probably experience in a properly protected production cluster, but I like to fix things that are broken.
These are not problems you probably experience in a properly protected production cluster, but I like to fix things that are broken.
When I deployed my lab machines I built a vSphere distributed switch and then followed the recommendations to build out my PortGroups.
One of the selection points is the Port Binding method. The selections are Static or Ephemeral.
Normally I just leave this as Static so that the system will provision ports from the virtual switch as needed to serve the Guests.
However , this caused me some trouble.
At some point , I crashed my cluster and had some trouble recovering vCenter. I tried to get it started, but I couldn’t get it to connect to the network.
I had to create a “Recovery” port group on a new standard switch and assign it to an open Network adapter. I then changed the Network on the Guest to “recovery”
With the guest attached to this network I was allowed to start it and get networking….. So I wonder, what was the problem….
Well , it turns out that without vCenter , Static port binding can’t function because it doesn’t have awareness of all the resources.
http://www.joshodgers.com/2014/11/23/example-architectural-decision-port-binding-setting-for-a-dvportgroup/
“4. New Virtual machines cannot be powered on and connected to a dvPortGroup (VDS) when vCenter is down.”
It seems this also applies when a virtual machine has been registered on another host without vCenter.
So , I got the system running for a few months like that and forgot to migrate it back to the vDS… which wasn’t a big deal because I didn’t need vCenter to move to any other hosts any way…. Until I did.
vCENTER FAILURE :
This weekend I had a problem with Host 1… the single host that vCenter could run on. The host was severely crippled and I could not restart any guests properly.
So, I found I had one option – Unregister all the machines on Host 1 and Re-Register them on Host 3.
Well , it turns out that without vCenter , Static port binding can’t function because it doesn’t have awareness of all the resources.
http://www.joshodgers.com/2014/11/23/example-architectural-decision-port-binding-setting-for-a-dvportgroup/
“4. New Virtual machines cannot be powered on and connected to a dvPortGroup (VDS) when vCenter is down.”
It seems this also applies when a virtual machine has been registered on another host without vCenter.
So , I got the system running for a few months like that and forgot to migrate it back to the vDS… which wasn’t a big deal because I didn’t need vCenter to move to any other hosts any way…. Until I did.
vCENTER FAILURE :
This weekend I had a problem with Host 1… the single host that vCenter could run on. The host was severely crippled and I could not restart any guests properly.
So, I found I had one option – Unregister all the machines on Host 1 and Re-Register them on Host 3.
I re-registered my PSC , vCenter and DC, but once I started them , I realized that I only had networking on my vCenter…
Well , that was because I created the “Recovery” port group and moved the Cable from Host 1 to Host 3…
So , I shut everything down.
Assigned the DC to the “Recovery” network – Powered On
Waited for start and confirmed networking.
Assigned the PSC to the “Recovery” network – Powered On
Waited for start and confirmed Access to VAMI
Powered on vCenter – Waited for start and confirmed Login.
FINALLY , I was able to get into vCenter.
I login and look around for signs of problems… ironically , my vSAN appears healthy, the HOST 1 is ok (DRS is disabled) … But there are a handful of “GUEST NAME(Orphaned)” machines. There doesn’t seem to be any way to remove them from the inventory or delete the reference.
I realize that they will probably clear up if I put HOST1 into maintenance mode. It takes a few minutes, but all the machines register on other hosts and are no longer orphaned.
CLEANUP:
In order to avoid problems in the future, I identified that my “Critical” virtual machines should be associated with a vDS port group that can survive this type of issue. While the Port Binding was Static , it was not helping when the VM was disconnected from the switch
I then migrated my VM’s from the Recovery Network to the vDS port group.
Completing this wizard moves all 3 machines with no errors.
Now I can go Shutdown Host 1 to troubleshoot and turn DRS back on in my cluster without fear of machines moving to bad hosts.
CAUTION: There are warnings about using Ephemeral port binding in situations where security is paramount. This example was from my lab and this solves a problem in that environment. Always research the appropriateness of a solution for your use case before making changes....Protect your vCenter and this is not an issue.
https://kb.vmware.com/s/article/1022312
http://www.yellow-bricks.com/2011/06/02/ephemeral-ports/
CAUTION: There are warnings about using Ephemeral port binding in situations where security is paramount. This example was from my lab and this solves a problem in that environment. Always research the appropriateness of a solution for your use case before making changes....Protect your vCenter and this is not an issue.
https://kb.vmware.com/s/article/1022312
http://www.yellow-bricks.com/2011/06/02/ephemeral-ports/
Thank you,
Jason Valentine
Did I explain this wrong? Will I see problems in the future? Let me know.
ReplyDelete