Search This Blog

Monday, August 5, 2019

vDS Design Decisions and Consequences ...

I wanted to share an example of unexpected problems with Network design in vSphere.

These are not problems you probably experience in a properly protected production cluster, but I like to fix things that are broken.

When I deployed my lab machines I built a vSphere distributed switch and then followed the recommendations to build out my PortGroups.
One of the selection points is the Port Binding method. The selections are Static or Ephemeral.

Normally I just leave this as Static so that the system will provision ports from the virtual switch as needed to serve the Guests.


However , this caused me some trouble.

At some point , I crashed my cluster and had some trouble recovering vCenter. I tried to get it started, but I couldn’t get it to connect to the network.

I had to create a “Recovery” port group on a new standard switch and assign it to an open Network adapter. I then changed the Network on the Guest to “recovery”

With the guest attached to this network I was allowed to start it and get networking….. So I wonder, what was the problem….

Well , it turns out that without vCenter , Static port binding can’t function because it doesn’t have awareness of all the resources.


http://www.joshodgers.com/2014/11/23/example-architectural-decision-port-binding-setting-for-a-dvportgroup/

“4. New Virtual machines cannot be powered on and connected to a dvPortGroup (VDS) when vCenter is down.”

It seems this also applies when a virtual machine has been registered on another host without vCenter.
So , I got the system running for a few months like that and forgot to migrate it back to the vDS… which wasn’t a big deal because I didn’t need vCenter to move to any other hosts any way…. Until I did.


vCENTER FAILURE :

This weekend I had a problem with Host 1… the single host that vCenter could run on. The host was severely crippled and I could not restart any guests properly.
So, I found I had one option – Unregister all the machines on Host 1 and Re-Register them on Host 3.
I re-registered my PSC , vCenter and DC, but once I started them , I realized that I only had networking on my vCenter…

Well , that was because I created the “Recovery” port group and moved the Cable from Host 1 to Host 3…

So , I shut everything down.

Assigned the DC to the “Recovery” network – Powered On
Waited for start and confirmed networking.

Assigned the PSC to the “Recovery” network – Powered On
Waited for start and confirmed Access to VAMI

Powered on vCenter – Waited for start and confirmed Login.
FINALLY , I was able to get into vCenter.

I login and look around for signs of problems… ironically , my vSAN appears healthy, the HOST 1 is ok (DRS is disabled) … But there are a handful of “GUEST NAME(Orphaned)” machines. There doesn’t seem to be any way to remove them from the inventory or delete the reference.

I realize that they will probably clear up if I put HOST1 into maintenance mode. It takes a few minutes, but all the machines register on other hosts and are no longer orphaned.

CLEANUP:
In order to avoid problems in the future, I identified that my “Critical” virtual machines should be associated with a vDS port group that can survive this type of issue. While the Port Binding was Static , it was not helping when the VM was disconnected from the switch
I migrated all other machines off the 192-PtG and  changed to Ephemeral Port Binding. I was then able to  save the configuration.

I then migrated my VM’s from the Recovery Network to the vDS port group.


Completing this wizard moves all 3 machines with no errors.


Now I can go Shutdown Host 1 to troubleshoot and turn DRS back on in my cluster without fear of machines moving to bad hosts.


CAUTION: There are warnings about using Ephemeral port binding in situations where security is paramount. This example was from my lab and this solves a problem in that environment. Always research the appropriateness of a solution for your use case before making changes....Protect your vCenter and this is not an issue.

https://kb.vmware.com/s/article/1022312

http://www.yellow-bricks.com/2011/06/02/ephemeral-ports/


Thank you,
Jason Valentine
MCP, VCP-DCV 2019, vExpert 2019, CMNO

1 comment:

  1. Did I explain this wrong? Will I see problems in the future? Let me know.

    ReplyDelete

Congratulations to the vExperts!

  Today , VMware released the names of new and returning vExperts for 2021. vExpert 2021 Award Announcement I was honored and humbled to be ...