Search This Blog

Monday, August 5, 2019

vDS Design Decisions and Consequences ...

I wanted to share an example of unexpected problems with Network design in vSphere.

These are not problems you probably experience in a properly protected production cluster, but I like to fix things that are broken.

When I deployed my lab machines I built a vSphere distributed switch and then followed the recommendations to build out my PortGroups.
One of the selection points is the Port Binding method. The selections are Static or Ephemeral.

Normally I just leave this as Static so that the system will provision ports from the virtual switch as needed to serve the Guests.


However , this caused me some trouble.

At some point , I crashed my cluster and had some trouble recovering vCenter. I tried to get it started, but I couldn’t get it to connect to the network.

I had to create a “Recovery” port group on a new standard switch and assign it to an open Network adapter. I then changed the Network on the Guest to “recovery”

With the guest attached to this network I was allowed to start it and get networking….. So I wonder, what was the problem….

Well , it turns out that without vCenter , Static port binding can’t function because it doesn’t have awareness of all the resources.


http://www.joshodgers.com/2014/11/23/example-architectural-decision-port-binding-setting-for-a-dvportgroup/

“4. New Virtual machines cannot be powered on and connected to a dvPortGroup (VDS) when vCenter is down.”

It seems this also applies when a virtual machine has been registered on another host without vCenter.
So , I got the system running for a few months like that and forgot to migrate it back to the vDS… which wasn’t a big deal because I didn’t need vCenter to move to any other hosts any way…. Until I did.


vCENTER FAILURE :

This weekend I had a problem with Host 1… the single host that vCenter could run on. The host was severely crippled and I could not restart any guests properly.
So, I found I had one option – Unregister all the machines on Host 1 and Re-Register them on Host 3.
I re-registered my PSC , vCenter and DC, but once I started them , I realized that I only had networking on my vCenter…

Well , that was because I created the “Recovery” port group and moved the Cable from Host 1 to Host 3…

So , I shut everything down.

Assigned the DC to the “Recovery” network – Powered On
Waited for start and confirmed networking.

Assigned the PSC to the “Recovery” network – Powered On
Waited for start and confirmed Access to VAMI

Powered on vCenter – Waited for start and confirmed Login.
FINALLY , I was able to get into vCenter.

I login and look around for signs of problems… ironically , my vSAN appears healthy, the HOST 1 is ok (DRS is disabled) … But there are a handful of “GUEST NAME(Orphaned)” machines. There doesn’t seem to be any way to remove them from the inventory or delete the reference.

I realize that they will probably clear up if I put HOST1 into maintenance mode. It takes a few minutes, but all the machines register on other hosts and are no longer orphaned.

CLEANUP:
In order to avoid problems in the future, I identified that my “Critical” virtual machines should be associated with a vDS port group that can survive this type of issue. While the Port Binding was Static , it was not helping when the VM was disconnected from the switch
I migrated all other machines off the 192-PtG and  changed to Ephemeral Port Binding. I was then able to  save the configuration.

I then migrated my VM’s from the Recovery Network to the vDS port group.


Completing this wizard moves all 3 machines with no errors.


Now I can go Shutdown Host 1 to troubleshoot and turn DRS back on in my cluster without fear of machines moving to bad hosts.


CAUTION: There are warnings about using Ephemeral port binding in situations where security is paramount. This example was from my lab and this solves a problem in that environment. Always research the appropriateness of a solution for your use case before making changes....Protect your vCenter and this is not an issue.

https://kb.vmware.com/s/article/1022312

http://www.yellow-bricks.com/2011/06/02/ephemeral-ports/


Thank you,
Jason Valentine
MCP, VCP-DCV 2019, vExpert 2019, CMNO

1 comment:

  1. Did I explain this wrong? Will I see problems in the future? Let me know.

    ReplyDelete

All comments are moderated to avoid spam. If you require support I would recommend connecting with me through spiceworks.com

Broadcom announces VMWare licensing Changes

 Today we wake up with the answer to the long awaited question of how Broadcom would change the license requirements for VMware products. Re...