Search This Blog

Thursday, August 5, 2021

"INVALID" machines after vSAN shutdown.

 In some business environments or homelabs it may be necessary to shutdown a vSAN cluster entirely.

VMware has a proper procedure outlined on the documents site.

Shutting Down and Restarting the vSAN Cluster (vmware.com)

Once you have completed your maintenance tasks and fire everything up, you may be startled when you login to each of the hosts and attempt to verify your VMs....

They will probably all display as an "INVALID" status with the GUID Path in the "name" field.

fig.1 - Host 1 - 2 "invalid" machines


fig.2 - Host 2 - 2 "Invalid" machines


This message certainly caught one of my peers off guard during a recent datacenter update. That quick rush of butterflies in your gut that tells you all of your data is now corrupt.

However, I quickly reassured him that nothing was lost and the issue was simply that all of the hosts were still in maintenance mode.

When the vSAN nodes are in maintenance mode after a reboot, none of the stats are correct since none of the vSAN components are allowed to work together. 

fig.3 - 0Byte vsanDatastore


If you have powered on all of the vSAN nodes in the cluster and waited a sufficient amount of time for all the vSAN nodes to communicate, the remediation is actually quite swift.

Remove all the Hosts from Maintenance mode.


fig.4 - Host 2 - Still in Maintenance Mode!



The names will re-appear for each of the Guest objects.

fig.5 - Host 1 - Proper names!

fig.6 - Host 2 - Proper names again!


If all of your workload is on these hosts, you can start your DNS server, then the vCenter server.

fig.7 - vCenter operational again.

Before you know it , you are back in business.




Thursday, February 11, 2021

Congratulations to the vExperts!

 


Today , VMware released the names of new and returning vExperts for 2021.

vExpert 2021 Award Announcement



I was honored and humbled to be able to stay with this program for my 4th year.

I believe in the VMware eco system and have seen how transformational it has been in my business and I enjoy sharing stories and advice with others on the subject.

I was fortunate enough to join the VMUG in Denver at a great time and found a group of people that understood my language. After being a member for several years, I volunteered my time to hep organize and plan activities for the local groups along with 2 other amazing leaders. The planning and organization of the 2018 Denver UserCon was intense, but attending on the event day was awe inspiring. So many VMware enthusiasts that I spoke with that day thanked us for helping to provide them with so much information about the products and services that could help advance their careers and businesses.

I had to relocate for work to Orlando and immediately sought out the local chapter and immediately volunteered. Once again I found myself connected with passionate leaders and VMware advocates. I was fortunate enough to attend a few in person meetings before the pandemic fully took hold , and I am looking forward to the day that we can get together again.

I am looking for new ways to support and add to the community.

I have started this blog as a way of documenting things that I find interesting and new.

I am looking at the option to livestream some lab sessions.

I answer questions on the Spiceworks forums for those that find themselves in a bind.

Jason Valentine's SpiceWorks

Having access to a community that understands your needs and daily challenges is such a profound help and I am hopeful that I give as much  as I take.

Jason

Saturday, January 9, 2021

Updating the DNS Search domains and FQDN Hostname for an ESXi Host.

 Here is a quick tip for anyone that has deployed an ESXi host from a home network or received DHCP values that didn't match your intended configuration.


I deployed a machine that I want to use in my home lab, but my default Comcast domain is displayed as the FQDN.
I have already changed this in the DCUI , but it persists in the vSphere client.

The configuration is found on the default TCP/IP stack, Click Networking

Select the TCP/IP Stacks tab

the DNS configuration shows my Domain name as hsd1.fl.comcast.net ,but my proper domain name is in the search domains box. Lets clean this up.

Edit the Default TCP/IP Stack - note the Domain name and Search Domain fields.

Update the Domain name and Search Domains fields to remove and replace the incorrect domain.

Refreshing the host client shows the updated values!


Hope it helps!

Jason


Reusing Disks from a Previous vSAN

 This post may be mostly relevant to home-lab builders, but if you have previously configured vSAN on the disks used in host, you may not be able to easily reuse them in a new vSAN.

Lets take a look at my specific use case.

I want to install vCenter 7 and I want to establish vSAN on the host during this process.


For this installation I am going to install on host ESX03 , I previously used the internal disks for a vSAN cluster, but I am upgrading from hybrid to all flash using a 1TB SSD that I recently acquired.

In the datastore part of the installer , I select "Install on a new vSAN cluster containing the target host"
This will create a standalone vSAN node, configuration of the cluster is completed later when additional hosts are added.


Once I get to the section to claim my disks, I can see that I only have one disk and it is the new 1TB disk. My existing 120GB drive does not appear.

On the host I can see that this 120GB disk does exist.

Selecting the disks shows me that there is an existing vSAN partition table. We need to clear this partition.

Select Actions -> Clear Partition Table

Note the warning, erasing this partition is destructive and will remove the data from the disk that is essential for the previous vSAN.
I know that I will not need that data, so I can proceed.

There are no partitions on the disk.

Refreshing the vCenter installer by going back and then forward again. I can now see my 120GB drive available to claim

I select my 1TB drive

and claim it for the capacity Tier. I can now proceed with the remainder of the vCenter deployment


Hope this helps someone 
Jason

Saturday, January 2, 2021

Probably a bright spot for HomeLabs in the ESXi 7.0 U1C release.



 I was looking up the release notes for the 7.0U1C update and spotted this bit of good news.

"With ESXi 7.0 Update 1c, you can use the installer boot option systemMediaSize to limit the size of system storage partitions on the boot media."

https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-70u1c.html

I remember this being a design consideration for the new stack that I was deploying at work. I needed to have more than 150GB for the boot media.

Considering that I have 512GB per host there, it probably doesn't make sense to use this new option...

but on my homelab with 24GB of RAM per host, or an embedded deployment in VMware workstation... this could save money and space.


Jason

Happy New Year!

 This year I am continuing my journey with VMware solutions and I want to share more with the community along the way.

I am currently studying to attempt the VCAP tests for DataCenter Design and Deploy.

I have an expanded homelab with some upgraded components and I will be doing some live streams as I study for the test. 

Jason

Monday, August 5, 2019

vDS Design Decisions and Consequences ...

I wanted to share an example of unexpected problems with Network design in vSphere.

These are not problems you probably experience in a properly protected production cluster, but I like to fix things that are broken.

When I deployed my lab machines I built a vSphere distributed switch and then followed the recommendations to build out my PortGroups.
One of the selection points is the Port Binding method. The selections are Static or Ephemeral.

Normally I just leave this as Static so that the system will provision ports from the virtual switch as needed to serve the Guests.


However , this caused me some trouble.

At some point , I crashed my cluster and had some trouble recovering vCenter. I tried to get it started, but I couldn’t get it to connect to the network.

I had to create a “Recovery” port group on a new standard switch and assign it to an open Network adapter. I then changed the Network on the Guest to “recovery”

With the guest attached to this network I was allowed to start it and get networking….. So I wonder, what was the problem….

Well , it turns out that without vCenter , Static port binding can’t function because it doesn’t have awareness of all the resources.


http://www.joshodgers.com/2014/11/23/example-architectural-decision-port-binding-setting-for-a-dvportgroup/

“4. New Virtual machines cannot be powered on and connected to a dvPortGroup (VDS) when vCenter is down.”

It seems this also applies when a virtual machine has been registered on another host without vCenter.
So , I got the system running for a few months like that and forgot to migrate it back to the vDS… which wasn’t a big deal because I didn’t need vCenter to move to any other hosts any way…. Until I did.


vCENTER FAILURE :

This weekend I had a problem with Host 1… the single host that vCenter could run on. The host was severely crippled and I could not restart any guests properly.
So, I found I had one option – Unregister all the machines on Host 1 and Re-Register them on Host 3.
I re-registered my PSC , vCenter and DC, but once I started them , I realized that I only had networking on my vCenter…

Well , that was because I created the “Recovery” port group and moved the Cable from Host 1 to Host 3…

So , I shut everything down.

Assigned the DC to the “Recovery” network – Powered On
Waited for start and confirmed networking.

Assigned the PSC to the “Recovery” network – Powered On
Waited for start and confirmed Access to VAMI

Powered on vCenter – Waited for start and confirmed Login.
FINALLY , I was able to get into vCenter.

I login and look around for signs of problems… ironically , my vSAN appears healthy, the HOST 1 is ok (DRS is disabled) … But there are a handful of “GUEST NAME(Orphaned)” machines. There doesn’t seem to be any way to remove them from the inventory or delete the reference.

I realize that they will probably clear up if I put HOST1 into maintenance mode. It takes a few minutes, but all the machines register on other hosts and are no longer orphaned.

CLEANUP:
In order to avoid problems in the future, I identified that my “Critical” virtual machines should be associated with a vDS port group that can survive this type of issue. While the Port Binding was Static , it was not helping when the VM was disconnected from the switch
I migrated all other machines off the 192-PtG and  changed to Ephemeral Port Binding. I was then able to  save the configuration.

I then migrated my VM’s from the Recovery Network to the vDS port group.


Completing this wizard moves all 3 machines with no errors.


Now I can go Shutdown Host 1 to troubleshoot and turn DRS back on in my cluster without fear of machines moving to bad hosts.


CAUTION: There are warnings about using Ephemeral port binding in situations where security is paramount. This example was from my lab and this solves a problem in that environment. Always research the appropriateness of a solution for your use case before making changes....Protect your vCenter and this is not an issue.

https://kb.vmware.com/s/article/1022312

http://www.yellow-bricks.com/2011/06/02/ephemeral-ports/


Thank you,
Jason Valentine
MCP, VCP-DCV 2019, vExpert 2019, CMNO

"INVALID" machines after vSAN shutdown.

 In some business environments or homelabs it may be necessary to shutdown a vSAN cluster entirely. VMware has a proper procedure outlined o...