Search This Blog

Wednesday, December 29, 2021

Why can't I shut down my vSAN cluster?

 I ran into a bit of an odd issue today.

After having a load of trouble with a bad boot media on vSAN cluster, I had actually managed to stabilize it and had been running it for a few months.

Today I wanted to shut it down and do some maintenance, dusting , wireing , expansion, that type of stuff.

I followed my usual process of placing the hosts into Maintenance mode with "No Data Migration"

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.vsan-monitoring.doc/GUID-31B4F958-30A9-4BEC-819E-32A18A685688.html

This follows along with the steps I had done many times before in my homelab, but today was different.

In this case I had joined the hosts to a vCenter hosted on another cluster. Previously I would have had the vCenter on the vSAN cluster. I would have shutdown the vCenter, then powered off the hosts.

I was able to place Host 1 and 2 into maintenance, but Host 3 would NOT go into maintenance. The tasks showed that it was waiting for the VMs to be powered off or migrated.

I had already disabled DRS, I had disabled HA. The VMs were powered off. So I figured I would just remove them from inventory... they weren't important to me.

Yet the issue persisted.

So I clicked on Host 3 and checked the VMs

Lo and behold there were 2 vCLS machines still sitting on the host.

So now I am trying to shutdown the vCLS systems , but they are throwing Access Denied errors.

I little bit of digging into the previous article uncovers this note..

"For vSphere 7.0 U1 and later, enable vCLS retreat mode. For more information, see the VMware knowledge base article at https://kb.vmware.com/s/article/80472."

In order to cleanly shutdown my vSAN cluster, I would need to enable a vCLS retreat mode.

The expectation is that with the addition of an "Advanced Setting" I will be able to enable or disable the Cluster Services in the future. The article makes it pretty clear that it is not meant to be run "disabled" so they also offer advice for scripting the change to make it easier to build into a "shutdown" or "startup" workflow.

Hope this article helps, I will update with pictures later as I go through the process.


Jason



Tuesday, November 2, 2021

VCAP - DCV Deploy 2021

 I Passed!

Absolutely beside myself. Could not have done it without all the support from my family and my team. On to the next challenge!

https://www.linkedin.com/posts/jasonvalentine_vmware-certified-advanced-professional-activity-6861835156651618304-lIVn

Tuesday, October 26, 2021

VCP-DCV 2021

 I am very excited to have completed my 4th refresh of the VCP-DCV Certification on Oct 20th. I even passed with my highest score ever. 412.


I was fortunate enough to get a Tech+ Pass to VMworld 2021 gifted through the VMUG program and I vowed not to let it go to waste.

So after a brief discussion with my wife, I determined that I could complete 2 certification tests using the discounted vouchers between October 4th and November 4th.

One would be a VCP refresh and the other would be a VCAP-Deploy attempt.

I know some folks like stories, so here is how I prepped and passed this test in only 2 weeks.

Let me preface by saying the test was not at all 'easy'. I have now tested on 3 Major version of ESXI , 5.1 , 6.5 and 7.0

The older tests were very much focused on bare technical specifications, troubleshooting and detailed configuration tasks. The current test adds in so much more content for the "modern hybrid cloud datacenter". Trying to remember the technical, the troubleshooting and the configuration while ADDING information about technologies that you may not be seeing yet is no easy feat.

I was fortunate though that there had been an updated release of the Official Certification guide from Pearson. Where the prior guides were very heavy on screenshots and processes, this seemed to be just walls and walls of text. That is not to say that it is bad, it was extremely useful. The alternative is to RTFM on docs.vmware.com , according to the exam blueprint. At the least the Certification guide provided "Key Points" to narrow the focus a bit.

I read it cover to cover and found myself recharged when reading processes that I was familiar with.

Sometimes though, the best way to remember something is to see it done..

So I FIRED UP THE LAB.... and it died... hmmm.. probably should have swapped out that USB boot media when I had the chance. Probably will need to figure out how to recover vSAN with a lost host...

So I FIRED UP THE OTHER LAB ! And I performed an upgrade of the vCenter to 7.0.3!

I tried to upgrade the hosts, but the boot media on these were internal SD cards... error "not enough space for upgrade"

So I powered off and removed one of the blades to get a closer look at the SD cards. 2GB Mirror in the Dell M520 blades. 2GB?! ah , ok , wow.

So I slotted it back in with the intention of addressing that again later, and thats when one of the RAID controllers failed.

I only know that because I tried to refresh vCenter after repowering the blade and realized it was not loading. A quick look at the hosts showed why. The snapshot chain had become corrupt. Trying to commit the snapshots failed. Trying to consolidate the virtual machine, failed.

At least the domain controller survived..... nope. Black screen , restarted, crashed on boot. Direct to Windows recovery options. I could get into directory services restore mode, but no repairs would stick. Dang.

So vSAN lab is down,  Dell Lab is down.. This would be a great opportunity to learn some things... but I am short on time. 

So, on a prior recommendation from a friend and colleague , I checked out a 7 day trial of CBT nuggets. Fortunately they had an updated course for the VCP-DCV 2021 2V0-21.20 exam presented by Keith Barker. I wanted to see some specific topics so I jumped around a bit, but I did find Keiths style very engaging. More than a few times things didn't go 'as planned', sometimes this was on purpose , other times it was the lab he was using, but it was interesting that some of these things were left in.. because, honestly things do not always go right and it would be wrong to present only 100% success every time.

Ultimately I was able to salvage a domain controller from the vSAN lab, (host 2 is dead in the water though) ... 

I built a new vCenter and Domain controller on the Dell kit.

Read some more, watched Keith for a few hours, ran a couple of scenarios on my labs, ran a couple of scenarios on hol.vmware.com.

Then, I got a good nights rest.

I went to the testing center, sat for the exam, second guessed myself a few times. Reviewed the questions that I had "marked" and determined that it was probably more harm than good to go back to them (it was only 9.. and of those 4 were marked because I thought I might need them)

I clicked the button to submit. AND BANG! Screen says congratulations.

Very relieved. I don't have to take it again, I can sit for the VCAP-DCV Deploy now... so it starts all over again..

Wish me luck on November 2nd!


Jason


Thursday, August 5, 2021

"INVALID" machines after vSAN shutdown.

 In some business environments or homelabs it may be necessary to shutdown a vSAN cluster entirely.

VMware has a proper procedure outlined on the documents site.

Shutting Down and Restarting the vSAN Cluster (vmware.com)

Once you have completed your maintenance tasks and fire everything up, you may be startled when you login to each of the hosts and attempt to verify your VMs....

They will probably all display as an "INVALID" status with the GUID Path in the "name" field.

fig.1 - Host 1 - 2 "invalid" machines


fig.2 - Host 2 - 2 "Invalid" machines


This message certainly caught one of my peers off guard during a recent datacenter update. That quick rush of butterflies in your gut that tells you all of your data is now corrupt.

However, I quickly reassured him that nothing was lost and the issue was simply that all of the hosts were still in maintenance mode.

When the vSAN nodes are in maintenance mode after a reboot, none of the stats are correct since none of the vSAN components are allowed to work together. 

fig.3 - 0Byte vsanDatastore


If you have powered on all of the vSAN nodes in the cluster and waited a sufficient amount of time for all the vSAN nodes to communicate, the remediation is actually quite swift.

Remove all the Hosts from Maintenance mode.


fig.4 - Host 2 - Still in Maintenance Mode!



The names will re-appear for each of the Guest objects.

fig.5 - Host 1 - Proper names!

fig.6 - Host 2 - Proper names again!


If all of your workload is on these hosts, you can start your DNS server, then the vCenter server.

fig.7 - vCenter operational again.

Before you know it , you are back in business.




Thursday, February 11, 2021

Congratulations to the vExperts!

 


Today , VMware released the names of new and returning vExperts for 2021.

vExpert 2021 Award Announcement



I was honored and humbled to be able to stay with this program for my 4th year.

I believe in the VMware eco system and have seen how transformational it has been in my business and I enjoy sharing stories and advice with others on the subject.

I was fortunate enough to join the VMUG in Denver at a great time and found a group of people that understood my language. After being a member for several years, I volunteered my time to hep organize and plan activities for the local groups along with 2 other amazing leaders. The planning and organization of the 2018 Denver UserCon was intense, but attending on the event day was awe inspiring. So many VMware enthusiasts that I spoke with that day thanked us for helping to provide them with so much information about the products and services that could help advance their careers and businesses.

I had to relocate for work to Orlando and immediately sought out the local chapter and immediately volunteered. Once again I found myself connected with passionate leaders and VMware advocates. I was fortunate enough to attend a few in person meetings before the pandemic fully took hold , and I am looking forward to the day that we can get together again.

I am looking for new ways to support and add to the community.

I have started this blog as a way of documenting things that I find interesting and new.

I am looking at the option to livestream some lab sessions.

I answer questions on the Spiceworks forums for those that find themselves in a bind.

Jason Valentine's SpiceWorks

Having access to a community that understands your needs and daily challenges is such a profound help and I am hopeful that I give as much  as I take.

Jason

Saturday, January 9, 2021

Updating the DNS Search domains and FQDN Hostname for an ESXi Host.

 Here is a quick tip for anyone that has deployed an ESXi host from a home network or received DHCP values that didn't match your intended configuration.


I deployed a machine that I want to use in my home lab, but my default Comcast domain is displayed as the FQDN.
I have already changed this in the DCUI , but it persists in the vSphere client.

The configuration is found on the default TCP/IP stack, Click Networking

Select the TCP/IP Stacks tab

the DNS configuration shows my Domain name as hsd1.fl.comcast.net ,but my proper domain name is in the search domains box. Lets clean this up.

Edit the Default TCP/IP Stack - note the Domain name and Search Domain fields.

Update the Domain name and Search Domains fields to remove and replace the incorrect domain.

Refreshing the host client shows the updated values!


Hope it helps!

Jason


Reusing Disks from a Previous vSAN

 This post may be mostly relevant to home-lab builders, but if you have previously configured vSAN on the disks used in host, you may not be able to easily reuse them in a new vSAN.

Lets take a look at my specific use case.

I want to install vCenter 7 and I want to establish vSAN on the host during this process.


For this installation I am going to install on host ESX03 , I previously used the internal disks for a vSAN cluster, but I am upgrading from hybrid to all flash using a 1TB SSD that I recently acquired.

In the datastore part of the installer , I select "Install on a new vSAN cluster containing the target host"
This will create a standalone vSAN node, configuration of the cluster is completed later when additional hosts are added.


Once I get to the section to claim my disks, I can see that I only have one disk and it is the new 1TB disk. My existing 120GB drive does not appear.

On the host I can see that this 120GB disk does exist.

Selecting the disks shows me that there is an existing vSAN partition table. We need to clear this partition.

Select Actions -> Clear Partition Table

Note the warning, erasing this partition is destructive and will remove the data from the disk that is essential for the previous vSAN.
I know that I will not need that data, so I can proceed.

There are no partitions on the disk.

Refreshing the vCenter installer by going back and then forward again. I can now see my 120GB drive available to claim

I select my 1TB drive

and claim it for the capacity Tier. I can now proceed with the remainder of the vCenter deployment


Hope this helps someone 
Jason

Why can't I shut down my vSAN cluster?

 I ran into a bit of an odd issue today. After having a load of trouble with a bad boot media on vSAN cluster, I had actually managed to sta...