It seems like all my posts start of the same way, “I recently had an issue…”, but it’s true 🙂 I recently had an issue with snapshots consuming all of the space in a datastore. The culprit was CommVault preforming end of month full backups of VMs. During the backup process it takes a snapshot of the VM prior to that full VMDK backup.
It was first identified with a VM having a pending question in the vCenter Client.
It was also quickly identified as having a snapshot by the xxx-000001.vmdk virtual disk name. A check of the working snapshots of the VM showed that it was created by CommVault. A check of the VMs datastores the VM used showed that one of them was full. It was also a good bet that any other VMs on this datastore with snapshots or thin provisioned disks would be having issues too.
The snapshots couldn’t be deleted as there was no free space on the datastore to commit the transactions from the snapshot VMDK. I could have grown the size of the datastore but that would have involved Change Requests with changes at the SAN and vSphere level. Without panicking too much I identified a rather small working VM and migrated it’s storage to another datastore. The goal here is to free up space on the datastore in preperation to answer the pending questions and get the VM running ASAP.
While the VM was being migrated to a different datastore it gave me a few minutes to identify which VMs had a pending question and would have been in a paused state. Now there’s probably 17 different ways to approach this but I just went with a direct method.
Using PowerCLI and connecting up the vCenter Server. I then ran Get-VMQuestion.
Connect-VIServer my_vcenter_server.domain.local
Get-VMQuestion
This will by default return all VMs in the vCenter instance that have pending questions. If you have a large vCenter with 1000+ VMs this will most likely take a while and you might want to target the command to specific datastores or ESXi servers. In my situation I wanted to make sure I identified all pending questions and didn’t miss anything.
By this point my VM migration just completed so I could now answer all the pending questions and resume the VMs in the paused state. This can be done by using the Set-VMQuestion.
Get-VMQuestion | Set-VMQuestion –Option “Retry” -Confirm:$false
It doesn’t really get simpler then the above command. Identify pending questions with Get-VMQuestion. Pipe it to Set-VMQuestion, answer all with ‘Retry’, and use the parameter ‘Confirm’ so not to get prompted to confirm action. Again probably smarter ways to go about this. You could use the Get-Datastore cmdlet to identify datastores with zero free space. Then target those datastores with Get-VMQuestion.
Get-Datastore my_datastore | Get-VM | Get-VMQuestion
The Get-VMQuestion / Set-VMQuestion is a great PowerCLI way of working smarter not harder. Whether answering 1 or 100 questions it’s all the same via PowerCLI. I don’t really know if there’s an easy way to identify and answer pending questions through the vCenter Web or C# client???
On a side note. Set-VMQuestion isn’t overly intuitive as a cmdlet name for answering a question. So there is an alias to it called Answer-VMQuestion. I guess it sticks with PowerShell tradition with the Get / Set verbs.
Interesting situation that you ran into here. I wonder if the new Datastore Space card in CloudPhysics could have helped you with this problem. Let us know if you’d like to try it out.
Not sure what others do to avoid this happening, but i set up an alarm action in vCenter to go off when a VM hits a specific snapshot size. There’s also Alan Renouf’s vCheck script that shows VM’s with snapshots over 14 days old, and oversize snapshots.Because other admins will take snapshots and forget about them.
Also the alarm on % datastore remaining should give some warning. as long as you set an alarm action (email) to get notifications.. (it should MAKE you set email for a lot of the alarms by default).
You found a great outcome for your issue.