A guest post by Erik Zandboer
In the previous episode of breaking stuff we powered off all hosts in one of the datacenters… Not really eventful as we were leaning mostly (if not all) on VMware vSphere’s HA capabilities. Now let’s put thing up a notch… And fail an entire datacenter! In this episode we will break a datacenter while NOT having a witness service.
If you’re not the reading type you can of course skip straight to the video
Failing a DC in a metro setup without a witness
I will be failing the datacenter by using the #dell #iDrac out-of-band management modules for the hosts, and I will use a smart PDU to pull the power from both controllers of one of the #PowerStores. I tried to shut the PowerStore “nicely” but it wouldn’t let me; as both controllers are aware of each other’s status I did not find a way to tear it down from the GUI. So this is where the PDU comes in.
As we are performing this test without a witness, the storage layer take the basic action of disabling all non-preferred storage. This is because without the witness service, the surviving array cannot distinguish between interlink failure or storage array failure. Looking at our four workloads (Datacenter A on the left, Datacenter B on the right, preferred volumes top and non-preferred volumes down), I would expect that only the DCA-preferred VM would survive, and the non-preferred VM from DCB would get restarted on DCA.
But lets not spoiler too much, it is enough to say that without witness we would not get to a recovery scenario that is acceptable from an enterprise perspective. Good to know, in the next episode we will be doing this test over once again, but this time WITH the support of a witness service. For now, enjoy the video below:
Failing a DC in a metro setup with a witness
In first part of the post, we shut down an entire datacenter. AS we did not have a witness service present, the recovery wasn’t all that enterprise-ready. So in this episode we will repeat the same test, but now WITH the use of a witness. Want to guess what will happen differently?
As throughout this series, if you’re not the reading type you can skip straight to the video as well
The difference a Witness makes
As we will see in todays “breaking stuff” experiment, having a witness REALLY matters! The fact that a surviving array can actually determine between interlink failure or array failure means that the surviving array is now capable of maintaining access to even non-preferred volumes, allowing the vSphere layer on top to restart the workloads that were running in the failing datacenter, even though the volume(s) they are running on are non-preferred in the surviving site.
To make a long story short: All workloads that matter to you should survive OR be restarted automatically. Watch the video below:
The post Blog Series: Dell PowerStore Metro – Breaking stuff 3&4 – Fail A Datacenter With / Without a witness appeared first on Itzikr's Blog.