Please enable javascript, or click here to visit my ecommerce web site powered by Shopify.

Community Forum > HA Pool / SCSI-3 reservations / failures

Hi there!

I'm still testing Quantastor (thanks for all the help with 'unrecognized HBAs' etc) for some HA environment.
I configured two nodes, NodeA, and NodeB, with HP H221 (LSI SAS2308) HBAs and two SAS enclosures (HP D2700). All connected per 'best practice' references.
I configured the QS for HA, created two cluster heartbeat rings, vif etc. Also configured the ha-storage pool as per QS manual.
Nice, all seems working as expected.

Then I stared to play/test HA behavior.. killed a node and the other took the pool and vif... nice.
Then I started pulling sas cables out of controllers-enclosures... and then the fan started!

The node correctly detected the "missing disks" etc... but all the I/O on the affected node went stuck! The pool was not switched to the other node.
After some twiddling around the only thing I could do was a hard-reset/power cycle of the affected node.

The pool was not took over be the other node... instead it complained the "drives are reserved by another initiator" hmm....
Then I tried to import the pool on NodeB (original one) and the system complained with the same error - devices are reserved by other initiator.
Tried to import the pool on NodeA (the other, surviving node) - same error - drives reserved???

Restarted both nodes - no go, same error.

How is this supposed to work? Should QS detect drive/enclosure missing and switch over to other node?
What to do now, that "drives are reserved by other initiator"??? I have a system which is all "up & running" but disk are "inaccessible"?

Should the QS HA handle such events (sas link/enclosure failure)? How should it perform?

Any hints would be really appreciated.

Regards,
M.Culibrk

July 15, 2019 | Registered CommenterM.Culibrk

Hi M,
What is the RAID layout that you selected for the pool? Also, when you view the Storage Pool in the WUI does it say "Enclosure Redundancy" = "Verified"? QuantaStor tries to ensure enclosure redundancy when you create a new Storage Pool by intelligently selecting the groups of drives that make up the VDEVs to ensure that the loss of an enclosure will not stop the pool. If you choose a RAID layout where that's not possible then it will choose disks in a simple linear fashion rather than striping across jbods. So in your case with 2x D2700s you'll want to select a RAID10 layout so that it'll mirror across chassis. Assuming you did so, verify that the pool was made enclosure redundant by look in the Storage Pools section and checking that property.

When you disconnect access to a JBOD on a system that has an active pool with disks in that JBOD the pool can become degraded. That will not cause a failover, but it will generate a call-home alert. If you lose enough connectivity to the jbods such that the pool goes into an UNAVAIL state the system with the pool will ask the other system to takeover the pool. It will check connectivity and assuming it has connectivity will start the failover process to active the pool.

So in your case it could be a number of different things:
1) devices are not mirrored?
2) device groups (VDEVs) are not striped across jbods to ensure enclosure redundancy?
3) cabling not setup with multipathing?
4) dual-redundant heartbeat rings (on separate subnets) must be setup with at least one VIP on the pool
5) check the HA group failover policy to ensure it's setup as you like (can also setup host side ping and rules around port link state)

If you're doing a PoC with Trial Edition do reach out to our support team at support@osnexus.com so that they can assist you with the pre-go-live checks and the above items.
Best,
Steve

July 15, 2019 | Registered CommenterSteve

One more thing, when an HA pool becomes active on a system a marker is put in place to make it so that one cannot fail-back to that system unless the active pool gets a clean export. This ensures that the pool does not get corrupted from write-io that was in-flight at the time of the failover. The fix is to reboot the server but if you're having issues with IO fencing the proper fix is to power everything off, wait 10 seconds, then turn on the JBODs, then turn on the servers.

The disks can have IO fencing reservations in them and powering off the JBODs will clear that but you also need to reboot the servers in this case as there wasn't a clean export of the pool and the other node didn't successfully import the pool.

Would be happy to have the support@osnexus.com team have a closer look at what happened there. Am assuming that it lost access to too many disks for the pool to keep running so the secondary node didn't have enough drives to activate the pool in a degraded state.

July 15, 2019 | Registered CommenterSteve

Grrr.... sorry for the late reply... but I somehow did not get any notification... and forgot to check the forum directly...

Anyway...
Yes, I'm using RAID10 and yes, the WUI reported "enclosure redundancy - verified".

To be totally honest, I have 4x D2700 enclosures attached. I have a single H221 HBA (LSI2308) in each node with 2x SFF-8088 ports (but the WUI report 0 ports for this HBA).
Each HBA port "handles" two daisy-chained enclosures. Enclosures are dual-controller.
Enclosures are linked "top to bottom" for Node A and "bottom to top" for Node B for maximal redundancy. (if needed, I can send some pictures/scetches).

Anyway, both nodes see all the drives and enclosures, WUI also correctly places the VDEV mirror pairs in different enclosures.

Nodes are interconnected with two cross over network cables and different Ip networks are configured. One VIF per ha-pool is configured.
All as suggested by "best practices".

I first disconnected one port of one enclosure and the system correctly reported "degraded" VDEVs. Later on I disconnected both sas cables from Node B so it lost all the drives/enclosures - at this point the I/O on Node B was "suck" (as expected, kind of) but the problem is the other node (Node A - which still normally sees all drives/enclosures) did not took over the pool.
Nothing else "happened" on the nodes other than Node B recorded a log of "missing drives" and "unavail" pool, Node A had no traces of "failures" or "tries" to take over the pool. Nothing.

OK... as the Node was unresponsive (I/O blocked) i finally restarted the Node B.... Also tried to manually import the pool on surviving node (Node A by scan + import) but that resulted in the mentioned "reservation error"... so the second pool remains unavailable on *both* nodes.

I did not power-cycled the enclosures/drives jet (will do now) - because that would at least degrade the other pool too.
Will try to start over form scratch (clear everything) and rebuild the cluster and see if there are any changes.

Will reach to support@ if the fresh try will not succeed.

Thanks for all!

best regards,
M.Culibrk

July 17, 2019 | Registered CommenterM.Culibrk

Interesting, not sure why the jbod detection logic didn't trigger the failover in this case. Yes, please do reach out to support@ they may be able to determine more details from the log and I'll check with QA to see about getting a test setup using similar hardware and cabling. Usually we do not cascade the configurations but rather use direct cabling to each jbod but that should no matter.

July 17, 2019 | Registered CommenterSteve

Yeah...
I finally power-cycled everything... and the "reservation" problem went away... (I even tried to do a manual "clear reservation" with sg_utils tools without success).
But some strange problem still remains. I cannot re-create the second pool as I'm getting the error "error creating zpool with ID... - error 485"...

I'm also getting some "errors" from mpt2sas/mpt3sas like "mpt2sas_cm0: log_info(0x30030101): originator(IOP), code(0x03), sub_code(0x0101)" and "sd 0:0:66:0: mpt3sas_scsih_issue_tm: ABORT_TASK: FAILED : scmd(00000000906820c1)"...
Giving all the strange issues with this HP H221 HBAs I'll try to replace them with HP P822 in HBA mode (the only i have with external connectors)... it's a "overkill" to use the P822 just as a HBA but for a test...

After power-cycle I'm also noticing another "strange" behavior of the enclosures... the enclosure controllers have a little 7-segment display showing the #ID of the enclosure... i remember those IDs were "OK" (logically) - first enclosure in the chain had ID 01, the second 02... but now all are showing the ID 01... really strange... I suppose the dual-controllers in one enclosure should show different IDs as the controllers A and B are wired in "opposite direction" (top-to-bottom / bottom-to-top) for each node.

I should probably change the SAS HBAs first...


Also, regarding

check the HA group failover policy to ensure it's setup as you like (can also setup host side ping and rules around port link state)

there are the options for checking ethernet ports and FC ports... but no SAS/SAN links/ports... and the "client checks" works only for IP peers (so iscsi, nfs, smb), no FC "client checking" or similar

Will swap HBAs and let you know and start to bother support@ if I still encounter issues.

Best regards!

July 17, 2019 | Registered CommenterM.Culibrk

The P822 in pass-thru may work but that's not a tested or certified configuration. We've only certified the H241 for use with HPE jbods and arrays so it may not work with the P822.

With regard to running the low level sg_utils to unlock the reservations that is a bad idea. By doing that you're bypassing the safeguards and could corrupt your storage pool. In your current state it is just best to power cycle everything to clear the reservations.

QuantaStor detects when it loses access to the drives and will automatically activate the beacon LEDs so that could be why you're seeing the 7-segment display lit up. You can clear the LED blink via the WUI in the Controllers & Enclosures section.

The SAS links are always monitored by checking the state of the pool, it's not something you can disable. If the pool goes to the UNAVAIL state due to too many disks missing the QuantaStor core service will attempt a failover after doing checks with the other system. Specifically the other system must be reponding as online and respond back in the affirmative that it has visibility to the drives. This ensures that we dont preemptively failover when the other node also has no access to the devices.

July 17, 2019 | Registered CommenterSteve

Thanks for the quick response!

No.. I thought to put P822 in HBA mode, not "pass-thru" (if you mean raid-0 per drive). I already did that with some P420 (which uses the same chip) and it really acts as a "normal HBA" but is still visible to ssacli (hp management utils). Hopefully it will behave more "nicely" compared to H221 (which is just a HP branded LSI 2308). But, if you recall, the H221 was not even recognized as a HBA until your tech masters revealed a little "trick" to make it "visible" as a normal LSI HBA... (I'm wondering how many other tricks are there... ;) )

The H241 is "nice"... but it has the "mini SAS HD" SFF-8644 connectors... and all my gear is still just 6G with SFF-8088 connectors... and the H221 is "just" a LSI2308 anyway... so I hoped it would still work...

Regarding reservations... I'm totally aware of the issues/risks you mention... but as the pool would not even import... and this was/is a test/POC system I'm not afraid to be "brave" and colossally screw something... ;)
It was kind of "last resort" to "reset" the drives without power-cycling (everything)....

The LED i mentioned... has nothing to do with the "drive activity/status" leds. Those 7segment leds just shows the controller ID (or sequence in a chain if you like) when the HBA initializes the SAS bus... so those should say "01, 02, 03, 04" if I had 4 enclosures daisy-chained (I have just two chained so the controllers should be numbered 01 and 02... but currently they are not... both controllers in both enclosures show 01 as its ID... go figure... but the enclosures are nicely detected by the HBA)

I also thought the "sas monitoring" is always active and the system should initiate a failover when the pool becomes "unavail"... that's just what I was naively testing by pulling SAS cables out of the HBA... but got just a "stuck" node and the other not even noticing something went wrong.... or trying to take over the pool...

So, let me do some cleanup and maybe changing HBAs and I'll post my results. And start to bother support@... ;)

THANKS!

Regards,
M.Culibrk

July 17, 2019 | Registered CommenterM.Culibrk