Please enable javascript, or click here to visit my ecommerce web site powered by Shopify.

Community Forum > iSCSI Timeout / Network issues

Today we went live with our QST server and had almost all of our VMs (about 10) go into RO mode. XenCenter nor QST panel had any errors during the period when all the VMs went into RO. All VMs went into RO at the same time and were on multiple xenhosts. We then review the Ubuntu logs and found the following:

Nov 14 12:33:22 quantastor1 kernel: [1424797.701263] iscsi-scst: Negotiated parameters: InitialR2T No, ImmediateData Yes, MaxConnections 1, MaxRecvDataSegmentLength 1048576, MaxXmitDataSegmentLength 131072,

Nov 14 12:33:22 quantastor1 kernel: [1424797.701269] iscsi-scst: MaxBurstLength 1048576, FirstBurstLength 262144, DefaultTime2Wait 2, DefaultTime2Retain 0,

Nov 14 12:33:22 quantastor1 kernel: [1424797.701272] iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,

Nov 14 12:33:22 quantastor1 kernel: [1424797.701276] iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048

Nov 14 12:33:22 quantastor1 kernel: [1424797.701281] iscsi-scst: Target parameters set for session 25300001c3d0200: QueuedCommands 32, Response timeout 30, Nop-In interval 30

Nov 14 12:33:22 quantastor1 kernel: [1424797.701295] scst: Processing thread 29247493522_7 (PID 25672) started

quantastor kernel seems to says that it was unable to handle kernel paging request for iscsi device, The IO operations max limit was burst. Also the quantastor1 kernel was unable to move into to recovery point so the recovery level was marked as 0.

Has anyone seen these issues and what is the best way to correct it. We had to run fdsk on 10 VMs to get them back online.

Note: All VMs have thier own Storage Volume as well.

November 14, 2011 | Unregistered CommenterJason

Jason, we'd like to get on a conf call at your earliest convenience tomorrow. The log information you're showing there with the MaxBurstLength settings and such look normal.
With all the XenServer nodes going RO at the same time that would be indicative of a network issue. There's a timeout on the iSCSI link after which it would go RO so would like to see the XenServer logs.
Probably unrelated but would like to verify the configuration of your /etc/multipath.conf on your XenServer hosts but we need to do some more checking.

Here is our documentation on the correct XenServer configuration settings:

XenServer multipath settings

Also, could you login at the console and run this command which will send us all the system logs so that we can triage:
sudo qs_bug_report.sh jason

Very sorry for the inconvenience, looking forward to meeting with you soon and getting this resolved,
-Steve

November 15, 2011 | Registered CommenterSteve

Just thought I'd write to share our findings with the community in working with Jason & SK on this ticket today. In looking at this there were a few things that we were looking for. One was low disk space in the storage pool, (note that in general if you're using thin-provisioning it's best to have 20-30% of your storage pool space available) to ensure that the thin provisioned storage volumes have adequate room to grow. The other was to verify the multipath configuration settings and the last was to look for network issues that could perhaps have been remedied with adjustments to the iSCSI initiator timeout settings.
In short, our findings were that there were some XenServer multipath configuration issues and the system was pretty low disk space so without having a 100% definite root cause, our conclusion is that it was likely one of the two. If you're a XenServer user, be sure to make sure that you've got our latest multipath.conf settings as these do not ship with XenServer by default but we're going to work with them to get that addressed in an upcoming XenServer release.
Thanks,
-Steve

November 15, 2011 | Registered CommenterSteve