Please enable javascript, or click here to visit my ecommerce web site powered by Shopify.

Community Forum > Primary grid node stays disconnected - some users can't access shares

Hi all,

although the primary grid node is up and running, the grid management shows the state "Disconnected". As one result of this situation some users don't have access to network shares anymore. In my case this is the backup user so there are no successful backups! The error when accessing the network shares with the backup user is "Access denied". Other users have access to the network shares. The backup user is the newest local account I created on the QuantaStor grid and for a few days everything worked fine with accessing the shares by this user.

I can't logon to the cli of this node. It's not possible to change user settings. Windows can't resolve the users in the security settings of fles or folders. It's not possible to remove the grid node for reconnecting it to the grid.

After a long time with testing the QuantaStor grid all tests were successful. Now, after going into productive state there are big issues! I'm happy that I didn't migrated my productive data yet! Actually there are "only" backup and IP camera files.

How can I solve this issue?

Kind Regards.

September 1, 2018 | Registered CommenterStefan Mössner

Hi all,

after switching the grid master to another node with the force option I was able to remove the disconnected node from the grid. Then I was able to access the cli of this node with the qadmin user for resetting the password of the user "admin". And after uploading the license to this node I could reconnect it to the grid. Now the grid is complete again and working.

But the access of the backup user to the shares was still not possible. I had to configure the security settings of the files and folders once again for this user to get access to the shares. Why is QuantaStor losing the access rights? This isn't really reliable!


Kind Regards.

September 1, 2018 | Registered CommenterStefan Mössner

You did the right thing Stefan (assign a new grid master node and reset the admin password). The other thing you can do is to add a "Grid IP" in the "Cluster Resource Management" section. That will create an IP address that will float between appliances for the purposes of auto-electing the master node.

I'm not sure about why the Backup user permissions changed. QuantaStor will allow one to modify the top level ownership of a given share via the 'Network Share Modify' dialog. In a previous version we didn't allow that to be set to root but I think it was in 4.6.2 that we made it so that one could assign the ownership to the root user. In this case the owner was the Backup user and then it was changed to a different user? QuantaStor doesn't change share ownership settings without an administrative command being run so the change must have been either via a ACL change done from one of the Client systems or via running a Network Share Modify do change the owner or other permissions settings. If it happens again and you know the step taken to cause the permission change let us know.

For our upcoming 4.7 release I will put in a ticket to see if this can be reproduced in some way.
Steve

September 1, 2018 | Registered CommenterSteve

Hi Steve,

I already have a VIF for managing the grid and this worked everytime. It also worked when I had the issue with the master node.

Regarding the permissions: The ownership is given to my local Administrator account and local Administrators group. I was on holidays and didn't change permissions settings when the issue did happen. And the backups were running fine for a few days before this issue happened.

Kind Regards

Stefan

September 4, 2018 | Registered CommenterStefan Mössner

Hi Stefan,
That's very odd, we don't have any logic that would have dynamically / auto change permissions like that without it being an explicit command. That said, we'll get an QA ticket opened to see if we can reproduce it. The Backup user account is an account you created within the QuantaStor system correct? Do you know if the VIF for the gluster interface changed nodes during that time?
Best,
Steve

September 4, 2018 | Registered CommenterSteve

Hi Steve,

yes, I created the backup account as local user within the management UI of the QuantaStor grid.

There were no error messages regarding the failover of the gluster VIF. But the master node had an issue with a full swap partition before it disconnected itself from the grid. I don't know where the gluster VIF was running before and after this issue.

Kind Regards

Stefan

September 4, 2018 | Registered CommenterStefan Mössner

Hi Steve,

today I got a swap error message again. This time it's the second grid node. Here's the message:

OSNEXUS QuantaStor Storage System Event/Alert

System Information
______________________________________________
Name : quantastor2
Serial # : 67fe48c1
Version : 4.6.2.017
IP Address(es): 192.168.0.32, 192.168.1.32, 192.168.0.40
______________________________________________

Event/Alert Detail
______________________________________________
Name : System Swap Partion Configuration Non-Optimal
Severity : Warning
Time stamp : Thu Sep 6 19:12:29 2018 Description :
Detected swap utilization level over '60%'. System swap size '4.0GiB'
configuration may not be optimial, you may want to contact Customer
Support for an appliance configuration review.

The backups are still running fine.

Kind Regards

Stefan

September 7, 2018 | Registered CommenterStefan Mössner

Hi Stephan,
Yes, you can add an additional swap device or more RAM. In general you want to have a swap device that's the same size or greater than the amount of physical RAM. Your boot disk should also be 200GB or larger. How bit is the boot drive?
Best,
Steve

September 8, 2018 | Registered CommenterSteve

Hi Steve,

I have 3 virtual nodes, each of them has 4 GB RAM.. The swap size is also at 4 GB. The boot disk has 20 GB and the 2 data disks have 1.7 TB. Here is the output of df-h of node QuantaStor2:

Filesystem Size Used Avail Use% Mounted on
udev 2.0G 12K 2.0G 1% /dev
tmpfs 394M 3.7M 390M 1% /run
/dev/sda1 16G 6.2G 8.8G 42% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 2.0G 0 2.0G 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/md126 1.7T 124G 1.6T 8% /mnt/storage-pools/qs-a4b928a9-9dff-c6c0-f2a6-a74a3142e947
/dev/md127 1.7T 166G 1.6T 10% /mnt/storage-pools/qs-ae5dd208-9882-9743-25c9-ec948f3327b1
localhost:NAS 6.8T 580G 6.3T 9% /export/NAS

So for me it doesn't look like the boot disk is too small. I don't know why I should configure 200 GB for the boot disk. Because of a virtual machine I can increase the boot disk size. How can I increase the swap disk and the other partitions?

And here is the output of top of node QuantaStor2:

top - 21:29:13 up 11 days, 12:11, 1 user, load average: 1.42, 1.46, 1.53
Tasks: 481 total, 1 running, 480 sleeping, 0 stopped, 0 zombie
%Cpu(s): 21.9 us, 25.7 sy, 0.0 ni, 50.1 id, 0.0 wa, 0.0 hi, 2.4 si, 0.0 st
KiB Mem: 4028380 total, 3773176 used, 255204 free, 12584 buffers
KiB Swap: 4192252 total, 659244 used, 3533008 free. 89280 cached Mem

The swap usage looks good. I don't see a usage of more than 60%. Is there an issue with the monitoring of the swap usage?

I have to say that I lost the VIF of Gluster now, see http://forum.osnexus.org/forum/post/2719986. I can't connect to Gluster with the IP address of the VIF of Gluster. For having running backups of my environment I changed the DNS from the VIF IP address to the physical IP address. I can't re-create the VIF. because something seems to be wrong with corosync.

Kind Regards

Stefan

September 10, 2018 | Registered CommenterStefan Mössner

Hi Steve,

how to go on with this issue? Actually there's a lot of manual work to get my backups working!

Kind Regards

Stefan

September 12, 2018 | Registered CommenterStefan Mössner

Hi Steve,

in the meantime I had sometimes performance issues with my backups via the QuantaStor environment: Sometimes the performance is really good and sometimes it's very poor! Is this because of using GlusterFS? The backup files are between 2 GB and 80 GB - image files of my client systems with incremental backups, made with Veeam Agent for Windows.

Now I don't have access to the network share anymore. Even the admins can't access the files. The share is viewable but after logging in I'll get an error message. I had to restart the node which is actually the presenting node for the file services to get the access working again. The gv... interface isn't available anymore.

After a lot of issues with this environment I'm thinking about changing something. In the past I didn't had such issues with the EMC Unity VSA. No performance issues and no issues accessing the shares.

So, I have the following questions:

1. When will version 4.7 of QuantaStor be available?
2. Is GlusterFS the right choice for a file server and as backup target with large files?
3. Would a Ceph-based environment be the better choice? I know that QuantaStor can do Ceph, too. Then I have to mount the volumes via iSCSI to my Windows system for network sharing files and folders and can use the local users and groups of this system for the ACL. But then I have a single point of failure for accessing the network share.

What do you think?

Thank You

Stefan

October 2, 2018 | Registered CommenterStefan Mössner

Hi Stefan,
There are some various gluster tunables which may help like the ones here for the smb.conf globals section. There are also a number of performance tunables with gluster that may be helpful, best to do some google searches on "gluster large file performance". Would recommend keeping an eye on the CPU load as that's a common performance bottleneck.

It looks like you've got some networking configuration issues. With regard to the :gv virtual interface the best thing to do is to delete the Site Cluster by going to the tab marked "Cluster Resource Management" and then recreate it and then recreate the gluster VIF. I'm guessing that you've added or removed a node which was a member of the cluster so it needs to be recreated.

To your questions:

1. QuantaStor v4.7 is due out soon, we're shooting for Friday this week but it looks like it may push out to next Wednesday.

2. The scale-out nature of GlusterFS makes it somewhat complex to deploy. Network switches, front-end / back-end ports, MTU settings, firmware, hardware build, etc all need to be checked to have a successful deployment. We do all that as part of the hardware solution design and later the pre-go-live check phase with every paid subscription deployment. You're not getting the benefit of that with Community Edition so you're running into various pit-falls, and unfortunately those are not easy to diagnose over the forum. In short, GlusterFS requires more much work to get it tuned for various workloads and we always recommend ZFS based configurations first. Unless there's a specific need for the scale-out features GlusterFS brings to the table, a ZFS based Storage Pool is the better way to go. Faster, simpler to setup, requires less hardware, requires little to no tuning, and requires fewer servers and network ports.

3. If your backup product can talk S3 then Ceph is a decent option but using Ceph over iSCSI/FC and then formatting the drive and all that wouldn't be my preference for an archive solution. Archive storage should be file/NAS based rather than block based unless there's a specific use case requiring block. As an example, if you map a block device to windows then resize it to make it larger later on Windows will run into problems with the block size setting of the NTFS filesystem

QuantaStor v4.7 has a number of performance updates and we did improve the VIF recovery logic to handle cases where a VIF can get orphaned due to the gluster volume it's associated with getting deleted. Anyhow, that shouldn't be effecting you and the recreation of the site cluster should sort out the gluster VIF problem you're seeing.

October 2, 2018 | Registered CommenterSteve

Hello Steve,

thank you for your detailed answer. When I'm right then QuantaStor isn't using CTDB. Instead it is running with CoroSync. So some of the tunables don't match.

My 3 nodes are running as virtual machines on VMware ESXi 6.7 and each node has two network interfaces. One for the user access and one for cluster. There are two 2 TB hard disk drives for each node which are mounted as virtual disks to the VMs (as VMDK). Each drive is configured as a separate storage pool and a separate Gluster brick.

My questions regarding Ceph were based on searching the internet if there are known issues with the storage performance of GlusterFS. And there were a lot of hints for Ceph being a better choice when mounting it via iSCSI to a Windows machine. ZFS isn't a good solution for me. First, I can't get a scale-out NAS. Second, I have bad experience with it. When the volume was running out of space in the time of a nightly running backup I wasn't able to free up space and as the storage space was filled up completely I wasn't able to get access to the data anymore.

I want a scale-out NAS for my personal data files and for backing up my client systems. And I have 2 IP cameras which are saving their films to the NAS. As a result of this I'm having a lot of small files (office documents) and also some very large files, i. e. backup images of client systems and also video files. This is what I've done in he past with EMC UnityVSA very successful and without any changing storage or network performance. But UnityVSA has some other disadvantages. So this isn't the right storage solution for me anymore.

After restarting the node which is actually the presenting node for the file services the performance is very good again! It's much faster than EMC UnityVSA ever was. But the performance is very often changing. It's not as stable as I know it from the UnityVSA. When the backup in the night is stopping I have to restart it manually at daytime and then it's working again without issues. When the backup is stopping the error messages always show issues with the QuantaStor: The share isn't available for a long time and so the retries of the backup tool are getting a timeout.

Im using FreeFileSync for backing up my files from the QuantaStor NAS to a Windows client with a network share on two disks secured by storage spaces. This is for having access to the files without any comlicated procedures when the NAS is accidently down. FreeFileSync is even versioning the synced files when there are newer versions of the same file for having a real backup with the speed of file syncing.

I will wait for 4.7 and will then recreate the cluster and the gv... interface again. Hopefully this will help to have more stability and a stable network performance.

Kind Regards

Stefan

October 3, 2018 | Registered CommenterStefan Mössner

Hello Steve,

I can see that you now released QuantaStor 4.6.3. But what about 4.7?

And after some reboots of QuantaStor and my ESXi hosts for updates (ESXi 6.7 Update 1, BIOS updates of my DELL servers, some Linux updates on the QuantaStor nodes - not the kernel) I found out that the performance is very good after the reboot but decreases day to day until there timeouts for synchronizing large backup files.

Kind Regards

Stefan

October 18, 2018 | Registered CommenterStefan Mössner

Hello Steve,

as an attachment to my last post after the upgrade to 4.6.3 it's now possible again to manually move the gv... and the sv… interfaces between the nodes. The gv... interface is back to the scale-out file storage tab in the QuantaStor Web Manager. I didn't recreate the QuantaStor grid. It looks like the grid repaired itsself by the upgrade of the QuantaStor nodes.

In the cluster resource management tab corosync shows healthy state for all nodes. But there's now an orphanned fourth node. It's also at healthy state because it's the old ID of the one node I had to re-add to the grid on August. The entry is shown with its ID instead of the hostname of the node. How can I delete this orphanned item?

Thank you

Stefan

October 18, 2018 | Registered CommenterStefan Mössner