Axigen Migration / DRBD Highly Available Single-Tier Solution

Jay · March 27, 2021, 10:27am

Good day,

Hypothetical question only.

Rather than using DRBD Highly Available Single-Tier Solution, can I use Axigen Migration instead continuously?

Regards,
Jay

indreias · March 31, 2021, 8:46am

Hello Jay,

I do not see how the Automatic Migration process could be used for replacing DRBD for HA purposes and this is why I like to ask you to provide more details of your plan.

Looking forward for your response.

BR,
Ioan

Jay · March 31, 2021, 12:52pm

@indreias

The scenario is:

I have currently running / deployed (main) Axigen standalone server (Not possible to any kind of downtime)
Without any changes on the schema I want to build another Axigen standalone server and start to replicate the container or even incremental in real time.
In which I would connect the newly build axigen server (slave) for migration continuously to the running one.
So incase of downtime of the main server I will just plug the slave one as the main.

Hope you understand what I am trying to do.
Current database size (1.1 - 1.3 TB)

Regards,
Jay

david · April 1, 2021, 12:37am

I don’t think thats going to work. To do a full proper sync of the data, you need to be able to have the mail server in a stopped state to copy all the data.

If you were do to that and then bring up the slave server, you’d run the risk of some corruption and certain emails / calendars / contacts potentially not being there or being corrupt. You might also run the risk of have X amount of email clients start to re-download the data again.

You’re best bet to do this would be to have some spam filter sitting in front, and cache incoming emails until the main / slave servers are back online. You might even be able to get a caching server that can deliver emails to both server simultaneously.

External email → Firewall → 3rd party spam filter → Delivers to Server 1 and Server 2 at the same time.

You’re only downfall would be on the primary mail server you’d need to setup a filter that sends everyones emails to their sent folders on the slave server so a copy exists. Might not be ideal.

The other option would to copy your full data to the slave server
nightly stop the primary server
run your rsync to the slave server
bring up your primary server.

It should sync fairly fast in reality. Your data stores and mine are about the same time. For my to sync my first full copy over 10GBe, took approx 5 hours using this command:

rsync -aHx --numeric-ids --progress --delete -e 'ssh -T -c aes128-gcm@openssh.com -o Compression=no -x ’ /var/opt/axigen/ root@10.10.10.230:/var/opt/axigen

Subsequent emails during the day differ in length depending on which DB files in the axigen folder get touched. At the end of the night without stopping the mail service, it takes between 1-3 hours.

I’m contemplating doing just what you do, but currently all my servers are setup as VM’s on the High Availability storage and servers, so if any of the main host servers fail, the mail server will fail over to another within a few minutes and continue to run.

The biggest issue I have would be a complete back end failure on a piece of hardware, which is where this exact process would excel. The issue is that the axigen service needs to be completely dead before you can do a final successful sync.

For my existing backups, I used the axigen backup script that I have mounted on my server from a different storage array thats setup in LVM and then I have borg that runs backups to another storage away that I maintain backups for basically forever.

Jay · April 1, 2021, 6:50am

@david

Thanks for the insights, This is the current / implemented process I am doing right now. Closing the server 00:10 every night and do the backup by script and crond and it takes approx 1hr to finish.

Just trying to figure out another way to be more efficient without any downtime as our clients and 24/7 with critical operations the whole days.

Thanks again for your nice and details answer.

Regards,
Jay

indreias · April 1, 2021, 7:41am

Hello @Jay

Thank you for the provided details.

Most probably I have tricked when you have used the “migration” word to think that you like to use the process of Automatic Migration included in Axigen (as described here).

Now, as mentioned by @david, syncing the Axigen storage files to a remote location when local Axigen process is running will most probably end into a disaster, as there are no ways to assure the consistency of the files set obtained using this method.

We have recently introduce “storage read lock/unlock” CLI commands so you could bring some / all domains into ReadOnly mode so you may have the chance to trigger a “storage snapshot” (which usually is very fast).

[root@mail ~]# /opt/axigen/scripts/run-cli.py 'config server|help' | grep -i 'read lock'
READLOCK Storages [matchingURI <pattern>] - Acquire read lock on all or some storage units registered to AXIGEN; an optional pattern used for filtering based on storage URI can be provided
READUNLOCK Storages [matchingURI <pattern>] - Release read lock on all or some storage units registered to AXIGEN; an optional pattern used for filtering based on storage URI can be provided

Having a storage snapshot (we heard from some customers that they have used LVM snapshots) will allow you to transfer the data to the remote Data Center where your “slave” Axigen instance is located. Unfortunately we could not provide here any further details as we haven’t been involved into any such setups.

The option of considering the backup data obtained via FUSE / FTP for full restore** is somehow acceptable only for quite small domains as it will be painfully slow to recover a 1 TB domain from it.

** as these backups methods are intended for “targeted” restore operations like restoring an entire mail folder which the end-user accidently deleted

Now, if your storage layer do not offer support for creating snapshots, in case you have access to the accounts password (very improbable but who knows) than the CLI “migrate” command (from domain context, to be ran periodically from the “slave” system) may be interesting to be evaluated.

HTH,
Ioan

Jay · April 1, 2021, 8:17am

@indreias loan,

Thanks for the details as well. My current storage does not support creating snapshot.

This is what I am trying to tell from the beginning in which I as only as an Hypothetical Question.

" in case you have access to the accounts password (very improbable but who knows) than the CLI “migrate” command (from domain context, to be ran periodically from the “slave” system) may be interesting to be evaluated."

Can anyone from your side can test this scenario?

As I cannot proceed to make another test machine due to limited resources.

Thanks it will be helpful not only for me but for the whole community of axigen.

Regards,
Jay

indreias · April 1, 2021, 9:43am

Hello @Jay

I have tested this scenario and could confirm that using migrate CLI command will transfer any new messages from remote (in your case the “master” server) to the local (“slave” server) Axigen server.

But let’s consider the following possible scenario:
1/ we have ran the initial migrate command and the folders + messages from remote server have been transferred to the local server
2/ on the remote server we’ll receive 10 new messages which have been delivered into Inbox folder
3/ we ran again the migrate command and all 10 messages will be picked and transferred from remote to local server (for Inbox folder)
4/ on the remote server we decide to move 3 messages from Inbox to another folder (let’s say into Spam)
5/ we ran again the migrate command and that 3 messages will be picked and transferred from remote to local server (for Spam folder)
6/ as the migrate command will pick only new messages (so no deletes will be detected) than you may end in the local server with duplicate messages (not in the same folder but in different ones, as the result of organizing messages, by the end-user, into the remote server).

Another possible downside of this method is that the slave Axigen server will not preserve the same IDs for internal objects (like folders and message IDs) as the ones from the master Axigen server. This means, that an IMAP client, for example, when connecting to the slave server will most probably trigger a full resync of the account (meaning that all messages will be fetched from the slave Axigen server.

Because all of the above this method is far from being a perfect solution and this is why I have mentioned that the evaluation should be done by yourself as you may find other negative aspects as well.

HTH,
Ioan

Jay · April 1, 2021, 11:15am

@indreias

Thanks loan, this is a valuable feedback. I’ll try it myself if i will find time to build and test this scenario.

In any case I’ll continue to use the currently backup deployment I have.

Thanks all @david and @indreias for all the insights.

Regards,
Jay

david · April 1, 2021, 12:22pm

I don’t know about the cli migration. That would pose a huge issue for me as I maintain multiple domains and I don’t have everyones email account passwords, nor do I want them due to security issues. The other problem I see also stems from if a user changes their password, those emails will never be moved to the backup server.

I think the best option here is as follows:

1.) stop axigen process
2.) snapshot the data store <<-- I understand your back end doesn’t support snapshots right now
3.) start axigen process
4.) mount snapshot
5.) rsync data to slave server
5.a.) I suppose you could also copy the snapshot to the remote server, but thats a whole different kettle of fish in my opinion and would take more time than it might be worth then just copying delta changes.
6.) unmount snapshot
7.) delete snapshot (potentially)

The service would be down for no more than a few seconds during the pause and snapshot of the data.

I’m also open to suggestions as well on this from other users on this forum. The backup of Axigen has always been troublesome for me. The backup feature works, but isn’t good for quick flip overs in case of emergency. The backup scripts works well for restoring individual / domains, but is time consuming.

This would allow you to keep a zombie server sitting idle with the latest version run x hours so in case of disaster, you could quickly bring it online and continue running without too much of a hickup.

For me, I also maintain an email archive server as well so every email is siphoned off to an archive server where users can login and restore and emails that might be missed if thats the case.

Basically I want to be a lazy admin and allow my users to go and login / search and retrieve their missing / deleted emails to their inbox without calling me to hunt and peck through backups.

david · April 3, 2021, 8:05pm

I’m not the best at scripting, but this should do the trick for anyone else using LVM and wanting to get full copies of the data.

The entire process of stopping and starting the axigen service was only seconds, so in theory, you could run this backup type nightly or a few times a day during downtime to keep more recent backups of your data. Overnight, lunchtime, end of day, etc.

This is perfect for a backup server sitting somewhere else to ensure continuity of servers in case of disaster.
Happy to take feedback on this as well in case people have better ideas on how this can be done. The important thing is that this is a snapshot of the server so you can bring up the disaster server knowing that all the data was copied from the snapshot while the server is running.

I would add this a cronjob on your production server. This was built against Debian 10, so your results may very if you’re using a different distro.

########################################################################################
#!/bin/bash

#You should only need to edit these variables:
#########################################
snapshot=colonysnap
axigenstore=/dev/vg_axigen/lv_data
axigensnap=/dev/vg_axigen/$snapshot
temp=/mnt/snapshot
username=root
remote=10.10.10.230
remotestore=/var/opt/axigen
#########################################
######DO NOT EDIT BELOW THIS LINE######

#Stop mail server
echo “stopping mail server” >> /var/log/syslog
/bin/systemctl stop axigen.service
echo “mail server stopped” >> /var/log/syslog

#Create snapshot
echo “creating temporary snapshot” >> /var/log/syslog
lvcreate -L 4096MB -s -n $snapshot $axigenstore
echo “snapshot created successfully” >> /var/log/syslog

#Start mail server
echo “starting mail server” >> /var/log/syslog
/bin/systemctl start axigen.service
echo “mail server started” >> /var/log/syslog

#Mount Snapshot
echo “making directory” >> /var/log/syslog
if [ -d “$temp” ]; then rm -Rf $temp; fi
mkdir $temp
echo “mounting snapshot” >> /var/log/syslog
mount $axigensnap $temp
echo “snapshot volumes mounted successfully” >> /var/log/syslog

#Start copy of snapshot data
echo “starting copy to remote server” >> /var/log/syslog
rsync -aHx --numeric-ids --progress --delete -e ‘ssh -T -c aes128-gcm@openssh.com -o Compression=no -x’ $temp $username@$remote:$remotestore
echo “data copy successfully completed” >> /var/log/syslog

#Unmount snapshot
echo “unmounting snapshot” >> /var/log/syslog
umount $temp
rm -rf $temp

#Remove Snapshot
echo “removing snapshot” >> /var/log/syslog
lvremove -y $axigensnap
echo “done” >> /var/log/syslog
########################################################################################

Marcel · June 9, 2021, 8:34pm

Hello Community,
I tried this solution but unfortunately, Axigen touches the xx.hsf files continuously even when the md5sum of the file has not changed which leads always to a full backup. Do you suggest a solution for this behaviour? Why is Axigen touching the files anyway?

Jay · June 10, 2021, 12:15pm

@Marcel

Sad to say No other way around, but if you want do FTP / Fuse Backup.
But making directory backup “/var/opt/axigen/” is the most safest way to go.

Jay

Marcel · June 10, 2021, 4:00pm

@Jay - Yes, but I searched for a solution for the same question you had initially. Have to fall back to DRDB and DRDB Proxy then. Thanks.

Jay · June 12, 2021, 7:52am

@Marcel

Yes, this will be the best solutions if you are just deploying your servers. Real time replication using DRBD.

indreias · June 17, 2021, 7:24am

Hello Marcel,

Have you checked the situation after setting the storage in read only mode (using “storage read lock/unlock” CLI commands, as explained in post #6?

BR,
Ioan

Marcel · June 17, 2021, 10:55am

Hey @indreias , just checked this solution.

READLOCK
SNAPSHOP
READUNLOCK
RSYNC

Does not work. Even seconds after rsync finished the job the next backup is a full again. Filetime changes.

indreias · June 17, 2021, 11:26am

Hello Marcel,

This is normal - sorry that I haven’t understand from the first time.

For the sake of completeness could you post the rsync options you are using?

Thx,
Ioan

Marcel · June 17, 2021, 1:17pm

@indreias

-aHx --numeric-ids --delete

syncing to a mounted sshfs drive

indreias · June 17, 2021, 3:31pm

Hello Marcel,

I have to confess that I’m not an rsync expert

From my knowledge selecting the files to be synced is one thing and exactly what is transferred is another thing.

When saying that rsync is doing a full transfer each time we’ll understand that the entire content of all files have been transferred from source to destination instead of doing only the transfer of the changed parts from selected files (as computed by the delta-transfer algorithm)

Now, could you please let us know if adding the --checksum option does make any improvements?

Also, I have to mention that in some of our own scripts we are using -P --inplace --no-whole-file options and maybe you could consider to use them as well .

I may also suggest to add --stats in order to have access at the end to some useful information (like Total bytes sent) that may allow to confirm if the transfer was a “full” or a “partial” one.

HTH,
Ioan