Hi Everyone, I have a situation with a new client, and I could use some advice. I know very little about them at this point (just an initial can call), but they do medical research. They have a big and active database application, and they are looking for a backup solution. It sounds like they are generating 4~6TB of experimental results a month. They are less concerned about recovery time and DR, but terribly concerned about losing even a minute of data. It sounds like they have some pretty strict requirements and regulations and any missing data would be a red flag.

They have a Datto for their general office and business stuff, but I am not sure just rolling the database server into that backup schema is going to give them the level of protection they need. I don’t have much experience dealing with big and active databases like this, so I am not sure how best to approach this project. If anyone has any thoughts or suggestions, I would definitely appreciate it.

5 Spice ups

I’m going to say this, and you don’t mention the actual database application involved, which definitely would dictate your next steps. I mean a tool like Veeam can handle the biggies, Oracle, SQL, MySQL, etc. But keep in mind the granular level they need requires database awareness backup tools.

I am in medical (OTC stuff) and I can tell from your post, that validated systems are not your jam (sounds like not theirs either), that’s not a knock, just an observation. Things you MUST have are audit trails, backups, historical records of both, and a verified and tested method to restore from any point within the scope of the product research. Also if the system you are looking at, does not have any of this already, you have literally no provability that the data has not been altered up to this point (if the research has already been started, then it could be in question). I am hoping you are attempting to “start” this database from scratch after a solution is vetted/tested (Maybe they have an evidence trail in some other way though). Just going through the few things that need to be considered.

Let us know the database in use, we can make some recommendations that might be usable for you.

I will of course mention that any backup would need 3-2-1-1 right? 3 copies, in at least 2 places, 1 being off site, and 1 being immutable. These are the minimum standards for validated systems.

I hope this helps.

4 Spice ups

If they’re generating 4–6TB/month and can’t afford to lose a minute of data, Datto alone probably won’t cut it—especially for a busy database app.

You’ll want something that’s database-aware with point-in-time recovery. Tools like Veeam, Rubrik, or even native solutions (like pgBackRest for PostgreSQL or transaction log backups for SQL Server) are good places to start.

Also consider:

  • Continuous/incremental backups (not just daily snapshots)
  • Local + cloud/offsite storage for redundancy
  • Automated backup testing so you’re not flying blind
  • Encryption + retention policies that match their compliance needs (HIPAA, etc.)

Biggest first step: figure out what DB engine they’re using—everything hinges on that.

2 Spice ups

Thank you, that is all very helpful. I do need to get more specifics from them, so this gives me some better questions to ask. I will get more information on the application they are using, and what the underlying database is. Unfortunately, I am fairly certain that they already have the application installed and running (I need specifics on the hardware as well), so I am going to have to build a solution around what is already there.

2 Spice ups

Thank you, good points. I will get the specifics on the database and their server hardware, and hopefully some direct vendor contacts. I am pretty comfortable with Veeam, so if it will work for this I will probably go that direction. I have never set up a system for automating backups before, so I will need to look into that as well.

1 Spice up

There are a whole bunch of compliance things that might be in play here too, I don’t want you to get blindsided with. As soon as you say healthcare, testing, and data, there are possibly HIPPA and or at least PII data that might require some special handling too, as well as possibility of FDA if its drug/product research that drive other requirements under 21CFR…

3 Spice ups

Yes; I am worried about that as well. I have done some work that required HIPPA compliance, but this will be the first time I need to follow 21CFR.

1 Spice up

+1 for Veeam or Rubrik

For MSSQL (and alike) database backups you have 3 basic options:

  1. Simple recovery mode - Full backups only
  2. Full Recovery - Full backups plus transaction log backup (plus optional differential backups)
  3. Log shipping/mirroring

With the first two, you have to be able to tolerate data going missing for each backup interval. If you take transaction log backups every 15 minutes, then you accept that you can lose up to 15 minutes of data entry/changes.

For the final option - where you cannot tolerate loss of data, then you’re looking at: Database Mirroring and Log Shipping (SQL Server) - SQL Server Database Mirroring | Microsoft Learn

You’d technically need fully independent servers and storage to make that a reality (so there are no single points of failure, and no loss of data regardless of what happens).

Everybody likes to think they can’t lose a minute of data, until it comes time to budget that.

1 Spice up

Most of what @phildrew states is correct, if the database is something common like SQL or MySQL. Obviously other databases have similar abilities, but each one is unique.

I’d just say find out what the database is, then find a backup tool that works well with it. I’d stipulate to them though that they need to be up front about what compliance is in play.

2 Spice ups

You’re right. I’ve updated my post to reflect MSSQL.

Whatever database you’re dealing with, you’d research what options exist for backup and replication. And also understand the practical implications of those options so you can explain any restrictions to the client.

2 Spice ups

That makes sense — if you’re already comfortable with Veeam, that could definitely work depending on what their setup looks like.

One other angle to think about: if they’re using Google Workspace (Drive, Gmail, etc.) for storing or sharing any of that research data, it might be worth looking at a tool like cubeBackup. I’ve used it in a couple of cases where the client wanted more control over their backups — especially to keep a local or cloud copy outside of Google’s ecosystem.
A self-hosted and pretty lightweight, so it might pair nicely with whatever you set up on the server/database side. Just something to keep in mind depending on how their workflows are split.

1 Spice up

If you are talking about DB data recovery only, then there are a few solutions out there but if your RPO is 1 minute, likely it is going to cost a lot and you likely need to get a more decisive RPO…

What we have is an approx 20TB database (SQL) that sadly we cannot modify the SQL server (as managed by vendor) so we have Veeam Backup & Replication to backup the VM, with Veeam backup copy to other backup repositories as well. Currently it is using Veeam reverse increment backup every 6 hours due to VBR requiring time to create the synthetic full (although the backup takes approx 30 min to process and create the increment).

The DB is using SIOS datakeeper to sync to a secondary DB. Then it runs a SQL dump (literally a SQL incremental backup) every 15 minutes to another data repository.
This does 2 things…

  1. only committed data is being sync, so there is a prevention of data corruption from Primary DB to secondary DB.
  2. The “backup” or data dump does not affect performance of the primary DB

But do note that all the Backup Repositories and Data Repositories are using NAS with at least 2 1Gbps NICs and SSDs. You can easily work out how much each NAS using Synology NAS with 12 units of 4TB SSDs would cost. Some the newer Synology NAS may have 10Gbps NICs and also use 8TB SSDs.
Then throw in re-silver of the SSDs every 3-5 yrs with NAS refresh every 7-9 yrs…

1 Spice up

Thank you, that makes a lot of sense. I was thinking that a combination of backup jobs would be good (frequent RIs and a nightly schedule), but adding a second database to sync to makes a ton of sense. I agree this isn’t going to be a cheap project; I hope they’re ready for that. :grimacing:

But you need to also know what type of sync…

Coz there is a saying “rubbish in rubbish out” ?
In simple terms, if you are going to sync like an xls… you need to ensure that somebody did not blank out the xls or corrupt the xls before the sync, else you may end up with a 0KB xls or a xls file that is corrupted on both sides ?

If they can’t afford to lose a minute of data, then you’re not looking for backup software, you’re looking for a storage platform that has real time replication at the hardware level to an offsite location. This assumes the database software is setup correctly, cutting transaction logs accordingly, etc. You’d need real time type bandwidth for real time replication. That will handle a replicated copy of the data. You’d then need backup software for long term retention. Hopefully the client knows this is going to be a costly solution to implement.