They are a fundamental of enterprise data protection, but are often seen as interchangeable, or used incorrectly.
In this article, we’ll define backup and snapshots and look at how you can use them to best practice with each other.
Backups essential, and rule for longer RPOs
To sum up, backups are copies, come with an overhead, and are taken relatively infrequently, generally once a day. They provide a copy of the selected datasets from which customers can recover data to varying degrees of granularity.
It is quite possible for backups to be built from what are effectively snapshots – ie, incrementally constructed record of deltas (see below for more detail) – but what results are compressed and/or deduplicated copies that are retained for potentially lengthy periods of time.
Backups take time to run, involve quite some processing overhead and are therefore taken outside the busiest production hours. But they are kept for months and even years, and provide the ability to recover files that may subsequently have been deleted, corrupted, or simply need to be re-accessed.
They are the gold standard, copper-bottomed means of protecting data in the enterprise. But you could say backups potentially fall down on short recovery point objectives (RPOs) compared with snapshots, hence the need for the two technologies to work in tandem.
Snapshots for short RPOs, but delete them often
Snapshots are taken more frequently – every 30 or 60 minutes, for example – and barely intrude on production processes. They give the ability to rapidly roll back to previous versions of a file at numerous points in time.
Snapshots are not copies. Fundamentally, they are a record of changes in state in the blocks and files in a unit of storage (file, volume, drive, etc). Often, snapshots are a feature of NAS or SAN storage products and are held on that storage. That means they take up what could be relatively expensive capacity and if there is an outage on that storage, you lose access to recent snapshot copies too.
Snapshots build on an original or parent copy, with each one showing which blocks and/or files existed and where, at the time it was taken. When rolling back to previous versions, a copy of the unit of storage is changed to a state that reflects the snapshot, by adding, removing and moving blocks, etc.
So, snapshots are not copies, although you can create copies from them. They don’t take up much space individually, but their total volume can grow, and that brings a processing overhead as they are rebuilt, so it is good practice to limit the amount of snapshots that are retained.
Best practice is not to keep snapshots from further back than your most recent full backups. That way, you should have access to any data created or changed since then, and you get RPO to more recent versions than is possible with backups. Also, you won’t end up with a tottering tower of snapshots that would be complex and CPU-hungry to rebuild.
To sum up, backups provide the ability to restore on long RPOs and often quickly and in a granular fashion, down to file level.
Snapshots allow fast roll-back to (more recent) previous points in time than from backups.