One of the great things about NetApp® storage is that all the capabilities to protect your critical data are closely integrated with NetApp® hardware and Data ONTAP. Often, all that’s needed is a license key. You never have to buy a specialized appliance or do a complicated software installation to add functionality, and all our data protection solutions take advantage of built-in data management capabilities.
The fundamentals of NetApp integrated data protection were explored in a previous Tech OnTap article. In this article, I want to dig into some details of our replication technologies. Most of the important elements of NetApp data protection, such as volume SnapMirror®, qtree SnapMirror, SnapVault®, and MetroCluster™, use either mirroring or replication. Understanding how these technologies work and how they differ from each other can make it much easier to choose the best data protection strategy. I begin by discussing the various technologies and then provide some guidance on choosing the right options for your requirements.
NetApp Replication Options
Tech OnTap has published quite a bit on SnapMirror, SnapVault, and MetroCluster over the years. However, there are some key capabilities of these products—and some important distinctions between them—that to my knowledge have never been fully explored in a single article. I start with SnapMirror and then explain the other two in relation to it. (Don’t be too alarmed if the SnapMirror explanation seems overly lengthy. You won’t have to read a similar-length description to understand SnapVault and MetroCluster.) I also include several comparative tables that should help answer any remaining questions you have.
Everyone probably knows that SnapMirror is primarily intended to create mirrors in remote locations for disaster recovery. What’s less well known is that there are actually two SnapMirror operating modes.
Volume SnapMirror operates at the physical-block level. It replicates the contents of an entire volume and all volume attributes verbatim from the source (primary) volume to the target (secondary) volume. As a result, the target storage system must be running a version of Data ONTAP that is the same as or later than that on the source. If deduplication or NetApp data compression (added in Data ONTAP 8.0.1) is running on the primary system, the destination volume inherits those savings, since the volume is identical and the savings are experienced on the WAN as well.
Qtree SnapMirror replicates individual qtrees. Because qtrees are subsets of a volume, qtree SnapMirror operates at a logical level. You can’t simply replicate a qtree verbatim, because some of the necessary volume-level bookkeeping information for the qtree would be missing on the target system.
Because replication is happening at a logical level, there are a few important differences versus volume SnapMirror. First, qtree SnapMirror does not inherit deduplication savings. Again, this makes sense if you think about it in the context of the source and the target. On the source, a qtree can contain a deduplicated block that is just a pointer to a block that lies outside the qtree. That block obviously won’t exist on the target, and therefore the block must be replicated with the qtree rather than just the pointer. In this scenario, qtree SnapMirror is less network and capacity efficient than volume SnapMirror.
By default, qtree SnapMirror replicates only the last created Snapshot copy, and so it maintains an asymmetrical number of Snapshot copies at source and target locations. (Volume SnapMirror by definition has the same Snapshot copies on both source and target.) Qtree SnapMirror maintains only the pair of common Snapshot copies necessary to perform replication updates. In other words, qtree SnapMirror does not have Snapshot retention capabilities.
Both forms of SnapMirror begin with a baseline copy in which all data in the volume or qtree is replicated from source to target. Once the baseline is completed, replication occurs on a regular basis. Volume SnapMirror supports asynchronous, semi-synchronous, and synchronous replication, while qtree SnapMirror supports only asynchronous replication.
In async mode, Snapshot copies of the volume or qtree are created periodically on the source. Only blocks that have changed or have been newly created since the last replication cycle are transferred to the target, making this method very efficient in terms of storage system overhead and network bandwidth.
Sync mode sends updates from the source to the destination as they occur, rather than according to a predetermined schedule. This helps data written on the source system to be protected on the destination even if the entire source system fails. NVLOG and Consistency Point (CP) forwarding are used to keep the target completely up to date. NVLOG forwarding enables data from the write log that is normally cached in NVRAM on a NetApp storage to be synchronized with the target. Consistency Point forwarding enables the on-disk file system images to be kept synchronized.
Semi-sync mode differs from sync mode in two ways. Writes to the source aren’t required to wait for acknowledgement from the target before they are committed and acknowledged, and NVLOG forwarding is not used. These two changes speed up application response with only a very small hit in terms of achievable recovery point objective (RPO).
You can learn more about all of these modes by referring to TR-3446: SnapMirror Async Overview and Best Practices Guide and TR-3326: SnapMirror Sync and SnapMirror Semi-Sync Overview and Design Considerations.
Finally, one of the key things to know about SnapMirror is that both volume and qtree SnapMirror result in targets that can be made writable. In other words, if a failure occurs that affects the source or primary systems, you can fail over operations and start writing to the target. Once the failure has been corrected, you can do a failback resync to copy delta changes back to the source and restore normal operation. This capability is a key differentiator versus SnapVault.
SnapVault is primarily intended for disk-to-disk backup. Like async SnapMirror, SnapVault leverages NetApp Snapshot technology to back up and restore systems at the block level. Similarly, SnapVault identifies and copies only the changed blocks on a system (not changed files) to secondary storage. This not only increases performance by limiting the amount of data transferred during backup and restore operations, it limits the capacity needed to store backups, allowing you to perform backups more frequently if needed.
In terms of its basic operation, SnapVault is very similar to qtree SnapMirror—it performs replication on a logical basis at the qtree level. Like qtree SnapMirror, therefore, it’s not an exact replica of the source volume and doesn’t inherit deduplication or the data compression state from the source. (You can run deduplication and/or data compression on the target as you would with any other NetApp volume.)
In addition, you can’t make a SnapVault volume writable (for immediate recovery) as you can with SnapMirror; as a result, recovery times with SnapVault may be much longer than with SnapMirror if you transfer a lot of data across a network. If you also own SnapMirror, it is possible to make a SnapVault volume writable, but keep in mind that SnapVault is one-directional; it doesn’t have failback resync to bring the source back to currency.
The key weapons in the SnapVault arsenal—because it operates at a logical level—are Snapshot retention and Snapshot coalescing. You can retain as many Snapshot copies as you want (up to the limit of 255 per volume) on a SnapVault volume and expire Snapshot copies automatically according to a schedule that you set. Coalescing allows you to run multiple SnapVault processes from multiple sources to a single target and then create a single Snapshot copy on the target that includes all the different sources. This reduces the number of saved Snapshot copies; if you run deduplication on the target system, you can then deduplicate identical blocks across all qtrees in the backup.
You can learn more about all aspects of SnapVault from the SnapVault Best Practices Guide.
The NetApp solution for continuous data availability is MetroCluster. This solution is an outlier relative to SnapMirror and SnapVault because it works in a very different way, but conceptually it’s very easy to understand. As the name implies, MetroCluster provides “stretch” clustering. It lets you take a standard NetApp HA pair and separate the nodes by up to 100 km. MetroCluster uses a fully mirrored active-active configuration that maintains two complete copies of all mirrored data—one on each side of the cluster. These copies are called plexes and are continually and synchronously updated each time Data ONTAP writes data to disk.
Each controller owns storage volumes (plexes) on both nodes. This not only allows deduplication to occur on both nodes, it allows read operations to be split across both disk sets, which increases read performance by up to 80%. You can read more about MetroCluster in a recent Tech OnTap case study or you can watch a complete video explanation.
Which Option Should I Choose?
Tables 1 and 2 in the previous section are designed to help you choose the best replication option for your particular needs. There are a few considerations that can help you choose from the various technologies discussed above. The first—and most obvious—question to ask yourself is whether what you need is backup or DR.
If what you’re going for is backup, most people find that a regular snapshot schedule on primary storage (hourly is common), possibly combined with a nightly SnapVault copy to secondary storage (either local or remote), meets their backup needs. Most file restores can be satisfied from Snapshot copies on primary storage, while SnapVault provides the ability to reach further back in time, plus the ability to do big restores in the event of more serious failures.
See the sidebar to view a video on NetApp Syncsort Integrated Backup, which combines the benefits of Syncsort data management and NetApp replication for a variety of important application environments.
For protection from site-wide disasters and to enable business continuity, you’ll probably want to choose from either MetroCluster or SnapMirror. By far the most popular alternative in terms of the number of deployments is volume SnapMirror with asynchronous replication. People tend to choose this because it offers simplicity and great economy with efficient use of storage and network resources. NetApp invested a lot of development effort in SnapMirror, creating valuable features such as bandwidth throttling, network compression, and integration with the SnapManager suite of products for application integration.
Both qtree and volume SnapMirror can achieve recovery time objectives (RTOs) ranging from seconds to minutes and recovery point objectives (RPOs) as low as one minute (this requires replicating data every minute), although NetApp does not typically recommend asynchronously replicating every minute. For recovery times between one and three minutes, SnapMirror in semi-sync mode is a better choice. (If you aren’t familiar with RPO and RTO, see the sidebar.)
If you need a more aggressive RPO than async SnapMirror can achieve, you can choose from either MetroCluster or synchronous SnapMirror. Keep in mind that synchronous solutions typically require much greater network bandwidth and specialized network equipment to implement, so this makes them significantly more expensive.
MetroCluster is the preferred solution for distances up to 100 km, since it offers continuous data availability and automatic failover and recovery. SnapMirror Sync doubles the supported range to 200 km, and SnapMirror Semi-Sync can reach further than that should you need the lowest possible RPO over a longer distance.
The approaches I outline above should cover most of the situations out there, but, naturally, there are always corner cases. Some people use SnapMirror for backup, usually because they want the ability to quickly and easily make a backup volume writable should that become necessary. Conversely, others use SnapVault for DR because it lets them recover to any point in time. SnapVault volumes cannot be made writable by SnapVault alone, but, as I mentioned (although I haven’t explained how), this is possible using SnapVault and SnapMirror.
Naturally, many NetApp users implement a combination of the solutions I discuss in this article to cover both backup and DR needs. A fairly common scenario is SnapMirror for critical volumes to a remote site combined with a regular SnapVault schedule at the remote site for backup purposes. Some sites even deploy a combination of MetroCluster, SnapMirror, and SnapVault to address data protection needs.
You can read more about advanced configurations, all the topics I cover in this article, plus topics I didn’t have space for, such as data protection planning, in the NetApp Data Protection Handbook. You can also check out the other resources I mention in this article for more details. NetApp has developed a lot of expertise with all sorts of data protection solutions. You shouldn’t hesitate to go online to the NetApp community or ask questions of your NetApp team if you need help making the right decisions.