In this NetApp training tutorial, we will dicuss the NetApp SnapMirror Engine. Scroll down for the video and also text tutorial.
NetApp SnapMirror Engine Video Tutorial
I have never seen any training course in Netapp the way Neil conducted this and the way he breaks down every concept is like watching a movie…I am feeling so motivated and pumped up to retain all this knowledge…thank you!
The NetApp SnapMirror Engine
The SnapMirror engine is used to replicate data from a source volume to a destination volume. It's used for these features:
- Load Sharing Mirrors
- Data Protection Mirrors
I'll be talking about each of those different features in this post.
With Clustered ONTAP, the volume is the unit of replication in the SnapMirror engine. In 7-Mode you could replicate at either the Qtree or the volume level, but Cluster Mode only does it at the volume level.
When you run a replication using the SnapMirror engine, the source volume will be a read-write copy and the destination will be a read-only copy. This is to ensure that you have a single, consistent copy of the data. If you were able to write to both locations, the two wouldn't be the same.
The initial replication from the source to destination volume is a complete baseline transfer. It copies all of the data from the source to the destination volume. Once that has completed, all of the following replications are incremental only.
The system uses source volume Snapshot copies to update the destination volumes. Updates can be performed manually (on demand) or, more typically, they can run automatically based on a configured schedule.
The mirror copies are updated asynchronously. The minimum time between replications is one minute. You can get synchronous replication in Clustered ONTAP by using MetroCluster, but SnapMirror is asynchronous.
Synchronous vs Asynchronous Replication
The difference between synchronous and asynchronous replication is shown in the diagram below.
We have synchronous replication at the top. A client sends in a write request and it gets written to the source cluster. This change then gets replicated over to the destination cluster. It's only when the destination cluster has acknowledged the change back to the source that the final acknowledgement is sent back to the client.
With asynchronous replication on the other hand, the client sends in the write request and the source storage system immediately sends an acknowledgement back. The source will then replicate the data to the final destination storage system when it is next scheduled to do so, and another acknowledgement will come back from there.
Recovery Point Objective (RPO)
Your Recovery Point Objective (RPO) is the maximum amount of data you could lose if you had to fail over from your main site to a Disaster Recovery site.
If you're using synchronous replication, your recovery point objective (RPO) is going to basically be zero. If you have to failover from your primary site to your Disaster Recovery site you're not going to lose any data.
If you're using asynchronous replication and you have a failure of the primary site, the amount of data you lose when you failover to the secondary site will depend on the amount of data that had been written to the primary since the last replication. Let’s say your system was scheduled to replicate once every ten minutes. The worst case scenario would be that you have a failure just before you were about to do the next replication. You could lose up to ten minutes' worth of writes that happened at the primary site since the last scheduled replication, so your RPO would be 10 minutes.
You’re probably thinking "well then it that case I’ll always use synchronous replication", but there’s a trade-off. This is because the acknowledgement with synchronous replication isn’t sent back to the client until it's been written to both sites. Unless there’s minimal delay between the two sites, this can break applications on the client if its acknowledgement doesn't get back to it in time. For synchronous replication, you will typically need a really fast network between the two sites. That can be prohibitively expensive, in which case asynchronous replication would be a better solution for you.
How Replication Works
Let’s look at how replication works with SnapMirror. To begin with, the initial baseline transfer is completed. Then, a snapshot copy of all data on the source volume is created. This is then transferred to the destination volume. After that, you can run manual and/or scheduled updates. Typically you'll have scheduled updates. You can just run manual ones if you prefer, or you can do a combination of both.
When you do an update, a new snapshot copy of the source is taken. The current SnapMirror snapshot copy is compared with the previous SnapMirror snapshot copy, and then only changes are synchronized from the source to the destination incrementally. If any files have changed, it doesn’t replicate the entire file. Replication is done at the block level, so it's only the changed blocks that get sent across. It's very efficient after the initial baseline transfer. Whenever you do a new replication, only incremental changes at the block level get replicated across.
Storage Efficiency and Volume Moves
The SnapMirror engine is compatible with storage efficiency and volume moves. Deduplication and compression savings are replicated from the source to the destination volume. The data is not inflated or decompressed for the transfer. If you've got storage efficiency turned on at the source site, the space savings are going to be automatically replicated to the destination site as well. This also helps save on your network bandwidth and the amount of time it takes to do the transfer.
SnapMirror source or destination volumes can be moved to another aggregate in the cluster without breaking the SnapMirror relationships. If you want to move a volume because your aggregate is getting too full or you want to move it to higher or lower performance disks, you can. Any SnapMirror relationships will be retained even after you perform the move.
NetApp Load Sharing Mirrors
Let's now talk about the three different features that use the SnapMirror engine. They are Load Sharing (LS) mirrors, Data Protection (DP) mirrors, and SnapVault. I'll cover Load Sharing mirrors first. This will be a quick overview because they’re going to be covered in more detail in a later post.
Load Sharing mirrors are mirror copies of FlexVol volumes which provide redundancy and load balancing. They provide load balancing for read traffic only, not for write. Write requests always go to the one source volume so that you have that single, consistent copy of the data.
Load Sharing mirror destination volumes are always in the same cluster as the source volume. This is unlike DP mirrors and SnapVault. Data Protection Mirrors and SnapVault can perform their function either within the same cluster or to a different cluster, whereas Load Sharing mirrors only work within the same cluster.
Load Sharing mirrors are automatically mounted into the namespace with the same path as the source volume, and they provide redundancy for read access with no administrator intervention required. If the source volume goes down, clients can still get read-only access the data without requiring any action from the administrator. (Getting write access back if the original source volume cannot be recovered does require the administrator to manually promote one of the mirror copies to be the new source).
NetApp Data Protection Mirrors
Next we have Data Protection mirrors (DP mirrors). With DP mirrors you can replicate a source volume to a destination volume in the same or in a different cluster. Most often it'll be a different cluster. They can be used for the following reasons:
- Replicate data between volumes in different clusters for Disaster Recovery. In this scenario you have a main primary site and a Disaster Recovery site. The way that you replicate data from the main site to the DR site is using SnapMirror DP mirrors. With DP mirrors, intervention is required to failover to the DR site. This differs from our Load Sharing mirrors. This will require either manual intervention, or you can configure software to do this.
- Provide load balancing for read access across different sites. We always have only one writable copy of the data to keep it consistent, but if we have a read only data set which is accessed by clients in multiple locations, we can keep separate mirrored copies of the data close to the clients to provide load balancing and better performance.
- Data migration between clusters or SVMs. This is for when you want to move some data from cluster A to cluster B or between SVMs in the same cluster. If you want to move data within the same SVM and you're moving the entire volume, you can use a volume copy. Volume copy only works within the same SVM, so if you want to move data between different SVMs you would use SnapMirror DP mirrors.
- Replicate data to a single centralised tape backup location. If you have clusters in multiple locations this removes the need to backup each one locally. (SnapVault can also do this, but uses disk to disk rather than disk to tape backups).
When we talk about ‘SnapMirror’ in general, we're talking about DP mirrors. Load Sharing mirrors, Data Protection mirrors and SnapVault all use the SnapMirror engine but if we refer to ‘SnapMirror’, we’re usually talking about DP mirrors.
As mentioned earlier (and unlike Load Sharing mirrors), DP mirror copies are not automatically mounted into the namespace and implicitly accessed by clients. DP mirror copies can be mounted through a junction into the namespace by the administrator if they need to be accessed at the destination.
Data Protection mirrors and SnapVault are both licensed features but Load Sharing mirrors are not. You don't need a license to implement Load Sharing mirrors.
The main functionality of DP mirrors is as a Disaster Recovery solution. If your primary site is lost, you can make the destination volumes writable in your DR site. This requires a failover which breaks the mirror relationship.
A FlexClone copy can also be taken on the destination without doing a failover. This will give us a separate writable copy without disrupting SnapMirror operations.
Our DP mirrors maintain two snapshots on the destination volume. When a replication occurs, the oldest snapshot is deleted. When it’s time for the next replication, the new snapshot is compared to the previous one to determine the incremental changes to replicate across.
DP mirrors keep the source and destination volumes in the same state with some lag as determined by your replication schedule. The source and destination are kept in sync with each other. If data is corrupted in the source volume it will therefore be corrupted in the destination volume as well. DP mirrors provide Disaster Recovery, but they don't provide long term backups. The feature won’t help if you developed a problem in the main site, it then replicated across to your DR site, and you don't have backups going further back in time. It's a Disaster Recovery solution, not a backup solution.
That leads us on to the other feature which uses the SnapMirror engine, which is SnapVault. SnapVault is ONTAP's long term disk-to-disk backup solution. It has the same functionality as traditional tape backups but is much faster, more convenient, and requires less storage space.
Data is replicated from the source volume to a destination volume on a centralized backup cluster. Like our Data Protection mirrors, SnapVault is also a licensed feature. SnapMirror DP mirrors and SnapVault have separate licenses.
Unlike DP mirrors, SnapVault can retain multiple snapshots as backups over a long time period. Data can be restored to the original source volume or to a different volume.
The destination volume cannot be made writable so it's a backup, not a DR solution. Data Protection mirrors are our Disaster Recovery solution, SnapVault is our long term disk-to-disk backup solution. Separate FlexClone copies can, however, be made of the snapshot backups at the destination site to get a separate writable copy of the data there.
LS Mirror, DP Mirror and SnapVault Summary
Let's summarize the similarities and differences between the features that use the SnapMirror engine and where we would use each one.
Load Sharing mirrors, Data Protection mirrors, and SnapVault all use the SnapMirror engine to replicate data between volumes. They all do their replication in the same way.
Load Sharing mirrors are used for redundancy and load balancing within the same cluster for read-only volumes. No license is required.
The main function of DP mirrors is as a Disaster Recovery solution between clusters. They can also be used to move data between SVMs or clusters.
SnapVault is our long term disk-to-disk backup solution.
DP Mirror and SnapVault Compatibility
With regard to hardware compatibility, the source and destination nodes can be different models of controller. Also, you can use inexpensive SATA drives on the destination volume while your main site can still be running SSD or SAS disks. When you're replicating across to a SnapMirror Disaster Recovery location or to a SnapVault backup location, you can use cheaper drives in the remote location to achieve cost savings.
SnapMirror became available in Clustered ONTAP version 8.1, and SnapVault became available in version 8.2. At that point, the version of ONTAP that was running on the destination volume had to be the same or a later version than the one running on the source volume.
In ONTAP 8.3, a version flexible SnapMirror became available which breaks that limitation and allows different versions to be run on both the source and destination. This will become more useful as further versions are released.
ONTAP 7-Mode volumes cannot be used in Clustered ONTAP systems, and vice versa. However, if you are migrating from 7-Mode to Cluster Mode, you can use the 7-Mode Transition Tool (7MTT) which uses the SnapMirror engine to migrate your volumes across. That's the one exception where you can actually use SnapMirror to replicate from 7-Mode to Cluster Mode. You can't have normal SnapMirror relationships between 7-Mode and Cluster Mode systems.
Text by Alex Papas.
Alex Papas has been working with Data Center technologies for the last 20 years. His first job was in local government, since then he has worked in areas such as the Building sector, Finance, Education and IT Consulting. Currently he is the Network Lead for Costa, one of the largest agricultural companies in Australia. The project he’s working on right now involves upgrading a VMware environment running on NetApp storage with migration to a hybrid cloud DR solution. When he’s not knee deep in technology you can find Alex performing with his band 2am