In this NetApp training tutorial, I explain how to perform a NetApp ONTAP 9 upgrade. The information applies whether you’re upgrading from ONTAP 8 or between versions in ONTAP 9. Scroll down for the video and also text tutorial.
- Where to find the documentation you should check before doing the upgrade.
- Manual checks that you should do before the NetApp ONTAP upgrade.
- How to actually perform the upgrade.
- Caveats about whether the process is going to be disruptive or not.
- Firmware upgrades.
You’ll see a live demonstration of an upgrade in Part 2.
NetApp ONTAP 9 Upgrades Part 1: Pre-Upgrade Tasks Video Tutorial
NetApp ONTAP 9 Upgrade Documentation
Doing an upgrade is obviously a fairly major process on your storage system, so you want to make sure that you’ve got everything lined up before you go ahead with it. Check all of the relevant documentation. The first thing you should do is use Upgrade Advisor in Active IQ to generate an Update Plan. The Update Plans comes in either PDF or Excel format and takes you through, step by step, the upgrade process for your particular system.
Manual Checks before ANDU – Automatic Non-Disruptive Upgrade
The recommended way of doing the upgrade is using ANDU, the Automatic Non-Disruptive Upgrade. ANDU uses a wizard in System Manager and most of the process is done automatically for you, so it’s very easy way to perform the NetApp ONTAP upgrade.
When you’re using ANDU there’s some manual checks that you should do first. ANDU does automate most of the process for you, but there are a few things that you need to check manually first. The things that you do need to check are documented in the Upgrade Plan from Upgrade Advisor, and you can also find them in the Express Guide PDF as well.
Review the Release Notes for the new ONTAP version, check there’s not any issues in there which would be relevant to your particular environment.
Confirm the hardware platform you’re using is supported. If you’re using an older hardware platform, it’s possible that there’s a certain version of ONTAP that it will support up to but not the latest version.
Check the switches you’re using (cluster and network switches) are supported and have the correct configuration.
If you’re using SAN protocols on your system, make sure the clients are using an operating system and version of NetApp Host Utilities which is supported on the new NetApp ONTAP version. You might need to upgrade the Host Utilities software on the clients as well as upgrading ONTAP on the NetApp storage system.
Verify you can upgrade to the target NetApp ONTAP version from the current version. You may need to stage the upgrade. For example, if you’re currently at ONTAP version 9.2 and you want to upgrade to version 9.4, you need to upgrade from 9.2 to 9.3 first, and then from 9.3 you can upgrade to 9.4. It’s not possible to go straight from ONTAP 9.2 to ONTAP 9.4.
Verify the cluster does not exceed system limits for your platform. For example, make sure you don’t have too many snapshots.
Verify the CPU and disk utilisation is below 50%. High Availability is used when upgrading to a new version. While one node is rebooting to load the new upgraded ONTAP version, its HA peer temporarily takes ownership of its disk. While one of the nodes is down the other node is taking care of all the storage, so it’s doing double the normal work. Because of that, you don’t want your CPU or disk utilisation to be above 50% which would put too much load on that HA partner.
Suspend any jobs such as SnapMirror or backups until after the upgrade is complete.
Lastly, download the ONTAP software image from the NetApp website and copy it to an HTTP or FTP server, from where it’s going to be downloaded onto the storage system. Starting in ONTAP 9.4, you don’t have to use an external HTTP or FTP server. You can download the software image to your laptop, and browse directly there from the ANDU upgrade wizard.
ANDU Automatic Non-Disruptive Upgrade
As mentioned earlier, the normal way to perform the upgrade is by using ANDU, the Automatic Non-Disruptive Upgrade. There’s three stages as you go through the wizard in System Manager.
The first stage is ‘Select’. At this stage the ONTAP software image is uploaded to the cluster by the administrator and selected. You should have already gone to the NetApp website, downloaded the new ONTAP software image and copied that it to an HTTP or FTP server. Select it in the first stage in the wizard then click Next.
The next stage is the ‘Validate’ stage. At this stage the System Manager wizard automatically validates cluster health to verify that the cluster is ready to be updated. If there’s issues in your cluster, you don’t want to be trying to upgrade it at that time. If the system doesn’t pass the validate stage, the wizard won’t let you move on with the upgrade. The validate stage gives you a full report of what the issues are, tells you the remedial action that you need to take, you then go and fix those issues, and can then come back and try to upgrade again.
The last stage is ‘Upgrade’, where the upgrade is automatically performed. So ANDU is a very simple wizard, just three steps which are basically clicking next, next, next, and then the system will do most of the work for you. You’ll see a demo in the next post.
Rolling vs Batch Upgrades
There’s two type of upgrades that can be performed; either a rolling upgrade or a batch upgrade. With a rolling upgrade, each node is upgraded one at a time. When a node is being upgraded, it is taken offline, upgraded and then rebooted into the new ONTAP version. While that is happening, its HA partner takes over its storage. When it reboots with the new ONTAP version it takes control of its storage back.
The process is then repeated on the partner node, and then each HA pair is upgraded like this one at a time in sequence, until all nodes in the cluster are completed. It takes around about half an hour for each node to be upgraded.
The other option is the batch upgrade. With a batch upgrade, the cluster is split into two batches of multiple HA pairs. Half of the nodes are upgraded at the same time in the first batch, and then their partners, and then that is repeated on the second batch. So it’s actually split into four parts. Half of the first batch is upgraded first, and then the second half of the first batch, then half of the second batch, and then the second half of the second batch.
The batch upgrade is only supported on clusters of eight or more nodes, but when you do have eight or more nodes it means that the upgrade is completed more quickly than if you were doing a rolling upgrade.
Rolling Upgrade Example
Let’s have a look at that with a diagram. The first type was a rolling upgrade where the nodes are upgraded one at a time. In our example diagram below we’ve got an eight-node cluster which is made up of four HA pairs.
We upgrade node one. While that is happening, node two (its HA pair) takes control of its storage.
After node one is done, node two is next. While node two is being upgraded, node one will take care of its storage. When node two is completed, nodes one and two have completed their upgrade to the new ONTAP version and go back to functioning as normal under the new version.
Node three is then upgraded.
And finally, node eight. At this point the cluster upgrade is completed. So with the rolling upgrade, one node is upgraded at a time.
Batch Upgrade Example
Now let’s look at a batch upgrade. In this example we’ve got the same eight cluster, made up of four HA pairs. This time it’s been split into two different batches for the batch upgrade process.
First off, node one and node three are upgraded at the same time.
Obviously we wouldn’t do node one and node two together, because they’re the HA pairs for each other. When node one is being upgraded, we need node two to be online and taking care of node one’s storage. So node one and node three are upgraded first, at the same time.
When nodes one and three have completed upgrading, nodes two and four (their HA partners) in the same batch are done.
Next are nodes five and seven over batch two.
And finally nodes six and eight to complete the cluster upgrade.
With the rolling upgrade, upgrading one node at a time, it would take about four hours to do the upgrade (using an estimate of 30 minutes per node). When using a batch upgrade that time is cut in half. With eight nodes, it would only take about two hours. The benefit of using a batch upgrade is it cuts down the amount of time required to complete.
NetApp ONTAP Manual Upgrade
The standard upgrade method of ANDU is not supported on MetroCluster systems. In that case you have to do a manual upgrade (you can also do manual upgrades on non MetroCluster systems if the mood takes you).
Manual upgrades use a rolling upgrade with the nodes upgraded one at a time, and are performed by the administrator at the command line. The automated validation process done for you by ANDU in the System Manager wizard is not available so you have to validate cluster health manually by running several commands in the CLI prior to the upgrade. This makes the upgrade take a lot longer than when using ANDU.
The entire process including all of the commands that you do have to enter is documented in the Upgrade and Revert/Downgrade Guide.
Single Node Clusters
As you saw earlier, during a typical upgrade one node of an HA pair is done at a time and while it is being upgraded its HA partner takes ownership of its storage. This allows us to have a non-disruptive upgrade. The cluster as a whole and all of its storage is still available to clients while you do the upgrade. There are some caveats to that that I’ll get to in a minute, but that’s how it works as long you’ve got at least two nodes in the cluster.
Now obviously, a single-node cluster is not going to be non-disruptive because it doesn’t have an HA partner that it can fail over to. So if you’ve got a single-node cluster, while that single node is offline while it’s being upgraded, your data is not going to be available to clients.
Disruption Considerations – Stateless Protocols
Let’s consider those caveats about whether the NetApp ONTAP upgrade is going to be disruptive or not. Even though it’s called an Automatic Non-Disruptive Upgrade, there are some situations where it can actually cause some disruption, mostly if you have Windows NAS clients in your environment.
Stateless protocols however do not constantly maintain the state of their connection to the server, so for stateless protocols we’re not likely to have an issue during an upgrade. If there is a temporary break in connectivity between the client and the server, any operation in progress will be completed when connectivity is restored. The session will go down if no communication is possible between client and server for a certain period of time (the timeout period) which can vary between client applications. If you’ve got a temporary break in connectivity which is less than the timeout then there’s no issues. It’s only if that break in connectivity lasts so long that it reaches the timeout period that the connection is going to be torn down. In this case check if it’s possible to make the timeout on the application longer to avoid disruption.
Stateless protocols such as NFSv3, Fibre Channel, and iSCSI are going to be less susceptible to service interruptions during the upgrade than session-oriented protocols such as SMB. There is no disruption to a client using a stateless protocol if the timeout is greater than any disruption period on the cluster, such as the amount of time that an HA giveback takes.
Disruption Considerations – Stateful Protocols
(Note: CIFS and SMB are not exactly the same thing but NetApp group them both together under the term ‘CIFS’ in their documentation so I’ll do the same here.)
Stateful protocols maintain the session between client and server constantly. They do not have a timeout. With stateful protocols, you need to direct users to end their sessions before you do an upgrade of the NetApp system.
CIFS is a stateful protocol, so if services are disrupted, the state information about any operation in progress is lost, and the user must restart the operation. The failover and giveback with the High Availability peer does cause a short outage that’s long enough to cause problems with your CIFS sessions.
If you are using the CIFS protocol and you’re going to do an upgrade, its highly recommended that you do this in a maintenance window when you don’t have any clients connecting into the storage system.
NFSv4 is also a stateful protocol, but it tends to handle short outages a bit better than CIFS. The clients will automatically recover from upgrade connection losses.
NetApp ONTAP System, Disk and Disk Shelf Firmware Upgrades
‘System firmware’ is included on the motherboard and disks. ‘Disk shelf firmware’ is separately configured. The latest system, disk and disk shelf firmware is bundled with the ONTAP upgrade packages, so when you download the new version of ONTAP the file doesn’t just include the operating system, it includes the firmware as well. Upgrading ONTAP also upgrades your firmware non-disruptively at the same time.
You can also upgrade firmware manually in between ONTAP upgrades. You would do this if for example there was a bug on your particular model of disk and a fix for it in a firmware update.
If new disks or shelves are added to the system at any time, their firmware is automatically upgraded to the current version on the storage system.
DQP – Disk Qualification Package
Only approved (‘qualified’) disks are supported in NetApp systems. When a new disk is added it is checked against the system’s Disk Qualification Package (DQP).
Unlike disk firmware, the DQP Disk Qualification Package is not updated as part of an ONTAP upgrade. You should download and install the latest DQP before you add a new drive type or size which is not already on the system, or if you’re going to upgrade to a new version of ONTAP, or if you’re going to update disk firmware. So before you do your ONTAP upgrade, upgrade the DQP first.
It’s very easy to do, you just download the file from the NetApp website then enter a couple of commands at the command line. I’ll show a demonstration in the 3rd post in this series.