New Features in ONTAP 9.6

ONTAP 9.6RC1 is out now, which is no surprise to those who follow the new ONTAP release cadence at NetApp. For several years, we’ve been releasing two versions a year, in the fall and spring. Typically a long-term support (LTS) release comes out in the fall but this year, the model has changed a bit. Going forward, every release will be a long-term supported release with 3 years of full support, 2 years limited support, and 3 years of self-service support. In case you were holding back on the spring release of ONTAP out of concern for it not being a LTS release, go ahead and upgrade! It will be a great experience with a simple automated upgrade, like any ONTAP upgrade.


The primary theme of ONTAP 9.6 is simplicity. I’ve talked to many customers and partners who will happily reduce the tunability of a product in exchange for a simpler user experience. With ONTAP 9.6, there are a number of improvements that will deliver a simpler experience for administrators.

The first of these features is a excellent out-of-box experience that reduces the setup to 5 simple steps. After the initial setup, quick provisioning workflows and guided LUN placement allow you to get your applications configured faster. Configuration of replication continues to be simplified, so you can ensure all your data is protected and available. Finally, upgrade has been simplified to allow faster and more convenient upgrade from a laptop.

The management ecosystem has changed as well. Now OnCommand System Manager is ONTAP System Manager. This is where you’ll go to manage a single ONTAP cluster. The look and feel has been updated to be more intuitive and provide simpler workflows. You’ll see some of these improvements the first time you log in. In the background, System Manager is using REST APIs to deliver a simpler management experience


AFF A320 and NVMe

Along with ONTAP 9.6, NetApp is releasing a new controller, the AFF A320, with onboard 100GbE ports for high performance connectivity. The AFF A320 will support the new NS224 NVMe expansion shelf. This combination of high performance and low latency will be a great fit for artificial intelligence and deep learning workloads.

StorageReview NetApp NS224 NVMe SSD Storage Shelf.png

Aggregate encryption

ONTAP supports a couple different types of encryption. Encryption through NSE drives means that everything on the cluster is encrypted. This feature is a great fit for secure environments but sometimes you only want to encrypt some of the data on the cluster. Volume level encryption allows you to encrypt data for an individual volume but could be tedious to maintain for many volumes. Aggregate level encryption fills the gap between these two, providing a simpler experience without needing to purchase special hardware or encrypt all the data on a cluster.

New MetroCluster Support

As a past MetroCluster engineer, I always like to keep track of the new things the team is doing. MetroCluster has been supported with an IP backend for over a year now. The team has slowly increased the distance between the sites, now allowing up to 700km of distance. New in ONTAP 9.6 is support for smaller systems, the AFF A220 and the FAS2750.


Find VM Name by IP Address

What do you do when you’ve got a couple hundred VMs in vCenter and you need to find one? If you know its IP address, you can use this nice little PowerCLI snippet. It helped me solve a problem for one of my coworkers.

PS /Users/amkirk> Get-View -ViewType VirtualMachine | ?{ ($_.Guest.Net | %{ $_.IpAddress }) -contains "" } | select Name



Cisco Live NOC and PowerCLI

I wrote in a previous post about how Cisco Live runs on NetApp Storage. I’ll write a few posts describing some of the automation we use to get ready for the show and some ways that we monitor the hardware during the show. This is the first of those posts.

One of the things we need to do to prepare for Cisco Live US is move the data from a FAS8040 to the AFF8060 that we use during the show. NetApp SnapMirror makes the data easy to move but we also need to rediscover all of the VMs from the replicated data. This turns out to be a long process if you need to manually click on all of the .vmx files from the replicated data. To speed this process along, I used a VMware PowerCLI script. We’ve got four volumes but I’ll demonstrate the process for one of them, CLUS-A-01.

First, all of the VMs need to powered down. We’re going to have to remove all of them so they need to be powered down anyway. Once they’re powered down, check to see if there are any that are connected to the datastore from the CD Drive. Disconnect them if there are.
get-vm -datastore CLUS-A-01 | stop-vm
get-vm | Get-CDDrive | select @{N="VM";E="Parent"},HostDevice,IsoPath | where {$_.IsoPath -ne $null}
get-vm -Name clnoc-wifi-checker3 | get-cddrive | Set-CDDrive -NoMedia -Confirm:$false
Next I’ll need to take the inventory of all the VMs we’re going to move so that we can make sure they show up later. The VM name has often been changed and has nothing to do with the VMX name so I’ll pull both of those and dump them to a file as a reference. I’ll also dump the VMX paths into an object that I can import and use later.
get-vm -datastore CLUS-A-01 | select Name,@{E={$_.ExtensionData.Config.Files.VmPathName};L="VM Path"} | set-content /CLUS-A-01.txt
get-vm -datastore CLUS-A-01 | select @{E={$_.ExtensionData.Config.Files.VmPathName};L="VM Path"} | export-clixml /CLUS-A-01.xml
Now all VMs can be removed that are on the target datastore.
get-vm -Datastore CLUS-A-01 | where {$_.PowerState -eq "PoweredOff"} | remove-vm -Confirm:$false
I don’t have good steps for this part. I’ll come back in the future and add the commands but the SnapMirror needs one final update, then I need to break it. This makes the destination volume read/write. After the volume is accessible, mount it from the NetApp controller. Now, back to the PowerCLI…

Add the datastore in vCenter from it’s new location.
Get-VMHost -Location CLUS | new-datastore -Name CLUS-A-01 -Path /CLUS_A_1 -NFS -NfsHost
Now that the volume is accessible, I just need to grab the VMX paths that I dumped to a file earlier, loop through the VMX paths and add them back to vCenter.
import-clixml /CLUS-A-01.xml | foreach-object {
$vmhost = $vmhosts | get-random -count 1
New-Vm -RunAsync:$true -VMFilePath $_ -VMHost $vmhost<span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>

There you have it! We actually combined all of these commands into a little script that handles everything for us. This makes a easy way to get the FlexPod ready for Cisco Live!


Inside the NOC at Cisco Live

Several times a year, Cisco Live offers thousands of attendees the opportunity to learn about new and exciting technologies, network with a lot of smart folks, and have a blast while doing it. For me, Cisco Live offers an exciting opportunity as well. As a Technical Marketing Engineer on the Converged Infrastructure team at NetApp, I get the opportunity to create a lot of data center designs in a year but I don’t typically get to do the day-to-day support of the FlexPod for thousands of people. Cisco Live gives me the opportunity to do that and talk to attendees about the experience throughout the week as a member of the Network Operations Center (NOC) team. NetApp has been the official storage provider of the NOC for several years now, ever since the decision to collaborate on the infrastructure and run Cisco Live on a FlexPod.


The Cisco Live NOC team provides a vital role at the conference. We are a service provider for all of the Cisco employees, vendors, and attendees – ensuring everyone receives a reliable internet connection with great performance. We deploy a staggering amount of hardware during the week before the show, what we refer to as the setup week. Before we can get to the thousands of access points and switches that need to be deployed, we need to get our FlexPod up and running. For almost 5 years now, the data center at the core of Cisco Live has been a FlexPod Datacenter – a Cisco and NetApp converged infrastructure that combines best practices and industry leading support. The FlexPod Datacenter is where we run all of the applications required to configure, monitor, and maintain the network. These applications include video surveillance, WAN optimization, wireless network management, a lot of custom applications, just to name a few.

This summer at Cisco Live US in Orlando, we’re exciting to once again be running Cisco Live on a FlexPod containing 2 Cisco UCS Blade chassis, 2 Cisco Nexus 7Ks, and 4 NetApp AFF 8060s in a MetroCluster configuration. We designed this infrastructure with a few considerations in mind.

The primary design consideration for our infrastructure is business continuity. During the setup week, there are a lot of things going on. With hundreds of people on site tearing down from past conferences while also setting up for Cisco Live, there is plenty of opportunity for accidents. At Cisco Live Europe in Barcelona, an electrician pulled the power on one of our data centers, thinking it was used for the hairstyling convention that had just ended. It’s very important that throughout any issues we may encounter with any part of the infrastructure – even an entire data center – we continue to serve data and run the applications. For that, we turned to NetApp MetroCluster – a solution which synchronously mirrors your data between two data centers and has enough redundancy built in that you could lose a full data center, failover and continue serving data. With the MetroCluster as our storage solution, we are able to failover to the surviving data center with people cut the breaker to our data center, continue serving data, and switch control back once we have regained power.


In addition to business continuity, the flexibility and performance of the infrastructure is very important. Because of the fast-moving environment at Cisco Live, we often don’t have good requirements until right before the show. Because of this, we need a data center infrastructure that is flexible enough to handle all kinds of different workloads and protocols. Regardless of the chosen data center design, it needs to perform well. All Flash FAS is perfect as a platform that provides all the features we need combined with great performance. For example, at Cisco Live Barcelona this year, we planned on implementing Fibre Channel through the Cisco Nexus 5548. This required 2 Fibre Channel ISLs between the sites in addition to the Ethernet ISLs which we had for NFS traffic. At the last minute, the venue communicated that there were not enough links between the data centers and the plans would need to be changed. All Flash FAS made this an easy decision. It cost us just a small amount of time to convert our SAN boot infrastructure from Fibre Channel to iSCSI. With some storage controllers, this flexibility isn’t available. Regardless of any design we’ve chosen for Cisco Live, the All Flash FAS has been able to consistently respond to IOs with sub-millisecond latency. The NOC team has found that the controllers are capable of great performance with any workload that we choose.

One great thing about the NOC at Cisco Live US is that you can see all the infrastructure being used by stopping by the NOC booth in The Hub. We’d love for any attendees at CLUS in Orlando to swing by and talk about the infrastructure and any other NOC related things you’re interested in!

New Features in ONTAP 9.4

While NetApp first created ONTAP over 25 years ago, innovations are still being added today with ONTAP 9.4. ONTAP 9.4 brings NVMe, 100GbE, 30TB SSDs, and enhancements to several recently released features.

Fabric Pool

Fabric Pool was first released in ONTAP 9.2 as a feature that allows you to tier data off to cheaper object storage. Originally, Amazon S3 and StorageGRID were the available tiers – Azure Blob Storage was added as a tier on 9.3.

Fabric Pool works by running two processes in ONTAP to move ‘cold’ blocks of data to the cloud. The first is a temperature scanner which is constantly evaluating the ‘temperature’ of a block. Active blocks are ‘hot’ and blocks that haven’t been used in a while are ‘cold’. The second process finds cold blocks and moves them to the object storage tier if the aggregate containing the cold blocks is over 50% full.

Previously, ONTAP had two policies for Fabric Pool. One that moved backup data and another that moved blocks only used by Snapshots. A new policy has been added in ONTAP 9.4 that will move any cold block in the volume to the object storage. This new policy also allows the user the specify the time that it takes for a block to become eligible to move to object storage. This information is also reported back to the storage administrator through the CLI and ONTAP System Manager.

NVE Secure Purge

NetApp Volume Encryption Secure Purge is important for any enterprise looking to abide by the new GDPR standards. The goal with secure purge is that the deleted data cannot be recovered from the physical media at a later point in time. To do this, ONTAP will remove any data from the filesystem which contains remnants of the deleted files. After this, it will re-encrypt the data which is leftover with new keys. This ensures that the data cannot be recovered.



NVMe deserves it’s own post in the future but I’ll give a quick overview of the capabilities of NVMe in ONTAP 9.4 here.

With ONTAP 9.4 and the AFF A800, NetApp is first to the market with end-to-end NVMe. It  includes NVMe drives in the AFF A800, NVMe over fabrics with Brocade Gen-6 Fibre Channel switches, and frontend host connectivity. FC-NVMe can be be used to deliver lower latency,  more bandwidth, and more IOPS. It can also be implemented with a non-disruptive upgrade to existing AFF A-Series controllers including the A300, A700, and A700s.

For more information about ONTAP 9.4 and other things NetApp has going on, head over to the NetApp Blog


Reclaim FC Datastore Space

Reclaim unused space on Thin Provisioned NetApp LUN

Something that’s annoying when you’re implementing thin provisioning for your Fibre Channel LUNs is that when you delete or move VMs from the LUN, the freed up space is not seen on the NetApp storage controller.

You can see this problem here where I’ve deleted files from the datastore so that VMware sees plenty of free space but NetApp still sees a 70% full LUN. Space has become available on the VMFS filesystem but the NetApp storage controller doesn’t recognize it because we don’t know what’s going on inside that filesystem.

There’s an easy way around this that could be handy if you need that extra space in NetApp ONTAP. Make sure that space-allocation is enabled on your lun before you try this. If it isn’t enabled – by default it will be disabled in ONTAP – you will see that the SCSI UNMAP is not supported in ESXi.

esxcli storage core device vaai status get
ATS Status: supported
Clone Status: supported
Zero Status: supported
Delete Status: unsupported

You can follow these steps to enable them. Unfortunately it requires you to offline the LUN to reflect the changes so you’ll obviously want to move any VMs away from the LUN.

  1. Offline the LUN
    lun offline -vserver Infra-SVM -path /vol/workload2/lun1
  2. Modify the LUN to ensure space-allocation is enabled
    lun modify -vserver Infra-SVM -path /vol/workload2/lun1 -space-allocation enabled
  3. Online the LUN
    lun online -vserver Infra-SVM -path /vol/workload2/lun1

Now you can see that Delete Status is supported; we can continue on to free up some space.

esxcli storage core device vaai status get
ATS Status: supported
Clone Status: supported
Zero Status: supported
Delete Status: supported
esxcli storage vmfs unmap -l Test

After this you can see that my LUN in ONTAP’s System Manager reflects the correct size used!

Just a warning… You will probably take a performance hit in your vSphere cluster when you run the unmap command. Keep that in mind and run it during off peak hours.


Clear Cisco MDS Zoning

I’m getting the NetApp MetroCluster setup for Cisco Live EMEA in 2017. One of the things I needed to do was clear the zoning on the Fibre Channel switches and create new zones. The current set of zones was created for the ATTO Fibrebridge 6500N and I’m swapping those out for the newer ATTO Fibrebridge 7500N. The zoning needs to be changed to account for the two Fibre Channel target ports per fibrebridge on the 7500N. Here are the steps to remove zoning from Cisco MDS switches if you need to.

I’ve got two active zones, an FCVI zone on VSAN 10 and a storage zone on VSAN 20. The first step is to deactivate those:

no zoneset activate name FAB_1_FCVI_ZONESET_VSAN_10 vsan 10
no zoneset activate name FAB_1_STOR_ZONESET_VSAN_20 vsan 20

Next delete the zonesets:

no zoneset name FAB_1_FCVI_ZONESET_VSAN_10 vsan 10
no zoneset name FAB_1_STOR_ZONESET_VSAN_20 vsan 20

Now clear the zone database, copy the running configuration to the startup configuration and reload the switch:

clear zone database vsan 10
clear zone database vsan 20
copy run start

Overall, pretty simple. You should probably do a quick backup of the switch config file before you do it just in case you want the zoning back. I didn’t  ¯\_(ツ)_/¯ . But it’s probably a good idea.