SnapManager for SQL Sizing Case StudyPosted: July 10, 2013 Filed under: NetApp, SQL Server | Tags: netapp smsql, netapp smsql sizing, smsql, smsql sizing, snapmanager, snapmanager for sql, sql sizing 2 Comments
This SnapManager for SQL case study was conducted for a real world client. Anything in this study that could identify the client has been removed from the article to protect their business. I wanted to take this opportunity to document the procedures and explanations for sizing such an environment.
This particular implementation involved a three node SQL Server 2012 AlwaysOn Availability Group running on Server 2008R2 physical servers. The databases are new and haven’t been populated with data, yet, so the sizing had to take these “known unknowns” into account. SnapManager for SQL 6.0 and SnapDrive for Windows 6.5 were used. The NetApp system includes a FAS3220 in an HA pair running Data ONTAP 8.1.2 7-mode.
Typical best practices were used such as using volume autogrow, letting SnapManager take care of Snapshot deletions, etc. I don’t address thin provisioning, deduplication, or space reservations in this document beyond saying that Fractional Reserve is kept at its default 0% and SnapReserve is changed to 0%. I suggested the LUNs and volumes be thinly provisioned because the client has a trained and dedicated NetApp Administrator on staff with the tools and alerts necessary to manage aggregate capacity properly. The storage deployment is a new, mid-size deployment and capacity is already at a premium. Thin provisioning now, monitoring, and growing or shrinking volumes and LUNs as actual growth is observed was advised so as not to waste space. Deduplication was used on the database volumes and CIFS shares – not the transaction logs, SnapInfo, TempDB, or System Databases.
A logical diagram of a SQL Server replication scheme is show below. There is an OLTP database that is relatively large compared to the many smaller databases that comprise the Data Warehouse (DW). Each Primary Replica will be at Site A hosted, under normal conditions, on separate nodes. These nodes will then synchronously replicate within the same site to the other node. Asynchronous replication will happen across sites to Site B and a third node.
The following sizings are based off the chart below provided by the client and are based on growth through 2016.
|KB per person||85.00||85.00||85.00|
|MB per month||23352.86||27447.92||37298.18|
|GB per year||273.67||321.66||437.09|
OLTP Database LUN Sizing
The OLTP database will start out at 100GB. It will add 274GB the first year, 322GB the second year, and 438GB the third year. This give a total of 1134GB of data over the next three years. Because these are approximations and to err on the side of caution so as not to have a LUN go offline, 20% headroom will be added and the final number rounded up. 1134GB plus 20% equals 1360GB, rounded up to a 1400GB LUN.
OLTP Database Volume Sizing
Given a 1400GB LUN, the volume has to be at least twice that size to take into account the possibility that 100% of the data in the LUN will change and the initial writes are locked in Snapshots. This is an extremely conservative number but it guarantees that the LUN can be completely overwritten and not go offline. The formula used to calculate volume size is
(<LUNsize> x 2) + (<DailyChangeRate> x <DaystoKeepSnaps>)
The daily change rate is calculated based on the final year of sizing, 2016. That year, there will be approximately 38GB of change to the LUN per month. This equates to about 1.3GB of change, per day. It was decided in a design meeting that three days of backups would be kept online. The contracted RPO is 48 hours so this strategy beats the required RPO. The resulting volume size is then
(1400 x 2) + (1.3 x 3) = 2804GB
Since the size of three days’ worth of Snapshots contributes only a tiny fraction of the total size of the volume, I think it’s safe to eliminate it from total size of the volume and round down to a 2800GB volume for the largest database, given conservative estimates in the beginning.
OLTP Database Transaction Log LUN Sizing
Typical transaction log LUN sizing takes into account actual transaction log backup size over a period of time. When actual transaction log growth data is gathered, LUNs can be grown or shrunk accordingly. Because the OLTP database is new and this information is not available, the rule of thumb for initial sizing is to use 20% of the database size. In this case, however, the client expects the OLTP database to follow the pattern of some of their largest databases currently in production, whose transaction logs are all less than 10% of the size of the data LUN. This size can be adjusted after some time of monitoring the size of each transaction log Snapshot. So the initial size of this transaction log LUN will be
2800GB x 0.1 = 280GB
OLTP DB Transaction Log Volume Sizing
To match the backups retained of the data volume, the transaction log volume will also keep three days’ worth of Snapshots. The method to size the transaction log volume is then
<Tlog-LUN-size> + (<TlogSizePerDay> x <NumDays>)
Because this is a new database and the transaction log size cannot be measured quite yet, we’ll simply add 20% headroom to the LUN size to accommodate three days’ worth of Snapshots. Monitoring will need to take place during the initial database load to ensure the transaction log volume will not fill up. This brings the transaction log volume size to
280GB x 1.2 = 336GB
This can be rounded up to 340GB. When the database is in full swing and transaction log growth can be measured for several 24 hour periods, the volume and LUN sizes can be adjusted up or down so as not to waste space or run out of it.
Smaller Database LUN and Volume Sizing
The client also has several other smaller databases that will require SnapManager for SQL backups. Each database has the same backup requirements so it is advisable to place them all on the same LUN in the same volume. Even though several databases will share a single LUN, it’s still possible to restore a single database on that LUN via streaming restore. A separate transaction log LUN in a dedicated volume will be used to ensure up-to-the-minute restores are possible for any of these small databases.
The client has estimated that together, these databases will consume less than 10% of the space of the OLTP database. 10% of the 1400GB OLTP database is a 140GB LUN for the smaller databases. Based on the average growth of the OLTP database of 30%, this 140GB LUN will grow to about 237GB through 2016. This can be rounded up to 240GB.
The volume size must account for the possibility that the entire database is overwritten while the initial writes are locked in Snapshots. This equates to the data volume being at least twice the size of the databases, or at least 480GB. The daily change rate for each database is unknown at this time because they’re new databases. 20% headroom will be added to accommodate three days’ worth of Snapshots. The total size of the volume hosting the LUN for all the smaller databases comes to
480GB x 1.2 = 576GB
Rounded up, this comes to a 580GB volume. Monitoring the total size of Snapshots taken over several 24-hour periods will help determine if the LUN and volume need to grow or shrink.
Smaller Database Transaction Log LUN and Volume Sizing
Since actual transaction log sizes are not available, a LUN 20% of the size of the data LUN will be used. Again, monitoring of transaction log Snapshots over several 24 hour periods should give an indication of whether the LUN needs to grow or shrink. 20% of the data LUN is
580GB x 0.2 = 116GB
This can be rounded up to a 120GB transaction log LUN.
Again, adding 20% headroom for the volume to accommodate three days’ worth of Snapshots, we have
120GB x 1.2 = 144GB
Again, rounded up to 150GB for the volume.
System Database LUN and Volume Sizing
The client’s DBA team determined that a 40GB TempDB LUN and a 5 GB LUN for the System Datbases are sufficient. This is based on their expectations of the new databases. Because these LUNs will not be protected with Snapshots (the System Databases are streaming backups to the SnapInfo LUN), their volumes size will also be 40GB and 5GB. The volumes will have Snapshots turned off, any existing Snapshots deleted, and the SnapReserve set to 0%.
SnapInfo LUN and Volume Sizing
Traditionaly, the SnapInfo LUN serves as the backup location for transaction logs. This requires the SnapInfo LUN to be the same size as the transaction log LUN. Additionally, the SnapInfo LUN holds the stream-based backups of the System Databases as well as SnapManager backup metadata. The metadata is about 1% of the amount of data stored in the instance.
If Availability Groups are used, however, the transaction logs are not backed up to the SnapInfo LUN. Rather, they’re streamed to a CIFS share that all Availability Group (AG) nodes can see. So the capacity of the SnapInfo LUN is greatly reduced only to be added to a CIFS share roughly the same size.
In this case, at least 14GB (1% of 1400GB) of metadata will be used just for the OLTP database. Additional capacity will be needed when more databases are added to the instance. It is estimated that the remaining, smaller databases will be no more than 10% of the OLTP database, so 140GB can be used to approximate their size. Backup metadata will use 1% of 140GB, which is 1.4GB. Rounding up to 2GB for and adding to the existing 14GB requirement for the OLTP database, at least 16GB is required for backup metadata. 4GB is allocated for streaming system database backups, for a total SnapInfo LUN size of 20GB.
Snapshots are taken of the SnapInfo volume, so 20% is added to the size of the SnapInfo LUN to accommodate three days’ worth of backups. This gives us a 24GB SnapInfo volume.
AG CIFS Transaction Log Share
The CIFS share that the AG will share will be the same size as the sum of the transaction log volumes of the AG, 340GB + 150GB = 490GB. This will be rounded up to 500GB. It can also be grown or shrunk based on the transaction log size monitoring. It will also hold three days’ worth of backups.
Because the AG is stretched across sites and between NetApp systems, the CIFS share will be SnapMirrored to the secondary data center. The destination CIFS share will be read-only and accessible for transaction log replay should the primary site fail. If an entire site should fail, the SnapMirror relationship will be broken and the CIFS share will be made writable.
So the final sizing for the OLTP database and several smaller, DW databases in a SQL Server 2012 Availability Group is laid out in the table below. There will be two nodes at Site A and a single node at Site B.
|Function||Volume Size (GB)||LUN Size (GB)|
|Availability Group 1 per SQL Node|
|Other DB Data||580||240|
|Other DB Tlog||150||120|
|Sub-total per Node||3,939||2,105|
|Shared per Availability Group|
|CIFS share for t-logs||500||–|
|Site A Total (2 Nodes)||8,378||4,210|
|Site B Total (1 Node)||4,439||2,105|
Since replication between nodes uses the exact same amount of space for each node, the sizing for one node is tripled for the entire system. Node 1 and Node 2 will each have 3,939GB allocated to their volumes which equals 7,878GB total. Add to this the 500GB CIFS share for transaction log backups and you have a total capacity of 8,378GB at Site A. Of course, through thin provisioning and deduplication, a lot of this space is saved.
Please feel free to leave comments on this study. I’m eager to hear stories from the field.
Hi Mike, this is a great case study. I’ve had the chance to deploy SMSQL on clients, but only on a small scale. I’ve always wondered if there were administrator friendly layouts.
Here’s an example:
You have one application and an SQL database.It’s recommended to have the following:
For a database of 40gb, I still wonder if it’s possible to aggregate certain luns. Deploying the solution is ok for one application, but when you start to have new apps in the works, the lun management becomes unbearable.
What do you think?
I tend to lean towards a global SQL server for an entire site which facilitates any additional databases down the road.
I agree – smaller deployments benefit from less administration on the storage side for SQL LUNs. The only item I wouldn’t necessarily put on a NetApp LUN are the SQL binaries – they work fine on local storage or VMDKs and usually don’t have a requirement to be on remote storage. But that still leaves at least 2 LUNs for every database (per AG node). Remember the SnapInfo LUN can be used to stream t-log backups of multiple databases and can be shared across SQL Instances or unique to each SQL Instance. TempDB is also shared as are the SystemDBs. So again, that leaves 2 LUNs for every database (data and t-log) per AG node plus the supporting LUNs. That adds up quick. Fortunately, NetApp has put out several documents on the subject. Check out TR-4003, updated in June 2013, on the NetApp Support site.
To your question, it is possible to keep several LUNs in a single volume and even several databases on a single LUN. But as with most things in computing, there’s going to be a trade-off.
Because NetApp Snapshots are performed at the volume level, any LUNs that share a volume will have a Snapshot taken at the same time and therefore, must share the same backup schedule. So if you have databases that can share a backup schedule, go ahead and put them in the same volume. This will reduce administration overhead because you’ll have fewer volumes to manage.
On the flip-side, if you need granular control, create separate volumes for each LUN. The catch is that this obviously increases the number of volumes you have to manage.
It’s also ok to keep many *small* databases on a single LUN. An issue quickly arises, though, when performing restores of large databases with this configuration because the restore copies out to the active file system. Large databases can make this process take many hours whereas keeping one database per LUN will allow LUN clone splits for restores. These are much quicker.
Hope this helps.