SnapMirror: replication.dst.err:errorPosted: May 14, 2013
I ran across an issue today that my various sources of troubleshooting (ok, Google) couldn’t help solve – at least not directly. I configured SnapMirror between two disparate systems for a data migration. 16 of the 17 volumes initialized just fine, but I was getting an error on the one volume that had a LUN inside. It was a SnapDrive for Windows LUN, so I knew that just prior to the final cutover I’d have to take a Snapshot via SnapDrive, but I should be able to start the baseline transfer via the standard CLI. Here’s what I was seeing:
ControllerA> snapmirror initialize -S ControllerZ-vif01:vol_server2008 ControllerA:vol_server2008 Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log. Mon May 13 14:21:26 CDT [ControllerA:replication.dst.err:error]: SnapMirror: destination transfer from ControllerZ-vif01:vol_server2008 to vol_server2008 : process was aborted.
Here are the relevant excerpts from my config files – in short, everything was configured correctly, but the initialization wouldn’t start.
The source controller:
ControllerZ> rdfile /etc/snapmirror.allow 10.1.1.8 ControllerA-VIF01-6
ControllerZ> rdfile /etc/hosts #---used for SnapMirror data migration---# 10.1.1.8 ControllerA-VIF01-6
And the destination controller:
ControllerA> rdfile /etc/snapmirror.conf ControllerZ-vif01:vol_server2008 ControllerA:vol_server2008 - - - - -
ControllerA> rdfile /etc/hosts #---used for SnapMirror data migration---# 10.4.1.3 ControllerZ-vif01
The controllers could also ping each other. I ran a traceroute from from the destination to the source forcing the use of specific replication links like this:
ControllerA> traceroute -s ControllerA-VIF01-6 -v ControllerZ-vif01 traceroute to ControllerZ-vif01 (10.4.0.113) from ControllerA-VIF01-6, 30 hops max, 40 byte packets 1 10.1.1.2 (10.1.1.2) 36 bytes to 10.1.1.8 0.000 ms 1.000 ms 0.000 ms 2 ControllerZ-vif01 (10.4.1.3) 36 bytes to 10.1.1.8 0.000 ms 0.000 ms 0.000 ms
I tried to initialize the baseline transfer like so but received an immediate error.
ControllerA> snapmirror initialize -S ControllerZ-vif01:vol_server2008 ControllerA:vol_server2008 Transfer started. Monitor progress with 'snapmirror status' or the snapmirror log. ControllerA> Mon May 13 14:21:26 CDT [ControllerA:replication.dst.err:error]: SnapMirror: destination transfer from ControllerZ-vif01:vol_server2008 to vol_server2008 : process was aborted.
Everything seemed right to me and the googalizer was coming up empty so I was at a bit of a loss. Finally, Matt Oswalt’s article, while not addressing the error specifically, did mention looking at the source controller’s CLI for errors written to the console. That was the key. Here’s what I found:
ControllerZ> Mon May 13 14:27:25 CDT [ControllerZ: replication.src.err:error]: SnapMirror: source transfer from vol_server2008 to ControllerA:vol_server2008 : cannot create incremental snapshot: No space left on device.
Of course, SnapMirror takes a snapshot before each transfer but apparently, there’s not enough space for that snapshot. Looking at the vol options for the offending volume on the source, I see that fractional reserve is set to 100%. The volume is 30GB and the LUN is 15GB. There’s no snapshot reserve allocated, so I see where the problem is. With fractional reserve set to 100% and the above volume and LUN sizes, there is no space in the volume for anything else but the original LUN and the fractional reserve space, hence the “no space left on device” notice.
ControllerZ> vol options vol_server2008 nosnap=on, nosnapdir=off, minra=off, no_atime_update=off, nvfail=off, ignore_inconsistent=off, snapmirrored=off, create_ucode=on, convert_ucode=on, maxdirsize=20971, schedsnapname=ordinal, fs_size_fixed=off, compression=off, guarantee=none, svo_enable=off, svo_checksum=off, svo_allow_rman=off, svo_reject_errors=off, no_i2p=on, fractional_reserve=100, extent=off, try_first=volume_grow, read_realloc=off, snapshot_clone_dependency=off, nbu_archival_snap=off
So I resized the volume to 50GB, gave it 20% snapshot reserve and looked at the results.
The resize is a bit more than what was actually needed, but the the idea was to make it big enough so as not to get the same error. There were no snapshots being taken so I had no historical reference to know how to size the snapshot reserve. You can see that the data space used from the volume is about 30GB, which includes the 15GB LUN and the 100% fractional reserve.
Restarting the SnapMirror initialization did not produce an error and using snapmirror status showed bits being transferred. Huzzah!