*New 11.3 series Release:
2019-10-05: XigmaNAS 11.3.0.4.6928 - released, 11.2 series are soon unsupported!

*New 12.0 series Release:
2019-10-05: XigmaNAS 12.0.0.4.6928 - released!

*New 11.2 series Release:
2019-09-23: XigmaNAS 11.2.0.4.6881 - released!

We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

Rsync – copy is bigger than source

Synchronize files & directories to/from NAS4Free with minimal data transfer.
Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
ridesbikes
NewUser
NewUser
Posts: 1
Joined: 25 Apr 2019 17:10
Status: Offline

Rsync – copy is bigger than source

#1

Post by ridesbikes » 25 Apr 2019 17:15

Summary: Primary NAS rsync to secondary pool in same tower, gets bigger.

I have tried to sort this out, and am at a loss on how to proceed. Welcome any advise and suggestions. What I have tried below.

Goal of this: is to have reliable data copies without burning up my storage.

I suspect some extra data is copied (?.zfs, ?hidden) as the third NAS I use is almost the same size and the second (numbers are out a bit, have not recently rsynced the third NAS)

11.2.0.4 - Omnius (revision 6625), Supermicro X10SL7-F, 32GB RAM.

In same Tower:

Primary Pool – mirrored array, snapshots active daily.
NAS_6TB
32% of 11.95TB
Total: 11.95TB | Alloc: 3.85TB | Free: 8.09TB | State: ONLINE

Secondary Pool – Raidz (twice daily copy to this drive)
NAS_4TB
39% of 15.94TB
Total: 15.94TB | Alloc: 6.22TB | Free: 9.71TB | State: ONLINE

On same network, in out building

RemoteNAS
57% of 11.95TB
Total: 11.95TB | Alloc: 6.84TB | Free: 5.11TB | State: ONLINE

First setup
Primary to Secondary pool via rsync cron job, using Siftu’s script – basically emails when job done, either success/failure

$rsync_exec -avP --delete --progress --log-file "$log" "$1" "$2"

Files sent to the / level of the pool.

Second setup
Made datasets, rsync directly into datasets.

Third setup
Added lz4 compression to non-primary datasets.

User avatar
tony1
Moderator
Moderator
Posts: 172
Joined: 14 Jul 2016 19:04
Status: Offline

Re: Rsync – copy is bigger than source

#2

Post by tony1 » 01 May 2019 23:02

maybe look at the "df" or "du" command and see if they are the same?

User avatar
Lee Sharp
Advanced User
Advanced User
Posts: 255
Joined: 13 May 2013 21:12
Contact:
Status: Offline

Re: Rsync – copy is bigger than source

#3

Post by Lee Sharp » 07 May 2019 06:37

I am seeing similar behavior. I have not investigated yet, but I will...

User avatar
JoseMR
Hardware & Software Guru
Hardware & Software Guru
Posts: 1151
Joined: 16 Apr 2014 04:15
Location: PR
Contact:
Status: Offline

Re: Rsync – copy is bigger than source

#4

Post by JoseMR » 15 May 2019 19:05

Had a similar issue time ago wen testing on Webmin, bug report/findings here

What I've found is that some programs being unaware of ZFS snapshots may behave differently, for example "du" will report different sizes from zpools/datasets with .zfs(read-only snapshots location) visible or vise-versa.

And to complicate things even further, "nullfs" filesystems will also be accounted(duplicated) in the "du" final disk usage reports even if the "nullfs" is just a map from the same disk/pool/dataset, but this is expected, and also "rsync" will copy this "nullfs" mounts as a whole if they not excluded from the "rsync" command.

How to reproduce:

Code: Select all

root@nas-mserver: ~# du -h -c /mnt/media/mymedia
...
607G	total
root@nas-mserver: ~#
Now lets try with ZFS snapshot directory visible:

Code: Select all

root@nas-mserver: ~# zfs set snapdir=visible media/mymedia
root@nas-mserver: ~# du -h -c /mnt/media/mymedia
...
18T	total
root@nas-mserver: ~#
As you can see every single snapshot has been accounted to the final usage report, despite they being 0B usage in the ZFS side, "du" does not know that, so lets do the math, my "mymedia" datasets has 607G disk usage, but I have 30 ZFS snapshots, so 607G*30=18210 (18T) just like "du" reported.


Now lets try by adding a "nullfs" mount to the media pool(ZFS snaps hidden):

Code: Select all

root@nas-mserver: ~# mount -t nullfs -o ro /mnt/media/mymedia /mnt/media/nulltest
root@nas-mserver: ~# du -h -c /mnt/media
...
1.2T	total
root@nas-mserver: ~#
As you can see the 607G duplicated here, 607G ZFS usage + 607G from "nullfs" = 1.2T

Now lets try with ZFS snapshots visible + the nullfs:

Code: Select all

root@nas-mserver: ~# zfs set snapdir=visible media/mymedia
root@nas-mserver: ~# du -h -c /mnt/media
...
19T	total
root@nas-mserver: ~#

Yep, that's 607G*30+1.2T=19.T~

This does not account for UnionFS(if any), and/or symlinks in which may produce unwanted disk copy/usage/reporting statistics if command options are not properly configured to workout symbolic links and file hierarchies or to exclude/mask unwanted ones.

To resume this up, ZFS data replication may be the best choice we working with large scale ZFS pools with lots of snapshots, many "nullfs" inside the same pools etc. and ZFS will know what to do by design.

Existing options for this use cases may be zrep also available as Zrep Extension that you can use after familiarizing with, and script/cron your own.

Regards
System: FreeBSD 12 RootOnZFS, MB: Supermicro X8SI6-F, Xeon X3450, 16GB DDR3 ECC RDIMMs.
Addons at GitHub
JoseMRPubServ
Boot Environments Intro

cookiemonster
Advanced User
Advanced User
Posts: 163
Joined: 23 Mar 2014 02:58
Location: UK
Status: Offline

Re: Rsync – copy is bigger than source

#5

Post by cookiemonster » 15 May 2019 22:43

thank you JoseMR for taking the time to explain it so well.
Main: Xigmanas 11.2.0.4 x64-full-RootOnZFS on Supermicro X8DT3. zroot on mirrorred pair of CRUCIAL_CT64M225. Memory: 24GB ECC; 2 Xeon E5645 CPUs; Storage: (HBA) - LSI SAS 9211-4i with 3 SATA x 1 Tb in raidZ1, 1 x 3 Tb SAS drive as single stripe.
Spare1: HP DL580 G5; 128 GB ECC RAM; 4 CPU; 8 x 500 GB disks on H210i
Spare2: HP DL360 G7; 6 GB ECC RAM; 1 Xeon CPU; 5 x 500 GB disks on H210i
Spare3: HP DL380 G7; 24 GB ECC RAM; 2 Xeon E5645 CPUs; 8 x 500 GB disks on IBM M1015 flashed to LSI9211-IT

Post Reply

Return to “RSYNC”