*New 11.4 series Release:
2020-07-03: XigmaNAS 11.4.0.4.7633 - released!

*New 12.1 series Release:
2020-04-17: XigmaNAS 12.1.0.4.7542 - released


We really need "Your" help on XigmaNAS https://translations.launchpad.net/xigmanas translations. Please help today!

Producing and hosting XigmaNAS costs money. Please consider donating for our project so that we can continue to offer you the best.
We need your support! eg: PAYPAL

Duplicate file finder, server-side

XigmaNAS Extensions / Add-ons

Moderator: crest

Forum rules
Set-Up GuideFAQsForum Rules
Post Reply
ku-gew
Advanced User
Advanced User
Posts: 173
Joined: 29 Nov 2012 09:02
Location: Den Haag, The Netherlands
Status: Offline

Duplicate file finder, server-side

#1

Post by ku-gew »

I have on N4F a folder where backups are made, plus some other folders where users made manual partial backups.

I would like to find duplicates and leave only those in the backup folder.

I already used in the past Duplicate File Finder https://www.digitalvolcano.co.uk/ for Windows, and it's just wonderful, but I have a 100 Mbps network and it's slow on shares (but it works fine!).

I'm looking for a server-side solution.

I already found three (apparently) good alternatives:
http://www.pixelbeat.org/fslint/
https://www.hardcoded.net/dupeguru/
https://github.com/adrianlopezroche/fdupes

Since I have an embedded installation, I tried FSlint: it has only shell scripts so I hoped to be able to use it directly. However, I get this error:

Code: Select all

Error: uniq must have --all-repeated=prepend option.   Get the latest version from GNU coreutils or textutils-2.0.21
I guess that having it work on FreeBSD would require some coding.

So, what are my options for finding duplicates server-side?
HP Microserver N40L, 8 GB ECC, 2x 3TB WD Red, 2x 4TB WD Red
XigmaNAS stable branch, always latest version
SMB, rsync

ku-gew
Advanced User
Advanced User
Posts: 173
Joined: 29 Nov 2012 09:02
Location: Den Haag, The Netherlands
Status: Offline

Re: Duplicate file finder, server-side

#2

Post by ku-gew »

I found
http://stackoverflow.com/a/22822755/2131851
and I will sort the output manually, then I will remove non-duplicates and so on.

Better than nothing.
HP Microserver N40L, 8 GB ECC, 2x 3TB WD Red, 2x 4TB WD Red
XigmaNAS stable branch, always latest version
SMB, rsync

ku-gew
Advanced User
Advanced User
Posts: 173
Joined: 29 Nov 2012 09:02
Location: Den Haag, The Netherlands
Status: Offline

Re: Duplicate file finder, server-side

#3

Post by ku-gew »

HP Microserver N40L, 8 GB ECC, 2x 3TB WD Red, 2x 4TB WD Red
XigmaNAS stable branch, always latest version
SMB, rsync

User avatar
MikeMac
Forum Moderator
Forum Moderator
Posts: 444
Joined: 07 Oct 2012 23:12
Location: Moscow, Russia
Contact:
Status: Offline

Re: Duplicate file finder, server-side

#4

Post by MikeMac »

this one does not work

Code: Select all

nas4free tmp/ root~$ ./dupFinder.sh  /mnt/WD2T/log/
usage: tr [-Ccsu] string1 string2
       tr [-Ccu] -d string1
       tr [-Ccu] -s string1
       tr [-Ccu] -ds string1 string2
./dupFinder.sh: line 29: \0: command not found
awk: syntax error at source line 2
 context is
        {if ($1 in used) {if  >>>
 <<<
awk: illegal statement at source line 2
Too young probably ;)
Author: Olaf Marzocchi

First revision: 2017-01-01.
Last revision: 2017-01-02.

ku-gew
Advanced User
Advanced User
Posts: 173
Joined: 29 Nov 2012 09:02
Location: Den Haag, The Netherlands
Status: Offline

Re: Duplicate file finder, server-side

#5

Post by ku-gew »

I used it yesterday on N4F 11 and except for some files that get listed twice (maybe soft links? I don't know) it worked.
Are you also on FreeBSD 11? It was tested only on that.

However, there must be a problem on your script: your error mentions line 29, the script has 25 lines in total.

Get the script directly, I provided a link.
HP Microserver N40L, 8 GB ECC, 2x 3TB WD Red, 2x 4TB WD Red
XigmaNAS stable branch, always latest version
SMB, rsync

frank1982
NewUser
NewUser
Posts: 1
Joined: 20 Jan 2017 10:34
Status: Offline

Re: Duplicate file finder, server-side

#6

Post by frank1982 »

Please install Duplicate Files Deleter as this is the best software available right now.

User avatar
gomario
experienced User
experienced User
Posts: 113
Joined: 17 Dec 2016 08:45
Status: Offline

Re: Duplicate file finder, server-side

#7

Post by gomario »

Isn't that for windows?

ku-gew
Advanced User
Advanced User
Posts: 173
Joined: 29 Nov 2012 09:02
Location: Den Haag, The Netherlands
Status: Offline

Re: Duplicate file finder, server-side

#8

Post by ku-gew »

I used Duplicate File Finder https://www.digitalvolcano.co.uk/ for Windows too but it is bound by ethernet speed, while mine works server-side and can read as fast as the HDD can.

Also, I doubt that "Duplicate Files Deleter" is "the best software available now".
Maybe it is for you. Have you even tried Duplicate File Finder?
HP Microserver N40L, 8 GB ECC, 2x 3TB WD Red, 2x 4TB WD Red
XigmaNAS stable branch, always latest version
SMB, rsync

dat_junk
NewUser
NewUser
Posts: 3
Joined: 16 Mar 2017 10:38
Status: Offline

Re: Duplicate file finder, server-side

#9

Post by dat_junk »

For these purposes I use Duplicate File Finder & Remover. This helps to free up space, and it also helps to avoid confusion when you're looking for a particular file, so you don't have to sort through various versions to pick the right one.This free version seems like it has the potential to be pretty cool, it is fast and accurate for what it actually accomplishes. Maybe this will help you.

jacob2017
NewUser
NewUser
Posts: 2
Joined: 26 Apr 2017 01:46
Status: Offline

Re: Duplicate file finder, server-side

#10

Post by jacob2017 »

Hello Good day, I use a software called Duplicate Files Deleter, it's very easy to use and after it finds the duplicate files it lets you chose what you want to do with them (copy/delete/move). You can even check network files and you can check multiple paths in the same scan. This helps me alot. I hope you too.

User avatar
gomario
experienced User
experienced User
Posts: 113
Joined: 17 Dec 2016 08:45
Status: Offline

Re: Duplicate file finder, server-side

#11

Post by gomario »

People, pleeease don't post in this thread programs which are meant to run on other operating systems! The OP specifically asks for Nas4Free, local solutions. Not for programs which run on networked devices (Android, Windows etc)

User avatar
ernie
Forum Moderator
Forum Moderator
Posts: 1452
Joined: 26 Aug 2012 19:09
Location: France - Val d'Oise
Status: Offline

Re: Duplicate file finder, server-side

#12

Post by ernie »

Hello
I am interested by such tool on the server side (on xigmanas)
@ku-gew : did you find a solution ?
Any tool or script is welcome

Thanks
NAS 1&2:
System: GA-6LXGH(BIOS: R01 04/30/2014) / 16 Go ECC
XigmaNAS 12.1.0.4 - Ingva (revision 7542) embedded
NAS1: Xeon E3 1241@3.5GHz, 4HDD@2To/raidz2 (WD red), 3HDD@300Go/sas/raidz1 (Hitachi), 1SSD cache, Zlog on sas mirror
NAS2: G3220@3GHz, 3HDD@2To/raidz1 (Seagate), 1SSD cache, 1HDD@300Go/UFS
UPS: APC Back-UPS RS 900G
Case : Fractal Design XL R2

Extensions & services:
NAS1: OBI (Plex, BTSync, zrep, rclone, themes), nfs, UPS,
NAS2: OBI (zrep (backup mode), themes)

coatmaker618
Starter
Starter
Posts: 42
Joined: 23 Feb 2014 07:55
Status: Offline

Re: Duplicate file finder, server-side

#13

Post by coatmaker618 »

I have a similar problem, stupid amounts of potentially duplicate files (easily >100,000 files--probably 1-10M, multiple TBs) so even 10Gb would be unreasonably slow...much less the 1Gb I actually have.

Anyway, I actually wrote a bash script that's specifically designed to do this. I've been running it on my XigmaNAS so it definitely runs locally ;)

The general approach is that you give it two folders and it makes a list (recursively) of every file in those folders...then hashes those files and compares the hashes of all the files. So it's still up to the user to interpret the results & delete the appropriate folders (for better or worse).

That said VERY clunky & still requires some handholding...but could be made WAY better by someone who's actually decent, much less good, at XigmaNAS scripting.

If this is the best approach to date I'd be happy to share my script and go from there...but I'd be happy to find an existing approach!

User avatar
ernie
Forum Moderator
Forum Moderator
Posts: 1452
Joined: 26 Aug 2012 19:09
Location: France - Val d'Oise
Status: Offline

Re: Duplicate file finder, server-side

#14

Post by ernie »

I am interested by your script.
Today I manage from a Linux computer with fslint and I select mounted nfs disks.
Last month I saved 150 GBytes afte 3-4 years of data.
Sure I check visually the list of files and I verify on some lines that it is real duplicates.

BR
NAS 1&2:
System: GA-6LXGH(BIOS: R01 04/30/2014) / 16 Go ECC
XigmaNAS 12.1.0.4 - Ingva (revision 7542) embedded
NAS1: Xeon E3 1241@3.5GHz, 4HDD@2To/raidz2 (WD red), 3HDD@300Go/sas/raidz1 (Hitachi), 1SSD cache, Zlog on sas mirror
NAS2: G3220@3GHz, 3HDD@2To/raidz1 (Seagate), 1SSD cache, 1HDD@300Go/UFS
UPS: APC Back-UPS RS 900G
Case : Fractal Design XL R2

Extensions & services:
NAS1: OBI (Plex, BTSync, zrep, rclone, themes), nfs, UPS,
NAS2: OBI (zrep (backup mode), themes)

Post Reply

Return to “Extensions / Add-ons”