Each dataset has a compression property, which defaults to off.
Compression can be set per dataset or per volume. If not explicitly set, it inherits the behavior of the parent.
Compression occurs transparently, as data is written to disk. ZFS makes no effort to go back and do anything to data already written.
This means that if you create a pool with no compression, then create some datasets and start populating them, then realize that compression may be advantageous, so set that on the pool (where it is inherited to the datasets), the data stored in the datasets is still stored uncompressed. To gain compression for the already-written data, it will have to be read and re-written to the datasets. Rsync is probably the best tool of choice to perform this action.
When To Use ZFS Compression
on (= LZJB)
LZ4 - Added in ZFS pool version 5000 (feature flags), LZ4 is now the recommended compression algorithm.
LZ4 compresses approximately 50% faster than LZJB when operating on compressible data, and is over three times faster when operating on uncompressible data. LZ4 also decompresses approximately 80% faster than LZJB. On modern CPUs, LZ4 can often compress at over 500 MB
/s, and decompress at over 1.5 GB
/s (per single CPU core).
LZJB - Offers good compression with less CPU overhead compared to GZIP.
ZLE - Zero Length Encoding is a special compression algorithm that only compresses continuous runs of zeros. This compression algorithm is only useful when the dataset contains large blocks of zeros.
GZIP - A popular stream compression algorithm available in ZFS. One of the main advantages of using GZIP is its configurable level of compression. When setting the compress property, the administrator can choose the level of compression, ranging from gzip1, the lowest level of compression, to gzip9, the highest level of compression. This gives the administrator control over how much CPU time to trade for saved disk space.