A Random Summary of Compressed Linux Backup

Update: if you just want to compress the filesystem before making a dump, use Clonezilla expert mode & part_to_local_part instead. For details see LFS 编译记录 (VirtualBox + Void)#压缩.

TL;DR

Just pipe to zstd -T0 to read from stdin.

Intro

Recently I’ve been busy updating & rebuilding my Internet infrastructure like:

  • Updated Linux From Scratch to 12.1
  • Rebuild all local WSL distros & updated WSL
  • Rebuild Custom-WSL2-Kernel & made a reproducible CI
  • Completed periodical rkhunter & lynis audit
  • Set up a secondary vertical monitor
  • Moved out from Vultr due to its controversial ToS change
  • Finally ditched Ubuntu in VPS & switched to Debian testing & Rocky Linux 9
  • Migrate from onedrive-vercel-index to Alist
  • Updated service index, shut down unmaintained projects
  • Begun the migration of mirroring service from sourcehut to self-hosted Forgejo
  • Begun the partial migration of hosting services on serverless platforms/VPS to my tiny rpi
  • Begun another wave of distro-hopping on rpi by swapping/flashing SD cards (and I begun to understand why Snowden loves these tiny microSD cards so much when bringing data out of NSA office)
  • And much more I don’t want to share publicly

And I accidentally broke the VHDX containing LFS 12.1 build when attempting to fix it by attaching it directly in Explorer, after HOURS of waiting during the backup procedure… Ended up having only lfs-tmp-tools, I had to compile everything from the temp toolchain again, so I spent some time looking for a sufficient backup tool other than tar & xz. And I did not even mention the notorious xz backdoor yet (although I’m not affected at all since Void uses runit instead of SystemD).

Boring Theory

The first thing I found is pigz, which is basically gzip with multi-thread and multi-core support.

And later when I’m unpacking Seifuku Kanojo, I discovered that NeXAS uses ZSTD compression and GARbro (or its active fork) does not support it yet. Fortunately, PR#464 was made exactly for this, so I spent some time installing Visual Studio I avoided on purpose for years (and it still sucks after all those years!) and compiled it against the latest commit of the fork I mentioned above, and it works.

Back to the point, Zstandard is another lossless data compression algorithm (aka. RFC-8878), and zstd is the corresponding reference implementation in C. Benchmarks are really impressive and it supports multi-thread as well.

Let me talk about an extreme example. I have an unfinished 16-bit OS, not based on Linux/BSD kernel, written from scratch in C & Assembly, with a size of 1.5M. Yes, you read that right. The complete system is 1.5M. Although I should mention that it only has a barely working desktop, cursor and console. I have not added UTF-8 support or developed any meaningful program on it yet.

And when I tried to compress it using zstd, miracle happens:

$ zstd vinfallos.img
vinfallos.img        :  0.69%   (  1.41 MiB =>  10.00 KiB, vinfallos.img.zst)

List

I’ll explain the usage of compressed backup in a few examples I use on regular basis.

LFS

I think it works for whatever linux distro but anyway:

# Chroot
exit
# WSL, run as ROOT
mountpoint -q $LFS/dev/shm && umount $LFS/dev/shm
umount $LFS/dev/pts
umount $LFS/{sys,proc,run,dev}
# Make sure $LFS is set for root
echo $LFS
# Backup
cd $LFS

# xz
tar -cJpf '/mnt/c/WSL/lfs-temp-tools-12.1.tar.xz' .
# pigz
tar cf - . | pigz -p 4 > /mnt/c/WSL/lfs-12.1-rootfs.tar.gz
# zstd
tar cf - . | zstd -T4 > /mnt/c/WSL/lfs-12.1-rootfs.tar.zst

A rough benchmarking based on Sarasa-Gothic:

$ tar -cJpf 'sarasa-1.0.8.tar.xz' Sarasa-1.0.8
56s
$ tar cf - Sarasa-1.0.8 | pigz -p 4 > sarasa-1.0.8.tar.xz
12s
$ tar cf - Sarasa-1.0.8 | zstd -T4 > sarasa-1.0.8.tar.zst
7s

WSL

For years I’ve been using the following command to backup WSL and assume it’s compressed by xz since WSL supports importing .tar.xz:

wsl --export Devuan Devuan-$(Get-Date -UFormat "%Y%m%d").tar.xz

Sadly, it’s not the case and I only realized it after testing against issue#6056. It’s just a file extension change which means nothing.

So after investigation, I install the dependencies via Chocolatey like this:

# For zstd
sudo choco install zstandard
# A bit misleading, but this is actually for xz support
sudo choco install 7zip-zstd

BTW you can find more codecs in 7-Zip-zstd GitHub README. I think Brotil and LZ4 are widely adopted as well.

Then I can export WSL distro like this (I think pigz has no Windows binary):

# zstd
cmd /c "wsl --export void-glibc - | zstd -T4 -o void-glibc-$(Get-Date -UFormat "%Y%m%d").tar.zst"
# xz
cmd /c "wsl --export void-glibc - | xz -T4 > void-glibc-$(Get-Date -UFormat "%Y%m%d").tar.xz"

This way my (six) WSL backup is shrunk from 13G to 5.2G. And cmd is called to avoid a pipe bug in PowerShell below v7.4 according to SuperUser.

Worried about import? That’s not an issue since WSL supports importing .tar.zst too although official documentation said nothing on this.

Termux PRoot-Distro

Enough talk, let me just show you the code:

# prep
cd $PREFIX/var/lib/proot-distro/installed-rootfs
PAR=$(pwd)
echo $PAR
cd ~/downloads/

# zstd
tar --use-compress-program="zstd -T4" -cvf void.tar.zst -C $PAR/void .
14s 964M
# pigz
tar -cvf - -C $PAR/void . | pigz -p 4 > void.tar.gz
25s 976M
# xz
tar -I 'xz -T4' -cvf void.tar.xz -C $PAR/void .
4m46s 893M

I hold some doubt about the size as I rarely use proot-distro, so find a way to clean up package cache on Reddit and test again:

proot-distro login void
# clean up pkg cache like ./var/cache/xbps/glibc-2.38_2.aarch64.xbps
sudo xbps-remove -oO
exit

# zstd after clean cache
tar --use-compress-program="zstd -T4" -cvf proot-distro-backup.tar.zst -C $PAR/void .
8s 300M

8s is fairly impressive and the size is down to 1/3 of the original, perfect. To this point I concluded that my previous 13G gigantic VoidWSL was full of pkg caches…

Raspberry Pi

Theory

Before actually putting my hands on this, my imagination was like the followings.

zstd:

# Prep
sudo lsblk

# Backup
$ sudo dd if=/dev/sdb bs=4M | zstd -T4 -o rpi4-void-20240412.img.zst

# Restore
sudo zstd -d -T0 ~/PiSDBackup.img.zst | sudo dd of=/dev/sdx

pigz:

sudo lsblk
sudo dd if=/dev/sdb bs=4M | pigz -p 4 > PiSDBackup.img.gz
sudo gzip -dc ~/PiSDBackup.img.gz | sudo dd of=/dev/sdx

xz:

sudo lsblk
sudo dd if=/dev/sdb bs=4M | xz -T4 -c > PiSDBackup.img.xz
xz -d -T0 -c ~/PiSDBackup.img.xz | sudo dd of=/dev/sdx

Trouble

Just like all the other examples in the list, huh?

Well, not really:

# Backup using dd
$ sudo dd if=/dev/sdb bs=4M | zstd -T4 -o rpi4-void-20240412.img.zst
# sudo dd if=/dev/sdb conv=sparse bs=4M | zstd -T4 -o rpi4-void-sparse-20240412.img.zst
Read:  29.7 GiB  ==> 34%
31914983424 bytes (32 GB, 30 GiB) copied, 365.197 s, 87.4 MB/s
/*stdin*\            : 34.45%   (  29.7 GiB =>   10.2 GiB, rpi4-void-20240412.img.zst)

Now you see, dd cannot differentiate zero sector in columns (I hope I said it correctly), so it would dump all space on the SD card, including empty one, even if only 4.2G is used. And the backup piped & compressed via zstd is in an unacceptable 10.2G size.

Therapy

Following dd on entire disk, but do not want empty portion, there are several solutions:

  1. Add conf=sparse param when dd: I tried, and it did not work
  2. Shrink the partition to as smallest as possible using GParted, then backup: should work, but this sounds quite stupid. And I have to resize it again on rpi when I finish the backup, so no
  3. dd if=/dev/zero of=asdf.txt, aka. fill empty space with zero to make sure it would be compressed by gzip: I don’t use gzip, and it’s unacceptable to fill up the disk and literally killing SSD NAND flash (although I don’t attach an external SSD and just use SD card)
  4. Following this, use Partimage or Clonezilla: I discovered that Kali repository has clonezilla, so gave it a try and it works perfectly although the first impression is miserable

Clonezilla supports many algorithms but I do not mess with those since the default works fine. And a 1.32G backup seems far more reasonable than 10.2G one already.

Vinfall's Geekademy

Sine īrā et studiō