Linux From Scratch to 12.1
And I accidentally broke the VHDX containing LFS 12.1 build when attempting to fix it by attaching it directly in Explorer, after HOURS of waiting during the backup procedure… Ended up having only lfs-tmp-tools, I had to compile everything from the temp toolchain again, so I spent some time looking for a sufficient backup tool other than tar & xz. And I did not even mention the notorious xz backdoor yet (although I’m not affected at all since Void uses runit instead of SystemD).
The first thing I found is pigz, which is basically gzip with multi-thread and multi-core support.
And later when I’m unpacking Seifuku Kanojo, I discovered that NeXAS uses ZSTD compression and GARbro (or its active fork) does not support it yet. Fortunately, PR#464 was made exactly for this, so I spent some time installing Visual Studio I avoided on purpose for years (and it still sucks after all those years!) and compiled it against the latest commit of the fork I mentioned above, and it works.
Back to the point, Zstandard is another lossless data compression algorithm (aka. RFC-8878), and zstd is the corresponding reference implementation in C. Benchmarks are really impressive and it supports multi-thread as well.
Let me talk about an extreme example. I have an unfinished 16-bit OS, not based on Linux/BSD kernel, written from scratch in C & Assembly, with a size of 1.5M. Yes, you read that right. The complete system is 1.5M. Although I should mention that it only has a barely working desktop, cursor and console. I have not added UTF-8 support or developed any meaningful program on it yet.
And when I tried to compress it using zstd, miracle happens:
$ zstd vinfallos.img
vinfallos.img : 0.69% ( 1.41 MiB => 10.00 KiB, vinfallos.img.zst)
I’ll explain the usage of compressed backup in a few examples I use on regular basis.
I think it works for whatever linux distro but anyway:
# Chroot
exit
# WSL, run as ROOT
mountpoint -q $LFS/dev/shm && umount $LFS/dev/shm
umount $LFS/dev/pts
umount $LFS/{sys,proc,run,dev}
# Make sure $LFS is set for root
echo $LFS
# Backup
cd $LFS
# xz
tar -cJpf '/mnt/c/WSL/lfs-temp-tools-12.1.tar.xz' .
# pigz
tar cf - . | pigz -p 4 > /mnt/c/WSL/lfs-12.1-rootfs.tar.gz
# zstd
tar cf - . | zstd -T4 > /mnt/c/WSL/lfs-12.1-rootfs.tar.zst
A rough benchmarking based on Sarasa-Gothic:
$ tar -cJpf 'sarasa-1.0.8.tar.xz' Sarasa-1.0.8
56s
$ tar cf - Sarasa-1.0.8 | pigz -p 4 > sarasa-1.0.8.tar.xz
12s
$ tar cf - Sarasa-1.0.8 | zstd -T4 > sarasa-1.0.8.tar.zst
7s
For years I’ve been using the following command to backup WSL and assume it’s compressed
by xz since WSL supports importing .tar.xz
:
wsl --export Devuan Devuan-$(Get-Date -UFormat "%Y%m%d").tar.xz
Sadly, it’s not the case and I only realized it after testing against issue#6056. It’s just a file extension change which means nothing.
So after investigation, I install the dependencies via Chocolatey like this:
# For zstd
sudo choco install zstandard
# A bit misleading, but this is actually for xz support
sudo choco install 7zip-zstd
BTW you can find more codecs in 7-Zip-zstd GitHub README. I think Brotil and LZ4 are widely adopted as well.
Then I can export WSL distro like this (I think pigz has no Windows binary):
# zstd
cmd /c "wsl --export void-glibc - | zstd -T4 -o void-glibc-$(Get-Date -UFormat "%Y%m%d").tar.zst"
# xz
cmd /c "wsl --export void-glibc - | xz -T4 > void-glibc-$(Get-Date -UFormat "%Y%m%d").tar.xz"
This way my (six) WSL backup is shrunk from 13G to 5.2G.
And cmd
is called to avoid a pipe bug in PowerShell below v7.4 according to SuperUser.
Worried about import? That’s not an issue since WSL supports importing .tar.zst
too although official documentation said nothing on this.
Enough talk, let me just show you the code:
# prep
cd $PREFIX/var/lib/proot-distro/installed-rootfs
PAR=$(pwd)
echo $PAR
cd ~/downloads/
# zstd
tar --use-compress-program="zstd -T4" -cvf void.tar.zst -C $PAR/void .
14s 964M
# pigz
tar -cvf - -C $PAR/void . | pigz -p 4 > void.tar.gz
25s 976M
# xz
tar -I 'xz -T4' -cvf void.tar.xz -C $PAR/void .
4m46s 893M
I hold some doubt about the size as I rarely use proot-distro, so find a way to clean up package cache on Reddit and test again:
proot-distro login void
# clean up pkg cache like ./var/cache/xbps/glibc-2.38_2.aarch64.xbps
sudo xbps-remove -oO
exit
# zstd after clean cache
tar --use-compress-program="zstd -T4" -cvf proot-distro-backup.tar.zst -C $PAR/void .
8s 300M
8s is fairly impressive and the size is down to 1/3 of the original, perfect. To this point I concluded that my previous 13G gigantic VoidWSL was full of pkg caches…
Before actually putting my hands on this, my imagination was like the followings.
zstd:
# Prep
sudo lsblk
# Backup
$ sudo dd if=/dev/sdb bs=4M | zstd -T4 -o rpi4-void-20240412.img.zst
# Restore
sudo zstd -d -T0 ~/PiSDBackup.img.zst | sudo dd of=/dev/sdx
pigz:
sudo lsblk
sudo dd if=/dev/sdb bs=4M | pigz -p 4 > PiSDBackup.img.gz
sudo gzip -dc ~/PiSDBackup.img.gz | sudo dd of=/dev/sdx
xz:
sudo lsblk
sudo dd if=/dev/sdb bs=4M | xz -T4 -c > PiSDBackup.img.xz
xz -d -T0 -c ~/PiSDBackup.img.xz | sudo dd of=/dev/sdx
Just like all the other examples in the list, huh?
Well, not really:
# Backup using dd
$ sudo dd if=/dev/sdb bs=4M | zstd -T4 -o rpi4-void-20240412.img.zst
# sudo dd if=/dev/sdb conv=sparse bs=4M | zstd -T4 -o rpi4-void-sparse-20240412.img.zst
Read: 29.7 GiB ==> 34%
31914983424 bytes (32 GB, 30 GiB) copied, 365.197 s, 87.4 MB/s
/*stdin*\ : 34.45% ( 29.7 GiB => 10.2 GiB, rpi4-void-20240412.img.zst)
Now you see, dd
cannot differentiate zero sector in columns (I hope I said it correctly), so it would dump all space on the SD card, including empty one, even if only 4.2G is used. And the backup piped & compressed via zstd is in an unacceptable 10.2G size.
Following dd on entire disk, but do not want empty portion, there are several solutions:
conf=sparse
param when dd
: I tried, and it did not workdd if=/dev/zero of=asdf.txt
, aka. fill empty space with zero to make sure it would be compressed by gzip: I don’t use gzip, and it’s unacceptable to fill up the disk and literally killing SSD NAND flash (although I don’t attach an external SSD and just use SD card)Clonezilla supports many algorithms but I do not mess with those since the default works fine. And a 1.32G backup seems far more reasonable than 10.2G one already.
Sine īrā et studiō