zstd: Backport Huffman speed improvement from upstream
Backport upstream commit c7269ad [0] to improve zstd decoding speed. Updating the kernel to zstd v1.5.5 earlier in this patch series regressed zstd decoding speed. This turned out to be because gcc was not unrolling the inner loops of the Huffman decoder which are executed a constant number of times [1]. This really hurts performance, as we expect this loop to be completely branch-free. This commit fixes the issue by unrolling the loop manually [2]. The commit fixes one more minor issue, which is to mask a variable shift by 0x3F. The shift was guaranteed to be less than 64, but gcc couldn't prove that, and emitted suboptimal code. Finally, the upstream commit added a build macro `HUF_DISABLE_FAST_DECODE` which is not used in the kernel, but is maintained to keep a clean import from upstream. This commit was generated from upstream signed tag v1.5.5-kernel [3] by: export ZSTD=/path/to/repo/zstd/ export LINUX=/path/to/repo/linux/ cd "$ZSTD/contrib/linux-kernel" git checkout v1.5.5-kernel make import LINUX="$LINUX" I ran my benchmark & test suite before and after this commit to measure the overall decompression speed benefit. It benchmarks zstd at several compression levels. These benchmarks measure the total time it takes to read data from the compressed filesystem. Component, Level, Read time delta Btrfs , 1, -7.0% Btrfs , 3, -3.9% Btrfs , 5, -4.7% Btrfs , 7, -5.5% Btrfs , 9, -2.4% Squashfs , 1, -9.1% Link: https://github.com/facebook/zstd/commit/c7269add7eaf028ed828d9af41e732cf01993aad Link: https://gist.github.com/terrelln/2e14ff1fb197102a08d7823d8044978d Link: https://gist.github.com/terrelln/a70bde22a2abc800691fb65c21eabc2a Link: https://github.com/facebook/zstd/tree/v1.5.5-kernel Signed-off-by:Nick Terrell <terrelln@fb.com>
Loading
Please register or sign in to comment