- Nov 20, 2023
-
-
Nick Terrell authored
gcc in the linux kernel was not unrolling the inner loops of the Huffman decoder, which was destroying decoding performance. The compiler was generating crazy code with all sorts of branches. I suspect because of Spectre mitigations, but I'm not certain. Once the loops were manually unrolled, performance was restored. Additionally, when gcc couldn't prove that the variable left shift in the 4X2 decode loop wasn't greater than 63, it inserted checks to verify it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete this check. This is a no op, because `entry.nbBits` is guaranteed to be less than 64. Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the fast C loops for Issue #3762. So if even after this change, there is a performance regression, users can opt-out at compile time.
-
- Nov 17, 2023
-
-
Nick Terrell authored
ZSTD_resetDStream() is deprecated and replaced by ZSTD_DCtx_reset(). This removes deprecation warnings from the kernel build. This change is a no-op, see the docs suggesting this replacement. https://github.com/facebook/zstd/blob/fcbf2fde9ac7ce1562c7b3a394350e764bcb580f/lib/zstd.h#L2655-L2663
-
- Nov 16, 2023
-
-
Nick Terrell authored
Backport the fix already in the kernel, and is fixed in the dev branch in a different way. https://lkml.kernel.org/r/20231012213428.1390905-1-nickrterrell@gmail.com
-
Nick Terrell authored
Linux started providing intptr_t in <linux/types.h> so we no longer need to define it here. https://lkml.kernel.org/r/ed66b9e4-1fb7-45be-9bb9-d4bc291c691f@p183
-
- Apr 04, 2023
-
-
Yann Collet authored
v1.5.5 last changes
-
Yann Collet authored
removed Appveyor Badge
-
Yann Collet authored
fix potential over-reads
-
Yann Collet authored
as we don't use Appveyor CI anymore.
-
- Apr 03, 2023
-
-
Yann Collet authored
detected by @terrelln, these issue could be triggered in specific scenarios namely decompression of certain invalid magic-less frames, or requested properties from certain invalid skippable frames.
-
Yann Collet authored
v1.5.5
-
Felix Handte authored
Rename/Restructure Windows Release Artifact
-
W. Felix Handte authored
https://github.com/facebook/zstd/releases/tag/v1.5.0 describes the structure we want to adhere to. This commit tries to accomplish that automatically, so we can avoid manual fixups on future releases.
-
Yann Collet authored
fix #3583
-
Yann Collet authored
As reported by @georgmu, the previous fix is undone by the later initialization. Switch order, so that initialization is adjusted by special case.
-
- Apr 02, 2023
-
-
Yann Collet authored
Preparation for release v1.5.5
-
- Apr 01, 2023
-
-
Yann Collet authored
in preparation for v1.5.5
-
Yann Collet authored
also : updated man pages
-
Yann Collet authored
fix decompression with -o writing into a block device
-
daniellerozenblit authored
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
-
- Mar 31, 2023
-
-
Yann Collet authored
-
Yann Collet authored
Seekable format read optimization
-
Yann Collet authored
added a Clang-CL Windows test to CI
-
Yann Collet authored
Increase tests timeout
-
Yann Collet authored
Couple tweaks to improve decompression speed with clang PGO compilation
-
Yann Collet authored
decompression features automatic support of sparse files, aka a form of "compression" where entire blocks consists only of zeroes. This only works for some compatible file systems (like ext4), others simply ignore it (like afs). Triggering this feature relies of `fseek()`. But `fseek()` is not compatible with non-seekable devices, such as pipes. Therefore it's disabled for pipes. However, there are other objects which are not compatible with `fseek()`, such as block devices. Changed the logic, so that `fseek()` (and therefore sparse write) is only automatically enabled on regular files. Note that this automatic behavior can always be overridden by explicit commands `--sparse` and `--no-sparse`. fix #3583
-
- Mar 30, 2023
-
-
Yoni Gilad authored
This does the following: 1. Compress test data into multiple frames 2. Perform a series of small decompressions and seeks forward, checking that compressed data wasn't reread unnecessarily. 3. Perform some seeks forward and backward to ensure correctness.
-
Yoni Gilad authored
When decompressing a seekable file, if seeking forward within a frame (by issuing multiple ZSTD_seekable_decompress calls with a small gap between them), the frame will be unnecessarily reread from the beginning. This patch makes it continue using the current frame data and simply skip over the unneeded bytes.
-
- Mar 29, 2023
-
-
Yann Collet authored
If I understand correctly, this should trigger the issue notified in #3569.
-
Yann Collet authored
Disable linker flag detection on MSVC/ClangCL.
-
- Mar 28, 2023
-
-
Yann Collet authored
Bump github/codeql-action from 2.2.6 to 2.2.8
-
daniellerozenblit authored
* mmap for windows * remove enabling mmap for testing * rename FIO dictionary initialization methods + un-const dictionary objects in free functions * remove enabling mmap for testing * initDict returns void, underlying setDictBuffer methods return the size of the set buffer * fix comment
-
Han Zhu authored
Looking at the __builtin_expect in ZSTD_decodeSequence: { size_t offset; #if defined(__clang__) if (LIKELY(ofBits > 1)) { #else if (ofBits > 1) { #endif ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1); From profile-annotated assembly, the probability of ofBits > 1 is about 75% (101k counts out of 135k counts). This is much smaller than the recommended likelihood to use __builtin_expect which is 99%. As a result, clang moved the else block further away which hurts cache locality. Removing this __built_expect along with two others in ZSTD_decodeSequence gave better performance when PGO is enabled. I suggest to remove these branch hints and rely on PGO which leverages runtime profiles from actual workload to calculate branch probability instead.
-
Han Zhu authored
Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for clang PGO-optimized zstd binary, measured using the Silesia corpus with compression level 1. The win comes from improved register allocation which leads to fewer spills and reloads. Take a look at this comparison of profile-annotated hot assembly before and after this change: https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three fewer moves after inlining. In general LLVM's register allocator works better when it can see more code. For example, when the register allocator sees a call instruction, it partitions the registers into caller registers and callee registers, and it is not free to do whatever it wants with all the registers for the current function. Inlining the callee lets the register allocation access all registers and use them more flexsibly.
-
Elliot Gorokhovsky authored
Provide an interface for fuzzing sequence producer plugins
-
Elliot Gorokhovsky authored
-
Yann Collet authored
Add instructions for building Universal2 on macOS via CMake
-
- Mar 27, 2023
-
-
Felix Handte authored
[contrib/pzstd] Detect and Select Maximum Available C++ Standard
-
W. Felix Handte authored
-
Yann Collet authored
fix minor doc mistake (`ninja build` doesn't work)
-
Yann Collet authored
[easy] minor doc update for --rsyncable
-