Commits · 3ff79dbe942dcaaac464bcd57ffbd3123e585248 · CodeLinaro / yocto-mirrors / zstd

Nov 20, 2023

[huf] Improve fast huffman decoding speed in linux kernel · 3ff79dbe

Nick Terrell authored 1 year ago

gcc in the linux kernel was not unrolling the inner loops of the Huffman
decoder, which was destroying decoding performance. The compiler was
generating crazy code with all sorts of branches. I suspect because of
Spectre mitigations, but I'm not certain. Once the loops were manually
unrolled, performance was restored.

Additionally, when gcc couldn't prove that the variable left shift in
the 4X2 decode loop wasn't greater than 63, it inserted checks to verify
it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete
this check. This is a no op, because `entry.nbBits` is guaranteed to be
less than 64.

Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the
fast C loops for Issue #3762. So if even after this change, there is a
performance regression, users can opt-out at compile time.

3ff79dbe

Nov 17, 2023

[linux] Remove usage of deprecated function · 58b3ef79

Nick Terrell authored 1 year ago

ZSTD_resetDStream() is deprecated and replaced by ZSTD_DCtx_reset().
This removes deprecation warnings from the kernel build.

This change is a no-op, see the docs suggesting this replacement.

https://github.com/facebook/zstd/blob/fcbf2fde9ac7ce1562c7b3a394350e764bcb580f/lib/zstd.h#L2655-L2663

58b3ef79

Nov 16, 2023

[linux] Backport fix for arry-index-out-of-bounds UBSAN warning · ca498343

Nick Terrell authored 1 year ago

Backport the fix already in the kernel, and is fixed in the dev branch
in a different way.

https://lkml.kernel.org/r/20231012213428.1390905-1-nickrterrell@gmail.com

ca498343

[linux] Backport intptr_t removal · 136a16dd

Nick Terrell authored 1 year ago

Linux started providing intptr_t in <linux/types.h> so we no longer need
to define it here.

https://lkml.kernel.org/r/ed66b9e4-1fb7-45be-9bb9-d4bc291c691f@p183

136a16dd

Apr 04, 2023
- Merge pull request #3595 from facebook/dev · 63779c79
  Yann Collet authored 1 year ago
```
v1.5.5 last changes
```
  v1.5.5 Unverified
  
  63779c79
- Merge pull request #3593 from facebook/appvbadge · d4871d5e
  Yann Collet authored 1 year ago
```
removed Appveyor Badge
```
  Unverified
  
  d4871d5e
- Merge pull request #3592 from facebook/overRead_magicless · 838f96a9
  Yann Collet authored 1 year ago
```
fix potential over-reads
```
  Unverified
  
  838f96a9
- removed Appveyor Badge · 8eef3370
  Yann Collet authored 1 year ago
```
as we don't use Appveyor CI anymore.
```
  8eef3370
Apr 03, 2023

fixing potential over-reads · e4120c55

Yann Collet authored 1 year ago

detected by @terrelln,
these issue could be triggered in specific scenarios
namely decompression of certain invalid magic-less frames,
or requested properties from certain invalid skippable frames.

e4120c55

Merge pull request #3590 from facebook/dev · d507e029
Yann Collet authored 1 year ago
```
v1.5.5
```
Unverified

d507e029
Merge pull request #3591 from felixhandte/win-rel-artifact-name · 2b71b79f
Felix Handte authored 1 year ago
```
Rename/Restructure Windows Release Artifact
```
Unverified

2b71b79f

Rename/Restructure Windows Release Artifact · fcaa4228

W. Felix Handte authored 1 year ago

https://github.com/facebook/zstd/releases/tag/v1.5.0 describes the structure
we want to adhere to. This commit tries to accomplish that automatically, so
we can avoid manual fixups on future releases.

fcaa4228

Merge pull request #3589 from facebook/fix3583 · 130c2640
Yann Collet authored 1 year ago
```
fix #3583
```
Unverified

130c2640

fix #3583 · 2e297287

Yann Collet authored 1 year ago

As reported by @georgmu,
the previous fix is undone by the later initialization.
Switch order, so that initialization is adjusted by special case.

2e297287

Apr 02, 2023
- Merge pull request #3585 from facebook/dev · 73e19c4c
  Yann Collet authored 1 year ago
```
Preparation for release v1.5.5
```
  Unverified
  
  73e19c4c
Apr 01, 2023
- updated changelog · 9b4833df
  Yann Collet authored 1 year ago
```
in preparation for v1.5.5
```
  9b4833df
- updated version number to v1.5.5 · 9f58241d
  Yann Collet authored 1 year ago
```
also : updated man pages
```
  9f58241d
- Merge pull request #3584 from facebook/fix_o_blockdev · c45eddfa
  Yann Collet authored 1 year ago
```
fix decompression with -o writing into a block device
```
  Unverified
  
  c45eddfa
- Check that `dest` is valid for decompression (#3555) · fcaf06dd
  daniellerozenblit authored 1 year ago
```
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0

* add uptrval to linux-kernel

* remove bin files

* get rid of uptrval

* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
```
  Unverified
  
  fcaf06dd
Mar 31, 2023

do not add invocation of UTIL_isRegularFile() · 14d0cd5d
Yann Collet authored 1 year ago

14d0cd5d
Merge pull request #3581 from facebook/seekable_readOpt · 7b828aae
Yann Collet authored 1 year ago
```
Seekable format read optimization
```
Unverified

7b828aae
Merge pull request #3579 from facebook/clangclwintest · f33a4068
Yann Collet authored 1 year ago
```
added a Clang-CL Windows test to CI
```
Unverified

f33a4068
Merge pull request #3540 from dvoropaev/tests_timeout · c1024af3
Yann Collet authored 1 year ago
```
Increase tests timeout
```
Unverified

c1024af3
Merge pull request #3576 from zhuhan0/dev · bb7fbd56
Yann Collet authored 1 year ago
```
Couple tweaks to improve decompression speed with clang PGO compilation
```
Unverified

bb7fbd56

fix decompression with -o writing into a block device · 5bf1359e

Yann Collet authored 1 year ago

decompression features automatic support of sparse files,
aka a form of "compression" where entire blocks consists only of zeroes.
This only works for some compatible file systems (like ext4),
others simply ignore it (like afs).

Triggering this feature relies of `fseek()`.
But `fseek()` is not compatible with non-seekable devices, such as pipes.
Therefore it's disabled for pipes.

However, there are other objects which are not compatible with `fseek()`, such as block devices.

Changed the logic, so that `fseek()` (and therefore sparse write) is only automatically enabled on regular files.

Note that this automatic behavior can always be overridden by explicit commands `--sparse` and `--no-sparse`.

fix #3583

5bf1359e

Mar 30, 2023

seekable_format: Add unit test for multiple decompress calls · 649a9c85

Yoni Gilad authored 2 years ago

This does the following:
1. Compress test data into multiple frames
2. Perform a series of small decompressions and seeks forward, checking
   that compressed data wasn't reread unnecessarily.
3. Perform some seeks forward and backward to ensure correctness.

649a9c85

seekable_format: Prevent rereading frame when seeking forward · 618bf84e

Yoni Gilad authored 3 years ago

When decompressing a seekable file, if seeking forward within
a frame (by issuing multiple ZSTD_seekable_decompress calls
with a small gap between them), the frame will be unnecessarily
reread from the beginning. This patch makes it continue using
the current frame data and simply skip over the unneeded bytes.

618bf84e

Mar 29, 2023
- added a Clang-CL Windows test to CI · 0f77956b
  Yann Collet authored 1 year ago
```
If I understand correctly,
this should trigger the issue notified in #3569.
```
  0f77956b
- Merge pull request #3569 from tru/linker_flag_fix · 871f3a40
  Yann Collet authored 1 year ago
```
Disable linker flag detection on MSVC/ClangCL.
```
  Unverified
  
  871f3a40
Mar 28, 2023

Merge pull request #3573 from facebook/dependabot/github_actions/github/codeql-action-2.2.8 · 262e553b
Yann Collet authored 1 year ago
```
Bump github/codeql-action from 2.2.6 to 2.2.8
```
Unverified

262e553b

mmap for windows (#3557) · b2ad17a6

daniellerozenblit authored 1 year ago

* mmap for windows

* remove enabling mmap for testing

* rename FIO dictionary initialization methods + un-const dictionary objects in free functions

* remove enabling mmap for testing

* initDict returns void, underlying setDictBuffer methods return the size of the set buffer

* fix comment

Unverified

b2ad17a6

Remove clang-only branch hints from ZSTD_decodeSequence · b558190a

Han Zhu authored 1 year ago

Looking at the __builtin_expect in ZSTD_decodeSequence:

{   size_t offset;
    #if defined(__clang__)
 if (LIKELY(ofBits > 1)) {
    #else
 if (ofBits > 1) {
    #endif
 ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);

From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.

b558190a

Inline BIT_reloadDStream · e6dccbf4

Han Zhu authored 1 year ago

Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for
clang PGO-optimized zstd binary, measured using the Silesia corpus with
compression level 1. The win comes from improved register allocation which leads
to fewer spills and reloads. Take a look at this comparison of
profile-annotated hot assembly before and after this change:
https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three
fewer moves after inlining.

In general LLVM's register allocator works better when it can see more code. For
example, when the register allocator sees a call instruction, it partitions the
registers into caller registers and callee registers, and it is not free to do
whatever it wants with all the registers for the current function. Inlining the
callee lets the register allocation access all registers and use them more
flexsibly.

e6dccbf4

Merge pull request #3551 from embg/seq_prod_fuzz · 57e1b459
Elliot Gorokhovsky authored 1 year ago
```
Provide an interface for fuzzing sequence producer plugins
```
Unverified

57e1b459
Provide an interface for fuzzing sequence producer plugins · a810e1ee
Elliot Gorokhovsky authored 2 years ago

a810e1ee
Merge pull request #3568 from facebook/readme_cmake_fat · abb3585c
Yann Collet authored 1 year ago
```
Add instructions for building Universal2 on macOS via CMake
```
Unverified

abb3585c

Mar 27, 2023
- Merge pull request #3574 from felixhandte/pzstd-max-cpp-std · 93da0416
  Felix Handte authored 1 year ago
```
[contrib/pzstd] Detect and Select Maximum Available C++ Standard
```
  Unverified
  
  93da0416
- Switch Strategies: Only Set `-std=c++11` When Default is Older · cbe0f0e4
  W. Felix Handte authored 1 year ago
  
  cbe0f0e4
- Update README.md · c36d54f5
  Yann Collet authored 1 year ago
```
fix minor doc mistake (`ninja build` doesn't work)
```
  Unverified
  
  c36d54f5
- Merge pull request #3570 from facebook/rsync_doc · 7306832e
  Yann Collet authored 1 year ago
```
[easy] minor doc update for --rsyncable
```
  Unverified
  
  7306832e