Align decompress sequences loop to 32+16 bytes (8d0ee37a) · Commits · CodeLinaro / yocto-mirrors / zstd

Commit 8d0ee37a authored 5 years ago by Nick Terrell

Align decompress sequences loop to 32+16 bytes

The alignment is added before the loop, so this shouldn't hurt
performance in any case. The only way it hurts is if there is already
performance instability, and we force it to be stable but in the bad
case.

This consistently gets us into the good case with gcc-{7,8,9} on an
Intel i9-9900K and clang-9. gcc-5 is 5% worse than its best case but has
stable performance. We get consistently good behavior on my Macbook Pro
compiled with both clang and gcc-8. It ends up in the 50% from DSB and
50% from MITE case, but the performance is the same as the 85% DSB case,
so thats fine.

parent 66607d0e

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 41 additions and 0 deletions

Please register or to comment