Inline BIT_reloadDStream (e6dccbf4) · Commits · CodeLinaro / yocto-mirrors / zstd

Commit e6dccbf4 authored 1 year ago by Han Zhu

Inline BIT_reloadDStream

Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for
clang PGO-optimized zstd binary, measured using the Silesia corpus with
compression level 1. The win comes from improved register allocation which leads
to fewer spills and reloads. Take a look at this comparison of
profile-annotated hot assembly before and after this change:
https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three
fewer moves after inlining.

In general LLVM's register allocator works better when it can see more code. For
example, when the register allocator sees a call instruction, it partitions the
registers into caller registers and callee registers, and it is not free to do
whatever it wants with all the registers for the current function. Inlining the
callee lets the register allocation access all registers and use them more
flexsibly.

parent 57e1b459

No related branches found

No related tags found

Hide whitespace changes

Inline Side-by-side

Showing with 1 addition and 1 deletion

Please register or to comment