[lazy] Use switch instead of indirect function calls.
Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR #2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR #2828 was merged. This PR is necessary for Issue #3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar | Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta | |----------|-------|------------------|-----------------|--------| | gcc | 5 | 102.6 | 113.7 | +10.8% | | gcc | 7 | 66.6 | 74.8 | +12.3% | | gcc | 9 | 51.5 | 58.9 | +14.3% | | gcc | 13 | 14.3 | 14.3 | +0.0% | | clang | 5 | 108.1 | 114.8 | +6.2% | | clang | 7 | 68.5 | 72.3 | +5.5% | | clang | 9 | 53.2 | 56.2 | +5.6% | | clang | 13 | 14.3 | 14.7 | +2.8% | The binary size stays just about the same for clang and gcc, measured using the `size` command: | Compiler | Branch | Text | Data | BSS | Total | |----------|--------|---------|------|-----|---------| | gcc | dev | 1127950 | 3312 | 280 | 1131542 | | gcc | PR | 1123422 | 2512 | 280 | 1126214 | | clang | dev | 1046254 | 3256 | 216 | 1049726 | | clang | PR | 1048198 | 2296 | 216 | 1050710 |
Loading
Please register or sign in to comment