Increase inflate speed: read decoder input into a uint64_t
The chunk-copy code contribution deals with writing decoded DEFLATE data to the output with SIMD methods to increase inflate decode speed. Modern compilers such as gcc/clang/msvc elide the portable memcpy() calls used, replacing them with much faster SIMD machine instructions. Similarly, reading the input data to the DEFLATE decoder with wide, SIMD methods can also increase decode speed. See https://crbug.com/760853#c32 for details; content-encoding: gzip decoding speed improves by 2.17x, in the median over the snappy testdata corpus, when this method is combined with the chunk-copy, and the adler32, and crc32 SIMD contributions (this method improves our current inflate decode speed by 20-30%). Update the chunk-copy code with a wide input data reader, which consumes input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the wide reader. This feature is supported on little endian machines, and is enabled with the INFLATE_CHUNK_READ_64LE build flag in BUILD.gn on Intel CPU only for now. The wide reader idea is due to nigeltao@chromium.org who did the initial work. This patch is based on his patch [1]. No change in behavior (other than more inflate decode speed), so no new tests. [1] https://chromium-review.googlesource.com/c/chromium/src/+/601694/16 Bug: 760853 Change-Id: Ia806d9a225737039367e1b803624cd59e286ce51 Reviewed-on: https://chromium-review.googlesource.com/900982 Commit-Queue: Noel Gordon <noel@chromium.org> Reviewed-by:Mike Klein <mtklein@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#535365} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: 6e212423a214e0e41794e8c9969c2896e2c33121
Showing
- BUILD.gn 4 additions, 0 deletionsBUILD.gn
- contrib/optimizations/chunkcopy.h 31 additions, 1 deletioncontrib/optimizations/chunkcopy.h
- contrib/optimizations/inffast_chunk.c 45 additions, 4 deletionscontrib/optimizations/inffast_chunk.c
- contrib/optimizations/inffast_chunk.h 6 additions, 1 deletioncontrib/optimizations/inffast_chunk.h
Loading
Please register or sign in to comment