-
Noel Gordon authored
Add SSSE3 implementation of the adler32 checksum, suitable for both large workloads, and small workloads commonly seen during PNG image decoding. Add a NEON implementation. Speed is comparable to the serial adler32 computation but near 64 bytes of input data, the SIMD code paths begin to be faster than the serial path: 3x faster at 256 bytes of input data, to ~8x faster for 1M of input data (~4x on ARMv8 NEON). For the PNG 140 image corpus, PNG decoding speed is ~8% faster on average on the desktop machines tested, and ~2% on an ARMv8 Pixel C Android (N) tablet, https://crbug.com/762564#c41 Update x86.{c,h} to runtime detect SSSE3 support and use it to enable the adler32_simd code path and update inflate.c to call x86_check_features(). Update the name mangler file names.h for the new symbols added, add FIXME about simd.patch. Ignore data alignment in the SSSE3 case since unaligned access is no longer penalized on current generation Intel CPU. Use it in the NEON case however to avoid the extra costs of unaligned memory access on ARMv8/v7. NEON credits: the v_s1/s2 vector component accumulate code was provided by Adenilson Cavalcanti. The uint16 column vector sum code is from libdeflate with corrections to process NMAX input bytes which improves performance by 3% for large buffers. Update BUILD.gn to put the code in its own source set, and add it conditionally to the zlib library build rule. On ARM, build the SIMD with max-speed config to produce the smallest code. No change in behavior, covered by many existing tests. Bug: 762564 Change-Id: I14a39940ae113b5a67ba70a99c3741e289b1796b Reviewed-on: https://chromium-review.googlesource.com/660019 Commit-Queue: Chris Blume <cblume@chromium.org> Reviewed-by:
Adenilson Cavalcanti <cavalcantii@chromium.org> Reviewed-by:
Chris Blume <cblume@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#505447} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: 09b784fd12f255a9da38107ac6e0386f4dde6d68
17bbb3d7