-
Noel Gordon authored
Using SSE2 chunk copies improves the decoding rate of the PNG 140 corpus by an average 17%, giving a total 37% performance increase when combined with SIMD adler32 code (https://crbug.com/772870#c3 for details). Move the arm-specific code back into the main chunk copy code and generalize the SIMD parts of chunkset_core() with inline function calls for ARM, and Intel SSE2 devices. This removes the TODO from arm/chunkcopy_arm.h, and that file can be deleted as a result. Add SSE2 vector load / store SSE helpers for chunkset_core(). The existing NEON load code had alignment issues, as noted in review. Fix that: use unaligned loads in the ARM helper code. Change chunkcopy.h to use __builtin_memcpy if it's available, use zmemcpy otherwise such as on MSVC. Also call x86_check_features() in inflateInit2_() to keep the adler32 SIMD code path enabled. Update BUILD.gn to conditionally compile the SIMD chunk copy code on Intel SSE2 and ARM NEON devices. Update names.h to add the new symbol defined by the inflate chunk copy code path. Code had various comment styles; pick one and use it consistently everywhere. Add inffast_chunk.h TODO(cblume). Bug: 772870 Change-Id: I47004c68ee675acf418825fb0e1f8fa8018d4342 Reviewed-on: https://chromium-review.googlesource.com/708834 Commit-Queue: Noel Gordon <noel@chromium.org> Reviewed-by:
Chris Blume <cblume@chromium.org> Cr-Original-Commit-Position: refs/heads/master@{#522764} Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src Cr-Mirrored-Commit: c293a3255eb27dee8879f85f2c45dedff58e2452
64ffef0b