Skip to content
Snippets Groups Projects
  • Noel Gordon's avatar
    Improve zlib inflate speed by using SSE2 chunk copy · 64ffef0b
    Noel Gordon authored
    Using SSE2 chunk copies improves the decoding rate of the PNG 140
    corpus by an average 17%, giving a total 37% performance increase
    when combined with SIMD adler32 code (https://crbug.com/772870#c3
    for details).
    
    Move the arm-specific code back into the main chunk copy code and
    generalize the SIMD parts of chunkset_core() with inline function
    calls for ARM, and Intel SSE2 devices. This removes the TODO from
    arm/chunkcopy_arm.h, and that file can be deleted as a result.
    
    Add SSE2 vector load / store SSE helpers for chunkset_core(). The
    existing NEON load code had alignment issues, as noted in review.
    Fix that: use unaligned loads in the ARM helper code.
    
    Change chunkcopy.h to use __builtin_memcpy if it's available, use
    zmemcpy otherwise such as on MSVC. Also call x86_check_features()
    in inflateInit2_() to keep the adler32 SIMD code path enabled.
    
    Update BUILD.gn to conditionally compile the SIMD chunk copy code
    on Intel SSE2 and ARM NEON devices. Update names.h to add the new
    symbol defined by the inflate chunk copy code path.
    
    Code had various comment styles; pick one and use it consistently
    everywhere. Add inffast_chunk.h TODO(cblume).
    
    Bug: 772870
    Change-Id: I47004c68ee675acf418825fb0e1f8fa8018d4342
    Reviewed-on: https://chromium-review.googlesource.com/708834
    
    
    Commit-Queue: Noel Gordon <noel@chromium.org>
    Reviewed-by: default avatarChris Blume <cblume@chromium.org>
    Cr-Original-Commit-Position: refs/heads/master@{#522764}
    Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
    Cr-Mirrored-Commit: c293a3255eb27dee8879f85f2c45dedff58e2452
    64ffef0b