Skip to content
Snippets Groups Projects
  1. Feb 11, 2018
  2. Feb 10, 2018
  3. Feb 09, 2018
  4. Feb 08, 2018
    • Noel Gordon's avatar
      Increase inflate speed: read decoder input into a uint64_t · 8a8edc1c
      Noel Gordon authored
      The chunk-copy code contribution deals with writing decoded DEFLATE data
      to the output with SIMD methods to increase inflate decode speed. Modern
      compilers such as gcc/clang/msvc elide the portable memcpy() calls used,
      replacing them with much faster SIMD machine instructions.
      
      Similarly, reading the input data to the DEFLATE decoder with wide, SIMD
      methods can also increase decode speed. See https://crbug.com/760853#c32
      for details; content-encoding: gzip decoding speed improves by 2.17x, in
      the median over the snappy testdata corpus, when this method is combined
      with the chunk-copy, and the adler32, and crc32 SIMD contributions (this
      method improves our current inflate decode speed by 20-30%).
      
      Update the chunk-copy code with a wide input data reader, which consumes
      input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the
      wide reader. This feature is supported on little endian machines, and is
      enabled with the INFLATE_CHUNK_READ_64LE build flag in BUILD.gn on Intel
      CPU only for now.
      
      The wide reader idea is due to nigeltao@chromium.org who did the initial
      work. This patch is based on his patch [1]. No change in behavior (other
      than more inflate decode speed), so no new tests.
      
      [1] https://chromium-review.googlesource.com/c/chromium/src/+/601694/16
      
      Bug: 760853
      Change-Id: Ia806d9a225737039367e1b803624cd59e286ce51
      Reviewed-on: https://chromium-review.googlesource.com/900982
      
      
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Reviewed-by: default avatarMike Klein <mtklein@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#535365}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 6e212423a214e0e41794e8c9969c2896e2c33121
      8a8edc1c
  5. Feb 07, 2018
  6. Feb 06, 2018
  7. Feb 05, 2018
  8. Jan 30, 2018
  9. Jan 23, 2018
  10. Jan 21, 2018
  11. Jan 11, 2018
    • Daniel Bratell's avatar
      Avoid exporting read_buf renaming from zlib · 2eb38892
      Daniel Bratell authored
      zlib.h includes a macro that renames read_buf->Cr_z_read_buf. Since
      read_buf is a common name in other parts of the code, it causes
      some random confusion depending on whether zlib.h has been
      included or not.
      
      The renaming macro is a side effect of the 0001-simd.patch that
      exposes an internal read_buf method to other files in zlib. This
      patch renames read_buf as it is exposed so that it has the less
      common name deflate_read_buf.
      
      Bug: 799448
      Change-Id: Icdc4eba973891dfd28d82017415048eded62d577
      Reviewed-on: https://chromium-review.googlesource.com/852257
      
      
      Commit-Queue: Daniel Bratell <bratell@opera.com>
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#528512}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 2c709d38a1c6f812da205c03f5448fe4ac5679f3
      2eb38892
  12. Jan 04, 2018
    • Noel Gordon's avatar
      Improve zlib inflate speed by using SSE4.2 crc32 · 8e904b33
      Noel Gordon authored
      Using an SSE4.2-based crc32 improves the decoding rate of the PNG
      140 corpus by 4% average, giving a total 40% performance increase
      when combined with adler32 SIMD code and inflate chunk copy code,
      see https://crbug.com/796178#c2 for details.
      
      Raw crc32 speed is 5x - 25x faster than the zlib default "BYFOUR"
      crc32, and gzip- and zlib-wrapped inflate performance improves by
      69% and 50% for the snappy corpus (https://crbug.com/796178#c3 #4
      for details).
      
      Add crc32 SIMD implementation and update the call-site in crc32.c
      to use the new crc32 code, using run-time detection of the SSE4.2
      and PCLMUL support required by the crc32 SIMD code.
      
      Update BUILD.gn to compile the crc32 SIMD code for Intel devices,
      also update names.h with the new symbol defined by the crc32 SIMD
      code path.
      
      Bug: 796178
      Change-Id: I1bb94b47c9a4934eed01ba3d4feda51d67c4bf85
      Reviewed-on: https://chromium-review.googlesource.com/833820
      
      
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#526935}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 65e2abcb74b1c07fa14f46abaa1fb1717892eec3
      8e904b33
  13. Dec 20, 2017
  14. Dec 13, 2017
  15. Dec 12, 2017
  16. Dec 08, 2017
    • Noel Gordon's avatar
      Improve zlib inflate speed by using SSE2 chunk copy · 64ffef0b
      Noel Gordon authored
      Using SSE2 chunk copies improves the decoding rate of the PNG 140
      corpus by an average 17%, giving a total 37% performance increase
      when combined with SIMD adler32 code (https://crbug.com/772870#c3
      for details).
      
      Move the arm-specific code back into the main chunk copy code and
      generalize the SIMD parts of chunkset_core() with inline function
      calls for ARM, and Intel SSE2 devices. This removes the TODO from
      arm/chunkcopy_arm.h, and that file can be deleted as a result.
      
      Add SSE2 vector load / store SSE helpers for chunkset_core(). The
      existing NEON load code had alignment issues, as noted in review.
      Fix that: use unaligned loads in the ARM helper code.
      
      Change chunkcopy.h to use __builtin_memcpy if it's available, use
      zmemcpy otherwise such as on MSVC. Also call x86_check_features()
      in inflateInit2_() to keep the adler32 SIMD code path enabled.
      
      Update BUILD.gn to conditionally compile the SIMD chunk copy code
      on Intel SSE2 and ARM NEON devices. Update names.h to add the new
      symbol defined by the inflate chunk copy code path.
      
      Code had various comment styles; pick one and use it consistently
      everywhere. Add inffast_chunk.h TODO(cblume).
      
      Bug: 772870
      Change-Id: I47004c68ee675acf418825fb0e1f8fa8018d4342
      Reviewed-on: https://chromium-review.googlesource.com/708834
      
      
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#522764}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: c293a3255eb27dee8879f85f2c45dedff58e2452
      64ffef0b
  17. Nov 30, 2017
    • Boris Sazonov's avatar
      Revert "Using ARMv8 CRC32 specific instruction" · 0f473a1d
      Boris Sazonov authored
      This reverts commit 35988c821c051a57e30c76f9fcd87b7b677bd9bd.
      
      Reason for revert: broke build ('cpu-features.h' not found)
      https://uberchromegw.corp.google.com/i/internal.client.clank/builders/x64-builder/builds/13697
      
      Original change's description:
      > Using ARMv8 CRC32 specific instruction
      > 
      > CRC32 affects performance for both image decompression (PNG)
      > as also in general browsing while accessing websites that serve
      > content using compression (i.e. Content-Encoding: gzip).
      > 
      > This patch implements an optimized CRC32 function using the
      > dedicated instruction available in ARMv8. This instruction is available
      > in new Android devices featuring an ARMv8 SoC, like Nexus 5x and
      > Google Pixel.
      > 
      > It should be between 6x (A53: 116ms X 22ms for a 4Kx4Kx4 buffer) to
      > 10x faster (A72: 91ms x 9ms) than the C implementation currently used
      > by zlib.
      > 
      > PNG decoding performance gains should be around 5-9%.
      > 
      > Finally it also introduces code to perform the ARM CPU features detection
      > using getauxval()@Linux/CrOS or android_getCpuFeatures(). We pre-built
      > and link the CRC32 instruction dependent code but will decide if to
      > use it at run time.
      > 
      > If the feature is not supported, we fallback to the C implementation.
      > 
      > This approach allows to use the instruction in both 32bits and 64bits
      > builds and works fine either in ARMv7 or ARMv8 processor. I tested the
      > generated Chromium apk in both a ARMv7 (Nexus 4 and 6) and ARMv8 (Nexus 5x and
      > Google Pixel).
      > 
      > Change-Id: I069408ebc06c49a3c2be4ba3253319e025ee09d7
      > Bug: 709716
      > Reviewed-on: https://chromium-review.googlesource.com/612629
      
      
      > Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      > Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#520377}
      
      TBR=agl@chromium.org,noel@chromium.org,cavalcantii@chromium.org,cblume@chromium.org,mtklein@chromium.org,adenilson.cavalcanti@arm.com
      
      Change-Id: Ief2c32df5c8a37635f937cd6a671f5574f5a53a3
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
      Bug: 709716
      Reviewed-on: https://chromium-review.googlesource.com/799930
      
      
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Reviewed-by: default avatarBoris Sazonov <bsazonov@chromium.org>
      Commit-Queue: Boris Sazonov <bsazonov@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#520497}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: e7d9a4649bde6f047105d29f0026dd8c3d54143a
      0f473a1d
    • Adenilson Cavalcanti's avatar
      Using ARMv8 CRC32 specific instruction · d7601c23
      Adenilson Cavalcanti authored
      CRC32 affects performance for both image decompression (PNG)
      as also in general browsing while accessing websites that serve
      content using compression (i.e. Content-Encoding: gzip).
      
      This patch implements an optimized CRC32 function using the
      dedicated instruction available in ARMv8. This instruction is available
      in new Android devices featuring an ARMv8 SoC, like Nexus 5x and
      Google Pixel.
      
      It should be between 6x (A53: 116ms X 22ms for a 4Kx4Kx4 buffer) to
      10x faster (A72: 91ms x 9ms) than the C implementation currently used
      by zlib.
      
      PNG decoding performance gains should be around 5-9%.
      
      Finally it also introduces code to perform the ARM CPU features detection
      using getauxval()@Linux/CrOS or android_getCpuFeatures(). We pre-built
      and link the CRC32 instruction dependent code but will decide if to
      use it at run time.
      
      If the feature is not supported, we fallback to the C implementation.
      
      This approach allows to use the instruction in both 32bits and 64bits
      builds and works fine either in ARMv7 or ARMv8 processor. I tested the
      generated Chromium apk in both a ARMv7 (Nexus 4 and 6) and ARMv8 (Nexus 5x and
      Google Pixel).
      
      Change-Id: I069408ebc06c49a3c2be4ba3253319e025ee09d7
      Bug: 709716
      Reviewed-on: https://chromium-review.googlesource.com/612629
      
      
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#520377}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 35988c821c051a57e30c76f9fcd87b7b677bd9bd
      d7601c23
  18. Nov 15, 2017
  19. Nov 13, 2017
  20. Nov 10, 2017
  21. Nov 04, 2017
  22. Nov 03, 2017
  23. Nov 02, 2017
  24. Oct 31, 2017
  25. Oct 30, 2017
    • Adenilson Cavalcanti's avatar
      Isolating ARM specific code in inffast · f44229bb
      Adenilson Cavalcanti authored
      The NEON specific code will be hosted in the folder 
      'contrib/optimizations/arm' while the platform independent 
      C code is hosted in the upper directory.
      
      This allows to easily implement the inffast optimization for other
      architectures by simply implementing 2 functions and including the
      necessary header in chunk_copy.h (that is used by inflate and inffast).
      
      The idea is with time to move all optimizations to this new folder.
      
      Bug: 769880
      Change-Id: I404ec0fdf3f6867c9c124da859ca38bf57b25447
      Reviewed-on: https://chromium-review.googlesource.com/740907
      
      
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#512542}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 626df311481f7fac07b58799fbc94e09c848f01d
      f44229bb
  26. Oct 18, 2017
    • Jay Civelli's avatar
      Changing the FileAccessor API in zip.h to improve perfs over IPC. · 945bc9fb
      Jay Civelli authored
      When using zip::Zip() with an IPC based FileAccessor, zipping
      directories with large number of files triggers many IPC calls
      making the entire operation significantly slower than with direct file
      access.
      In order to alleviate this performance hit, this patch groups file
      reads by modifying the FileAccessor read method so it reads multiple
      files at once. zip::Zip() can then group these reads when writing the
      ZIP file.
      The writing code has been factored out into a new ZipWriter class to
      make that code more readable.
      
      Bug: 773310
      Change-Id: I8121980bf05d87a174c63164840ec6bf325c7e52
      Reviewed-on: https://chromium-review.googlesource.com/719356
      
      
      Commit-Queue: Jay Civelli <jcivelli@chromium.org>
      Reviewed-by: default avatarIlya Sherman <isherman@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#509693}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: d0cb5e408404d652492171bbed9c8ecd3d44a9aa
      945bc9fb
  27. Oct 13, 2017
  28. Oct 12, 2017
Loading