Skip to content
Snippets Groups Projects
  1. Apr 22, 2020
  2. Apr 21, 2020
  3. Apr 20, 2020
  4. Apr 08, 2020
    • Adenilson Cavalcanti's avatar
      Adding a utest for small payloads · 61bddccf
      Adenilson Cavalcanti authored
      One of the optimizations (i.e. chunk_copy) will perform vector stores on
      16 bytes chunks instead of the original 3 bytes scalar operations.
      
      It is interesting to validate its safety while operating with small
      payloads (i.e. data input smaller than a single load/store).
      
      Even though it is a corner case (i.e. the payload would be smaller than
      the wrapper used for the DEFLATE stream for GZIP), it is good to certify
      that the optimization works as expected.
      
      This will also add gtest as a dependency as the plan is to write some
      tests to stress the optimizations we ship.
      
      Bug: 1032721
      Change-Id: Ifc6a81879e3dba6a9c4b7cfde80e7207258b934c
      Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2128836
      
      
      Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Reviewed-by: default avatarVictor Costan <pwnall@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#757639}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 272595ed5f469ee379e28dd5c40ef0230b6680a5
      61bddccf
  5. Feb 14, 2020
  6. Jan 28, 2020
    • Nico Weber's avatar
      Reformat remaining gn files. · b9b9a5af
      Nico Weber authored
      `gn format` recently changed its formatting behavior
      for deps, source, and a few other elements when they
      are assigned (with =) single-element lists to be consistent
      with the formatting of updates (with +=) with single-element.
      
      Now that we've rolled in a GN binary with the change,
      reformat all files so that people don't get presubmit
      warnings due to this.
      
      Most changes have landed by now via `git cl split`.
      This is what remains after two weeks.
      
      Bug: 1041419
      Change-Id: Ia813d744e57e5647266a91d4f6c725bf921fb11c
      Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2024471
      
      
      Commit-Queue: Nico Weber <thakis@chromium.org>
      Auto-Submit: Nico Weber <thakis@chromium.org>
      Reviewed-by: default avatarKentaro Hara <haraken@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#735958}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 852532f442a478b767bb452c63b5aa9b2e5e19fe
      b9b9a5af
  7. Jan 24, 2020
  8. Jan 08, 2020
  9. Dec 21, 2019
  10. Dec 12, 2019
  11. Aug 27, 2019
  12. Jun 24, 2019
  13. Jun 10, 2019
    • Jose Dapena Paz's avatar
      zlib: fix ARMv8 CRC32 compilation in GCC · bbacb136
      Jose Dapena Paz authored
      GCC compilation in ARM architectures with CRC32 extension was
      broken, as the extension was guarded for clang.
      
      For GCC we are enforcing armv8-a+crc architecture at module
      level, so the builtin extensions are available. Then we
      just include arm_acle.h to declare the required builtins.
      
      ThinLTO requires all modules to use same target, so this
      change makes GCC fail with ThinLTO (that was not supported
      anyway). Added a GN assert to explicitely fail in this case.
      
      Adapted from Vladislav Mukulov <vladislav.mukulov@lge.com>
      original patch.
      
      Bug: 819294
      Change-Id: Ifa5cf64318f88220052c44126db90bef999b7113
      Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/1642730
      
      
      Reviewed-by: default avatarAdenilson Cavalcanti <cavalcantii@chromium.org>
      Commit-Queue: José Dapena Paz <jose.dapena@lge.com>
      Cr-Original-Commit-Position: refs/heads/master@{#667541}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 68e95088b6f73f489aa1e1023b7864794627cae1
      bbacb136
  14. Apr 18, 2019
  15. Apr 15, 2019
  16. Apr 08, 2019
  17. Mar 28, 2019
  18. Jan 31, 2019
  19. Dec 13, 2018
  20. Oct 31, 2018
  21. Aug 23, 2018
  22. Aug 07, 2018
  23. May 30, 2018
  24. Mar 27, 2018
  25. Feb 16, 2018
    • Adenilson Cavalcanti's avatar
      Compute crc32 using ARMv8 specific instruction · 72356729
      Adenilson Cavalcanti authored
      CRC32 affects performance for both image decoding (PNG)
      as also in general browsing while accessing websites that serve
      content using compression (i.e. Content-Encoding: gzip).
      
      This patch implements an optimized CRC32 function using the
      dedicated instruction available in ARMv8a. We only support
      ARM Little-Endian (LE).
      
      This instruction is available in new Android devices featuring an
      ARMv8 SoC, like Nexus 5x and Google Pixel. It should be between
      3x (A72) to 7x faster (A53) than the C implementation currently used
      by zlib for 8KB vectors.
      
      This is performance critical code and can be called with both large (8KB)
      or small vectors, therefore we must avoid extraneous function calls or
      branching (otherwise the performance benefits are negated). So the use
      of 'public' variables to read the CPU features status flags
      (i.e. arm_cpu_enable_crc32 | pmull).
      
      Finally it also introduces code to perform run-time ARM CPU feature
      detection on the supported platforms: Android and Linux/CrOS. We build
      and link the CRC32 instruction dependent code, but will decide to use it
      at run-time if the ARM CPU supports the CRC32 instruction. Otherwise,
      we fallback to using zlib's default C implementation.
      
      This approach allows to use the instruction in both 32bits and 64bits and
      works fine either in ARMv7 or ARMv8 processor. I tested the generated
      Chrome apk in both a Nexus 6 (ARMv7) and a Google Pixel (ARMv8).
      
      The crc32 function benefited from input from Yang Zang and Mike Klein,
      while the arm_features benefited from input from Noel Gordon.
      
      Bug: 709716
      Change-Id: I315c1216f8b3a8d88607630a28737c41f52a2f5d
      Reviewed-on: https://chromium-review.googlesource.com/801108
      
      
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Reviewed-by: default avatarNoel Gordon <noel@chromium.org>
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#537179}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 28c9623083688b3a354c33bf77746f4c51f58826
      72356729
  26. Feb 12, 2018
  27. Feb 09, 2018
  28. Feb 08, 2018
    • Noel Gordon's avatar
      Increase inflate speed: read decoder input into a uint64_t · 8a8edc1c
      Noel Gordon authored
      The chunk-copy code contribution deals with writing decoded DEFLATE data
      to the output with SIMD methods to increase inflate decode speed. Modern
      compilers such as gcc/clang/msvc elide the portable memcpy() calls used,
      replacing them with much faster SIMD machine instructions.
      
      Similarly, reading the input data to the DEFLATE decoder with wide, SIMD
      methods can also increase decode speed. See https://crbug.com/760853#c32
      for details; content-encoding: gzip decoding speed improves by 2.17x, in
      the median over the snappy testdata corpus, when this method is combined
      with the chunk-copy, and the adler32, and crc32 SIMD contributions (this
      method improves our current inflate decode speed by 20-30%).
      
      Update the chunk-copy code with a wide input data reader, which consumes
      input in 64-bit (8 byte) chunks. Update inflate_fast_chunk_() to use the
      wide reader. This feature is supported on little endian machines, and is
      enabled with the INFLATE_CHUNK_READ_64LE build flag in BUILD.gn on Intel
      CPU only for now.
      
      The wide reader idea is due to nigeltao@chromium.org who did the initial
      work. This patch is based on his patch [1]. No change in behavior (other
      than more inflate decode speed), so no new tests.
      
      [1] https://chromium-review.googlesource.com/c/chromium/src/+/601694/16
      
      Bug: 760853
      Change-Id: Ia806d9a225737039367e1b803624cd59e286ce51
      Reviewed-on: https://chromium-review.googlesource.com/900982
      
      
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Reviewed-by: default avatarMike Klein <mtklein@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#535365}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 6e212423a214e0e41794e8c9969c2896e2c33121
      8a8edc1c
  29. Jan 21, 2018
  30. Jan 04, 2018
    • Noel Gordon's avatar
      Improve zlib inflate speed by using SSE4.2 crc32 · 8e904b33
      Noel Gordon authored
      Using an SSE4.2-based crc32 improves the decoding rate of the PNG
      140 corpus by 4% average, giving a total 40% performance increase
      when combined with adler32 SIMD code and inflate chunk copy code,
      see https://crbug.com/796178#c2 for details.
      
      Raw crc32 speed is 5x - 25x faster than the zlib default "BYFOUR"
      crc32, and gzip- and zlib-wrapped inflate performance improves by
      69% and 50% for the snappy corpus (https://crbug.com/796178#c3 #4
      for details).
      
      Add crc32 SIMD implementation and update the call-site in crc32.c
      to use the new crc32 code, using run-time detection of the SSE4.2
      and PCLMUL support required by the crc32 SIMD code.
      
      Update BUILD.gn to compile the crc32 SIMD code for Intel devices,
      also update names.h with the new symbol defined by the crc32 SIMD
      code path.
      
      Bug: 796178
      Change-Id: I1bb94b47c9a4934eed01ba3d4feda51d67c4bf85
      Reviewed-on: https://chromium-review.googlesource.com/833820
      
      
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#526935}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 65e2abcb74b1c07fa14f46abaa1fb1717892eec3
      8e904b33
  31. Dec 20, 2017
    • Noel Gordon's avatar
      zlib adler_simd.c: unsigned cast |blocks| on assignment · e1769aea
      Noel Gordon authored
      MSVC noted the unsigned |n| = size_t |blocks| could be a possible
      loss in precision. No loss in precision occurs since (n > blocks)
      at this point: |blocks| fits in an unsigned type.
      
      To silence compiler warnings, first update BUILD.gn for the adler
      SIMD code to use chromium compiler:chromium_code rule (more error
      checking), rather than the permissive "compiler:no_chromium_code"
      rule. Then cast |blocks| to unsigned on assigment to |n| (this is
      safe to do as mentioned above).
      
      No change in behavior, no new tests.
      
      Tbr: cblume@chromium.org
      Bug: 762564
      Change-Id: Ia97120bcca206287fd42b97674f8a6215283e4a5
      Reviewed-on: https://chromium-review.googlesource.com/835927
      
      
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Reviewed-by: default avatarSam McNally <sammc@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#525285}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 0cb9a22e2fb55e092342192d66f7e33c14432d27
      e1769aea
  32. Dec 13, 2017
  33. Dec 12, 2017
  34. Dec 08, 2017
    • Noel Gordon's avatar
      Improve zlib inflate speed by using SSE2 chunk copy · 64ffef0b
      Noel Gordon authored
      Using SSE2 chunk copies improves the decoding rate of the PNG 140
      corpus by an average 17%, giving a total 37% performance increase
      when combined with SIMD adler32 code (https://crbug.com/772870#c3
      for details).
      
      Move the arm-specific code back into the main chunk copy code and
      generalize the SIMD parts of chunkset_core() with inline function
      calls for ARM, and Intel SSE2 devices. This removes the TODO from
      arm/chunkcopy_arm.h, and that file can be deleted as a result.
      
      Add SSE2 vector load / store SSE helpers for chunkset_core(). The
      existing NEON load code had alignment issues, as noted in review.
      Fix that: use unaligned loads in the ARM helper code.
      
      Change chunkcopy.h to use __builtin_memcpy if it's available, use
      zmemcpy otherwise such as on MSVC. Also call x86_check_features()
      in inflateInit2_() to keep the adler32 SIMD code path enabled.
      
      Update BUILD.gn to conditionally compile the SIMD chunk copy code
      on Intel SSE2 and ARM NEON devices. Update names.h to add the new
      symbol defined by the inflate chunk copy code path.
      
      Code had various comment styles; pick one and use it consistently
      everywhere. Add inffast_chunk.h TODO(cblume).
      
      Bug: 772870
      Change-Id: I47004c68ee675acf418825fb0e1f8fa8018d4342
      Reviewed-on: https://chromium-review.googlesource.com/708834
      
      
      Commit-Queue: Noel Gordon <noel@chromium.org>
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#522764}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: c293a3255eb27dee8879f85f2c45dedff58e2452
      64ffef0b
  35. Nov 30, 2017
    • Boris Sazonov's avatar
      Revert "Using ARMv8 CRC32 specific instruction" · 0f473a1d
      Boris Sazonov authored
      This reverts commit 35988c821c051a57e30c76f9fcd87b7b677bd9bd.
      
      Reason for revert: broke build ('cpu-features.h' not found)
      https://uberchromegw.corp.google.com/i/internal.client.clank/builders/x64-builder/builds/13697
      
      Original change's description:
      > Using ARMv8 CRC32 specific instruction
      > 
      > CRC32 affects performance for both image decompression (PNG)
      > as also in general browsing while accessing websites that serve
      > content using compression (i.e. Content-Encoding: gzip).
      > 
      > This patch implements an optimized CRC32 function using the
      > dedicated instruction available in ARMv8. This instruction is available
      > in new Android devices featuring an ARMv8 SoC, like Nexus 5x and
      > Google Pixel.
      > 
      > It should be between 6x (A53: 116ms X 22ms for a 4Kx4Kx4 buffer) to
      > 10x faster (A72: 91ms x 9ms) than the C implementation currently used
      > by zlib.
      > 
      > PNG decoding performance gains should be around 5-9%.
      > 
      > Finally it also introduces code to perform the ARM CPU features detection
      > using getauxval()@Linux/CrOS or android_getCpuFeatures(). We pre-built
      > and link the CRC32 instruction dependent code but will decide if to
      > use it at run time.
      > 
      > If the feature is not supported, we fallback to the C implementation.
      > 
      > This approach allows to use the instruction in both 32bits and 64bits
      > builds and works fine either in ARMv7 or ARMv8 processor. I tested the
      > generated Chromium apk in both a ARMv7 (Nexus 4 and 6) and ARMv8 (Nexus 5x and
      > Google Pixel).
      > 
      > Change-Id: I069408ebc06c49a3c2be4ba3253319e025ee09d7
      > Bug: 709716
      > Reviewed-on: https://chromium-review.googlesource.com/612629
      
      
      > Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      > Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
      > Cr-Commit-Position: refs/heads/master@{#520377}
      
      TBR=agl@chromium.org,noel@chromium.org,cavalcantii@chromium.org,cblume@chromium.org,mtklein@chromium.org,adenilson.cavalcanti@arm.com
      
      Change-Id: Ief2c32df5c8a37635f937cd6a671f5574f5a53a3
      No-Presubmit: true
      No-Tree-Checks: true
      No-Try: true
      Bug: 709716
      Reviewed-on: https://chromium-review.googlesource.com/799930
      
      
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Reviewed-by: default avatarBoris Sazonov <bsazonov@chromium.org>
      Commit-Queue: Boris Sazonov <bsazonov@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#520497}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: e7d9a4649bde6f047105d29f0026dd8c3d54143a
      0f473a1d
    • Adenilson Cavalcanti's avatar
      Using ARMv8 CRC32 specific instruction · d7601c23
      Adenilson Cavalcanti authored
      CRC32 affects performance for both image decompression (PNG)
      as also in general browsing while accessing websites that serve
      content using compression (i.e. Content-Encoding: gzip).
      
      This patch implements an optimized CRC32 function using the
      dedicated instruction available in ARMv8. This instruction is available
      in new Android devices featuring an ARMv8 SoC, like Nexus 5x and
      Google Pixel.
      
      It should be between 6x (A53: 116ms X 22ms for a 4Kx4Kx4 buffer) to
      10x faster (A72: 91ms x 9ms) than the C implementation currently used
      by zlib.
      
      PNG decoding performance gains should be around 5-9%.
      
      Finally it also introduces code to perform the ARM CPU features detection
      using getauxval()@Linux/CrOS or android_getCpuFeatures(). We pre-built
      and link the CRC32 instruction dependent code but will decide if to
      use it at run time.
      
      If the feature is not supported, we fallback to the C implementation.
      
      This approach allows to use the instruction in both 32bits and 64bits
      builds and works fine either in ARMv7 or ARMv8 processor. I tested the
      generated Chromium apk in both a ARMv7 (Nexus 4 and 6) and ARMv8 (Nexus 5x and
      Google Pixel).
      
      Change-Id: I069408ebc06c49a3c2be4ba3253319e025ee09d7
      Bug: 709716
      Reviewed-on: https://chromium-review.googlesource.com/612629
      
      
      Reviewed-by: default avatarChris Blume <cblume@chromium.org>
      Commit-Queue: Adenilson Cavalcanti <cavalcantii@chromium.org>
      Cr-Original-Commit-Position: refs/heads/master@{#520377}
      Cr-Mirrored-From: https://chromium.googlesource.com/chromium/src
      Cr-Mirrored-Commit: 35988c821c051a57e30c76f9fcd87b7b677bd9bd
      d7601c23
Loading