Skip to content
Snippets Groups Projects
Forked from Linaro / linux / kernel / torvalds / linux
Source project has a limited visibility.
  • Linus Torvalds's avatar
    Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of... · 0df82189
    Linus Torvalds authored
    Merge tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
    
    Pull perf tools updates from Arnaldo Carvalho de Melo:
     "Miscellaneous:
    
       - Add Ian Rogers to MAINTAINERS as a perf tools reviewer.
    
       - Add support for retire latency feature (pipeline stall of a
         instruction compared to the previous one, in cycles) present on
         some Intel processors.
    
       - Add 'perf c2c' report option to show false sharing with adjacent
         cachelines, to be used in machines with cacheline prefetching,
         where accesses to a cacheline brings the next one too.
    
       - Skip 'perf test bpf' when the required kernel-debuginfo package
         isn't installed.
    
       - Avoid d3-flame-graph package dependency in 'perf script flamegraph',
         making this feature more generally available.
    
       - Add JSON metric events to present CPI stall cycles in Power10.
    
       - Assorted improvements/refactorings on the JSON metrics parsing
         code.
    
      perf lock contention:
    
       - Add -o/--lock-owner option:
    
            $ sudo ./perf lock contention -abo -- ./perf bench sched pipe
            # Running 'sched/pipe' benchmark:
            # Executed 1000000 pipe operations between two processes
    
                 Total time: 4.766 [sec]
    
                   4.766540 usecs/op
                     209795 ops/sec
             contended   total wait     max wait     avg wait          pid   owner
    
                   403    565.32 us     26.81 us      1.40 us           -1   Unknown
                     4     27.99 us      8.57 us      7.00 us      1583145   sched-pipe
                     1      8.25 us      8.25 us      8.25 us      1583144   sched-pipe
                     1      2.03 us      2.03 us      2.03 us         5068   chrome
    
             The owner is unknown in most cases. Filtering only for the
             mutex locks, it will more likely get the owners.
    
       - -S/--callstack-filter is to limit display entries having the given
         string in the callstack:
    
            $ sudo ./perf lock contention -abv -S net sleep 1
            ...
             contended   total wait     max wait     avg wait         type   caller
    
                     5     70.20 us     16.13 us     14.04 us     spinlock   __dev_queue_xmit+0xb6d
                                    0xffffffffa5dd1c60  _raw_spin_lock+0x30
                                    0xffffffffa5b8f6ed  __dev_queue_xmit+0xb6d
                                    0xffffffffa5cd8267  ip6_finish_output2+0x2c7
                                    0xffffffffa5cdac14  ip6_finish_output+0x1d4
                                    0xffffffffa5cdb477  ip6_xmit+0x457
                                    0xffffffffa5d1fd17  inet6_csk_xmit+0xd7
                                    0xffffffffa5c5f4aa  __tcp_transmit_skb+0x54a
                                    0xffffffffa5c6467d  tcp_keepalive_timer+0x2fd
    
         Please note that to have the -b option (BPF) working above one has
         to build with BUILD_BPF_SKEL=1.
    
       - Add more 'perf test' entries to test these new features.
    
      perf script:
    
       - Add 'cgroup' field for 'perf script' output:
    
            $ perf record --all-cgroups -- true
            $ perf script -F comm,pid,cgroup
                      true 337112  /user.slice/user-657345.slice/user@657345.service/...
                      true 337112  /user.slice/user-657345.slice/user@657345.service/...
                      true 337112  /user.slice/user-657345.slice/user@657345.service/...
                      true 337112  /user.slice/user-657345.slice/user@657345.service/...
    
       - Add support for showing branch speculation information in 'perf
         script' and in the 'perf report' raw dump (-D).
    
      perf record:
    
       - Fix 'perf record' segfault with --overwrite and --max-size.
    
      perf test/bench:
    
       - Switch basic BPF filtering test to use syscall tracepoint to avoid
         the variable number of probes inserted when using the previous
         probe point (do_epoll_wait) that happens on different CPU
         architectures.
    
       - Fix DWARF unwind test by adding non-inline to expected function in
         a backtrace.
    
       - Use 'grep -c' where the longer form 'grep | wc -l' was being used.
    
       - Add getpid and execve benchmarks to 'perf bench syscall'.
    
      Intel PT:
    
       - Add support for synthesizing "cycle" events from Intel PT traces as
         we support "instruction" events when Intel PT CYC packets are
         available. This enables much more accurate profiles than when using
         the regular 'perf record -e cycles' (the default) when the workload
         lasts for very short periods (<10ms).
    
       - .plt symbol handling improvements, better handling IBT (in the past
         MPX) done in the context of decoding Intel PT processor traces,
         IFUNC symbols on x86_64, static executables, understanding .plt.got
         symbols on x86_64.
    
       - Add a 'perf test' to test symbol resolution, part of the .plt
         improvements series, this tests things like symbol size in contexts
         where only the symbol start is available (kallsyms), etc.
    
       - Better handle auxtrace/Intel PT data when using pipe mode (perf
         record sleep 1|perf report).
    
       - Fix symbol lookup with kcore with multiple segments match stext,
         getting the symbol resolution to just show DSOs as unknown.
    
      ARM:
    
       - Timestamp improvements for ARM64 systems with ETMv4 (Embedded Trace
         Macrocell v4).
    
       - Ensure ARM64 CoreSight timestamps don't go backwards.
    
       - Document that ARM64 SPE (Statistical Profiling Extension) is used
         with 'perf c2c/mem'.
    
       - Add raw decoding for ARM64 SPEv1.2 previous branch address.
    
       - Update neoverse-n2-v2 ARM vendor events (JSON tables): topdown L1,
         TLB, cache, branch, PE utilization and instruction mix metrics.
    
       - Update decoder code for OpenCSD version 1.4, on ARM64 systems.
    
       - Fix command line auto-complete of CPU events on aarch64.
    
      Build:
    
       - Fix 'perf probe' and 'perf test' when libtraceevent isn't linked,
         as several tests use tracepoints, those should be skipped.
    
       - More fallout fixes for the removal of tools/lib/traceevent/.
    
       - Fix build error when linking with libpfm"
    
    * tag 'perf-tools-for-v6.3-1-2023-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (114 commits)
      perf tests stat_all_metrics: Change true workload to sleep workload for system wide check
      perf vendor events power10: Add JSON metric events to present CPI stall cycles in powerpc
      perf intel-pt: Synthesize cycle events
      perf c2c: Add report option to show false sharing in adjacent cachelines
      perf record: Fix segfault with --overwrite and --max-size
      perf stat: Avoid merging/aggregating metric counts twice
      perf tools: Fix perf tool build error in util/pfm.c
      perf tools: Fix auto-complete on aarch64
      perf lock contention: Support old rw_semaphore type
      perf lock contention: Add -o/--lock-owner option
      perf lock contention: Fix to save callstack for the default modified
      perf test bpf: Skip test if kernel-debuginfo is not present
      perf probe: Update the exit error codes in function try_to_find_probe_trace_event
      perf script: Fix missing Retire Latency fields option documentation
      perf event x86: Add retire_lat when synthesizing PERF_SAMPLE_WEIGHT_STRUCT
      perf test x86: Support the retire_lat (Retire Latency) sample_type check
      perf test bpf: Check for libtraceevent support
      perf script: Support Retire Latency
      perf report: Support Retire Latency
      perf lock contention: Support filters for different aggregation
      ...
    0df82189