Skip to content
Snippets Groups Projects
  • Naveen N Rao's avatar
    powerpc64/ftrace: Move ftrace sequence out of line · eec37961
    Naveen N Rao authored
    
    Function profile sequence on powerpc includes two instructions at the
    beginning of each function:
    	mflr	r0
    	bl	ftrace_caller
    
    The call to ftrace_caller() gets nop'ed out during kernel boot and is
    patched in when ftrace is enabled.
    
    Given the sequence, we cannot return from ftrace_caller with 'blr' as we
    need to keep LR and r0 intact. This results in link stack (return
    address predictor) imbalance when ftrace is enabled. To address that, we
    would like to use a three instruction sequence:
    	mflr	r0
    	bl	ftrace_caller
    	mtlr	r0
    
    Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to
    reserve two instruction slots before the function. This results in a
    total of five instruction slots to be reserved for ftrace use on each
    function that is traced.
    
    Move the function profile sequence out-of-line to minimize its impact.
    To do this, we reserve a single nop at function entry using
    -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine
    the total number of functions that can be traced. This is then used to
    generate a .S file reserving the appropriate amount of space for use as
    ftrace stubs, which is built and linked into vmlinux.
    
    On bootup, the stub space is split into separate stubs per function and
    populated with the proper instruction sequence. A pointer to the
    associated stub is maintained in dyn_arch_ftrace.
    
    For modules, space for ftrace stubs is reserved from the generic module
    stub space.
    
    This is restricted to and enabled by default only on 64-bit powerpc,
    though there are some changes to accommodate 32-bit powerpc. This is
    done so that 32-bit powerpc could choose to opt into this based on
    further tests and benchmarks.
    
    As an example, after this patch, kernel functions will have a single nop
    at function entry:
    <kernel_clone>:
    	addis	r2,r12,467
    	addi	r2,r2,-16028
    	nop
    	mfocrf	r11,8
    	...
    
    When ftrace is enabled, the nop is converted to an unconditional branch
    to the stub associated with that function:
    <kernel_clone>:
    	addis	r2,r12,467
    	addi	r2,r2,-16028
    	b	ftrace_ool_stub_text_end+0x11b28
    	mfocrf	r11,8
    	...
    
    The associated stub:
    <ftrace_ool_stub_text_end+0x11b28>:
    	mflr	r0
    	bl	ftrace_caller
    	mtlr	r0
    	b	kernel_clone+0xc
    	...
    
    This change showed an improvement of ~10% in null_syscall benchmark on a
    Power 10 system with ftrace enabled.
    
    Signed-off-by: default avatarNaveen N Rao <naveen@kernel.org>
    Signed-off-by: default avatarHari Bathini <hbathini@linux.ibm.com>
    Signed-off-by: default avatarMichael Ellerman <mpe@ellerman.id.au>
    Link: https://patch.msgid.link/20241030070850.1361304-13-hbathini@linux.ibm.com
    eec37961