Skip to content
Snippets Groups Projects
  1. Apr 07, 2022
  2. Mar 31, 2022
  3. Mar 29, 2022
  4. Mar 25, 2022
  5. Mar 22, 2022
  6. Mar 18, 2022
  7. Mar 15, 2022
  8. Mar 04, 2022
    • Oleg Nesterov's avatar
      signal, x86: Delay calling signals in atomic on RT enabled kernels · bf9ad37d
      Oleg Nesterov authored
      
      On x86_64 we must disable preemption before we enable interrupts
      for stack faults, int3 and debugging, because the current task is using
      a per CPU debug stack defined by the IST. If we schedule out, another task
      can come in and use the same stack and cause the stack to be corrupted
      and crash the kernel on return.
      
      When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and
      one of these is the spin lock used in signal handling.
      
      Some of the debug code (int3) causes do_trap() to send a signal.
      This function calls a spinlock_t lock that has been converted to a
      sleeping lock. If this happens, the above issues with the corrupted
      stack is possible.
      
      Instead of calling the signal right away, for PREEMPT_RT and x86,
      the signal information is stored on the stacks task_struct and
      TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
      code will send the signal when preemption is enabled.
      
      [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
        ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
      [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ]
      [ tglx: Use a config option ]
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de
      bf9ad37d
  9. Mar 02, 2022
  10. Feb 28, 2022
    • Mateusz Jończyk's avatar
      x86/Kconfig: move and modify CONFIG_I8K · a7a6f65a
      Mateusz Jończyk authored
      
      In Kconfig, inside the "Processor type and features" menu, there is
      the CONFIG_I8K option: "Dell i8k legacy laptop support". This is
      very confusing - enabling CONFIG_I8K is not required for the kernel to
      support old Dell laptops. This option is specific to the dell-smm-hwmon
      driver, which mostly exports some hardware monitoring information and
      allows the user to change fan speed.
      
      This option is misplaced, so move CONFIG_I8K to drivers/hwmon/Kconfig,
      where it belongs.
      
      Also, modify the dependency order - change
              select SENSORS_DELL_SMM
      to
              depends on SENSORS_DELL_SMM
      as it is just a configuration option of dell-smm-hwmon. This includes
      changing the option type from tristate to bool. It was tristate because
      it could select CONFIG_SENSORS_DELL_SMM=m .
      
      When running "make oldconfig" on configurations with
      CONFIG_SENSORS_DELL_SMM enabled , this change will result in an
      additional question (which could be printed several times during
      bisecting). I think that tidying up the configuration is worth it,
      though.
      
      Next patch tweaks the description of CONFIG_I8K.
      
      Signed-off-by: default avatarMateusz Jończyk <mat.jonczyk@o2.pl>
      Cc: Pali Rohár <pali@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Jean Delvare <jdelvare@suse.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Mark Gross <markgross@kernel.org>
      Reviewed-by: default avatarHans de Goede <hdegoede@redhat.com>
      Reviewed-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/20220212125654.357408-1-mat.jonczyk@o2.pl
      
      
      Signed-off-by: default avatarGuenter Roeck <linux@roeck-us.net>
      a7a6f65a
  11. Feb 26, 2022
    • Kees Cook's avatar
      usercopy: Check valid lifetime via stack depth · 2792d84e
      Kees Cook authored
      One of the things that CONFIG_HARDENED_USERCOPY sanity-checks is whether
      an object that is about to be copied to/from userspace is overlapping
      the stack at all. If it is, it performs a number of inexpensive
      bounds checks. One of the finer-grained checks is whether an object
      crosses stack frames within the stack region. Doing this on x86 with
      CONFIG_FRAME_POINTER was cheap/easy. Doing it with ORC was deemed too
      heavy, and was left out (a while ago), leaving the courser whole-stack
      check.
      
      The LKDTM tests USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM
      try to exercise these cross-frame cases to validate the defense is
      working. They have been failing ever since ORC was added (which was
      expected). While Muhammad was investigating various LKDTM failures[1],
      he asked me for additional details on them, and I realized that when
      exact stack frame boundary checking is not available (i.e. everything
      except x86 with FRAME_POINTER), it could check if a stack object is at
      least "current depth valid", in the sense that any object within the
      stack region but not between start-of-stack and current_stack_pointer
      should be considered unavailable (i.e. its lifetime is from a call no
      longer present on the stack).
      
      Introduce ARCH_HAS_CURRENT_STACK_POINTER to track which architectures
      have actually implemented the common global register alias.
      
      Additionally report usercopy bounds checking failures with an offset
      from current_stack_pointer, which may assist with diagnosing failures.
      
      The LKDTM USERCOPY_STACK_FRAME_TO and USERCOPY_STACK_FRAME_FROM tests
      (once slightly adjusted in a separate patch) pass again with this fixed.
      
      [1] https://github.com/kernelci/kernelci-project/issues/84
      
      
      
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mm@kvack.org
      Reported-by: default avatarMuhammad Usama Anjum <usama.anjum@collabora.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      ---
      v1: https://lore.kernel.org/lkml/20220216201449.2087956-1-keescook@chromium.org
      v2: https://lore.kernel.org/lkml/20220224060342.1855457-1-keescook@chromium.org
      v3: https://lore.kernel.org/lkml/20220225173345.3358109-1-keescook@chromium.org
      v4: - improve commit log (akpm)
      2792d84e
  12. Feb 19, 2022
    • Mark Rutland's avatar
      sched/preempt: Add PREEMPT_DYNAMIC using static keys · 99cf983c
      Mark Rutland authored
      
      Where an architecture selects HAVE_STATIC_CALL but not
      HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline
      which will either branch to a callee or return to the caller.
      
      On such architectures, a number of constraints can conspire to make
      those trampolines more complicated and potentially less useful than we'd
      like. For example:
      
      * Hardware and software control flow integrity schemes can require the
        addition of "landing pad" instructions (e.g. `BTI` for arm64), which
        will also be present at the "real" callee.
      
      * Limited branch ranges can require that trampolines generate or load an
        address into a register and perform an indirect branch (or at least
        have a slow path that does so). This loses some of the benefits of
        having a direct branch.
      
      * Interaction with SW CFI schemes can be complicated and fragile, e.g.
        requiring that we can recognise idiomatic codegen and remove
        indirections understand, at least until clang proves more helpful
        mechanisms for dealing with this.
      
      For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we
      really only need to enable/disable specific preemption functions. We can
      achieve the same effect without a number of the pain points above by
      using static keys to fold early returns into the preemption functions
      themselves rather than in an out-of-line trampoline, effectively
      inlining the trampoline into the start of the function.
      
      For arm64, this results in good code generation. For example, the
      dynamic_cond_resched() wrapper looks as follows when enabled. When
      disabled, the first `B` is replaced with a `NOP`, resulting in an early
      return.
      
      | <dynamic_cond_resched>:
      |        bti     c
      |        b       <dynamic_cond_resched+0x10>     // or `nop`
      |        mov     w0, #0x0
      |        ret
      |        mrs     x0, sp_el0
      |        ldr     x0, [x0, #8]
      |        cbnz    x0, <dynamic_cond_resched+0x8>
      |        paciasp
      |        stp     x29, x30, [sp, #-16]!
      |        mov     x29, sp
      |        bl      <preempt_schedule_common>
      |        mov     w0, #0x1
      |        ldp     x29, x30, [sp], #16
      |        autiasp
      |        ret
      
      ... compared to the regular form of the function:
      
      | <__cond_resched>:
      |        bti     c
      |        mrs     x0, sp_el0
      |        ldr     x1, [x0, #8]
      |        cbz     x1, <__cond_resched+0x18>
      |        mov     w0, #0x0
      |        ret
      |        paciasp
      |        stp     x29, x30, [sp, #-16]!
      |        mov     x29, sp
      |        bl      <preempt_schedule_common>
      |        mov     w0, #0x1
      |        ldp     x29, x30, [sp], #16
      |        autiasp
      |        ret
      
      Any architecture which implements static keys should be able to use this
      to implement PREEMPT_DYNAMIC with similar cost to non-inlined static
      calls. Since this is likely to have greater overhead than (inlined)
      static calls, PREEMPT_DYNAMIC is only defaulted to enabled when
      HAVE_PREEMPT_DYNAMIC_CALL is selected.
      
      Signed-off-by: default avatarMark Rutland <mark.rutland@arm.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: default avatarArd Biesheuvel <ardb@kernel.org>
      Acked-by: default avatarFrederic Weisbecker <frederic@kernel.org>
      Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com
      99cf983c
  13. Feb 08, 2022
  14. Jan 28, 2022
  15. Jan 23, 2022
  16. Jan 20, 2022
    • Marco Elver's avatar
      kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR · bece04b5
      Marco Elver authored
      Until recent versions of GCC and Clang, it was not possible to disable
      KCOV instrumentation via a function attribute.  The relevant function
      attribute was introduced in 540540d0 ("kcov: add
      __no_sanitize_coverage to fix noinstr for all architectures").
      
      x86 was the first architecture to want a working noinstr, and at the
      time no compiler support for the attribute existed yet.  Therefore,
      commit 0f1441b4 ("objtool: Fix noinstr vs KCOV") introduced the
      ability to NOP __sanitizer_cov_*() calls in .noinstr.text.
      
      However, this doesn't work for other architectures like arm64 and s390
      that want a working noinstr per ARCH_WANTS_NO_INSTR.
      
      At the time of 0f1441b4, we didn't yet have ARCH_WANTS_NO_INSTR,
      but now we can move the Kconfig dependency checks to the generic KCOV
      option.  KCOV will be available if:
      
      	- architecture does not care about noinstr, OR
      	- we have objtool support (like on x86), OR
      	- GCC is 12.0 or newer, OR
      	- Clang is 13.0 or newer.
      
      Link: https://lkml.kernel.org/r/20211201152604.3984495-1-elver@google.com
      
      
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Acked-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bece04b5
    • Kefeng Wang's avatar
      mm: percpu: generalize percpu related config · 7ecd19cf
      Kefeng Wang authored
      Patch series "mm: percpu: Cleanup percpu first chunk function".
      
      When supporting page mapping percpu first chunk allocator on arm64, we
      found there are lots of duplicated codes in percpu embed/page first chunk
      allocator.  This patchset is aimed to cleanup them and should no function
      change.
      
      The currently supported status about 'embed' and 'page' in Archs shows
      below,
      
      	embed: NEED_PER_CPU_PAGE_FIRST_CHUNK
      	page:  NEED_PER_CPU_EMBED_FIRST_CHUNK
      
      		embed	page
      	------------------------
      	arm64	  Y	 Y
      	mips	  Y	 N
      	powerpc	  Y	 Y
      	riscv	  Y	 N
      	sparc	  Y	 Y
      	x86	  Y	 Y
      	------------------------
      
      There are two interfaces about percpu first chunk allocator,
      
       extern int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
                                      size_t atom_size,
                                      pcpu_fc_cpu_distance_fn_t cpu_distance_fn,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
       extern int __init pcpu_page_first_chunk(size_t reserved_size,
      -                               pcpu_fc_alloc_fn_t alloc_fn,
      -                               pcpu_fc_free_fn_t free_fn,
      -                               pcpu_fc_populate_pte_fn_t populate_pte_fn);
      +                               pcpu_fc_cpu_to_node_fn_t cpu_to_nd_fn);
      
      The pcpu_fc_alloc_fn_t/pcpu_fc_free_fn_t is killed, we provide generic
      pcpu_fc_alloc() and pcpu_fc_free() function, which are called in the
      pcpu_embed/page_first_chunk().
      
      1) For pcpu_embed_first_chunk(), pcpu_fc_cpu_to_node_fn_t is needed to be
         provided when archs supported NUMA.
      
      2) For pcpu_page_first_chunk(), the pcpu_fc_populate_pte_fn_t is killed too,
         a generic pcpu_populate_pte() which marked '__weak' is provided, if you
         need a different function to populate pte on the arch(like x86), please
         provide its own implementation.
      
      [1] https://github.com/kevin78/linux.git percpu-cleanup
      
      This patch (of 4):
      
      The HAVE_SETUP_PER_CPU_AREA/NEED_PER_CPU_EMBED_FIRST_CHUNK/
      NEED_PER_CPU_PAGE_FIRST_CHUNK/USE_PERCPU_NUMA_NODE_ID configs, which have
      duplicate definitions on platforms that subscribe it.
      
      Move them into mm, drop these redundant definitions and instead just
      select it on applicable platforms.
      
      Link: https://lkml.kernel.org/r/20211216112359.103822-1-wangkefeng.wang@huawei.com
      Link: https://lkml.kernel.org/r/20211216112359.103822-2-wangkefeng.wang@huawei.com
      
      
      Signed-off-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Will Deacon <will@kernel.org>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7ecd19cf
  17. Jan 15, 2022
  18. Dec 11, 2021
  19. Dec 09, 2021
    • Jarkko Sakkinen's avatar
      x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node · 50468e43
      Jarkko Sakkinen authored
      
      == Problem ==
      
      The amount of SGX memory on a system is determined by the BIOS and it
      varies wildly between systems.  It can be as small as dozens of MB's
      and as large as many GB's on servers.  Just like how applications need
      to know how much regular RAM is available, enclave builders need to
      know how much SGX memory an enclave can consume.
      
      == Solution ==
      
      Introduce a new sysfs file:
      
      	/sys/devices/system/node/nodeX/x86/sgx_total_bytes
      
      to enumerate the amount of SGX memory available in each NUMA node.
      This serves the same function for SGX as /proc/meminfo or
      /sys/devices/system/node/nodeX/meminfo does for normal RAM.
      
      'sgx_total_bytes' is needed today to help drive the SGX selftests.
      SGX-specific swap code is exercised by creating overcommitted enclaves
      which are larger than the physical SGX memory on the system.  They
      currently use a CPUID-based approach which can diverge from the actual
      amount of SGX memory available.  'sgx_total_bytes' ensures that the
      selftests can work efficiently and do not attempt stupid things like
      creating a 100,000 MB enclave on a system with 128 MB of SGX memory.
      
      == Implementation Details ==
      
      Introduce CONFIG_HAVE_ARCH_NODE_DEV_GROUP opt-in flag to expose an
      arch specific attribute group, and add an attribute for the amount of
      SGX memory in bytes to each NUMA node:
      
      == ABI Design Discussion ==
      
      As opposed to the per-node ABI, a single, global ABI was considered.
      However, this would prevent enclaves from being able to size
      themselves so that they fit on a single NUMA node.  Essentially, a
      single value would rule out NUMA optimizations for enclaves.
      
      Create a new "x86/" directory inside each "nodeX/" sysfs directory.
      'sgx_total_bytes' is expected to be the first of at least a few
      sgx-specific files to be placed in the new directory.  Just scanning
      /proc/meminfo, these are the no-brainers that we have for RAM, but we
      need for SGX:
      
      	MemTotal:       xxxx kB // sgx_total_bytes (implemented here)
      	MemFree:        yyyy kB // sgx_free_bytes
      	SwapTotal:      zzzz kB // sgx_swapped_bytes
      
      So, at *least* three.  I think we will eventually end up needing
      something more along the lines of a dozen.  A new directory (as
      opposed to being in the nodeX/ "root") directory avoids cluttering the
      root with several "sgx_*" files.
      
      Place the new file in a new "nodeX/x86/" directory because SGX is
      highly x86-specific.  It is very unlikely that any other architecture
      (or even non-Intel x86 vendor) will ever implement SGX.  Using "sgx/"
      as opposed to "x86/" was also considered.  But, there is a real chance
      this can get used for other arch-specific purposes.
      
      [ dhansen: rewrite changelog ]
      
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20211116162116.93081-2-jarkko@kernel.org
      50468e43
    • Peter Zijlstra's avatar
      x86: Add straight-line-speculation mitigation · e463a09a
      Peter Zijlstra authored
      Make use of an upcoming GCC feature to mitigate
      straight-line-speculation for x86:
      
        https://gcc.gnu.org/g:53a643f8568067d7700a9f2facc8ba39974973d3
        https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102952
        https://bugs.llvm.org/show_bug.cgi?id=52323
      
      
      
      It's built tested on x86_64-allyesconfig using GCC-12 and GCC-11.
      
      Maintenance overhead of this should be fairly low due to objtool
      validation.
      
      Size overhead of all these additional int3 instructions comes to:
      
           text	   data	    bss	    dec	    hex	filename
        22267751	6933356	2011368	31212475	1dc43bb	defconfig-build/vmlinux
        22804126	6933356	1470696	31208178	1dc32f2	defconfig-build/vmlinux.sls
      
      Or roughly 2.4% additional text.
      
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lore.kernel.org/r/20211204134908.140103474@infradead.org
      e463a09a
  20. Dec 08, 2021
  21. Dec 05, 2021
    • Tom Lendacky's avatar
      x86/sme: Explicitly map new EFI memmap table as encrypted · 1ff2fc02
      Tom Lendacky authored
      Reserving memory using efi_mem_reserve() calls into the x86
      efi_arch_mem_reserve() function. This function will insert a new EFI
      memory descriptor into the EFI memory map representing the area of
      memory to be reserved and marking it as EFI runtime memory. As part
      of adding this new entry, a new EFI memory map is allocated and mapped.
      The mapping is where a problem can occur. This new memory map is mapped
      using early_memremap() and generally mapped encrypted, unless the new
      memory for the mapping happens to come from an area of memory that is
      marked as EFI_BOOT_SERVICES_DATA memory. In this case, the new memory will
      be mapped unencrypted. However, during replacement of the old memory map,
      efi_mem_type() is disabled, so the new memory map will now be long-term
      mapped encrypted (in efi.memmap), resulting in the map containing invalid
      data and causing the kernel boot to crash.
      
      Since it is known that the area will be mapped encrypted going forward,
      explicitly map the new memory map as encrypted using early_memremap_prot().
      
      Cc: <stable@vger.kernel.org> # 4.14.x
      Fixes: 8f716c9b ("x86/mm: Add support to access boot related data in the clear")
      Link: https://lore.kernel.org/all/ebf1eb2940405438a09d51d121ec0d02c8755558.1634752931.git.thomas.lendacky@amd.com/
      
      
      Signed-off-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      [ardb: incorporate Kconfig fix by Arnd]
      Signed-off-by: default avatarArd Biesheuvel <ardb@kernel.org>
      1ff2fc02
  22. Nov 18, 2021
  23. Nov 15, 2021
    • Tony Luck's avatar
      x86/sgx: Add infrastructure to identify SGX EPC pages · 40e0e784
      Tony Luck authored
      
      X86 machine check architecture reports a physical address when there
      is a memory error. Handling that error requires a method to determine
      whether the physical address reported is in any of the areas reserved
      for EPC pages by BIOS.
      
      SGX EPC pages do not have Linux "struct page" associated with them.
      
      Keep track of the mapping from ranges of EPC pages to the sections
      that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to
      the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX.
      
      Create a function arch_is_platform_page() that simply reports whether an
      address is an EPC page for use elsewhere in the kernel. The ACPI error
      injection code needs this function and is typically built as a module,
      so export it.
      
      Note that arch_is_platform_page() will be slower than other similar
      "what type is this page" functions that can simply check bits in the
      "struct page".  If there is some future performance critical user of
      this function it may need to be implemented in a more efficient way.
      
      Note also that the current implementation of xarray allocates a few
      hundred kilobytes for this usage on a system with 4GB of SGX EPC memory
      configured. This isn't ideal, but worth it for the code simplicity.
      
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Tested-by: default avatarReinette Chatre <reinette.chatre@intel.com>
      Link: https://lkml.kernel.org/r/20211026220050.697075-3-tony.luck@intel.com
      40e0e784
  24. Nov 06, 2021
  25. Oct 26, 2021
    • Masami Hiramatsu's avatar
      kprobes: Add a test case for stacktrace from kretprobe handler · 1f6d3a8f
      Masami Hiramatsu authored
      Add a test case for stacktrace from kretprobe handler and
      nested kretprobe handlers.
      
      This test checks both of stack trace inside kretprobe handler
      and stack trace from pt_regs. Those stack trace must include
      actual function return address instead of kretprobe trampoline.
      The nested kretprobe stacktrace test checks whether the unwinder
      can correctly unwind the call frame on the stack which has been
      modified by the kretprobe.
      
      Since the stacktrace on kretprobe is correctly fixed only on x86,
      this introduces a meta kconfig ARCH_CORRECT_STACKTRACE_ON_KRETPROBE
      which tells user that the stacktrace on kretprobe is correct or not.
      
      The test results will be shown like below;
      
       TAP version 14
       1..1
           # Subtest: kprobes_test
           1..6
           ok 1 - test_kprobe
           ok 2 - test_kprobes
           ok 3 - test_kretprobe
           ok 4 - test_kretprobes
           ok 5 - test_stacktrace_on_kretprobe
           ok 6 - test_stacktrace_on_nested_kretprobe
       # kprobes_test: pass:6 fail:0 skip:0 total:6
       # Totals: pass:6 fail:0 skip:0 total:6
       ok 1 - kprobes_test
      
      Link: https://lkml.kernel.org/r/163516211244.604541.18350507860972214415.stgit@devnote2
      
      
      
      Signed-off-by: default avatarMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: default avatarSteven Rostedt (VMware) <rostedt@goodmis.org>
      1f6d3a8f
    • Thomas Gleixner's avatar
      x86/signal: Implement sigaltstack size validation · 3aac3ebe
      Thomas Gleixner authored
      
      For historical reasons MINSIGSTKSZ is a constant which became already too
      small with AVX512 support.
      
      Add a mechanism to enforce strict checking of the sigaltstack size against
      the real size of the FPU frame.
      
      The strict check can be enabled via a config option and can also be
      controlled via the kernel command line option 'strict_sas_size' independent
      of the config switch.
      
      Enabling it might break existing applications which allocate a too small
      sigaltstack but 'work' because they never get a signal delivered. Though it
      can be handy to filter out binaries which are not yet aware of
      AT_MINSIGSTKSZ.
      
      Also the upcoming support for dynamically enabled FPU features requires a
      strict sanity check to ensure that:
      
         - Enabling of a dynamic feature, which changes the sigframe size fits
           into an enabled sigaltstack
      
         - Installing a too small sigaltstack after a dynamic feature has been
           added is not possible.
      
      Implement the base check which is controlled by config and command line
      options.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarChang S. Bae <chang.seok.bae@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20211021225527.10184-3-chang.seok.bae@intel.com
      3aac3ebe
  26. Oct 21, 2021
  27. Oct 19, 2021
  28. Oct 15, 2021
    • Tim Chen's avatar
      sched: Add cluster scheduler level for x86 · 66558b73
      Tim Chen authored
      
      There are x86 CPU architectures (e.g. Jacobsville) where L2 cahce is
      shared among a cluster of cores instead of being exclusive to one
      single core.
      
      To prevent oversubscription of L2 cache, load should be balanced
      between such L2 clusters, especially for tasks with no shared data.
      On benchmark such as SPECrate mcf test, this change provides a boost
      to performance especially on medium load system on Jacobsville.  on a
      Jacobsville that has 24 Atom cores, arranged into 6 clusters of 4
      cores each, the benchmark number is as follow:
      
       Improvement over baseline kernel for mcf_r
       copies		run time	base rate
       1		-0.1%		-0.2%
       6		25.1%		25.1%
       12		18.8%		19.0%
       24		0.3%		0.3%
      
      So this looks pretty good. In terms of the system's task distribution,
      some pretty bad clumping can be seen for the vanilla kernel without
      the L2 cluster domain for the 6 and 12 copies case. With the extra
      domain for cluster, the load does get evened out between the clusters.
      
      Note this patch isn't an universal win as spreading isn't necessarily
      a win, particually for those workload who can benefit from packing.
      
      Signed-off-by: default avatarTim Chen <tim.c.chen@linux.intel.com>
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/r/20210924085104.44806-4-21cnbao@gmail.com
      66558b73
  29. Oct 11, 2021
    • Borislav Petkov's avatar
      x86/Kconfig: Do not enable AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT automatically · 71188590
      Borislav Petkov authored
      
      This Kconfig option was added initially so that memory encryption is
      enabled by default on machines which support it.
      
      However, devices which have DMA masks that are less than the bit
      position of the encryption bit, aka C-bit, require the use of an IOMMU
      or the use of SWIOTLB.
      
      If the IOMMU is disabled or in passthrough mode, the kernel would switch
      to SWIOTLB bounce-buffering for those transfers.
      
      In order to avoid that,
      
        2cc13bb4 ("iommu: Disable passthrough mode when SME is active")
      
      disables the default IOMMU passthrough mode so that devices for which the
      default 256K DMA is insufficient, can use the IOMMU instead.
      
      However 2, there are cases where the IOMMU is disabled in the BIOS, etc.
      (think the usual hardware folk "oops, I dropped the ball there" cases) or a
      driver doesn't properly use the DMA APIs or a device has a firmware or
      hardware bug, e.g.:
      
        ea68573d ("drm/amdgpu: Fail to load on RAVEN if SME is active")
      
      However 3, in the above GPU use case, there are APIs like Vulkan and
      some OpenGL/OpenCL extensions which are under the assumption that
      user-allocated memory can be passed in to the kernel driver and both the
      GPU and CPU can do coherent and concurrent access to the same memory.
      That cannot work with SWIOTLB bounce buffers, of course.
      
      So, in order for those devices to function, drop the "default y" for the
      SME by default active option so that users who want to have SME enabled,
      will need to either enable it in their config or use "mem_encrypt=on" on
      the kernel command line.
      
       [ tlendacky: Generalize commit message. ]
      
      Fixes: 7744ccdb ("x86/mm: Add Secure Memory Encryption (SME) support")
      Reported-by: default avatarPaul Menzel <pmenzel@molgen.mpg.de>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Acked-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Acked-by: default avatarTom Lendacky <thomas.lendacky@amd.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/8bbacd0e-4580-3194-19d2-a0ecad7df09c@molgen.mpg.de
      71188590
  30. Oct 07, 2021
Loading