Skip to content
Snippets Groups Projects
  1. Jan 08, 2018
  2. Jan 05, 2018
  3. Jan 02, 2018
    • Jann Horn's avatar
      aio: mark AIO pseudo-fs noexec · 6d3fee53
      Jann Horn authored
      
      commit 22f6b4d3 upstream.
      
      This ensures that do_mmap() won't implicitly make AIO memory mappings
      executable if the READ_IMPLIES_EXEC personality flag is set.  Such
      behavior is problematic because the security_mmap_file LSM hook doesn't
      catch this case, potentially permitting an attacker to bypass a W^X
      policy enforced by SELinux.
      
      I have tested the patch on my machine.
      
      To test the behavior, compile and run this:
      
          #define _GNU_SOURCE
          #include <unistd.h>
          #include <sys/personality.h>
          #include <linux/aio_abi.h>
          #include <err.h>
          #include <stdlib.h>
          #include <stdio.h>
          #include <sys/syscall.h>
      
          int main(void) {
              personality(READ_IMPLIES_EXEC);
              aio_context_t ctx = 0;
              if (syscall(__NR_io_setup, 1, &ctx))
                  err(1, "io_setup");
      
              char cmd[1000];
              sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'",
                  (int)getpid());
              system(cmd);
              return 0;
          }
      
      In the output, "rw-s" is good, "rwxs" is bad.
      
      Signed-off-by: default avatarJann Horn <jann@thejh.net>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d3fee53
    • Oleg Nesterov's avatar
      pids: make task_tgid_nr_ns() safe · da1c5144
      Oleg Nesterov authored
      
      commit dd1c1f2f upstream.
      
      This was reported many times, and this was even mentioned in commit
      52ee2dfd ("pids: refactor vnr/nr_ns helpers to make them safe") but
      somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is
      not safe because task->group_leader points to nowhere after the exiting
      task passes exit_notify(), rcu_read_lock() can not help.
      
      We really need to change __unhash_process() to nullify group_leader,
      parent, and real_parent, but this needs some cleanups.  Until then we
      can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
      fix the problem.
      
      Reported-by: default avatarTroy Kensinger <tkensinger@google.com>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      da1c5144
    • Eric Biggers's avatar
      x86/fpu: Don't let userspace set bogus xcomp_bv · 97373752
      Eric Biggers authored
      
      commit 814fb7bb upstream.
      
      [Please apply to 4.4-stable.  Note: the backport includes the
      fpstate_init() call in xstateregs_set(), since fix is useless without
      it.  It was added by commit 91c3dba7 ("x86/fpu/xstate: Fix PTRACE
      frames for XSAVES"), but it doesn't make sense to backport that whole
      commit.]
      
      On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
      set a task's extended state (xstate) or "FPU" registers.  ptrace() can
      set them for another task using the PTRACE_SETREGSET request with
      NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
      In either case, registers can be set to any value, but the kernel
      assumes that the XSAVE area itself remains valid in the sense that the
      CPU can restore it.
      
      However, in the case where the kernel is using the uncompacted xstate
      format (which it does whenever the XSAVES instruction is unavailable),
      it was possible for userspace to set the xcomp_bv field in the
      xstate_header to an arbitrary value.  However, all bits in that field
      are reserved in the uncompacted case, so when switching to a task with
      nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault.  This
      caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit.  In
      addition, since the error is otherwise ignored, the FPU registers from
      the task previously executing on the CPU were leaked.
      
      Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
      the uncompacted case, and returning an error otherwise.
      
      The reason for validating xcomp_bv rather than simply overwriting it
      with 0 is that we want userspace to see an error if it (incorrectly)
      provides an XSAVE area in compacted format rather than in uncompacted
      format.
      
      Note that as before, in case of error we clear the task's FPU state.
      This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
      better to return an error before changing anything.  But it seems the
      "clear on error" behavior is fine for now, and it's a little tricky to
      do otherwise because it would mean we couldn't simply copy the full
      userspace state into kernel memory in one __copy_from_user().
      
      This bug was found by syzkaller, which hit the above-mentioned
      WARN_ON_FPU():
      
          WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
          CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 #453
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
          RIP: 0010:__switch_to+0x5b5/0x5d0
          RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
          RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
          RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
          RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
          R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
          R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
          FS:  00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
          CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
          Call Trace:
          Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f
      
      Here is a C reproducer.  The expected behavior is that the program spin
      forever with no output.  However, on a buggy kernel running on a
      processor with the "xsave" feature but without the "xsaves" feature
      (e.g. Sandy Bridge through Broadwell for Intel), within a second or two
      the program reports that the xmm registers were corrupted, i.e. were not
      restored correctly.  With CONFIG_X86_DEBUG_FPU=y it also hits the above
      kernel warning.
      
          #define _GNU_SOURCE
          #include <stdbool.h>
          #include <inttypes.h>
          #include <linux/elf.h>
          #include <stdio.h>
          #include <sys/ptrace.h>
          #include <sys/uio.h>
          #include <sys/wait.h>
          #include <unistd.h>
      
          int main(void)
          {
              int pid = fork();
              uint64_t xstate[512];
              struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };
      
              if (pid == 0) {
                  bool tracee = true;
                  for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
                      tracee = (fork() != 0);
                  uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
                  asm volatile("   movdqu %0, %%xmm0\n"
                               "   mov %0, %%rbx\n"
                               "1: movdqu %%xmm0, %0\n"
                               "   mov %0, %%rax\n"
                               "   cmp %%rax, %%rbx\n"
                               "   je 1b\n"
                               : "+m" (xmm0) : : "rax", "rbx", "xmm0");
                  printf("BUG: xmm registers corrupted!  tracee=%d, xmm0=%08X%08X%08X%08X\n",
                         tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
              } else {
                  usleep(100000);
                  ptrace(PTRACE_ATTACH, pid, 0, 0);
                  wait(NULL);
                  ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
                  xstate[65] = -1;
                  ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
                  ptrace(PTRACE_CONT, pid, 0, 0);
                  wait(NULL);
              }
              return 1;
          }
      
      Note: the program only tests for the bug using the ptrace() system call.
      The bug can also be reproduced using the rt_sigreturn() system call, but
      only when called from a 32-bit program, since for 64-bit programs the
      kernel restores the FPU state from the signal frame by doing XRSTOR
      directly from userspace memory (with proper error checking).
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Acked-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Kevin Hao <haokexin@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Halcrow <mhalcrow@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
      Cc: kernel-hardening@lists.openwall.com
      Fixes: 0b29643a ("x86/xsaves: Change compacted format xsave area header")
      Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
      Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      97373752
    • Linus Torvalds's avatar
      Sanitize 'move_pages()' permission checks · 064cd1be
      Linus Torvalds authored
      
      commit 197e7e52 upstream.
      
      The 'move_paghes()' system call was introduced long long ago with the
      same permission checks as for sending a signal (except using
      CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).
      
      That turns out to not be a great choice - while the system call really
      only moves physical page allocations around (and you need other
      capabilities to do a lot of it), you can check the return value to map
      out some the virtual address choices and defeat ASLR of a binary that
      still shares your uid.
      
      So change the access checks to the more common 'ptrace_may_access()'
      model instead.
      
      This tightens the access checks for the uid, and also effectively
      changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
      anybody really _uses_ this legacy system call any more (we hav ebetter
      NUMA placement models these days), so I expect nobody to notice.
      
      Famous last words.
      
      Reported-by: default avatarOtto Ebeling <otto.ebeling@iki.fi>
      Acked-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      064cd1be
    • Chris Redpath's avatar
      ANDROID: sched/fair: Select correct capacity state for energy_diff · 71063cda
      Chris Redpath authored
      
      The util returned from group_max_util is not capped at the max util
      present in the group, so it can be larger than the capacity stored in
      the array. Ensure that when this happens, we always use the last entry
      in the array to fetch energy from.
      
      Tested with synthetics on Juno board.
      
      Bug: 38159576
      Change-Id: I89fb52fb7e68fa3e682e308acc232596672d03f7
      Signed-off-by: default avatarChris Redpath <chris.redpath@arm.com>
      71063cda
    • Seunghun Han's avatar
      x86/acpi: Prevent out of bound access caused by broken ACPI tables · bd205bf0
      Seunghun Han authored
      
      commit dad5ab0d upstream.
      
      The bus_irq argument of mp_override_legacy_irq() is used as the index into
      the isa_irq_to_gsi[] array. The bus_irq argument originates from
      ACPI_MADT_TYPE_IO_APIC and ACPI_MADT_TYPE_INTERRUPT items in the ACPI
      tables, but is nowhere sanity checked.
      
      That allows broken or malicious ACPI tables to overwrite memory, which
      might cause malfunction, panic or arbitrary code execution.
      
      Add a sanity check and emit a warning when that triggers.
      
      [ tglx: Added warning and rewrote changelog ]
      
      Signed-off-by: default avatarSeunghun Han <kkamagui@gmail.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bd205bf0
    • Herbert Xu's avatar
      crypto: algif_skcipher - Load TX SG list after waiting · 24bbeb9a
      Herbert Xu authored
      
      commit 4f0414e5 upstream.
      
      We need to load the TX SG list in sendmsg(2) after waiting for
      incoming data, not before.
      
      Reported-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Tested-by: default avatarDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24bbeb9a
  4. Dec 05, 2017
  5. Oct 06, 2017
    • Andrey Ryabinin's avatar
      mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] · d9b92736
      Andrey Ryabinin authored
      commit f5527fff upstream.
      
      This fixes CVE-2016-8650.
      
      If mpi_powm() is given a zero exponent, it wants to immediately return
      either 1 or 0, depending on the modulus.  However, if the result was
      initalised with zero limb space, no limbs space is allocated and a
      NULL-pointer exception ensues.
      
      Fix this by allocating a minimal amount of limb space for the result when
      the 0-exponent case when the result is 1 and not touching the limb space
      when the result is 0.
      
      This affects the use of RSA keys and X.509 certificates that carry them.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804011944c0 task.stack: ffff880401294000
      RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
      RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
      RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
      RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
      R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
      FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
      Stack:
       ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
       0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
       ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
      Call Trace:
       [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
       [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
       [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
       [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
       [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
       [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
       [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
       [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
       [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
       [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
       [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
       [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
       [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
       [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
       [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
       [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
      Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
      RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
       RSP <ffff880401297ad8>
      CR2: 0000000000000000
      ---[ end trace d82015255d4a5d8d ]---
      
      Basically, this is a backport of a libgcrypt patch:
      
      	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526
      
      
      
      Fixes: cdec9cb5 ("crypto: GnuPG based MPI lib - source files (part 1)")
      Signed-off-by: default avatarAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      cc: linux-ima-devel@lists.sourceforge.net
      Signed-off-by: default avatarJames Morris <james.l.morris@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2 tags
      d9b92736
    • Eric Dumazet's avatar
      sctp: do not inherit ipv6_{mc|ac|fl}_list from parent · 45f5c85d
      Eric Dumazet authored
      
      
      [ Upstream commit fdcee2cb ]
      
      SCTP needs fixes similar to 83eaddab ("ipv6/dccp: do not inherit
      ipv6_mc_list from parent"), otherwise bad things can happen.
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reported-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Tested-by: default avatarAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      45f5c85d
    • Eric Biggers's avatar
      fscrypt: remove broken support for detecting keyring key revocation · b32f4b0c
      Eric Biggers authored
      
      commit 1b53cf98 upstream.
      
      Filesystem encryption ostensibly supported revoking a keyring key that
      had been used to "unlock" encrypted files, causing those files to become
      "locked" again.  This was, however, buggy for several reasons, the most
      severe of which was that when key revocation happened to be detected for
      an inode, its fscrypt_info was immediately freed, even while other
      threads could be using it for encryption or decryption concurrently.
      This could be exploited to crash the kernel or worse.
      
      This patch fixes the use-after-free by removing the code which detects
      the keyring key having been revoked, invalidated, or expired.  Instead,
      an encrypted inode that is "unlocked" now simply remains unlocked until
      it is evicted from memory.  Note that this is no worse than the case for
      block device-level encryption, e.g. dm-crypt, and it still remains
      possible for a privileged user to evict unused pages, inodes, and
      dentries by running 'sync; echo 3 > /proc/sys/vm/drop_caches', or by
      simply unmounting the filesystem.  In fact, one of those actions was
      already needed anyway for key revocation to work even somewhat sanely.
      This change is not expected to break any applications.
      
      In the future I'd like to implement a real API for fscrypt key
      revocation that interacts sanely with ongoing filesystem operations ---
      waiting for existing operations to complete and blocking new operations,
      and invalidating and sanitizing key material and plaintext from the VFS
      caches.  But this is a hard problem, and for now this bug must be fixed.
      
      This bug affected almost all versions of ext4, f2fs, and ubifs
      encryption, and it was potentially reachable in any kernel configured
      with encryption support (CONFIG_EXT4_ENCRYPTION=y,
      CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
      CONFIG_UBIFS_FS_ENCRYPTION=y).  Note that older kernels did not use the
      shared fs/crypto/ code, but due to the potential security implications
      of this bug, it may still be worthwhile to backport this fix to them.
      
      Fixes: b7236e21 ("ext4 crypto: reorganize how we store keys in the inode")
      Signed-off-by: default avatarEric Biggers <ebiggers@google.com>
      Signed-off-by: default avatarTheodore Ts'o <tytso@mit.edu>
      Acked-by: default avatarMichael Halcrow <mhalcrow@google.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b32f4b0c