Skip to content
Snippets Groups Projects
  1. Aug 03, 2022
    • Linus Torvalds's avatar
      watch_queue: Fix missing locking in add_watch_to_object() · 8a2482fc
      Linus Torvalds authored
      
      commit e64ab2db upstream.
      
      If a watch is being added to a queue, it needs to guard against
      interference from addition of a new watch, manual removal of a watch and
      removal of a watch due to some other queue being destroyed.
      
      KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
      holding the key->sem writelocked and by holding refs on both the key and
      the queue - but that doesn't prevent interaction from other {key,queue}
      pairs.
      
      While add_watch_to_object() does take the spinlock on the event queue,
      it doesn't take the lock on the source's watch list.  The assumption was
      that the caller would prevent that (say by taking key->sem) - but that
      doesn't prevent interference from the destruction of another queue.
      
      Fix this by locking the watcher list in add_watch_to_object().
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Reported-by: default avatar <syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com>
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      cc: keyrings@vger.kernel.org
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8a2482fc
    • David Howells's avatar
      watch_queue: Fix missing rcu annotation · dbf4265d
      David Howells authored
      
      commit e0339f03 upstream.
      
      Since __post_watch_notification() walks wlist->watchers with only the
      RCU read lock held, we need to use RCU methods to add to the list (we
      already use RCU methods to remove from the list).
      
      Fix add_watch_to_object() to use hlist_add_head_rcu() instead of
      hlist_add_head() for that list.
      
      Fixes: c73be61c ("pipe: Add general notification queue support")
      Signed-off-by: default avatarDavid Howells <dhowells@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      dbf4265d
    • Nathan Chancellor's avatar
      drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid() · 1cc98fa5
      Nathan Chancellor authored
      commit 0c09bc33 upstream.
      
      When booting a kernel compiled with clang's CFI protection
      (CONFIG_CFI_CLANG), there is a CFI failure in
      drm_simple_kms_crtc_mode_valid() when trying to call
      simpledrm_simple_display_pipe_mode_valid() through ->mode_valid():
      
      [    0.322802] CFI failure (target: simpledrm_simple_display_pipe_mode_valid+0x0/0x8):
      ...
      [    0.324928] Call trace:
      [    0.324969]  __ubsan_handle_cfi_check_fail+0x58/0x60
      [    0.325053]  __cfi_check_fail+0x3c/0x44
      [    0.325120]  __cfi_slowpath_diag+0x178/0x200
      [    0.325192]  drm_simple_kms_crtc_mode_valid+0x58/0x80
      [    0.325279]  __drm_helper_update_and_validate+0x31c/0x464
      ...
      
      The ->mode_valid() member in 'struct drm_simple_display_pipe_funcs'
      expects a return type of 'enum drm_mode_status', not 'int'. Correct it
      to fix the CFI failure.
      
      Cc: stable@vger.kernel.org
      Fixes: 11e8f5fd ("drm: Add simpledrm driver")
      Link: https://github.com/ClangBuiltLinux/linux/issues/1647
      
      
      Reported-by: default avatarTomasz Paweł Gajc <tpgxyz@gmail.com>
      Signed-off-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarThomas Zimmermann <tzimmermann@suse.de>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220725233629.223223-1-nathan@kernel.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1cc98fa5
    • Alistair Popple's avatar
      nouveau/svm: Fix to migrate all requested pages · e6badb93
      Alistair Popple authored
      
      commit 66cee909 upstream.
      
      Users may request that pages from an OpenCL SVM allocation be migrated
      to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
      nouveau_dmem_migrate_vma() to do the migration. If the total range to be
      migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
      chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
      starting address means that only the first chunk will get migrated.
      
      Fix the calculation so that the entire range will get migrated if
      possible.
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Fixes: e3d8b089 ("drm/nouveau/svm: map pages after migration")
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarLyude Paul <lyude@redhat.com>
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com
      
      
      Cc: <stable@vger.kernel.org> # v5.8+
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e6badb93
    • Waiman Long's avatar
      intel_idle: Fix false positive RCU splats due to incorrect hardirqs state · cc9aace0
      Waiman Long authored
      
      commit d295ad34 upstream.
      
      Commit 32d4fd57 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
      uses raw_local_irq_enable/local_irq_disable() around call to
      __intel_idle() in intel_idle_irq().
      
      With interrupt enabled, timer tick interrupt can happen and a
      subsequently call to __do_softirq() may change the lockdep hardirqs state
      of a debug kernel back to 'on'. This will result in a mismatch between
      the cpu hardirqs state (off) and the lockdep hardirqs state (on) causing
      a number of false positive "WARNING: suspicious RCU usage" splats.
      
      Fix that by using local_irq_disable() to disable interrupt in
      intel_idle_irq().
      
      Fixes: 32d4fd57 ("cpuidle,intel_idle: Fix CPUIDLE_FLAG_IRQ_ENABLE")
      Signed-off-by: default avatarWaiman Long <longman@redhat.com>
      Cc: 5.16+ <stable@vger.kernel.org> # 5.16+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cc9aace0
    • Harald Freudenberger's avatar
      s390/archrandom: prevent CPACF trng invocations in interrupt context · 7560d827
      Harald Freudenberger authored
      
      commit 918e75f7 upstream.
      
      This patch slightly reworks the s390 arch_get_random_seed_{int,long}
      implementation: Make sure the CPACF trng instruction is never
      called in any interrupt context. This is done by adding an
      additional condition in_task().
      
      Justification:
      
      There are some constrains to satisfy for the invocation of the
      arch_get_random_seed_{int,long}() functions:
      - They should provide good random data during kernel initialization.
      - They should not be called in interrupt context as the TRNG
        instruction is relatively heavy weight and may for example
        make some network loads cause to timeout and buck.
      
      However, it was not clear what kind of interrupt context is exactly
      encountered during kernel init or network traffic eventually calling
      arch_get_random_seed_long().
      
      After some days of investigations it is clear that the s390
      start_kernel function is not running in any interrupt context and
      so the trng is called:
      
      Jul 11 18:33:39 t35lp54 kernel:  [<00000001064e90ca>] arch_get_random_seed_long.part.0+0x32/0x70
      Jul 11 18:33:39 t35lp54 kernel:  [<000000010715f246>] random_init+0xf6/0x238
      Jul 11 18:33:39 t35lp54 kernel:  [<000000010712545c>] start_kernel+0x4a4/0x628
      Jul 11 18:33:39 t35lp54 kernel:  [<000000010590402a>] startup_continue+0x2a/0x40
      
      The condition in_task() is true and the CPACF trng provides random data
      during kernel startup.
      
      The network traffic however, is more difficult. A typical call stack
      looks like this:
      
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5600fc>] extract_entropy.constprop.0+0x23c/0x240
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b560136>] crng_reseed+0x36/0xd8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5604b8>] crng_make_state+0x78/0x340
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b5607e0>] _get_random_bytes+0x60/0xf8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b56108a>] get_random_u32+0xda/0x248
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008aefe7a8>] kfence_guarded_alloc+0x48/0x4b8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008aeff35e>] __kfence_alloc+0x18e/0x1b8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008aef7f10>] __kmalloc_node_track_caller+0x368/0x4d8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611eac>] kmalloc_reserve+0x44/0xa0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b611f98>] __alloc_skb+0x90/0x178
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b6120dc>] __napi_alloc_skb+0x5c/0x118
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f06b4>] qeth_extract_skb+0x13c/0x680
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b8f6526>] qeth_poll+0x256/0x3f8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63d76e>] __napi_poll.constprop.0+0x46/0x2f8
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b63dbec>] net_rx_action+0x1cc/0x408
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b937302>] __do_softirq+0x132/0x6b0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf46ce>] __irq_exit_rcu+0x13e/0x170
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008abf531a>] irq_exit_rcu+0x22/0x50
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b922506>] do_io_irq+0xe6/0x198
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935826>] io_int_handler+0xd6/0x110
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b9358a6>] psw_idle_exit+0x0/0xa
      Jul 06 17:37:07 t35lp54 kernel: ([<000000008ab9c59a>] arch_cpu_idle+0x52/0xe0)
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b933cfe>] default_idle_call+0x6e/0xd0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac59f4e>] do_idle+0xf6/0x1b0
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008ac5a28e>] cpu_startup_entry+0x36/0x40
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008abb0d90>] smp_start_secondary+0x148/0x158
      Jul 06 17:37:07 t35lp54 kernel:  [<000000008b935b9e>] restart_int_handler+0x6e/0x90
      
      which confirms that the call is in softirq context. So in_task() covers exactly
      the cases where we want to have CPACF trng called: not in nmi, not in hard irq,
      not in soft irq but in normal task context and during kernel init.
      
      Signed-off-by: default avatarHarald Freudenberger <freude@linux.ibm.com>
      Acked-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      Reviewed-by: default avatarJuergen Christ <jchrist@linux.ibm.com>
      Link: https://lore.kernel.org/r/20220713131721.257907-1-freude@linux.ibm.com
      
      
      Fixes: e4f74400 ("s390/archrandom: simplify back to earlier design and initialize earlier")
      [agordeev@linux.ibm.com changed desc, added Fixes and Link, removed -stable]
      Signed-off-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      7560d827
    • Lukas Bulwahn's avatar
      asm-generic: remove a broken and needless ifdef conditional · 686b3f89
      Lukas Bulwahn authored
      
      commit e2a619ca upstream.
      
      Commit 527701ed ("lib: Add a generic version of devmem_is_allowed()")
      introduces the config symbol GENERIC_LIB_DEVMEM_IS_ALLOWED, but then
      falsely refers to CONFIG_GENERIC_DEVMEM_IS_ALLOWED (note the missing LIB
      in the reference) in ./include/asm-generic/io.h.
      
      Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs:
      
      GENERIC_DEVMEM_IS_ALLOWED
      Referencing files: include/asm-generic/io.h
      
      The actual fix, though, is simply to not to make this function declaration
      dependent on any kernel config. For architectures that intend to use
      the generic version, the arch's 'select GENERIC_LIB_DEVMEM_IS_ALLOWED' will
      lead to picking the function definition, and for other architectures, this
      function is simply defined elsewhere.
      
      The wrong '#ifndef' on a non-existing config symbol also always had the
      same effect (although more by mistake than by intent). So, there is no
      functional change.
      
      Remove this broken and needless ifdef conditional.
      
      Fixes: 527701ed ("lib: Add a generic version of devmem_is_allowed()")
      Signed-off-by: default avatarLukas Bulwahn <lukas.bulwahn@gmail.com>
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      686b3f89
    • Miaohe Lin's avatar
      hugetlb: fix memoryleak in hugetlb_mcopy_atomic_pte · 9fd5096c
      Miaohe Lin authored
      commit da9a298f upstream.
      
      When alloc_huge_page fails, *pagep is set to NULL without put_page first.
      So the hugepage indicated by *pagep is leaked.
      
      Link: https://lkml.kernel.org/r/20220709092629.54291-1-linmiaohe@huawei.com
      
      
      Fixes: 8cc5fcbb ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
      Signed-off-by: default avatarMiaohe Lin <linmiaohe@huawei.com>
      Acked-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9fd5096c
    • Muchun Song's avatar
      mm: fix missing wake-up event for FSDAX pages · e786be43
      Muchun Song authored
      commit f4f451a1 upstream.
      
      FSDAX page refcounts are 1-based, rather than 0-based: if refcount is
      1, then the page is freed.  The FSDAX pages can be pinned through GUP,
      then they will be unpinned via unpin_user_page() using a folio variant
      to put the page, however, folio variants did not consider this special
      case, the result will be to miss a wakeup event (like the user of
      __fuse_dax_break_layouts()).  This results in a task being permanently
      stuck in TASK_INTERRUPTIBLE state.
      
      Since FSDAX pages are only possibly obtained by GUP users, so fix GUP
      instead of folio_put() to lower overhead.
      
      Link: https://lkml.kernel.org/r/20220705123532.283-1-songmuchun@bytedance.com
      
      
      Fixes: d8ddc099 ("mm/gup: Add gup_put_folio()")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Suggested-by: default avatarMatthew Wilcox <willy@infradead.org>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e786be43
    • Josef Bacik's avatar
      mm: fix page leak with multiple threads mapping the same page · f1a0a81e
      Josef Bacik authored
      commit 3fe2895c upstream.
      
      We have an application with a lot of threads that use a shared mmap backed
      by tmpfs mounted with -o huge=within_size.  This application started
      leaking loads of huge pages when we upgraded to a recent kernel.
      
      Using the page ref tracepoints and a BPF program written by Tejun Heo we
      were able to determine that these pages would have multiple refcounts from
      the page fault path, but when it came to unmap time we wouldn't drop the
      number of refs we had added from the faults.
      
      I wrote a reproducer that mmap'ed a file backed by tmpfs with -o
      huge=always, and then spawned 20 threads all looping faulting random
      offsets in this map, while using madvise(MADV_DONTNEED) randomly for huge
      page aligned ranges.  This very quickly reproduced the problem.
      
      The problem here is that we check for the case that we have multiple
      threads faulting in a range that was previously unmapped.  One thread maps
      the PMD, the other thread loses the race and then returns 0.  However at
      this point we already have the page, and we are no longer putting this
      page into the processes address space, and so we leak the page.  We
      actually did the correct thing prior to f9ce0be7, however it looks
      like Kirill copied what we do in the anonymous page case.  In the
      anonymous page case we don't yet have a page, so we don't have to drop a
      reference on anything.  Previously we did the correct thing for file based
      faults by returning VM_FAULT_NOPAGE so we correctly drop the reference on
      the page we faulted in.
      
      Fix this by returning VM_FAULT_NOPAGE in the pmd_devmap_trans_unstable()
      case, this makes us drop the ref on the page properly, and now my
      reproducer no longer leaks the huge pages.
      
      [josef@toxicpanda.com: v2]
        Link: https://lkml.kernel.org/r/e90c8f0dbae836632b669c2afc434006a00d4a67.1657721478.git.josef@toxicpanda.com
      Link: https://lkml.kernel.org/r/2b798acfd95c9ab9395fe85e8d5a835e2e10a920.1657051137.git.josef@toxicpanda.com
      
      
      Fixes: f9ce0be7 ("mm: Cleanup faultaround and finish_fault() codepaths")
      Signed-off-by: default avatarJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: default avatarRik van Riel <riel@surriel.com>
      Signed-off-by: default avatarChris Mason <clm@fb.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f1a0a81e
    • Mike Rapoport's avatar
      secretmem: fix unhandled fault in truncate · afc21041
      Mike Rapoport authored
      commit 84ac0130 upstream.
      
      syzkaller reports the following issue:
      
      BUG: unable to handle page fault for address: ffff888021f7e005
      PGD 11401067 P4D 11401067 PUD 11402067 PMD 21f7d063 PTE 800fffffde081060
      Oops: 0002 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 3761 Comm: syz-executor281 Not tainted 5.19.0-rc4-syzkaller-00014-g941e3e791269 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:memset_erms+0x9/0x10 arch/x86/lib/memset_64.S:64
      Code: c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 f3 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 <f3> aa 4c 89 c8 c3 90 49 89 fa 40 0f b6 ce 48 b8 01 01 01 01 01 01
      RSP: 0018:ffffc9000329fa90 EFLAGS: 00010202
      RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000ffb
      RDX: 0000000000000ffb RSI: 0000000000000000 RDI: ffff888021f7e005
      RBP: ffffea000087df80 R08: 0000000000000001 R09: ffff888021f7e005
      R10: ffffed10043efdff R11: 0000000000000000 R12: 0000000000000005
      R13: 0000000000000000 R14: 0000000000001000 R15: 0000000000000ffb
      FS:  00007fb29d8b2700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: ffff888021f7e005 CR3: 0000000026e7b000 CR4: 00000000003506f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       zero_user_segments include/linux/highmem.h:272 [inline]
       folio_zero_range include/linux/highmem.h:428 [inline]
       truncate_inode_partial_folio+0x76a/0xdf0 mm/truncate.c:237
       truncate_inode_pages_range+0x83b/0x1530 mm/truncate.c:381
       truncate_inode_pages mm/truncate.c:452 [inline]
       truncate_pagecache+0x63/0x90 mm/truncate.c:753
       simple_setattr+0xed/0x110 fs/libfs.c:535
       secretmem_setattr+0xae/0xf0 mm/secretmem.c:170
       notify_change+0xb8c/0x12b0 fs/attr.c:424
       do_truncate+0x13c/0x200 fs/open.c:65
       do_sys_ftruncate+0x536/0x730 fs/open.c:193
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fb29d900899
      Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fb29d8b2318 EFLAGS: 00000246 ORIG_RAX: 000000000000004d
      RAX: ffffffffffffffda RBX: 00007fb29d988408 RCX: 00007fb29d900899
      RDX: 00007fb29d900899 RSI: 0000000000000005 RDI: 0000000000000003
      RBP: 00007fb29d988400 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb29d98840c
      R13: 00007ffca01a23bf R14: 00007fb29d8b2400 R15: 0000000000022000
       </TASK>
      Modules linked in:
      CR2: ffff888021f7e005
      ---[ end trace 0000000000000000 ]---
      
      Eric Biggers suggested that this happens when
      secretmem_setattr()->simple_setattr() races with secretmem_fault() so that
      a page that is faulted in by secretmem_fault() (and thus removed from the
      direct map) is zeroed by inode truncation right afterwards.
      
      Use mapping->invalidate_lock to make secretmem_fault() and
      secretmem_setattr() mutually exclusive.
      
      [rppt@linux.ibm.com: v3]
        Link: https://lkml.kernel.org/r/20220714091337.412297-1-rppt@kernel.org
      Link: https://lkml.kernel.org/r/20220707165650.248088-1-rppt@kernel.org
      
      
      Reported-by: default avatar <syzbot+9bd2b7adbd34b30b87e4@syzkaller.appspotmail.com>
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Suggested-by: default avatarEric Biggers <ebiggers@kernel.org>
      Reviewed-by: default avatarAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Hillf Danton <hdanton@sina.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      afc21041
    • Andrei Vagin's avatar
      fs: sendfile handles O_NONBLOCK of out_fd · 58b70647
      Andrei Vagin authored
      commit bdeb77bc upstream.
      
      sendfile has to return EAGAIN if out_fd is nonblocking and the write into
      it would block.
      
      Here is a small reproducer for the problem:
      
      #define _GNU_SOURCE /* See feature_test_macros(7) */
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <errno.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <sys/sendfile.h>
      
      
      #define FILE_SIZE (1UL << 30)
      int main(int argc, char **argv) {
              int p[2], fd;
      
              if (pipe2(p, O_NONBLOCK))
                      return 1;
      
              fd = open(argv[1], O_RDWR | O_TMPFILE, 0666);
              if (fd < 0)
                      return 1;
              ftruncate(fd, FILE_SIZE);
      
              if (sendfile(p[1], fd, 0, FILE_SIZE) == -1) {
                      fprintf(stderr, "FAIL\n");
              }
              if (sendfile(p[1], fd, 0, FILE_SIZE) != -1 || errno != EAGAIN) {
                      fprintf(stderr, "FAIL\n");
              }
              return 0;
      }
      
      It worked before b964bf53, it is stuck after b964bf53, and it
      works again with this fix.
      
      This regression occurred because do_splice_direct() calls pipe_write
      that handles O_NONBLOCK.  Here is a trace log from the reproducer:
      
       1)               |  __x64_sys_sendfile64() {
       1)               |    do_sendfile() {
       1)               |      __fdget()
       1)               |      rw_verify_area()
       1)               |      __fdget()
       1)               |      rw_verify_area()
       1)               |      do_splice_direct() {
       1)               |        rw_verify_area()
       1)               |        splice_direct_to_actor() {
       1)               |          do_splice_to() {
       1)               |            rw_verify_area()
       1)               |            generic_file_splice_read()
       1) + 74.153 us   |          }
       1)               |          direct_splice_actor() {
       1)               |            iter_file_splice_write() {
       1)               |              __kmalloc()
       1)   0.148 us    |              pipe_lock();
       1)   0.153 us    |              splice_from_pipe_next.part.0();
       1)   0.162 us    |              page_cache_pipe_buf_confirm();
      ... 16 times
       1)   0.159 us    |              page_cache_pipe_buf_confirm();
       1)               |              vfs_iter_write() {
       1)               |                do_iter_write() {
       1)               |                  rw_verify_area()
       1)               |                  do_iter_readv_writev() {
       1)               |                    pipe_write() {
       1)               |                      mutex_lock()
       1)   0.153 us    |                      mutex_unlock();
       1)   1.368 us    |                    }
       1)   1.686 us    |                  }
       1)   5.798 us    |                }
       1)   6.084 us    |              }
       1)   0.174 us    |              kfree();
       1)   0.152 us    |              pipe_unlock();
       1) + 14.461 us   |            }
       1) + 14.783 us   |          }
       1)   0.164 us    |          page_cache_pipe_buf_release();
      ... 16 times
       1)   0.161 us    |          page_cache_pipe_buf_release();
       1)               |          touch_atime()
       1) + 95.854 us   |        }
       1) + 99.784 us   |      }
       1) ! 107.393 us  |    }
       1) ! 107.699 us  |  }
      
      Link: https://lkml.kernel.org/r/20220415005015.525191-1-avagin@gmail.com
      
      
      Fixes: b964bf53 ("teach sendfile(2) to handle send-to-pipe directly")
      Signed-off-by: default avatarAndrei Vagin <avagin@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58b70647
    • ChenXiaoSong's avatar
      ntfs: fix use-after-free in ntfs_ucsncmp() · 24fdba11
      ChenXiaoSong authored
      commit 38c9c22a upstream.
      
      Syzkaller reported use-after-free bug as follows:
      
      ==================================================================
      BUG: KASAN: use-after-free in ntfs_ucsncmp+0x123/0x130
      Read of size 2 at addr ffff8880751acee8 by task a.out/879
      
      CPU: 7 PID: 879 Comm: a.out Not tainted 5.19.0-rc4-next-20220630-00001-gcc5218c8bd2c-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x1c0/0x2b0
       print_address_description.constprop.0.cold+0xd4/0x484
       print_report.cold+0x55/0x232
       kasan_report+0xbf/0xf0
       ntfs_ucsncmp+0x123/0x130
       ntfs_are_names_equal.cold+0x2b/0x41
       ntfs_attr_find+0x43b/0xb90
       ntfs_attr_lookup+0x16d/0x1e0
       ntfs_read_locked_attr_inode+0x4aa/0x2360
       ntfs_attr_iget+0x1af/0x220
       ntfs_read_locked_inode+0x246c/0x5120
       ntfs_iget+0x132/0x180
       load_system_files+0x1cc6/0x3480
       ntfs_fill_super+0xa66/0x1cf0
       mount_bdev+0x38d/0x460
       legacy_get_tree+0x10d/0x220
       vfs_get_tree+0x93/0x300
       do_new_mount+0x2da/0x6d0
       path_mount+0x496/0x19d0
       __x64_sys_mount+0x284/0x300
       do_syscall_64+0x3b/0xc0
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7f3f2118d9ea
      Code: 48 8b 0d a9 f4 0b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 76 f4 0b 00 f7 d8 64 89 01 48
      RSP: 002b:00007ffc269deac8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3f2118d9ea
      RDX: 0000000020000000 RSI: 0000000020000100 RDI: 00007ffc269dec00
      RBP: 00007ffc269dec80 R08: 00007ffc269deb00 R09: 00007ffc269dec44
      R10: 0000000000000000 R11: 0000000000000202 R12: 000055f81ab1d220
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
       </TASK>
      
      The buggy address belongs to the physical page:
      page:0000000085430378 refcount:1 mapcount:1 mapping:0000000000000000 index:0x555c6a81d pfn:0x751ac
      memcg:ffff888101f7e180
      anon flags: 0xfffffc00a0014(uptodate|lru|mappedtodisk|swapbacked|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc00a0014 ffffea0001bf2988 ffffea0001de2448 ffff88801712e201
      raw: 0000000555c6a81d 0000000000000000 0000000100000000 ffff888101f7e180
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8880751acd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8880751ace00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff8880751ace80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
                                                                ^
       ffff8880751acf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffff8880751acf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ==================================================================
      
      The reason is that struct ATTR_RECORD->name_offset is 6485, end address of
      name string is out of bounds.
      
      Fix this by adding sanity check on end address of attribute name string.
      
      [akpm@linux-foundation.org: coding-style cleanups]
      [chenxiaosong2@huawei.com: cleanup suggested by Hawkins Jiawei]
        Link: https://lkml.kernel.org/r/20220709064511.3304299-1-chenxiaosong2@huawei.com
      Link: https://lkml.kernel.org/r/20220707105329.4020708-1-chenxiaosong2@huawei.com
      
      
      Signed-off-by: default avatarChenXiaoSong <chenxiaosong2@huawei.com>
      Signed-off-by: default avatarHawkins Jiawei <yin31149@gmail.com>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Cc: ChenXiaoSong <chenxiaosong2@huawei.com>
      Cc: Yongqiang Liu <liuyongqiang13@huawei.com>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Zhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      24fdba11
    • Nadav Amit's avatar
      userfaultfd: provide properly masked address for huge-pages · 759572c3
      Nadav Amit authored
      commit d172b1a3 upstream.
      
      Commit 824ddc60 ("userfaultfd: provide unmasked address on
      page-fault") was introduced to fix an old bug, in which the offset in the
      address of a page-fault was masked.  Concerns were raised - although were
      never backed by actual code - that some userspace code might break because
      the bug has been around for quite a while.  To address these concerns a
      new flag was introduced, and only when this flag is set by the user,
      userfaultfd provides the exact address of the page-fault.
      
      The commit however had a bug, and if the flag is unset, the offset was
      always masked based on a base-page granularity.  Yet, for huge-pages, the
      behavior prior to the commit was that the address is masked to the
      huge-page granulrity.
      
      While there are no reports on real breakage, fix this issue.  If the flag
      is unset, use the address with the masking that was done before.
      
      Link: https://lkml.kernel.org/r/20220711165906.2682-1-namit@vmware.com
      
      
      Fixes: 824ddc60 ("userfaultfd: provide unmasked address on page-fault")
      Signed-off-by: default avatarNadav Amit <namit@vmware.com>
      Reported-by: default avatarJames Houghton <jthoughton@google.com>
      Reviewed-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarJames Houghton <jthoughton@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      759572c3
    • Junxiao Bi's avatar
      Revert "ocfs2: mount shared volume without ha stack" · 125db359
      Junxiao Bi authored
      commit c80af0c2 upstream.
      
      This reverts commit 912f655d.
      
      This commit introduced a regression that can cause mount hung.  The
      changes in __ocfs2_find_empty_slot causes that any node with none-zero
      node number can grab the slot that was already taken by node 0, so node 1
      will access the same journal with node 0, when it try to grab journal
      cluster lock, it will hung because it was already acquired by node 0.
      It's very easy to reproduce this, in one cluster, mount node 0 first, then
      node 1, you will see the following call trace from node 1.
      
      [13148.735424] INFO: task mount.ocfs2:53045 blocked for more than 122 seconds.
      [13148.739691]       Not tainted 5.15.0-2148.0.4.el8uek.mountracev2.x86_64 #2
      [13148.742560] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [13148.745846] task:mount.ocfs2     state:D stack:    0 pid:53045 ppid: 53044 flags:0x00004000
      [13148.749354] Call Trace:
      [13148.750718]  <TASK>
      [13148.752019]  ? usleep_range+0x90/0x89
      [13148.753882]  __schedule+0x210/0x567
      [13148.755684]  schedule+0x44/0xa8
      [13148.757270]  schedule_timeout+0x106/0x13c
      [13148.759273]  ? __prepare_to_swait+0x53/0x78
      [13148.761218]  __wait_for_common+0xae/0x163
      [13148.763144]  __ocfs2_cluster_lock.constprop.0+0x1d6/0x870 [ocfs2]
      [13148.765780]  ? ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
      [13148.768312]  ocfs2_inode_lock_full_nested+0x18d/0x398 [ocfs2]
      [13148.770968]  ocfs2_journal_init+0x91/0x340 [ocfs2]
      [13148.773202]  ocfs2_check_volume+0x39/0x461 [ocfs2]
      [13148.775401]  ? iput+0x69/0xba
      [13148.777047]  ocfs2_mount_volume.isra.0.cold+0x40/0x1f5 [ocfs2]
      [13148.779646]  ocfs2_fill_super+0x54b/0x853 [ocfs2]
      [13148.781756]  mount_bdev+0x190/0x1b7
      [13148.783443]  ? ocfs2_remount+0x440/0x440 [ocfs2]
      [13148.785634]  legacy_get_tree+0x27/0x48
      [13148.787466]  vfs_get_tree+0x25/0xd0
      [13148.789270]  do_new_mount+0x18c/0x2d9
      [13148.791046]  __x64_sys_mount+0x10e/0x142
      [13148.792911]  do_syscall_64+0x3b/0x89
      [13148.794667]  entry_SYSCALL_64_after_hwframe+0x170/0x0
      [13148.797051] RIP: 0033:0x7f2309f6e26e
      [13148.798784] RSP: 002b:00007ffdcee7d408 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      [13148.801974] RAX: ffffffffffffffda RBX: 00007ffdcee7d4a0 RCX: 00007f2309f6e26e
      [13148.804815] RDX: 0000559aa762a8ae RSI: 0000559aa939d340 RDI: 0000559aa93a22b0
      [13148.807719] RBP: 00007ffdcee7d5b0 R08: 0000559aa93a2290 R09: 00007f230a0b4820
      [13148.810659] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffdcee7d420
      [13148.813609] R13: 0000000000000000 R14: 0000559aa939f000 R15: 0000000000000000
      [13148.816564]  </TASK>
      
      To fix it, we can just fix __ocfs2_find_empty_slot.  But original commit
      introduced the feature to mount ocfs2 locally even it is cluster based,
      that is a very dangerous, it can easily cause serious data corruption,
      there is no way to stop other nodes mounting the fs and corrupting it.
      Setup ha or other cluster-aware stack is just the cost that we have to
      take for avoiding corruption, otherwise we have to do it in kernel.
      
      Link: https://lkml.kernel.org/r/20220603222801.42488-1-junxiao.bi@oracle.com
      
      
      Fixes: 912f655d("ocfs2: mount shared volume without ha stack")
      Signed-off-by: default avatarJunxiao Bi <junxiao.bi@oracle.com>
      Acked-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <heming.zhao@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      125db359
    • Linus Walleij's avatar
      ARM: pxa2xx: Fix GPIO descriptor tables · a854fc06
      Linus Walleij authored
      
      commit c5cdb928 upstream.
      
      Laurence reports:
      
      "Kernel >5.18 on Zaurus has a bug where the power management code can't
      talk to devices, emitting the following errors:
      
      sharpsl-pm sharpsl-pm: Error: AC check failed: voltage -22.
      sharpsl-pm sharpsl-pm: Charging Error!
      sharpsl-pm sharpsl-pm: Warning: Cannot read main battery!
      
      Looking at the recent changes, I found that commit 31455bbd ("spi:
      pxa2xx_spi: Convert to use GPIO descriptors") replaced the deprecated
      SPI chip select platform device code with a gpiod lookup table. However,
      this didn't seem to work until I changed the `dev_id` member from the
      device name to the bus id. I'm not entirely sure why this is necessary,
      but I suspect it is related to the fact that in sysfs SPI devices are
      attached under /sys/devices/.../dev_name/spi_master/spiB/spiB.C, rather
      than directly to the device."
      
      After reviewing the change I conclude that the same fix is needed
      for all affected boards.
      
      Fixes: 31455bbd ("spi: pxa2xx_spi: Convert to use GPIO descriptors")
      Reported-by: default avatarLaurence de Bruxelles <lfdebrux@gmail.com>
      Signed-off-by: default avatarLinus Walleij <linus.walleij@linaro.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/20220722114611.1517414-1-linus.walleij@linaro.org
      
      '
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a854fc06
    • Michael Walle's avatar
      ARM: dts: lan966x: fix sys_clk frequency · b357179e
      Michael Walle authored
      
      commit ef0324b6 upstream.
      
      The sys_clk frequency is 165.625MHz. The register reference of the
      Generic Clock controller lists the CPU clock as 600MHz, the DDR clock as
      300MHz and the SYS clock as 162.5MHz. This is wrong. It was first
      noticed during the fan driver development and it was measured and
      verified via the CLK_MON output of the SoC which can be configured to
      output sys_clk/64.
      
      The core PLL settings (which drives the SYS clock) seems to be as
      follows:
        DIVF = 52
        DIVQ = 3
        DIVR = 1
      
      With a refernce clock of 25MHz, this means we have a post divider clock
        Fpfd = Fref / (DIVR + 1) = 25MHz / (1 + 1) = 12.5MHz
      
      The resulting VCO frequency is then
        Fvco = Fpfd * (DIVF + 1) * 2 = 12.5MHz * (52 + 1) * 2 = 1325MHz
      
      And the output frequency is
        Fout = Fvco / 2^DIVQ = 1325MHz / 2^3 = 165.625Mhz
      
      This all adds up to the constrains of the PLL:
          10MHz <= Fpfd <= 200MHz
          20MHz <= Fout <= 1000MHz
        1000MHz <= Fvco <= 2000MHz
      
      Fixes: 290deaa1 ("ARM: dts: add DT for lan966 SoC and 2-port board pcb8291")
      Signed-off-by: default avatarMichael Walle <michael@walle.cc>
      Reviewed-by: default avatarKavyasree Kotagiri <kavyasree.kotagiri@microchip.com>
      Signed-off-by: default avatarClaudiu Beznea <claudiu.beznea@microchip.com>
      Link: https://lore.kernel.org/r/20220326194028.2945985-1-michael@walle.cc
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b357179e
    • Luiz Augusto von Dentz's avatar
      Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put · 59a55ec3
      Luiz Augusto von Dentz authored
      
      commit d0be8347 upstream.
      
      This fixes the following trace which is caused by hci_rx_work starting up
      *after* the final channel reference has been put() during sock_close() but
      *before* the references to the channel have been destroyed, so instead
      the code now rely on kref_get_unless_zero/l2cap_chan_hold_unless_zero to
      prevent referencing a channel that is about to be destroyed.
      
        refcount_t: increment on 0; use-after-free.
        BUG: KASAN: use-after-free in refcount_dec_and_test+0x20/0xd0
        Read of size 4 at addr ffffffc114f5bf18 by task kworker/u17:14/705
      
        CPU: 4 PID: 705 Comm: kworker/u17:14 Tainted: G S      W
        4.14.234-00003-g1fb6d0bd49a4-dirty #28
        Hardware name: Qualcomm Technologies, Inc. SM8150 V2 PM8150
        Google Inc. MSM sm8150 Flame DVT (DT)
        Workqueue: hci0 hci_rx_work
        Call trace:
         dump_backtrace+0x0/0x378
         show_stack+0x20/0x2c
         dump_stack+0x124/0x148
         print_address_description+0x80/0x2e8
         __kasan_report+0x168/0x188
         kasan_report+0x10/0x18
         __asan_load4+0x84/0x8c
         refcount_dec_and_test+0x20/0xd0
         l2cap_chan_put+0x48/0x12c
         l2cap_recv_frame+0x4770/0x6550
         l2cap_recv_acldata+0x44c/0x7a4
         hci_acldata_packet+0x100/0x188
         hci_rx_work+0x178/0x23c
         process_one_work+0x35c/0x95c
         worker_thread+0x4cc/0x960
         kthread+0x1a8/0x1c4
         ret_from_fork+0x10/0x18
      
      Cc: stable@kernel.org
      Reported-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Tested-by: default avatarLee Jones <lee.jones@linaro.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      59a55ec3
    • Abhishek Pandit-Subedi's avatar
      Bluetooth: Always set event mask on suspend · d50f2557
      Abhishek Pandit-Subedi authored
      
      commit ef61b6ea upstream.
      
      When suspending, always set the event mask once disconnects are
      successful. Otherwise, if wakeup is disallowed, the event mask is not
      set before suspend continues and can result in an early wakeup.
      
      Fixes: 182ee45d ("Bluetooth: hci_sync: Rework hci_suspend_notifier")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAbhishek Pandit-Subedi <abhishekpandit@chromium.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d50f2557
  2. Jul 29, 2022
Loading