Skip to content
Snippets Groups Projects
  1. Mar 18, 2025
  2. Mar 17, 2025
    • Linus Torvalds's avatar
      Merge tag 'soc-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc · fc444ada
      Linus Torvalds authored
      Pull SoC fixes from Arnd Bergmann:
       "The majority of these last fixes are for devicetree files.
      
        These address two important regressions for the Qualcomm SMMU and the
        Raspberry Pi 4 USB controller, as well as a larger number of patches
        fixing minor mistakes in board specific files for Rockchips, i.MX,
        starfive and broadcom.
      
        The non-DT changes are
      
         - A fix for an old boot regression on Renesas shmobile chips
      
         - Another boot time regression for for the Qualcomm PDR SoC driver,
           among a few other Qualcomm firmware driver fixes for efivars and
           tzmem
      
         - Minor Kconfig fixes for davinci and OMAP1
      
         - Minor code fixes for sparx5 reset controllers, OMAP memory
           controller, i.MX SCU, cpufreq and SoC drivers and a Hisilicon SoC
           driver
      
         - One more update to the Asahi maintainers, adding Neal Gompa as a
           reviewer"
      
      * tag 'soc-fixes-6.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
        ARM: davinci: da850: fix selecting ARCH_DAVINCI_DA8XX
        soc: hisilicon: kunpeng_hccs: Fix incorrect string assembly
        memory: omap-gpmc: drop no compatible check
        reset: mchp: sparx5: Fix for lan966x
        ARM: shmobile: smp: Enforce shmobile_smp_* alignment
        MAINTAINERS: Add myself (Neal Gompa) as a reviewer for ARM Apple support
        MAINTAINERS: Add apple-spi driver & binding files
        arm64: dts: rockchip: slow down emmc freq for rock 5 itx
        ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC3200
        ARM: dts: BCM5301X: Fix switch port labels of ASUS RT-AC5300
        ARM: dts: bcm2711: Don't mark timer regs unconfigured
        ARM: OMAP1: select CONFIG_GENERIC_IRQ_CHIP
        arm64: dts: rockchip: Add missing PCIe supplies to RockPro64 board dtsi
        arm64: dts: rockchip: Add avdd HDMI supplies to RockPro64 board dtsi
        arm64: dts: rockchip: Remove undocumented sdmmc property from lubancat-1
        arm64: dts: rockchip: fix pinmux of UART5 for PX30 Ringneck on Haikou
        arm64: dts: rockchip: fix pinmux of UART0 for PX30 Ringneck on Haikou
        arm64: dts: rockchip: fix u2phy1_host status for NanoPi R4S
        arm64: dts: bcm2712: PL011 UARTs are actually r1p5
        ARM: dts: bcm2711: PL011 UARTs are actually r1p5
        ...
      fc444ada
    • Linus Torvalds's avatar
      Merge tag 'probes-fixes-v6.14-rc6' of... · 47c7efa4
      Linus Torvalds authored
      Merge tag 'probes-fixes-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
      
      Pull probes fixes from Masami Hiramatsu:
      
       - Clean up tprobe correctly when module unload
      
         Tracepoint probes do not set TRACEPOINT_STUB on the 'tpoint' pointer
         when unloading a module, thus they show as a normal 'fprobe' instead
         of 'tprobe' and never come back
      
       - Fix leakage of tprobe module refcount
      
         When a tprobe's target module is loaded, it gets the module's
         refcount in the module notifier but forgot to put it after
         registering the probe on it.
      
         Fix it by getting the refcount only when registering tprobe.
      
      * tag 'probes-fixes-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: tprobe-events: Fix leakage of module refcount
        tracing: tprobe-events: Fix to clean up tprobe correctly when module unload
      47c7efa4
    • Kirill A. Shutemov's avatar
      mm/page_alloc: fix memory accept before watermarks gets initialized · 800f1059
      Kirill A. Shutemov authored
      Watermarks are initialized during the postcore initcall.  Until then, all
      watermarks are set to zero.  This causes cond_accept_memory() to
      incorrectly skip memory acceptance because a watermark of 0 is always met.
      
      This can lead to a premature OOM on boot.
      
      To ensure progress, accept one MAX_ORDER page if the watermark is zero.
      
      Link: https://lkml.kernel.org/r/20250310082855.2587122-1-kirill.shutemov@linux.intel.com
      
      
      Fixes: dcdfdd40 ("mm: Add support for unaccepted memory")
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarFarrah Chen <farrah.chen@intel.com>
      Reported-by: default avatarFarrah Chen <farrah.chen@intel.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarPankaj Gupta <pankaj.gupta@amd.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: "Mike Rapoport (IBM)" <rppt@kernel.org>
      Cc: Thomas Lendacky <thomas.lendacky@amd.com>
      Cc: <stable@vger.kernel.org>	[6.5+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      800f1059
    • Matthew Wilcox (Oracle)'s avatar
      mm: decline to manipulate the refcount on a slab page · b9c0e49a
      Matthew Wilcox (Oracle) authored
      Slab pages now have a refcount of 0, so nobody should be trying to
      manipulate the refcount on them.  Doing so has little effect; the object
      could be freed and reallocated to a different purpose, although the slab
      itself would not be until the refcount was put making it behave rather
      like TYPESAFE_BY_RCU.
      
      Unfortunately, __iov_iter_get_pages_alloc() does take a refcount.  Fix
      that to not change the refcount, and make put_page() silently not change
      the refcount.  get_page() warns so that we can fix any other callers that
      need to be changed.
      
      Long-term, networking needs to stop taking a refcount on the pages that it
      uses and rely on the caller to hold whatever references are necessary to
      make the memory stable.  In the medium term, more page types are going to
      hav a zero refcount, so we'll want to move get_page() and put_page() out
      of line.
      
      Link: https://lkml.kernel.org/r/20250310143544.1216127-1-willy@infradead.org
      
      
      Fixes: 9aec2fb0 (slab: allocate frozen pages)
      Signed-off-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reported-by: default avatarHannes Reinecke <hare@suse.de>
      Closes: https://lore.kernel.org/all/08c29e4b-2f71-4b6d-8046-27e407214d8c@suse.com/
      
      
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b9c0e49a
    • Shakeel Butt's avatar
      memcg: drain obj stock on cpu hotplug teardown · 9f01b495
      Shakeel Butt authored
      Currently on cpu hotplug teardown, only memcg stock is drained but we
      need to drain the obj stock as well otherwise we will miss the stats
      accumulated on the target cpu as well as the nr_bytes cached. The stats
      include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
      addition we are leaking reference to struct obj_cgroup object.
      
      Link: https://lkml.kernel.org/r/20250310230934.2913113-1-shakeel.butt@linux.dev
      
      
      Fixes: bf4f0599 ("mm: memcg/slab: obj_cgroup API")
      Signed-off-by: default avatarShakeel Butt <shakeel.butt@linux.dev>
      Reviewed-by: default avatarRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      9f01b495
    • Zi Yan's avatar
      mm/huge_memory: drop beyond-EOF folios with the right number of refs · 14efb479
      Zi Yan authored
      When an after-split folio is large and needs to be dropped due to EOF,
      folio_put_refs(folio, folio_nr_pages(folio)) should be used to drop all
      page cache refs.  Otherwise, the folio will not be freed, causing memory
      leak.
      
      This leak would happen on a filesystem with blocksize > page_size and a
      truncate is performed, where the blocksize makes folios split to >0 order
      ones, causing truncated folios not being freed.
      
      Link: https://lkml.kernel.org/r/20250310155727.472846-1-ziy@nvidia.com
      
      
      Fixes: c010d47f ("mm: thp: split huge page to any lower order pages")
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Closes: https://lore.kernel.org/all/fcbadb7f-dd3e-21df-f9a7-2853b53183c4@google.com/
      
      
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Kirill A. Shuemov <kirill.shutemov@linux.intel.com>
      Cc: Luis Chamberalin <mcgrof@kernel.org>
      Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Pankaj Raghav <p.raghav@samsung.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <yang@os.amperecomputing.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      14efb479
    • Rafael Aquini's avatar
      selftests/mm: run_vmtests.sh: fix half_ufd_size_MB calculation · 67a2f868
      Rafael Aquini authored
      We noticed that uffd-stress test was always failing to run when invoked
      for the hugetlb profiles on x86_64 systems with a processor count of 64 or
      bigger:
      
        ...
        # ------------------------------------
        # running ./uffd-stress hugetlb 128 32
        # ------------------------------------
        # ERROR: invalid MiB (errno=9, @uffd-stress.c:459)
        ...
        # [FAIL]
        not ok 3 uffd-stress hugetlb 128 32 # exit=1
        ...
      
      The problem boils down to how run_vmtests.sh (mis)calculates the size of
      the region it feeds to uffd-stress.  The latter expects to see an amount
      of MiB while the former is just giving out the number of free hugepages
      halved down.  This measurement discrepancy ends up violating uffd-stress'
      assertion on number of hugetlb pages allocated per CPU, causing it to bail
      out with the error above.
      
      This commit fixes that issue by adjusting run_vmtests.sh's
      half_ufd_size_MB calculation so it properly renders the region size in
      MiB, as expected, while maintaining all of its original constraints in
      place.
      
      Link: https://lkml.kernel.org/r/20250218192251.53243-1-aquini@redhat.com
      
      
      Fixes: 2e47a445 ("selftests/mm: run_vmtests.sh: fix hugetlb mem size calculation")
      Signed-off-by: default avatarRafael Aquini <raquini@redhat.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      67a2f868
    • Raphael S. Carvalho's avatar
      mm: fix error handling in __filemap_get_folio() with FGP_NOWAIT · 182db972
      Raphael S. Carvalho authored
      original report:
      https://lore.kernel.org/all/CAKhLTr1UL3ePTpYjXOx2AJfNk8Ku2EdcEfu+CH1sf3Asr=B-Dw@mail.gmail.com/T/
      
      When doing buffered writes with FGP_NOWAIT, under memory pressure, the
      system returned ENOMEM despite there being plenty of available memory, to
      be reclaimed from page cache.  The user space used io_uring interface,
      which in turn submits I/O with FGP_NOWAIT (the fast path).
      
      retsnoop pointed to iomap_get_folio:
      
      00:34:16.180612 -> 00:34:16.180651 TID/PID 253786/253721
      (reactor-1/combined_tests):
      
                          entry_SYSCALL_64_after_hwframe+0x76
                          do_syscall_64+0x82
                          __do_sys_io_uring_enter+0x265
                          io_submit_sqes+0x209
                          io_issue_sqe+0x5b
                          io_write+0xdd
                          xfs_file_buffered_write+0x84
                          iomap_file_buffered_write+0x1a6
          32us [-ENOMEM]  iomap_write_begin+0x408
      iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x…
      pos=0 len=4096 foliop=0xffffb32c296b7b80
      !    4us [-ENOMEM]  iomap_get_folio
      iter=&{.inode=0xffff8c67aa031138,.len=4096,.flags=33,.iomap={.addr=0xffffffffffffffff,.length=4096,.type=1,.flags=3,.bdev=0x…
      pos=0 len=4096
      
      This is likely a regression caused by 66dabbb6 ("mm: return an ERR_PTR
      from __filemap_get_folio"), which moved error handling from
      io_map_get_folio() to __filemap_get_folio(), but broke FGP_NOWAIT
      handling, so ENOMEM is being escaped to user space.  Had it correctly
      returned -EAGAIN with NOWAIT, either io_uring or user space itself would
      be able to retry the request.
      
      It's not enough to patch io_uring since the iomap interface is the one
      responsible for it, and pwritev2(RWF_NOWAIT) and AIO interfaces must
      return the proper error too.
      
      The patch was tested with scylladb test suite (its original reproducer),
      and the tests all pass now when memory is pressured.
      
      Link: https://lkml.kernel.org/r/20250224143700.23035-1-raphaelsc@scylladb.com
      
      
      Fixes: 66dabbb6 ("mm: return an ERR_PTR from __filemap_get_folio")
      Signed-off-by: default avatarRaphael S. Carvalho <raphaelsc@scylladb.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarDave Chinner <dchinner@redhat.com>
      Cc: "Darrick J. Wong" <djwong@kernel.org>
      Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      182db972
    • Muchun Song's avatar
      mm: memcontrol: fix swap counter leak from offline cgroup · 73f839b6
      Muchun Song authored
      Commit 67691831 removed the parameter of id from swap_cgroup_record()
      and get the memcg id from mem_cgroup_id(folio_memcg(folio)).  However, the
      caller of it may update a different memcg's counter instead of
      folio_memcg(folio).
      
      E.g.  in the caller of mem_cgroup_swapout(), @swap_memcg could be
      different with @memcg and update the counter of @swap_memcg, but
      swap_cgroup_record() records the wrong memcg's ID.  When it is uncharged
      from __mem_cgroup_uncharge_swap(), the swap counter will leak since the
      wrong recorded ID.
      
      Fix it by bringing the parameter of id back.
      
      Link: https://lkml.kernel.org/r/20250306023133.44838-1-songmuchun@bytedance.com
      
      
      Fixes: 67691831 ("mm/swap_cgroup: decouple swap cgroup recording and clearing")
      Signed-off-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Reviewed-by: default avatarKairui Song <kasong@tencent.com>
      Cc: Chris Li <chrisl@kernel.org>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Cc: Shakeel Butt <shakeel.butt@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      73f839b6
    • Dev Jain's avatar
      mm/vma: do not register private-anon mappings with khugepaged during mmap · 8c6ff7f1
      Dev Jain authored
      We already are registering private-anon VMAs with khugepaged during fault
      time, in do_huge_pmd_anonymous_page().  Commit "register suitable readonly
      file vmas for khugepaged" moved the khugepaged registration logic from
      shmem_mmap to the generic mmap path.
      
      The userspace-visible effect should be this: khugepaged will unnecessarily
      scan mm's which haven't yet faulted in.  Note that it won't actually
      collapse because all PTEs are none.
      
      Now that I think about it, the mm is going to have a file VMA anyways
      during fork+exec, so the mm already gets registered during mmap due to the
      non-anon case (I *think*), so at least one of either the mmap registration
      or fault-time registration is redundant.
      
      Make this logic specific for non-anon mappings.
      
      Link: https://lkml.kernel.org/r/20250306063037.16299-1-dev.jain@arm.com
      
      
      Fixes: 613bec09 ("mm: mmap: register suitable readonly file vmas for khugepaged")
      Signed-off-by: default avatarDev Jain <dev.jain@arm.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Liam Howlett <liam.howlett@oracle.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Cc: Matthew Wilcow (Oracle) <willy@infradead.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Yang Shi <yang@os.amperecomputing.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8c6ff7f1
    • Zhiyu Zhang's avatar
      squashfs: fix invalid pointer dereference in squashfs_cache_delete · d7147a33
      Zhiyu Zhang authored
      When mounting a squashfs fails, squashfs_cache_init() may return an error
      pointer (e.g., -ENOMEM) instead of NULL.  However, squashfs_cache_delete()
      only checks for a NULL cache, and attempts to dereference the invalid
      pointer.  This leads to a kernel crash (BUG: unable to handle kernel
      paging request in squashfs_cache_delete).
      
      This patch fixes the issue by checking IS_ERR(cache) before accessing it.
      
      Link: https://lkml.kernel.org/r/20250306132855.2030-1-zhiyuzhang999@gmail.com
      
      
      Fixes: 49ff2924 ("squashfs: make squashfs_cache_init() return ERR_PTR(-ENOMEM)")
      Signed-off-by: default avatarZhiyu Zhang <zhiyuzhang999@gmail.com>
      Reported-by: default avatarZhiyu Zhang <zhiyuzhang999@gmail.com>
      Closes: https://lore.kernel.org/linux-fsdevel/CALf2hKvaq8B4u5yfrE+BYt7aNguao99mfWxHngA+=o5hwzjdOg@mail.gmail.com/
      
      
      Tested-by: default avatarZhiyu Zhang <zhiyuzhang999@gmail.com>
      Reviewed-by: default avatarPhillip Lougher <phillip@squashfs.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      d7147a33
    • Zi Yan's avatar
      mm/migrate: fix shmem xarray update during migration · 60cf233b
      Zi Yan authored
      A shmem folio can be either in page cache or in swap cache, but not at the
      same time.  Namely, once it is in swap cache, folio->mapping should be
      NULL, and the folio is no longer in a shmem mapping.
      
      In __folio_migrate_mapping(), to determine the number of xarray entries to
      update, folio_test_swapbacked() is used, but that conflates shmem in page
      cache case and shmem in swap cache case.  It leads to xarray multi-index
      entry corruption, since it turns a sibling entry to a normal entry during
      xas_store() (see [1] for a userspace reproduction).  Fix it by only using
      folio_test_swapcache() to determine whether xarray is storing swap cache
      entries or not to choose the right number of xarray entries to update.
      
      [1] https://lore.kernel.org/linux-mm/Z8idPCkaJW1IChjT@casper.infradead.org/
      
      Note:
      In __split_huge_page(), folio_test_anon() && folio_test_swapcache() is
      used to get swap_cache address space, but that ignores the shmem folio in
      swap cache case.  It could lead to NULL pointer dereferencing when a
      in-swap-cache shmem folio is split at __xa_store(), since
      !folio_test_anon() is true and folio->mapping is NULL.  But fortunately,
      its caller split_huge_page_to_list_to_order() bails out early with EBUSY
      when folio->mapping is NULL.  So no need to take care of it here.
      
      Link: https://lkml.kernel.org/r/20250305200403.2822855-1-ziy@nvidia.com
      
      
      Fixes: fc346d0a ("mm: migrate high-order folios in swap cache correctly")
      Signed-off-by: default avatarZi Yan <ziy@nvidia.com>
      Reported-by: default avatarLiu Shixin <liushixin2@huawei.com>
      Closes: https://lore.kernel.org/all/28546fb4-5210-bf75-16d6-43e1f8646080@huawei.com/
      
      
      Suggested-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Charan Teja Kalla <quic_charante@quicinc.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      60cf233b
    • Jinjiang Tu's avatar
      mm/hugetlb: fix surplus pages in dissolve_free_huge_page() · cb402bbd
      Jinjiang Tu authored
      In dissolve_free_huge_page(), free huge pages are dissolved without
      adjusting surplus count. However, free huge pages may be accounted as
      surplus pages, and will lead to wrong surplus count.
      
      I reproduce this issue on qemu. The steps are:
      1) Node1 is memory-less at first. Hot-add memory to node1 by executing
      the two commands in qemu monitor:
        object_add memory-backend-ram,id=mem1,size=1G
        device_add pc-dimm,id=dimm1,memdev=mem1,node=1
      2) online one memory block of Node1 with:
        echo online_movable > /sys/devices/system/node/node1/memoryX/state
      3) create 64 huge pages for node1
      4) run a program to reserve (don't consume) all the huge pages
      5) echo 0 > nr_huge_pages for node1. After this step, free huge pages in
      Node1 are surplus.
      6) create 80 huge pages for node0
      7) offline memory of node1, The memory range to offline contains the free
      surplus huge pages created in step3) ~ step5)
        echo offline > /sys/devices/system/node/node1/memoryX/state
      8) kill the program in step 4)
      
      The result:
                 Node0     Node1
      total       80        0
      free        80        0
      surplus     0         61
      
      To fix it, adjust surplus when destroying huge pages if the node has
      surplus pages in dissolve_free_hugetlb_folio().
      
      The result with this patch:
                 Node0     Node1
      total       80        0
      free        80        0
      surplus     0         0
      
      Link: https://lkml.kernel.org/r/20250304132106.2872754-1-tujinjiang@huawei.com
      
      
      Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
      Signed-off-by: default avatarJinjiang Tu <tujinjiang@huawei.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarOscar Salvador <osalvador@suse.de>
      Cc: Jinjiang Tu <tujinjiang@huawei.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Nanyong Sun <sunnanyong@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      cb402bbd
    • SeongJae Park's avatar
      mm/damon/core: initialize damos->walk_completed in damon_new_scheme() · 73d7a69d
      SeongJae Park authored
      The function for allocating and initialize a 'struct damos' object,
      damon_new_scheme(), is not initializing damos->walk_completed field.  Only
      damos_walk_complete() is setting the field.  Hence the field will be
      eventually set and used correctly from second damos_walk() call for the
      scheme.  But the first damos_walk() could mistakenly not walk on the
      regions.  Actually, a common usage of DAMOS for taking an access pattern
      snapshot is installing a monitoring-purpose DAMOS scheme, doing
      damos_walk() to retrieve the snapshot, and then removing the scheme. 
      DAMON user-space tool (damo) also gets runtime snapshot in the way.  Hence
      the problem can continuously happen in such use cases.  Initialize it
      properly in the allocation function.
      
      Link: https://lkml.kernel.org/r/20250228174450.41472-1-sj@kernel.org
      
      
      Fixes: bf0eaba0 ("mm/damon/core: implement damos_walk()")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      73d7a69d
    • SeongJae Park's avatar
      mm/damon: respect core layer filters' allowance decision on ops layer · 39a326e6
      SeongJae Park authored
      Filtering decisions are made in filters evaluation order.  Once a decision
      is made by a filter, filters that scheduled to be evaluated after the
      decision-made filter should just respect it.  This is the intended and
      documented behavior.  Since core layer-handled filters are evaluated
      before operations layer-handled filters, decisions made on core layer
      should respected by ops layer.
      
      In case of reject filters, the decision is respected, since core
      layer-rejected regions are not passed to ops layer.  But in case of allow
      filters, ops layer filters don't know if the region has passed to them
      because it was allowed by core filters or just because it didn't match to
      any core layer.  The current wrong implementation assumes it was due to
      not matched by any core filters.  As a reuslt, the decision is not
      respected.  Pass the missing information to ops layer using a new filed in
      'struct damos', and make the ops layer filters respect it.
      
      Link: https://lkml.kernel.org/r/20250228175336.42781-1-sj@kernel.org
      
      
      Fixes: 491fee28 ("mm/damon/core: support damos_filter->allow")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      39a326e6
    • Dave Hansen's avatar
      filemap: move prefaulting out of hot write path · 665575cf
      Dave Hansen authored
      There is a generic anti-pattern that shows up in the VFS and several
      filesystems where the hot write paths touch userspace twice when they
      could get away with doing it once.
      
      Dave Chinner suggested that they should all be fixed up[1].  I agree[2]. 
      But, the series to do that fixup spans a bunch of filesystems and a lot of
      people.  This patch fixes common code that absolutely everyone uses.  It
      has measurable performance benefits[3].
      
      I think this patch can go in and not be held up by the others.
      
      I will post them separately to their separate maintainers for
      consideration. But, honestly, I'm not going to lose any sleep if
      the maintainers don't pick those up.
      
      1. https://lore.kernel.org/all/Z5f-x278Z3wTIugL@dread.disaster.area/
      2. https://lore.kernel.org/all/20250129181749.C229F6F3@davehans-spike.ostc.intel.com/
      3. https://lore.kernel.org/all/202502121529.d62a409e-lkp@intel.com/
      
      
      This patch:
      
      There is a bit of a sordid history here. I originally wrote
      998ef75d ("fs: do not prefault sys_write() user buffer pages")
      to fix a performance issue that showed up on early SMAP hardware.
      But that was reverted with 00a3d660 because it exposed an
      underlying filesystem bug.
      
      This is a reimplementation of the original commit along with some
      simplification and comment improvements.
      
      The basic problem is that the generic write path has two userspace
      accesses: one to prefault the write source buffer and then another to
      perform the actual write. On x86, this means an extra STAC/CLAC pair.
      These are relatively expensive instructions because they function as
      barriers.
      
      Keep the prefaulting behavior but move it into the slow path that gets
      run when the write did not make any progress. This avoids livelocks
      that can happen when the write's source and destination target the
      same folio. Contrary to the existing comments, the fault-in does not
      prevent deadlocks. That's accomplished by using an "atomic" usercopy
      that disables page faults.
      
      The end result is that the generic write fast path now touches
      userspace once instead of twice.
      
      0day has shown some improvements on a couple of microbenchmarks:
      
      	https://lore.kernel.org/all/202502121529.d62a409e-lkp@intel.com/
      
      Link: https://lkml.kernel.org/r/20250228203722.CAEB63AC@davehans-spike.ostc.intel.com
      
      
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Link: https://lore.kernel.org/all/yxyuijjfd6yknryji2q64j3keq2ygw6ca6fs5jwyolklzvo45s@4u63qqqyosy2/
      
      
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mateusz Guzik <mjguzik@gmail.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      665575cf
    • Ye Bin's avatar
      proc: fix UAF in proc_get_inode() · 654b33ad
      Ye Bin authored
      Fix race between rmmod and /proc/XXX's inode instantiation.
      
      The bug is that pde->proc_ops don't belong to /proc, it belongs to a
      module, therefore dereferencing it after /proc entry has been registered
      is a bug unless use_pde/unuse_pde() pair has been used.
      
      use_pde/unuse_pde can be avoided (2 atomic ops!) because pde->proc_ops
      never changes so information necessary for inode instantiation can be
      saved _before_ proc_register() in PDE itself and used later, avoiding
      pde->proc_ops->...  dereference.
      
            rmmod                         lookup
      sys_delete_module
                               proc_lookup_de
      			   pde_get(de);
      			   proc_get_inode(dir->i_sb, de);
        mod->exit()
          proc_remove
            remove_proc_subtree
             proc_entry_rundown(de);
        free_module(mod);
      
                                     if (S_ISREG(inode->i_mode))
      	                         if (de->proc_ops->proc_read_iter)
                                 --> As module is already freed, will trigger UAF
      
      BUG: unable to handle page fault for address: fffffbfff80a702b
      PGD 817fc4067 P4D 817fc4067 PUD 817fc0067 PMD 102ef4067 PTE 0
      Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
      CPU: 26 UID: 0 PID: 2667 Comm: ls Tainted: G
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
      RIP: 0010:proc_get_inode+0x302/0x6e0
      RSP: 0018:ffff88811c837998 EFLAGS: 00010a06
      RAX: dffffc0000000000 RBX: ffffffffc0538140 RCX: 0000000000000007
      RDX: 1ffffffff80a702b RSI: 0000000000000001 RDI: ffffffffc0538158
      RBP: ffff8881299a6000 R08: 0000000067bbe1e5 R09: 1ffff11023906f20
      R10: ffffffffb560ca07 R11: ffffffffb2b43a58 R12: ffff888105bb78f0
      R13: ffff888100518048 R14: ffff8881299a6004 R15: 0000000000000001
      FS:  00007f95b9686840(0000) GS:ffff8883af100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: fffffbfff80a702b CR3: 0000000117dd2000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       proc_lookup_de+0x11f/0x2e0
       __lookup_slow+0x188/0x350
       walk_component+0x2ab/0x4f0
       path_lookupat+0x120/0x660
       filename_lookup+0x1ce/0x560
       vfs_statx+0xac/0x150
       __do_sys_newstat+0x96/0x110
       do_syscall_64+0x5f/0x170
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      [adobriyan@gmail.com: don't do 2 atomic ops on the common path]
      Link: https://lkml.kernel.org/r/3d25ded0-1739-447e-812b-e34da7990dcf@p183
      
      
      Fixes: 778f3dd5 ("Fix procfs compat_ioctl regression")
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      654b33ad
  3. Mar 16, 2025
    • Linus Torvalds's avatar
      Linux 6.14-rc7 · 4701f33a
      Linus Torvalds authored
      v6.14-rc7
      4701f33a
    • Linus Torvalds's avatar
      Merge tag 'media/v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · d1275e99
      Linus Torvalds authored
      Pull media fix from Mauro Carvalho Chehab:
       "rtl2832 driver regression fix"
      
      * tag 'media/v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        media: rtl2832_sdr: assign vb2 lock before vb2_queue_init
      d1275e99
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · 0990528b
      Linus Torvalds authored
      Pull i2c fixes from Wolfram Sang:
      
       - omap: fix irq ACKS to avoid irq storming and system hang
      
       - ali1535, ali15x3, sis630: fix error path at probe exit
      
      * tag 'i2c-for-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: sis630: Fix an error handling path in sis630_probe()
        i2c: ali15x3: Fix an error handling path in ali15x3_probe()
        i2c: ali1535: Fix an error handling path in ali1535_probe()
        i2c: omap: fix IRQ storms
      0990528b
    • Linus Torvalds's avatar
      Merge tag 'trace-v6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace · ad87a8d0
      Linus Torvalds authored
      Pull tracing fix from Steven Rostedt:
       "Fix ref count of trace_array in error path of histogram file open
      
        Tracing instances have a ref count to keep them around while files
        within their directories are open. This prevents them from being
        deleted while they are used.
      
        The histogram code had some files that needed to take the ref count
        and that was added, but the error paths did not decrement the ref
        counts. This caused the instances from ever being removed if a
        histogram file failed to open due to some error"
      
      * tag 'trace-v6.14-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
        tracing: Correct the refcount if the hist/hist_debug file fails to open
      ad87a8d0
    • Linus Torvalds's avatar
      Merge tag 'usb-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb · cb82ca15
      Linus Torvalds authored
      Pull USB fixes from Greg KH:
       "Here are some small USB and Thunderbolt driver fixes and new
        usb-serial device ids. Included in here are:
      
         - new usb-serial device ids
      
         - typec driver bugfix
      
         - thunderbolt driver resume bugfix
      
        All of these have been in linux-next with no reported issues"
      
      * tag 'usb-6.14-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
        usb: typec: tcpm: fix state transition for SNK_WAIT_CAPABILITIES state in run_state_machine()
        USB: serial: ftdi_sio: add support for Altera USB Blaster 3
        thunderbolt: Prevent use-after-free in resume from hibernate
        USB: serial: option: fix Telit Cinterion FE990A name
        USB: serial: option: add Telit Cinterion FE990B compositions
        USB: serial: option: match on interface class for Telit FN990B
      cb82ca15
    • Linus Torvalds's avatar
      Merge tag 'input-for-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 31d7109a
      Linus Torvalds authored
      Pull input updates from Dmitry Torokhov:
      
       - several new device IDs added to xpad game controller driver
      
       - support for imagis IST3038H variant of chip added to imagis touch
         controller driver
      
       - a fix for GPIO allocation for ads7846 touch controller driver
      
       - a fix for iqs7222 driver to properly support status register
      
       - a fix for goodix-berlin touch controller driver to use the right name
         for the regulator
      
       - more i8042 quirks to better handle several old Clevo devices.
      
      * tag 'input-for-v6.14-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        MAINTAINERS: Remove myself from the goodix touchscreen maintainers
        Input: iqs7222 - preserve system status register
        Input: i8042 - swap old quirk combination with new quirk for more devices
        Input: i8042 - swap old quirk combination with new quirk for several devices
        Input: i8042 - add required quirks for missing old boardnames
        Input: i8042 - swap old quirk combination with new quirk for NHxxRZQ
        Input: xpad - rename QH controller to Legion Go S
        Input: xpad - add support for TECNO Pocket Go
        Input: xpad - add support for ZOTAC Gaming Zone
        Input: goodix-berlin - fix vddio regulator references
        Input: goodix-berlin - fix comment referencing wrong regulator
        Input: imagis - add support for imagis IST3038H
        dt-bindings: input/touchscreen: imagis: add compatible for ist3038h
        Input: xpad - add multiple supported devices
        Input: xpad - add 8BitDo SN30 Pro, Hyperkin X91 and Gamesir G7 SE controllers
        Input: ads7846 - fix gpiod allocation
        Input: wdt87xx_i2c - fix compiler warning
      31d7109a
    • Linus Torvalds's avatar
      Merge tag 'rust-fixes-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux · cd3a56ac
      Linus Torvalds authored
      Pull rust fixes from Miguel Ojeda:
       "Toolchain and infrastructure:
      
         - Disallow BTF generation with Rust + LTO
      
         - Improve rust-analyzer support
      
        'kernel' crate:
      
         - 'init' module: remove 'Zeroable' implementation for a couple types
           that should not have it
      
         - 'alloc' module: fix macOS failure in host test by satisfying POSIX
           alignment requirement
      
         - Add missing '\n's to 'pr_*!()' calls
      
        And a couple other minor cleanups"
      
      * tag 'rust-fixes-6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
        scripts: generate_rust_analyzer: add uapi crate
        scripts: generate_rust_analyzer: add missing include_dirs
        scripts: generate_rust_analyzer: add missing macros deps
        rust: Disallow BTF generation with Rust + LTO
        rust: task: fix `SAFETY` comment in `Task::wake_up`
        rust: workqueue: add missing newline to pr_info! examples
        rust: sync: add missing newline in locked_by log example
        rust: init: add missing newline to pr_info! calls
        rust: error: add missing newline to pr_warn! calls
        rust: docs: add missing newline to printing macro examples
        rust: alloc: satisfy POSIX alignment requirement
        rust: init: fix `Zeroable` implementation for `Option<NonNull<T>>` and `Option<KBox<T>>`
        rust: remove leftover mentions of the `alloc` crate
      cd3a56ac
  4. Mar 15, 2025
  5. Mar 14, 2025
Loading