- Mar 21, 2025
-
-
Lokesh Gidra authored
Following issues were reported in the MOVE ioctl: 1. Panic when trying to move a source page which is in swap-cache [1] 2. Livelock when multiple threads try to move the same source page [2] Three patches have been upstreamed to fix these issues [3, 4, 5] MOVE ioctl was backported to ACK 6.1 and 6.6 for ART GC to use it [6]. Therefore, on these kernels in order to be able to identify in the userspace if the fixes are included, this mode is added. NOTE: UFFDIO_MOVE_MODE_CONFIRM_FIXED mode is only for 6.1 and 6.6 kernels, and will go away afterwards. [1] https://lore.kernel.org/linux-mm/20250219112519.92853-1-21cnbao@gmail.com/ [2] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock [3] https://web.git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-hotfixes-stable&id=c50f8e6053b0503375c2975bf47f182445aebb4c [4] https://web.git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-hotfixes-stable&id=37b338eed10581784e854d4262da05c8d960c748 [5] https://web.git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git/commit/?h=mm-hotfixes-stable&id=927e926d72d9155fde3264459fe9bfd7b5e40d28 [6] b/274911254 Bug: 401790618 Bug: 405066974 Change-Id: Ibd854ec7ac9ae6a2ca416767d032b6c71f1bc688 Signed-off-by:
Lokesh Gidra <lokeshgidra@google.com> (cherry picked from commit 9bcabbda) Signed-off-by:
Yinchu Chen <chenyc5@motorola.com>
-
Suren Baghdasaryan authored
Current implementation of move_pages_pte() copies source and destination PTEs in order to detect concurrent changes to PTEs involved in the move. However these copies are also used to unmap the PTEs, which will fail if CONFIG_HIGHPTE is enabled because the copies are allocated on the stack. Fix this by using the actual PTEs which were kmap()ed. Link: https://lkml.kernel.org/r/20250226185510.2732648-3-surenb@google.com Fixes: adef4406 ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by:
Suren Baghdasaryan <surenb@google.com> Reported-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Lokesh Gidra <lokeshgidra@google.com> (cherry-picked from commit 927e926d https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-hotfixes-stable) Change-Id: I0ee6c1b509ea7c4fa68056d6e512d4ac167c9234 Bug: 401790618 Bug: 405066974 (cherry picked from commit 8d8d44ff) Signed-off-by:
Yinchu Chen <chenyc5@motorola.com>
-
Suren Baghdasaryan authored
Lokesh recently raised an issue about UFFDIO_MOVE getting into a deadlock state when it goes into split_folio() with raised folio refcount. split_folio() expects the reference count to be exactly mapcount + num_pages_in_folio + 1 (see can_split_folio()) and fails with EAGAIN otherwise. If multiple processes are trying to move the same large folio, they raise the refcount (all tasks succeed in that) then one of them succeeds in locking the folio, while others will block in folio_lock() while keeping the refcount raised. The winner of this race will proceed with calling split_folio() and will fail returning EAGAIN to the caller and unlocking the folio. The next competing process will get the folio locked and will go through the same flow. In the meantime the original winner will be retried and will block in folio_lock(), getting into the queue of waiting processes only to repeat the same path. All this results in a livelock. An easy fix would be to avoid waiting for the folio lock while holding folio refcount, similar to madvise_free_huge_pmd() where folio lock is acquired before raising the folio refcount. Since we lock and take a refcount of the folio while holding the PTE lock, changing the order of these operations should not break anything. Modify move_pages_pte() to try locking the folio first and if that fails and the folio is large then return EAGAIN without touching the folio refcount. If the folio is single-page then split_folio() is not called, so we don't have this issue. Lokesh has a reproducer [1] and I verified that this change fixes the issue. [1] https://github.com/lokeshgidra/uffd_move_ioctl_deadlock [akpm@linux-foundation.org: reflow comment to 80 cols, s/end/end up/] Link: https://lkml.kernel.org/r/20250226185510.2732648-2-surenb@google.com Fixes: adef4406 ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by:
Suren Baghdasaryan <surenb@google.com> Reported-by:
Lokesh Gidra <lokeshgidra@google.com> Reviewed-by:
Peter Xu <peterx@redhat.com> Acked-by:
Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Barry Song <21cnbao@gmail.com> Cc: Barry Song <v-songbaohua@oppo.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcow (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Lokesh Gidra <lokeshgidra@google.com> (cherry-picked from commit 37b338ee https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-hotfixes-stable) Change-Id: I71b307add9707ad3518a44623aea2e2ca417b95a Bug: 401790618 Bug: 405066974 (cherry picked from commit af439acc) Signed-off-by:
Yinchu Chen <chenyc5@motorola.com>
-
Barry Song authored
userfaultfd_move() checks whether the PTE entry is present or a swap entry. - If the PTE entry is present, move_present_pte() handles folio migration by setting: src_folio->index = linear_page_index(dst_vma, dst_addr); - If the PTE entry is a swap entry, move_swap_pte() simply copies the PTE to the new dst_addr. This approach is incorrect because, even if the PTE is a swap entry, it can still reference a folio that remains in the swap cache. This creates a race window between steps 2 and 4. 1. add_to_swap: The folio is added to the swapcache. 2. try_to_unmap: PTEs are converted to swap entries. 3. pageout: The folio is written back. 4. Swapcache is cleared. If userfaultfd_move() occurs in the window between steps 2 and 4, after the swap PTE has been moved to the destination, accessing the destination triggers do_swap_page(), which may locate the folio in the swapcache. However, since the folio's index has not been updated to match the destination VMA, do_swap_page() will detect a mismatch. This can result in two critical issues depending on the system configuration. If KSM is disabled, both small and large folios can trigger a BUG during the add_rmap operation due to: page_pgoff(folio, page) != linear_page_index(vma, address) [ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c index:0xffffaf150 pfn:0x4667c [ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_mapped:1 pincount:0 [ 13.337716] memcg:ffff00000405f000 [ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|owner_priv_1|head|swapbacked|node=0|zone=0|lastcpupid=0xffff) [ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361 [ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000 [ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507b538 ffff000006260361 [ 13.339204] head: 0000000ffffaf150 0000000000004000 0000000600000000 ffff00000405f000 [ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff00000000 0000000000000001 [ 13.339546] head: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000 [ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio, page) != linear_page_index(vma, address)) [ 13.340190] ------------[ cut here ]------------ [ 13.340316] kernel BUG at mm/rmap.c:1380! [ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP [ 13.340969] Modules linked in: [ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.0-rc3-gcf42737e247a-dirty #299 [ 13.341470] Hardware name: linux,dummy-virt (DT) [ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0 [ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0 [ 13.342018] sp : ffff80008752bb20 [ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 0000000000000001 [ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 0000000000000001 [ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fffffdffc0199f00 [ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 00000000ffffffff [ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 662866666f67705f [ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: ffff800083728ab0 [ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : ffff80008011bc40 [ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000829eebf8 [ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 0000000000000000 [ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 000000000000005f [ 13.343876] Call trace: [ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P) [ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320 [ 13.344333] do_swap_page+0x1060/0x1400 [ 13.344417] __handle_mm_fault+0x61c/0xbc8 [ 13.344504] handle_mm_fault+0xd8/0x2e8 [ 13.344586] do_page_fault+0x20c/0x770 [ 13.344673] do_translation_fault+0xb4/0xf0 [ 13.344759] do_mem_abort+0x48/0xa0 [ 13.344842] el0_da+0x58/0x130 [ 13.344914] el0t_64_sync_handler+0xc4/0x138 [ 13.345002] el0t_64_sync+0x1ac/0x1b0 [ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000) [ 13.345504] ---[ end trace 0000000000000000 ]--- [ 13.345715] note: a.out[107] exited with irqs disabled [ 13.345954] note: a.out[107] exited with preempt_count 2 If KSM is enabled, Peter Xu also discovered that do_swap_page() may trigger an unexpected CoW operation for small folios because ksm_might_need_to_copy() allocates a new folio when the folio index does not match linear_page_index(vma, addr). This patch also checks the swapcache when handling swap entries. If a match is found in the swapcache, it processes it similarly to a present PTE. However, there are some differences. For example, the folio is no longer exclusive because folio_try_share_anon_rmap_pte() is performed during unmapping. Furthermore, in the case of swapcache, the folio has already been unmapped, eliminating the risk of concurrent rmap walks and removing the need to acquire src_folio's anon_vma or lock. Note that for large folios, in the swapcache handling path, we directly return -EBUSY since split_folio() will return -EBUSY regardless if the folio is under writeback or unmapped. This is not an urgent issue, so a follow-up patch may address it separately. [v-songbaohua@oppo.com: minor cleanup according to Peter Xu] Link: https://lkml.kernel.org/r/20250226024411.47092-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20250226001400.9129-1-21cnbao@gmail.com Fixes: adef4406 ("userfaultfd: UFFDIO_MOVE uABI") Signed-off-by:
Barry Song <v-songbaohua@oppo.com> Acked-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Suren Baghdasaryan <surenb@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jann Horn <jannh@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Lokesh Gidra <lokeshgidra@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Nicolas Geoffray <ngeoffray@google.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shuah Khan <shuah@kernel.org> Cc: ZhangPeng <zhangpeng362@huawei.com> Cc: Tangquan Zheng <zhengtangquan@oppo.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Conflicts: 1. mm/userfaultfd.c [Removed pmd arguments being passed to move_swap_pte() to resolve conflicts - Lokesh Gidra] [Replaced swap_cache_index() with swp_offset() as the former doesn't exist - Lokesh Gidra] [Replaced folio_move_anon_rmap() with page_move_anon_rmap() as the former doesn't exist - Lokesh Gidra] Signed-off-by:
Lokesh Gidra <lokeshgidra@google.com> (cherry-picked from commit c50f8e60 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-hotfixes-stable) Change-Id: I94caeac5bf78add4d78650929303a25d54d8a638 Bug: 401790618 Bug: 405066974 (cherry picked from commit 7d6124b6) Signed-off-by:
Yinchu Chen <chenyc5@motorola.com>
-
- Mar 20, 2025
-
-
Michal Vrastil authored
commit 51cdd69d upstream. This reverts commit ec6ce707. Fix installation of WinUSB driver using OS descriptors. Without the fix the drivers are not installed correctly and the property 'DeviceInterfaceGUID' is missing on host side. The original change was based on the assumption that the interface number is in the high byte of wValue but it is in the low byte, instead. Unfortunately, the fix is based on MS documentation which is also wrong. The actual USB request for OS descriptors (using USB analyzer) looks like: Offset 0 1 2 3 4 5 6 7 0x000 C1 A1 02 00 05 00 0A 00 C1: bmRequestType (device to host, vendor, interface) A1: nas magic number 0002: wValue (2: nas interface) 0005: wIndex (5: get extended property i.e. nas interface GUID) 008E: wLength (142) The fix was tested on Windows 10 and Windows 11. Cc: stable@vger.kernel.org Fixes: ec6ce707 ("usb: gadget: composite: fix OS descriptors w_value logic") Signed-off-by:
Michal Vrastil <michal.vrastil@hidglobal.com> Signed-off-by:
Elson Roy Serrao <quic_eserrao@quicinc.com> Acked-by:
Peter korsgaard <peter@korsgaard.com> Link: https://lore.kernel.org/r/20241113235433.20244-1-quic_eserrao@quicinc.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 405046934 (cherry picked from commit c17418f4) Change-Id: Ibed2f523154a016106e48d345d3d368adaeddd48 Signed-off-by:
Yinchu Chen <chenyc5@motorola.com>
-
- Mar 10, 2025
-
-
zhanghui authored
1 function symbol(s) added 'int __traceiter_android_vh_filemap_map_pages_range(void*, struct file*, unsigned long, unsigned long, vm_fault_t)' 1 variable symbol(s) added 'struct tracepoint __tracepoint_android_vh_filemap_map_pages_range' Bug: 398130226 Bug: 400290767 Bug: 401380632 Change-Id: I789a16f5d0bc3d11b9518c548276b2ce19514ead Signed-off-by:
zhanghui <zhanghui31@xiaomi.com> (cherry picked from commit 6b227a1f) (cherry picked from commit 08e9e947)
-
zhanghui authored
In the current vendor hook, if next_uptodate_folio returns NULL, the first_pgoff is set to zero, and the last_pgoff is set to start_pgoff. Therefore, the collection range is from 0 to the start_pgoff. |-----------|------------|-------------|------------------| 0 start_pgoff first_pgoff last_pgoff end_pgoff We want to collect the first_pgoff to last_pgoff, so we have to add a new vendor hook. Bug: 398130226 Bug: 400290767 Bug: 401380632 Change-Id: I19d54c601e2ffc5de5ec2dafcd43fbdcdc84b0d2 Signed-off-by:
zhanghui <zhanghui31@xiaomi.com> (cherry picked from commit eaffa3e3) (cherry picked from commit 5ef6cde4)
-
Marcus Ma authored
Adding the following symbols: - page_swap_info Bug: 397308736 Bug: 400290767 Bug: 401380632 Change-Id: Ica1c945fd0401c0276d0409ff284fe9debc352a3 Signed-off-by:
Marcus Ma <maminghui5@xiaomi.corp-partner.google.com> (cherry picked from commit 4deb2cd7) (cherry picked from commit 24104369)
-
Marcus Ma authored
We present a specific requirement regarding the memory management and I/O operations.In our project,we're focused on handling scenarios where I/O delays are triggered by anoymous pages.During this period,we need to obtain swap_info_struct according to page to obtain the corresponding block device id. Bug: 397308736 Bug: 400290767 Bug: 401380632 Change-Id: Ibc11f412964245658cec60af42cf9486adc96e1a Signed-off-by:
Marcus Ma <maminghui5@xiaomi.corp-partner.google.com> (cherry picked from commit 75c1d11b) (cherry picked from commit be18f50e)
-
- Feb 26, 2025
-
-
Chao Yu authored
There are very similar codes in inc_valid_block_count() and inc_valid_node_count() which is used for available user block count calculation. This patch introduces a new helper get_available_block_count() to include those common codes, and used it to clean up codes. Bug: 399286969 Change-Id: Ie2ce55bdac091bc4880478eeba2a76e1608726e3 Signed-off-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit 0f1c6ede) [Added line for F2FS_IO_ALIGNED, which was removed in later kernels] Signed-off-by:
Daniel Rosenberg <drosen@google.com> (cherry picked from commit 38149d53)
-
Qi Han authored
UPSTREAM: f2fs: modify f2fs_is_checkpoint_ready logic to allow more data to be written with the CP disable When the free segment is used up during CP disable, many write or ioctl operations will get ENOSPC error codes, even if there are still many blocks available. We can reproduce it in the following steps: dd if=/dev/zero of=f2fs.img bs=1M count=65 mkfs.f2fs -f f2fs.img mount f2fs.img f2fs_dir -o checkpoint=disable:10% cd f2fs_dir i=1 ; while [[ $i -lt 50 ]] ; do (file_name=./2M_file$i ; dd \ if=/dev/random of=$file_name bs=1M count=2); i=$((i+1)); done sync i=1 ; while [[ $i -lt 50 ]] ; do (file_name=./2M_file$i ; truncate \ -s 1K $file_name); i=$((i+1)); done sync dd if=/dev/zero of=./file bs=1M count=20 In f2fs_need_SSR() function, it is allowed to use SSR to allocate blocks when CP is disabled, so in f2fs_is_checkpoint_ready function, can we judge the number of invalid blocks when free segment is not enough, and return ENOSPC only if the number of invalid blocks is also not enough. Bug: 399286969 Signed-off-by:
Qi Han <hanqi@vivo.com> Reviewed-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> (cherry picked from commit 84b5bb8b) Signed-off-by:
Daniel Rosenberg <drosen@google.com> (cherry picked from https://android-review.googlesource.com/q/commit:225caf3bdf7a4977ae50ba9b5c5470a16d480100) Merged-In: I41ad315f603cd764d0e9b8ef984447e7116b1514 Change-Id: I41ad315f603cd764d0e9b8ef984447e7116b1514 (cherry picked from commit 182b090f)
-
- Feb 17, 2025
-
-
Commit "mm,page_alloc,cma: conditionally prefer cma pageblocks for movable allocations" (16867664) introduced balancing of movable allocations between CMA and normal areas. Commit "ANDROID: cma: redirect page allocation to CMA" (f60c5572) removes it, making allocations go in CMA area first. 1. Reintroduce the condition, so that CMA and normal area are used in a balanced way(as it used to be), so it prevents depleting of CMA region; 2. Back-port a command line option(from 6.6), "restrict_cma_redirect", that can be used if only MOVABLE allocations marked as __GFP_CMA are eligible to be redirected to CMA region. By default it is true. The purpose of this change is to keep using CMA for movable allocations, but at the same time, to have enough free CMA pages for critical system areas such as modem initialization, GPU initialization and so on. Bug: 397184449 Bug: 381168812 Signed-off-by:
Sebastian Achim <sebastian.1.achim@sony.com> Signed-off-by:
Uladzislau Rezki <uladzislau.rezki@sony.com> Signed-off-by:
Oleksiy Avramchenko <oleksiy.avramchenko@sony.com> Change-Id: I5fd6d022340715e27754c687189c5ea0e56d9ee6 (cherry picked from commit ccc91578)
-
- Jan 22, 2025
-
-
Thiébaud Weksteen authored
commit 900f83cf upstream. When evaluating extended permissions, ignore unknown permissions instead of calling BUG(). This commit ensures that future permissions can be added without interfering with older kernels. Bug: 381754752 Cc: stable@vger.kernel.org Fixes: fa1aa143 ("selinux: extended permissions for ioctls") Signed-off-by:
Thiébaud Weksteen <tweek@google.com> Signed-off-by:
Paul Moore <paul@paul-moore.com> Acked-by:
Paul Moore <paul@paul-moore.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by:
Thiébaud Weksteen <tweek@google.com> Change-Id: I1689d8c5084a24c1a34ef3d15d71f8cfa4122447
-
- Jan 21, 2025
-
-
Hyunwoo Kim authored
commit 6ca57537 upstream. During loopback communication, a dangling pointer can be created in vsk->trans, potentially leading to a Use-After-Free condition. This issue is resolved by initializing vsk->trans to NULL. Bug: 378870958 Cc: stable <stable@kernel.org> Fixes: 06a8fc78 ("VSOCK: Introduce virtio_vsock_common.ko") Signed-off-by:
Hyunwoo Kim <v4bel@theori.io> Signed-off-by:
Wongi Lee <qwerty@theori.io> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> Message-Id: <2024102245-strive-crib-c8d3@gregkh> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit b110196f) Signed-off-by:
Lee Jones <joneslee@google.com> Change-Id: I5eb7b5ccf7f0d96644cc4313548c0114e8836149
-
- Jan 13, 2025
-
-
Yosry Ahmed authored
Previous patches removed the only caller of cgroup_rstat_flush_atomic(). Remove the function and simplify the code. Link: https://lkml.kernel.org/r/20230421174020.2994750-6-yosryahmed@google.com Change-Id: I28939a92d4dcc6b8ad03dfaf7243e6d2b6af9100 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 0a2dc6ac) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
Previous patches removed all callers of mem_cgroup_flush_stats_atomic(). Remove the function and simplify the code. Link: https://lkml.kernel.org/r/20230421174020.2994750-5-yosryahmed@google.com Change-Id: Ieef476cf2c0aacb48af4e82e4bb96e5cc69292c5 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 35822fda) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
Currently, we approximate the root usage by adding the memcg stats for anon, file, and conditionally swap (for memsw). To read the memcg stats we need to invoke an rstat flush. rstat flushes can be expensive, they scale with the number of cpus and cgroups on the system. mem_cgroup_usage() is called by memcg_events()->mem_cgroup_threshold() with irqs disabled, so such an expensive operation with irqs disabled can cause problems. Instead, approximate the root usage from global state. This is not 100% accurate, but the root usage has always been ill-defined anyway. Link: https://lkml.kernel.org/r/20230421174020.2994750-4-yosryahmed@google.com Change-Id: Ifbc895f45fc119f3079c17fd8f5e3d87267428c4 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Reviewed-by:
Michal Koutný <mkoutny@suse.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit f82a7a86) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
The previous patch moved the wb_over_bg_thresh()->mem_cgroup_wb_stats() code path in wb_writeback() outside the lock section. We no longer need to flush the stats atomically. Flush the stats non-atomically. Link: https://lkml.kernel.org/r/20230421174020.2994750-3-yosryahmed@google.com Change-Id: Ibbb2dd41fac7c26f1e18d0a984d7197b45294f6f Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Reviewed-by:
Michal Koutný <mkoutny@suse.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 190409ca) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
Patch series "cgroup: eliminate atomic rstat flushing", v5. A previous patch series [1] changed most atomic rstat flushing contexts to become non-atomic. This was done to avoid an expensive operation that scales with # cgroups and # cpus to happen with irqs disabled and scheduling not permitted. There were two remaining atomic flushing contexts after that series. This series tries to eliminate them as well, eliminating atomic rstat flushing completely. The two remaining atomic flushing contexts are: (a) wb_over_bg_thresh()->mem_cgroup_wb_stats() (b) mem_cgroup_threshold()->mem_cgroup_usage() For (a), flushing needs to be atomic as wb_writeback() calls wb_over_bg_thresh() with a spinlock held. However, it seems like the call to wb_over_bg_thresh() doesn't need to be protected by that spinlock, so this series proposes a refactoring that moves the call outside the lock criticial section and makes the stats flushing in mem_cgroup_wb_stats() non-atomic. For (b), flushing needs to be atomic as mem_cgroup_threshold() is called with irqs disabled. We only flush the stats when calculating the root usage, as it is approximated as the sum of some memcg stats (file, anon, and optionally swap) instead of the conventional page counter. This series proposes changing this calculation to use the global stats instead, eliminating the need for a memcg stat flush. After these 2 contexts are eliminated, we no longer need mem_cgroup_flush_stats_atomic() or cgroup_rstat_flush_atomic(). We can remove them and simplify the code. [1] https://lore.kernel.org/linux-mm/20230330191801.1967435-1-yosryahmed@google.com/ This patch (of 5): wb_over_bg_thresh() calls mem_cgroup_wb_stats() which invokes an rstat flush, which can be expensive on large systems. Currently, wb_writeback() calls wb_over_bg_thresh() within a lock section, so we have to do the rstat flush atomically. On systems with a lot of cpus and/or cgroups, this can cause us to disable irqs for a long time, potentially causing problems. Move the call to wb_over_bg_thresh() outside the lock section in preparation to make the rstat flush in mem_cgroup_wb_stats() non-atomic. The list_empty(&wb->work_list) check should be okay outside the lock section of wb->list_lock as it is protected by a separate lock (wb->work_lock), and wb_over_bg_thresh() doesn't seem like it is modifying any of wb->b_* lists the wb->list_lock is protecting. Also, the loop seems to be already releasing and reacquring the lock, so this refactoring looks safe. Link: https://lkml.kernel.org/r/20230421174020.2994750-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230421174020.2994750-2-yosryahmed@google.com Change-Id: Id9dbfad96cfd32f1381d7640e1e2cf62c0a56189 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Reviewed-by:
Michal Koutný <mkoutny@suse.com> Reviewed-by:
Jan Kara <jack@suse.cz> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Tejun Heo <tj@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 2816ea2a) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
In some situations, we may end up calling memcg_rstat_updated() with a value of 0, which means the stat was not actually updated. An example is if we fail to reclaim any pages in shrink_folio_list(). Do not add the cgroup to the rstat updated tree in this case, to avoid unnecessarily flushing it. Link: https://lkml.kernel.org/r/20230330191801.1967435-9-yosryahmed@google.com Change-Id: I8142aa6c2962e5af590a5e93f89fbf2313d4b741 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Michal Koutný <mkoutny@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit f9d911ca) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
Memory reclaim is a sleepable context. Flushing is an expensive operaiton that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically unnecessarily. This can slow down reclaim code if flushing stats is taking too long, but there is already multiple cond_resched()'s in reclaim code. Link: https://lkml.kernel.org/r/20230330191801.1967435-8-yosryahmed@google.com Change-Id: Ia0f0d42131e67a060dc7c9b868ef5247d78e05c8 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 0d856cfe) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
In workingset_refault(), we call mem_cgroup_flush_stats_atomic_ratelimited() to read accurate stats within an RCU read section and with sleeping disallowed. Move the call above the RCU read section to make it non-atomic. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Since workingset_refault() is the only caller of mem_cgroup_flush_stats_atomic_ratelimited(), just make it non-atomic, and rename it to mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/20230330191801.1967435-7-yosryahmed@google.com Change-Id: Ia073b9bcef12dfced053ebea7a5da96ce942596c Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 4009b2f1) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
Currently, all contexts that flush memcg stats do so with sleeping not allowed. Some of these contexts are perfectly safe to sleep in, such as reading cgroup files from userspace or the background periodic flusher. Flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system, so avoid doing it atomically where possible. Refactor the code to make mem_cgroup_flush_stats() non-atomic (aka sleepable), and provide a separate atomic version. The atomic version is used in reclaim, refault, writeback, and in mem_cgroup_usage(). All other code paths are left to use the non-atomic version. This includes callbacks for userspace reads and the periodic flusher. Since refault is the only caller of mem_cgroup_flush_stats_ratelimited(), change it to mem_cgroup_flush_stats_atomic_ratelimited(). Reclaim and refault code paths are modified to do non-atomic flushing in separate later patches -- so it will eventually be changed back to mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/20230330191801.1967435-6-yosryahmed@google.com Change-Id: I9c28e852e1a37202fbd3ee419c72acf667d63404 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 9fad9aee) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
As Johannes notes in [1], stats_flush_lock is currently used to: (a) Protect updated to stats_flush_threshold. (b) Protect updates to flush_next_time. (c) Serializes calls to cgroup_rstat_flush() based on those ratelimits. However: 1. stats_flush_threshold is already an atomic 2. flush_next_time is not atomic. The writer is locked, but the reader is lockless. If the reader races with a flush, you could see this: if (time_after(jiffies, flush_next_time)) spin_trylock() flush_next_time = now + delay flush() spin_unlock() spin_trylock() flush_next_time = now + delay flush() spin_unlock() which means we already can get flushes at a higher frequency than FLUSH_TIME during races. But it isn't really a problem. The reader could also see garbled partial updates if the compiler decides to split the write, so it needs at least READ_ONCE and WRITE_ONCE protection. 3. Serializing cgroup_rstat_flush() calls against the ratelimit factors is currently broken because of the race in 2. But the race is actually harmless, all we might get is the occasional earlier flush. If there is no delta, the flush won't do much. And if there is, the flush is justified. So the lock can be removed all together. However, the lock also served the purpose of preventing a thundering herd problem for concurrent flushers, see [2]. Use an atomic instead to serve the purpose of unifying concurrent flushers. [1]https://lore.kernel.org/lkml/20230323172732.GE739026@cmpxchg.org/ [2]https://lore.kernel.org/lkml/20210716212137.1391164-2-shakeelb@google.com/ Link: https://lkml.kernel.org/r/20230330191801.1967435-5-yosryahmed@google.com Change-Id: I98e8344b440486162426186c4abdf21e02eebd43 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 3cd9992b) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
Currently, the only context in which we can invoke an rstat flush from irq context is through mem_cgroup_usage() on the root memcg when called from memcg_check_events(). An rstat flush is an expensive operation that should not be done in irq context, so do not flush stats and use the stale stats in this case. Arguably, usage threshold events are not reliable on the root memcg anyway since its usage is ill-defined. Link: https://lkml.kernel.org/r/20230330191801.1967435-4-yosryahmed@google.com Change-Id: If230311168f126e3741afaeab1f20cb1949190f0 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Suggested-by:
Johannes Weiner <hannes@cmpxchg.org> Suggested-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit a2174e95) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
mem_cgroup_flush_stats_delayed() suggests his is using a delayed_work, but this is actually sometimes flushing directly from the callsite. What it's doing is ratelimited calls. A better name would be mem_cgroup_flush_stats_ratelimited(). Link: https://lkml.kernel.org/r/20230330191801.1967435-3-yosryahmed@google.com Change-Id: Ib418ae17599d10c520b766f0c0e4396a43906256 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Suggested-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 92fbbc72) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
Yosry Ahmed authored
Patch series "memcg: avoid flushing stats atomically where possible", v3. rstat flushing is an expensive operation that scales with the number of cpus and the number of cgroups in the system. The purpose of this series is to minimize the contexts where we flush stats atomically. Patches 1 and 2 are cleanups requested during reviews of prior versions of this series. Patch 3 makes sure we never try to flush from within an irq context. Patches 4 to 7 introduce separate variants of mem_cgroup_flush_stats() for atomic and non-atomic flushing, and make sure we only flush the stats atomically when necessary. Patch 8 is a slightly tangential optimization that limits the work done by rstat flushing in some scenarios. This patch (of 8): cgroup_rstat_flush_irqsafe() can be a confusing name. It may read as "irqs are disabled throughout", which is what the current implementation does (currently under discussion [1]), but is not the intention. The intention is that this function is safe to call from atomic contexts. Name it as such. Link: https://lkml.kernel.org/r/20230330191801.1967435-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20230330191801.1967435-2-yosryahmed@google.com Change-Id: I7a030bc657b330ce700a29ded19f995e26f3aec1 Signed-off-by:
Yosry Ahmed <yosryahmed@google.com> Suggested-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Vasily Averin <vasily.averin@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> (cherry picked from commit 8bff9a04) Bug: 322544714 Bug: 389118629 Signed-off-by:
T.J. Mercier <tjmercier@google.com>
-
- Jan 10, 2025
-
-
Qun-Wei Lin authored
commit 70457385 upstream. This patch addresses an issue introduced by commit 1a83a716 ("mm: krealloc: consider spare memory for __GFP_ZERO") which causes MTE (Memory Tagging Extension) to falsely report a slab-out-of-bounds error. The problem occurs when zeroing out spare memory in __do_krealloc. The original code only considered software-based KASAN and did not account for MTE. It does not reset the KASAN tag before calling memset, leading to a mismatch between the pointer tag and the memory tag, resulting in a false positive. Example of the error: ================================================================== swapper/0: BUG: KASAN: slab-out-of-bounds in __memset+0x84/0x188 swapper/0: Write at addr f4ffff8005f0fdf0 by task swapper/0/1 swapper/0: Pointer tag: [f4], memory tag: [fe] swapper/0: swapper/0: CPU: 4 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12. swapper/0: Hardware name: MT6991(ENG) (DT) swapper/0: Call trace: swapper/0: dump_backtrace+0xfc/0x17c swapper/0: show_stack+0x18/0x28 swapper/0: dump_stack_lvl+0x40/0xa0 swapper/0: print_report+0x1b8/0x71c swapper/0: kasan_report+0xec/0x14c swapper/0: __do_kernel_fault+0x60/0x29c swapper/0: do_bad_area+0x30/0xdc swapper/0: do_tag_check_fault+0x20/0x34 swapper/0: do_mem_abort+0x58/0x104 swapper/0: el1_abort+0x3c/0x5c swapper/0: el1h_64_sync_handler+0x80/0xcc swapper/0: el1h_64_sync+0x68/0x6c swapper/0: __memset+0x84/0x188 swapper/0: btf_populate_kfunc_set+0x280/0x3d8 swapper/0: __register_btf_kfunc_id_set+0x43c/0x468 swapper/0: register_btf_kfunc_id_set+0x48/0x60 swapper/0: register_nf_nat_bpf+0x1c/0x40 swapper/0: nf_nat_init+0xc0/0x128 swapper/0: do_one_initcall+0x184/0x464 swapper/0: do_initcall_level+0xdc/0x1b0 swapper/0: do_initcalls+0x70/0xc0 swapper/0: do_basic_setup+0x1c/0x28 swapper/0: kernel_init_freeable+0x144/0x1b8 swapper/0: kernel_init+0x20/0x1a8 swapper/0: ret_from_fork+0x10/0x20 ================================================================== Bug: 388132060 (cherry picked from commit 486aeb5f https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ linux-6.1.y) Fixes: 1a83a716 ("mm: krealloc: consider spare memory for __GFP_ZERO") Signed-off-by:
Qun-Wei Lin <qun-wei.lin@mediatek.com> Acked-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Seiya Wang <seiya.wang@mediatek.com> Change-Id: Iea0ba629183042d594665ab51b410965963d167e
-
Seiya Wang authored
Add bt_sock_linked to protected symbol list Bug: 387804010 Bug: 388980392 Change-Id: I96abbc18d9cb122708a07d80ae9f8fa2da276ef2 Signed-off-by:
Seiya Wang <seiya.wang@mediatek.com> (cherry picked from commit 770852bf)
-
- Jan 08, 2025
-
-
Dan Carpenter authored
commit f7d306b4 upstream. The usb_get_descriptor() function does DMA so we're not allowed to use a stack buffer for that. Doing DMA to the stack is not portable all architectures. Move the "new_device_descriptor" from being stored on the stack and allocate it with kmalloc() instead. Bug: 382243530 Fixes: b909df18 ("ALSA: usb-audio: Fix potential out-of-bound accesses for Extigy and Mbox devices") Cc: stable@kernel.org Signed-off-by:
Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/60e3aa09-039d-46d2-934c-6f123026c2eb@stanley.mountain Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Benoît Sevens <bsevens@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 44a7b041) Signed-off-by:
Lee Jones <joneslee@google.com> Change-Id: I469212aa538584e3d8cc5b0087b68c99acf43f64
-
Benoît Sevens authored
commit b909df18 upstream. A bogus device can provide a bNumConfigurations value that exceeds the initial value used in usb_get_configuration for allocating dev->config. This can lead to out-of-bounds accesses later, e.g. in usb_destroy_configuration. Bug: 382243530 Signed-off-by:
Benoît Sevens <bsevens@google.com> Fixes: 1da177e4 ("Linux-2.6.12-rc2") Cc: stable@kernel.org Link: https://patch.msgid.link/20241120124144.3814457-1-bsevens@google.com Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit b8f8b81d) Signed-off-by:
Lee Jones <joneslee@google.com> Change-Id: I1aa1a442a5c87116200dcab02f84e1bd48f86bb5
-
- Dec 19, 2024
-
-
Takashi Iwai authored
commit a3dd4d63 upstream. The current USB-audio driver code doesn't check bLength of each descriptor at traversing for clock descriptors. That is, when a device provides a bogus descriptor with a shorter bLength, the driver might hit out-of-bounds reads. For addressing it, this patch adds sanity checks to the validator functions for the clock descriptor traversal. When the descriptor length is shorter than expected, it's skipped in the loop. For the clock source and clock multiplier descriptors, we can just check bLength against the sizeof() of each descriptor type. OTOH, the clock selector descriptor of UAC2 and UAC3 has an array of bNrInPins elements and two more fields at its tail, hence those have to be checked in addition to the sizeof() check. Bug: 382239029 Reported-by:
Benoît Sevens <bsevens@google.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/20241121140613.3651-1-bsevens@google.com Link: https://patch.msgid.link/20241125144629.20757-1-tiwai@suse.de Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> (cherry picked from commit 74cb86e1) Signed-off-by:
Lee Jones <joneslee@google.com> Change-Id: I13e916ffd46fce6fd08f7b9f96cea82bb4bc475d
-
- Dec 16, 2024
-
-
Seiya Wang authored
1 function symbol(s) added 'unsigned int tty_buffer_space_avail(struct tty_port*)' Bug: 383039133 Bug: 384396059 Change-Id: Ic92096ecc482c120b8cda325a7f2ae44f99e8527 Signed-off-by:
Seiya Wang <seiya.wang@mediatek.com> (cherry picked from commit d8ccecb7)
-
Seiya Wang authored
1 function symbol(s) added 'int snd_pcm_format_set_silence(snd_pcm_format_t, void*, unsigned int)' Bug: 382303912 Bug: 384396059 Signed-off-by:
Seiya Wang <seiya.wang@mediatek.com> Change-Id: I7de7c367b52114b9770697cf3eaeb47008a0e52b (cherry picked from commit c276c539)
-
Seiya Wang authored
5 function symbol(s) added 'void* devm_memremap_pages(struct device*, struct dev_pagemap*)' 'int rproc_detach(struct rproc*)' 'struct socket* tun_get_socket(struct file*)' 'struct ptr_ring* tun_get_tx_ring(struct file*)' 'void tun_ptr_free(void*)' Bug: 381284575 Bug: 384396059 Signed-off-by:
Seiya Wang <seiya.wang@mediatek.com> Change-Id: I593a5dd82f5b7fde7bb3ebbd74fead839fcfdf9f (cherry picked from commit cdea241b)
-
- Dec 12, 2024
-
-
Jiri Kosina authored
[ Upstream commit 177f25d1 ] Since the report buffer is used by all kinds of drivers in various ways, let's zero-initialize it during allocation to make sure that it can't be ever used to leak kernel memory via specially-crafted report. Bug: 380395346 Fixes: 27ce4050 ("HID: fix data access in implement()") Reported-by:
Benoît Sevens <bsevens@google.com> Acked-by:
Benjamin Tissoires <bentiss@kernel.org> Signed-off-by:
Jiri Kosina <jkosina@suse.com> Signed-off-by:
Sasha Levin <sashal@kernel.org> (cherry picked from commit 9d9f5c75) Signed-off-by:
Lee Jones <joneslee@google.com> Change-Id: I31f64f2745347137bbc415eb35b7fab5761867f3
-
Todd Kjos authored
This reverts commit 62bbb08a. Reason for revert: b/382800956 Bug: 382800956 Change-Id: Ic7a0cdbb060c12c1628a5859d795e78cd6b9341d Signed-off-by:
Todd Kjos <tkjos@google.com> (cherry picked from commit c3766284) Signed-off-by:
Lee Jones <joneslee@google.com>
-
- Dec 02, 2024
-
-
Todd Kjos authored
Bug: 380360698 Signed-off-by:
Todd Kjos <tkjos@google.com> Change-Id: I71aeaa612f23804e376a3d9ebe33ac991e62e3b6
-
- Nov 29, 2024
-
-
Quentin Perret authored
Similar to how we failed to cross-check the state from the completer's PoV on the hyp_ack_unshare() path, we fail to do so from host_ack_unshare(). This shouldn't cause problems in practice as this can only be called on the guest_unshare_host() path, and guest currently don't have the ability to share their pages with anybody other than the host. But this again is rather fragile, so let's simply do the proper check -- it isn't very costly thanks to the hyp_vmemmap optimisation. Bug: 381409114 Change-Id: I3770b7db55c579758863e41f50ab30f6a8bb4a0c Signed-off-by:
Quentin Perret <qperret@google.com>
-
Quentin Perret authored
There are multiple pKVM memory transitions where the state of a page is not cross-checked from the completer's PoV for performance reasons. For example, if a page is PKVM_PAGE_OWNED from the initiator's PoV, we should be guaranteed by construction that it is PKVM_NOPAGE for everybody else, hence allowing us to save a page-table lookup. When it was introduced, hyp_ack_unshare() followed that logic and bailed out without checking the PKVM_PAGE_SHARED_BORROWED state in the hypervisor's stage-1. This was correct as we could safely assume that all host-initiated shares were directed at the hypervisor at the time. But with the introduction of other types of shares (e.g. for FF-A or non-protected guests), it is now very much required to cross check this state to prevent the host from running __pkvm_host_unshare_hyp() on a page shared with TZ or a non-protected guest. Thankfully, if an attacker were to try this, the hyp_unmap() call from hyp_complete_unshare() would fail, hence causing to WARN() from __do_unshare() with the host lock held, which is fatal. But this is fragile at best, and can hardly be considered a security measure. Let's just do the right thing and always check the state from hyp_ack_unshare(). Bug: 381409114 Link: https://lore.kernel.org/kvmarm/20241128154406.602875-1-qperret@google.com/ Change-Id: Id3bbd1fc3c75df506b0919f4d6f7be74b6f013f3 Signed-off-by:
Quentin Perret <qperret@google.com>
-