Skip to content
Snippets Groups Projects
  1. Mar 04, 2025
    • Suren Baghdasaryan's avatar
      ANDROID: cma: Change restrict_cma_redirect to be set by default · ce561872
      Suren Baghdasaryan authored
      
      When restrict_cma_redirect boot parameter was introduced, its default
      was set to follow the upstream behavior, which does not restrict any
      movable allocation from using CMA. However this poses an issue when
      partners upgrade from previous Android kernels and expect the same
      behavior, which is to restrict CMA usage to movable allocations with
      __GFP_CMA.
      Change the default value of restrict_cma_redirect boot parameter to
      keep backward compatibility with earlier ACK versions. Partners who
      need upstream behavior will need to set restrict_cma_redirect=false
      explicitly.
      
      Bug: 399727765
      Change-Id: Ia88008578557e38da54a455bc4ce3dc6f86fe52e
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      ce561872
    • Kalesh Singh's avatar
      ANDROID: cma_get_first_virtzone_base: fix !CONFIG_CMA error · 3c4e8061
      Kalesh Singh authored
      
      cma_get_first_virtzone_base() was not defined if CONFIG_CMA is not set.
      
      Define it to return 0 (no-op) if !CONFIG_CMA.
      
      Bug: 400651191
      Bug: 313807618
      Test: tools/bazel run --lto=none
                 //common:kernel_aarch64_microdroid_16k_dist
      Change-Id: I9083ece26d60cc967baf93e8e29b16d6b0901a15
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      3c4e8061
    • pengzhongcui's avatar
      ANDROID: GKI: Update symbol list for xiaomi · 8f602e19
      pengzhongcui authored
      
      2 variable symbol(s) added
        'struct tracepoint __tracepoint_android_vh_tune_swappiness'
        'struct tracepoint __tracepoint_android_vh_shrink_slab_async'
      
      Bug: 399777353
      
      Change-Id: If3fb7fa00349160e5b939b53208725396237c999
      Signed-off-by: default avatarpengzhongcui <pengzhongcui@xiaomi.corp-partner.google.com>
      8f602e19
    • pengzhongcui's avatar
      ANDROID: vendor_hook: Add hook is to optimize the time consumption of shrink slab. · 05ab4ba8
      pengzhongcui authored
      
      one Vendor hook add:
          android_vh_do_shrink_slab_ex
      
      Add vendor hook point in do_shrink_slab to optimize for user
      experience related threads and time-consuming shrinkers.
      
      Bug: 399777353
      
      Change-Id: I63778c73f76930fe27869e33ba6cdb97d50cf543
      Signed-off-by: default avatarpengzhongcui <pengzhongcui@xiaomi.corp-partner.google.com>
      05ab4ba8
    • Tangquan Zheng's avatar
      ANDROID: mm: Allow non-movable allocations to use virtual zones · 46d2359d
      Tangquan Zheng authored
      
      In some cases, our demand for mTHP is not as urgent, while the demand
      for other resources, such as dma-buf, becomes more pressing. However,
      the reserved virtual zones may not be efficiently utilized by dma-buf
      and similar use cases. Therefore, if we are absolutely certain that
      the product will not require movable zones, we can allow virtual zones
      to be allocated for requests with unmovable flags.
      
      After supporting the use of virtual zones for non-movable allocations,
      we need to address the large page migration issue triggered by the
      pin_user_pages function:
      During large folio performance profiling, we found unnecessary performance
      loss due to a large number of migrations caused by the pin_user_pages
       function during boot-up and continuous application startup testing.
      Call trace:
      	dump_stack+0x18/0x24
      	folio_add_anon_rmap_ptes+0x294/0x338
      	remove_migration_pte+0x268/0x514
      	rmap_walk_anon+0x1c8/0x278
      	rmap_walk+0x28/0x38
      	migrate_pages_batch+0xbc8/0x126c
      	migrate_pages+0x16c/0x7a4
      	__gup_longterm_locked+0x4a4/0x85c
      	pin_user_pages+0x68/0xc4
      	gup_local_repeat+0x38/0x1cc [mcDrvModule_ffa]
      	tee_mmu_create+0x368/0x804 [mcDrvModule_ffa]
      	client_mmu_create+0x58/0xd4 [mcDrvModule_ffa]
      	wsm_create+0x44/0x11c [mcDrvModule_ffa]
      	session_mc_map+0x174/0x244 [mcDrvModule_ffa]
      	client_mc_map+0x34/0x64 [mcDrvModule_ffa]
      	user_ioctl+0x704/0x830 [mcDrvModule_ffa]
      	__arm64_sys_ioctl+0xa8/0xe4
      
      With TAO enabled, large folio come from either ZONE_NOSPLIT or ZONE_NOMERGE policy zones.
      Both of the policy zones are movable zones(Refer to folio_is_zone_movable).
      Therefore folio_is_longterm_pinnable will return false and large folio will be added to the
      movable_page_list and migrated in the migrate_longterm_unpinnable_pages function.
      On the other hand, migrating large folio is costly and involves frequent calls to the deferred_split_folio function.
      What's worse is that dst folio will still alloc from two policy zones, the gfp flag will be set to GFP_TRANSHUGE
      in the alloc_migration_target function. So the whole migration behaviour seems to become meaningless!
      
      Bug: 313807618
      Change-Id: I2fdfc4df8b03daa96fd6c2c8c6630d26a8509ad0
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Signed-off-by: default avatarShuai Yuanyuan <yuanshuai@oppo.com>
      Signed-off-by: default avatarTangquan Zheng <zhengtangquan@oppo.com>
      46d2359d
    • Tangquan Zheng's avatar
      ANDROID: mm: Skip the virtzones area when reserving the CMA region · cb268664
      Tangquan Zheng authored
      
      We have discovered a bug: when there is an overlap between CMA and TAO,
      the cma_alloc() function internally causes splitting, which directly
      triggers a kernel warning.
      	Call trace:
      	split_free_page+0x29c/0x2f0
      	isolate_single_pageblock+0x38c/0x478
      	start_isolate_page_range+0x8c/0x178
      	alloc_contig_range+0xf8/0x2f0
      	__cma_alloc+0x3dc/0x668
      	cma_alloc+0x28/0x40
      	dma_alloc_from_contiguous+0x4c/0x60
      	atomic_pool_expand+0x9c/0x338
      	__dma_atomic_pool_init+0x54/0xc8
      	dma_atomic_pool_init+0xb8/0x200
      	do_one_initcall+0x80/0x360
      	kernel_init_freeable+0x2ac/0x568
      	kernel_init+0x2c/0x1f0
      	ret_from_fork+0x10/0x20
      The fix for this issue is to skip the virtzones area when reserving the CMA region,
      ensuring that CMA and virtzones do not overlap.
      
      Bug: 313807618
      Change-Id: I121c75defa6652777491818fcad1e87d14d0f02f
      Signed-off-by: default avatarTangquan Zheng <zhengtangquan@oppo.com>
      cb268664
    • Tangquan Zheng's avatar
      ANDROID: mm: usefaultfd: fix userfaultfd_move while large folios are from virtual zones · c6d8de0d
      Tangquan Zheng authored
      
      While large folios originate from virtual zones, split_folio() migrates them into
      nr_pages small folios and returns a value greater than 0. In this case, we should
      retry the move operation using the new small folios as sources. Otherwise, this
      may trigger a kernel BUG.
      
      [   64.788670] ------------[ cut here ]------------
      [   64.789179] WARNING: CPU: 0 PID: 126 at mm/userfaultfd.c:1760 move_pages+0x2bc/0x1960
      [   64.790059] Modules linked in:
      [   64.790866] CPU: 0 PID: 126 Comm: a.out Tainted: G        W          6.6.66-g29bb63ce7190-dirty #216
      [   64.791467] Hardware name: linux,dummy-virt (DT)
      [   64.791933] pstate: 21402005 (nzCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
      [   64.792412] pc : move_pages+0x2bc/0x1960
      [   64.792810] lr : move_pages+0x1a4/0x1960
      [   64.793194] sp : ffff800083ffbbc0
      [   64.793552] x29: ffff800083ffbc50 x28: 0000ffff850a0000 x27: 0000000000000001
      [   64.794412] x26: 0000ffff850b1000 x25: ffff00000576cd80 x24: ffff80008275aaf0
      [   64.795182] x23: ffff0000057625e8 x22: 0000000000000000 x21: ffff000005762a20
      [   64.795951] x20: 0000000000001000 x19: 0000ffff850b0000 x18: 0000000000000000
      [   64.796738] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000028
      [   64.797534] x14: 000000000000467c x13: 0000000000004679 x12: ffff8000834693c8
      [   64.798309] x11: 0000000000000000 x10: ffff8000825bdc20 x9 : ffff8000803e893c
      [   64.799107] x8 : ffff000005123900 x7 : ffff8000826c3000 x6 : 0000000000000000
      [   64.799882] x5 : 0000000000000001 x4 : ffff000005123900 x3 : ffff8000825bc008
      [   64.800665] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000005123900
      [   64.801525] Call trace:
      [   64.801929]  move_pages+0x2bc/0x1960
      [   64.802346]  userfaultfd_ioctl+0x484/0x1b98
      [   64.802742]  __arm64_sys_ioctl+0xb4/0x100
      [   64.803137]  invoke_syscall+0x50/0x120
      [   64.803521]  el0_svc_common.constprop.0+0x48/0xf0
      [   64.803916]  do_el0_svc+0x24/0x38
      [   64.804288]  el0_svc+0x58/0x148
      [   64.804659]  el0t_64_sync_handler+0x120/0x130
      [   64.805041]  el0t_64_sync+0x1a4/0x1a8
      [   64.805472] irq event stamp: 492
      [   64.805830] hardirqs last  enabled at (491): [<ffff8000803ce140>] uncharge_batch+0xd0/0x198
      [   64.806333] hardirqs last disabled at (492): [<ffff8000816761dc>] el1_dbg+0x24/0x98
      [   64.806829] softirqs last  enabled at (486): [<ffff800080063368>] handle_softirqs+0x548/0x570
      [   64.807323] softirqs last disabled at (475): [<ffff800080010934>] __do_softirq+0x1c/0x28
      [   64.807813] ---[ end trace 0000000000000000 ]---
      
      Bug: 313807618
      Change-Id: Ia8aef8301ed2c8bad3ce690f129c55788330cd26
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Signed-off-by: default avatarTangquan Zheng <zhengtangquan@oppo.com>
      c6d8de0d
    • Tangquan Zheng's avatar
      ANDROID: GKI: Update oplus symbol list · 6f195bb9
      Tangquan Zheng authored
      
      10 function symbol(s) added
        'int __ptep_set_access_flags(struct vm_area_struct*, unsigned long, pte_t*, pte_t, int)'
        'int __traceiter_android_vh_alloc_swap_folio_gfp(void*, struct vm_area_struct*, gfp_t*)'
        'int __traceiter_android_vh_get_swap_pages_bypass(void*, struct swap_info_struct*, int, bool*)'
        'int __traceiter_android_vh_replace_anon_vma_name(void*, struct vm_area_struct*, struct anon_vma_name*)'
        'int __traceiter_android_vh_reuse_whole_anon_folio(void*, struct folio*, struct vm_fault*, bool*)'
        'int __traceiter_android_vh_should_skip_zone(void*, struct zone*, gfp_t, unsigned int, int, bool*)'
        'int __traceiter_android_vh_should_split_folio_to_list(void*, struct folio*, bool*)'
        'int __traceiter_android_vh_update_unmapped_area_info(void*, struct vm_unmapped_area_info*)'
        'pte_t contpte_ptep_get(pte_t*, pte_t)'
        'int contpte_ptep_set_access_flags(struct vm_area_struct*, unsigned long, pte_t*, pte_t, int)'
      
      Bug: 313807618
      Change-Id: I5f2884ec964be2a15e2052dc6b5c55a88b14a424
      Signed-off-by: default avatarTangquan Zheng <zhengtangquan@oppo.com>
      6f195bb9
    • Tangquan Zheng's avatar
      ANDROID: vendor_hooks: Add hooks to entirely reuse the whole anonymous mTHP · 482ee073
      Tangquan Zheng authored
      Barry Song reported a problem here and provided a RFC patch:
      https://lore.kernel.org/linux-mm/20240831092339.66085-1-21cnbao@gmail.com/
      do_wp_page() has no ability to reuse mTHP, then it will CoW many small folios while wp fault occurs at a a read-only mTHP (for example, due to fork). This can somtimes waste lots of memory.
      David also did that in a more generic approach:
      https://lkml.kernel.org/r/20240829165627.2256514-1-david@redhat.com
      
      
      neither is merged by mm.
      Before either David's or Barry's method is ready to be merged into the mainline,we need to submit this GKI hook to support this functionality.
      
      android_vh_reuse_whole_anon_folio ----This vendor hook is used to entirely reuse the whole anonymous mTHP in do_wp_page.
      We also need to export the symbol __ptep_set_access_flags because it is called in our vendor hook function.
      
      Bug: 313807618
      Change-Id: I366569dd645a4a9e5f14c0d87e3768959e63ae17
      Signed-off-by: default avatarTangquan Zheng <zhengtangquan@oppo.com>
      482ee073
    • Tangquan Zheng's avatar
      ANDROID: vendor_hooks: Add hooks to enhance mTHP functionality · 58aa298d
      Tangquan Zheng authored
      
      We are adding these hooks to customize the mthp functionality.
      1.android_vh_alloc_swap_folio_gfp ----We use this vendor hook
      to update the allocation flags for swapping in large pages.
      
      2.android_vh_get_swap_pages_bypass ----We use dual zram to avoid
      swap fragmentation, as Chris’s swap reservation has not yet been
      merged. This vendor hook is used to select different swap devices.
      
      3.android_vh_should_split_folio_to_list ----This vendor hook is
      used to split shared mapped anonymous mTHP during swap-out.
      
      4.android_vh_should_skip_zone ----This vendor hook is used to
      prevent mTHP from occupying too much non-virtzone.
      
      5.android_vh_update_unmapped_area_info ----This is used to
      update vm_unmapped_area_info.
      
      6.android_vh_replace_anon_vma_name ----This is used to mark
      anon_vma_name for mthp.
      
      Bug: 313807618
      Change-Id: Ibadb440f89dacad91be17ada9bbff8424e9244d3
      Signed-off-by: default avatarTangquan Zheng <zhengtangquan@oppo.com>
      58aa298d
  2. Mar 03, 2025
    • Aran Dalton's avatar
      ANDROID: ABI: Update symbol list for sunxi · 2f50c21a
      Aran Dalton authored
      
      1 function symbol(s) added
        'void* devm_pci_remap_cfgspace(struct device*, resource_size_t, resource_size_t)'
      
      Bug: 400289337
      Change-Id: Ie641be053339715e699f95829c0a17aa4927242c
      Signed-off-by: default avatarAran Dalton <arda@allwinnertech.com>
      2f50c21a
    • Vincent Donnefort's avatar
      ANDROID: KVM: arm64: Enable RCU for pinned pages mtree · ccc8abc3
      Vincent Donnefort authored
      
      A maple tree might need memory allocation during insertion. This is not
      possible under the rwlock. Therefore, we need to preallocate memory
      before that lock is taken. This is done with the KVM_DUMMY_PPAGE
      insert_range. However, to do so, we need to enable the maple tree RCU
      locking that'll ensure the tree is stable between this lock-less
      insertion and concurrent tree walks. RCU protection must be
      manually.
      
      The sole limitation to the RCU protection is the mt_find() returned
      entry might not be valid. This is however not a problem as all the
      readers are protected from modifiers by the mmu_lock.
      
      On the VM destroy path, we can relax the RCU protection. No vCPU can run
      and as a consequence no concurrent tree access will occur.
      
      Bug: 278749606
      Bug: 395429108
      Change-Id: I8400f3b7bdda76d2a60ddcaeb3ea027607898eb2
      Signed-off-by: default avatarVincent Donnefort <vdonnefort@google.com>
      ccc8abc3
    • Roy Luo's avatar
      UPSTREAM: usb: gadget: core: flush gadget workqueue after device removal · de3fe451
      Roy Luo authored
      
      [ Upstream commit 399a45e5 ]
      
      device_del() can lead to new work being scheduled in gadget->work
      workqueue. This is observed, for example, with the dwc3 driver with the
      following call stack:
        device_del()
          gadget_unbind_driver()
            usb_gadget_disconnect_locked()
              dwc3_gadget_pullup()
      	  dwc3_gadget_soft_disconnect()
      	    usb_gadget_set_state()
      	      schedule_work(&gadget->work)
      
      Move flush_work() after device_del() to ensure the workqueue is cleaned
      up.
      
      Fixes: 5702f753 ("usb: gadget: udc-core: move sysfs_notify() to a workqueue")
      Cc: stable <stable@kernel.org>
      
      Bug: 400301689
      Change-Id: Icf64956f8a17b1876388546b679cfd203d9701dc
      Signed-off-by: default avatarRoy Luo <royluo@google.com>
      Reviewed-by: default avatarAlan Stern <stern@rowland.harvard.edu>
      Reviewed-by: default avatarThinh Nguyen <Thinh.Nguyen@synopsys.com>
      Link: https://lore.kernel.org/r/20250204233642.666991-1-royluo@google.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      (cherry picked from commit 859cb45a)
      Signed-off-by: default avatarwei li <sirius.liwei@honor.corp-partner.google.com>
      de3fe451
    • Carlos Llamas's avatar
      ANDROID: ABI: Update pixel symbol list · 9dd3f48d
      Carlos Llamas authored
      
      Adding the following symbols:
        - xhci_create_secondary_interrupter
        - xhci_disable_interrupter
        - xhci_enable_interrupter
        - xhci_initialize_ring_info
        - xhci_remove_secondary_interrupter
        - xhci_set_interrupter_moderation
        - xhci_stop_endpoint_sync
      
      Bug: 391779198
      Change-Id: I932747120c850b93468328900db99a1eb7821f47
      Signed-off-by: default avatarCarlos Llamas <cmllamas@google.com>
      [Lee: Rebased to avoid merge conflict - no changes required]
      Signed-off-by: default avatarLee Jones <joneslee@google.com>
      9dd3f48d
    • Selvarasu Ganesan's avatar
      UPSTREAM: usb: gadget: f_midi: Fixing wMaxPacketSize exceeded issue during MIDI bind retries · a0baa34d
      Selvarasu Ganesan authored
      
      The current implementation sets the wMaxPacketSize of bulk in/out
      endpoints to 1024 bytes at the end of the f_midi_bind function. However,
      in cases where there is a failure in the first midi bind attempt,
      consider rebinding. This scenario may encounter an f_midi_bind issue due
      to the previous bind setting the bulk endpoint's wMaxPacketSize to 1024
      bytes, which exceeds the ep->maxpacket_limit where configured dwc3 TX/RX
      FIFO's maxpacket size of 512 bytes for IN/OUT endpoints in support HS
      speed only.
      
      Here the term "rebind" in this context refers to attempting to bind the
      MIDI function a second time in certain scenarios. The situations where
      rebinding is considered include:
      
       * When there is a failure in the first UDC write attempt, which may be
         caused by other functions bind along with MIDI.
       * Runtime composition change : Example : MIDI,ADB to MIDI. Or MIDI to
         MIDI,ADB.
      
      This commit addresses this issue by resetting the wMaxPacketSize before
      endpoint claim. And here there is no need to reset all values in the usb
      endpoint descriptor structure, as all members except wMaxPacketSize and
      bEndpointAddress have predefined values.
      
      This ensures that restores the endpoint to its expected configuration,
      and preventing conflicts with value of ep->maxpacket_limit. It also
      aligns with the approach used in other function drivers, which treat
      endpoint descriptors as if they were full speed before endpoint claim.
      
      Fixes: 46decc82 ("usb: gadget: unconditionally allocate hs/ss descriptor in bind operation")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSelvarasu Ganesan <selvarasu.g@samsung.com>
      Link: https://lore.kernel.org/r/20250118060134.927-1-selvarasu.g@samsung.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Bug: 399689221
      Change-Id: Ib90cffc2a0b1a8b25042b4fa7fcad7947bdf0995
      (cherry picked from commit 9e8b2141)
      Signed-off-by: default avatarMeitao Gao <meitaogao@asrmicro.com>
      a0baa34d
  3. Feb 28, 2025
    • Mukesh Ojha's avatar
      Revert "ANDROID: gki_defconfig: enable CONFIG_KFENCE_STATIC_KEYS" · 511c84d7
      Mukesh Ojha authored
      
      This reverts commit ef1134dd.
      
      Sometimes back commit cfb00a35 ("arm64: jump_label: Ensure patched
      jump_labels are visible to all CPUs") got merged into all stable
      branches wherever applicable with citing a bug in static key which
      does not synchronizes among the cpus and adds IPI to all cores to
      fix this.
      
      Kfence is one of the user of static key and recently, it has been
      observed that after above commit during toggling kfence_allocation_key
      IPI is being sent to the core which are there low power mode which
      has regressed power numbers and after disabling CONFIG_KFENCE_STATIC_KEYS
      we see workload improved in the range of 1% - 10% resulting in 1% - 4%
      power savings for few audio playback, video decode & display cases and
      with no regression on benchmarks.
      
      Bug: 394509835
      Change-Id: I8efa3280bf115c33cc957f83ccb8e578730aa5f5
      Signed-off-by: default avatarMukesh Ojha <mukesh.ojha@oss.qualcomm.com>
      511c84d7
    • John Scheible's avatar
      ANDROID: ABI: Update pixel symbol list · 48a5719d
      John Scheible authored
      
      Adding the following symbols:
        - dev_pm_clear_wake_irq
        - dev_pm_set_wake_irq
        - dma_buf_vmap_unlocked
        - dma_buf_vunmap_unlocked
        - kvm_iommu_cma_alloc
        - kvm_iommu_cma_release
        - lru_cache_disable
        - lru_disable_count
        - __traceiter_android_vh_typec_store_partner_src_caps
        - __traceiter_android_vh_typec_tcpm_log
        - __traceiter_android_vh_typec_tcpm_modify_src_caps
        - __tracepoint_android_vh_typec_store_partner_src_caps
        - __tracepoint_android_vh_typec_tcpm_log
        - __tracepoint_android_vh_typec_tcpm_modify_src_caps
        - ufshcd_populate_vreg
        - ufshcd_resume_complete
        - ufshcd_runtime_resume
        - ufshcd_runtime_suspend
        - ufshcd_suspend_prepare
      
      Bug: 399486531
      Change-Id: Idf6a99e32cb9330968310ee0e364985cdcd0e087
      Signed-off-by: default avatarJohn Scheible <johnscheible@google.com>
      48a5719d
    • Yang Yang's avatar
      ANDROID: Fixed the KMI corruption issue caused by the patch of 72d04bdc. · 783d6780
      Yang Yang authored
      
      Due to 72d04bdc ("sbitmap: fix io hung due to race on sbitmap_word
      ::cleared") directly adding spinlock_t swap_1ock to struct sbitmap_word
      in sbitmap.h, KMI was damaged. In order to achieve functionality without
      damaging KMI, we can only apply for a block of memory with a size of
      map_nr * (sizeof (* sb ->map)+sizeof(spinlock_t)) to ensure that each
      struct sbitmap-word receives protection from spinlock.
      The actual memory distribution used is as follows:
      ----------------------
      struct sbitmap_word[0]
      ......................
      struct sbitmap_word[n]
      -----------------------
      spinlock_t swap_lock[0]
      .......................
      spinlock_t swap_lock[n]
      ----------------------
      sbitmap_word[0] corresponds to swap_lock[0], and sbitmap_word[n]
      corresponds to swap_lock[n], and so on.
      
      Fixes: ea86ea2c ("sbitmap: ammortize cost of clearing bits")
      Signed-off-by: default avatarYang Yang <yang.yang@vivo.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      
      Bug: 382398521
      Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com
      
      
      Change-Id: Idcab0dd5fd7c3147efd05dd6cc51757c2b0464f6
      Signed-off-by: default avatarliuyu <liuyu@allwinnertech.com>
      783d6780
    • Yang Yang's avatar
      UPSTREAM: sbitmap: fix io hung due to race on sbitmap_word::cleared · 378c357b
      Yang Yang authored
      
      Configuration for sbq:
        depth=64, wake_batch=6, shift=6, map_nr=1
      
      1. There are 64 requests in progress:
        map->word = 0xFFFFFFFFFFFFFFFF
      2. After all the 64 requests complete, and no more requests come:
        map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF
      3. Now two tasks try to allocate requests:
        T1:                                       T2:
        __blk_mq_get_tag                          .
        __sbitmap_queue_get                       .
        sbitmap_get                               .
        sbitmap_find_bit                          .
        sbitmap_find_bit_in_word                  .
        __sbitmap_get_word  -> nr=-1              __blk_mq_get_tag
        sbitmap_deferred_clear                    __sbitmap_queue_get
        /* map->cleared=0xFFFFFFFFFFFFFFFF */     sbitmap_find_bit
          if (!READ_ONCE(map->cleared))           sbitmap_find_bit_in_word
            return false;                         __sbitmap_get_word -> nr=-1
          mask = xchg(&map->cleared, 0)           sbitmap_deferred_clear
          atomic_long_andnot()                    /* map->cleared=0 */
                                                    if (!(map->cleared))
                                                      return false;
                                           /*
                                            * map->cleared is cleared by T1
                                            * T2 fail to acquire the tag
                                            */
      
      4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken
      up due to the wake_batch being set at 6. If no more requests come, T1
      will wait here indefinitely.
      
      This patch achieves two purposes:
      1. Check on ->cleared and update on both ->cleared and ->word need to
      be done atomically, and using spinlock could be the simplest solution.
      2. Add extra check in sbitmap_deferred_clear(), to identify whether
      ->word has free bits.
      
      Fixes: ea86ea2c ("sbitmap: ammortize cost of clearing bits")
      Signed-off-by: default avatarYang Yang <yang.yang@vivo.com>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Reviewed-by: default avatarBart Van Assche <bvanassche@acm.org>
      Link: https://lore.kernel.org/r/20240716082644.659566-1-yang.yang@vivo.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      (cherry picked from commit 72d04bdc)
      Signed-off-by: default avatarliuyu <liuyu@allwinnertech.com>
      Change-Id: Ibab11ef6a94d4db33fae5c4b314b119abc1cabc8
      378c357b
    • meitaogao's avatar
      ANDROID: GKI: Update asr symbol list · 87313319
      meitaogao authored
      
      2 function symbol(s) added
        'struct backlight_device* devm_of_find_backlight(struct device*)'
        'void sdhci_reset_tuning(struct sdhci_host*)'
      
      Bug: 399689222
      Change-Id: I1746ed7c4b0eef2e8f5363681b556af4eb5e7dcb
      Signed-off-by: default avatarmeitaogao <meitaogao@asrmicro.com>
      87313319
    • Barry Song's avatar
      BACKPORT: FROMGIT: mm: avoid splitting pmd for lazyfree pmd-mapped THP in try_to_unmap · 64d98b55
      Barry Song authored
      The try_to_unmap_one() function currently handles PMD-mapped THPs
      inefficiently.  It first splits the PMD into PTEs, copies the dirty state
      from the PMD to the PTEs, iterates over the PTEs to locate the dirty
      state, and then marks the THP as swap-backed.  This process involves
      unnecessary PMD splitting and redundant iteration.  Instead, this
      functionality can be efficiently managed in
      __discard_anon_folio_pmd_locked(), avoiding the extra steps and improving
      performance.
      
      The following microbenchmark redirties folios after invoking MADV_FREE,
      then measures the time taken to perform memory reclamation (actually set
      those folios swapbacked again) on the redirtied folios.
      
       #include <stdio.h>
       #include <sys/mman.h>
       #include <string.h>
       #include <time.h>
      
       #define SIZE 128*1024*1024  // 128 MB
      
       int main(int argc, char *argv[])
       {
       	while(1) {
       		volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
       				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
      
       		memset((void *)p, 1, SIZE);
       		madvise((void *)p, SIZE, MADV_FREE);
       		/* redirty after MADV_FREE */
       		memset((void *)p, 1, SIZE);
      
      		clock_t start_time = clock();
       		madvise((void *)p, SIZE, MADV_PAGEOUT);
       		clock_t end_time = clock();
      
       		double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
       		printf("Time taken by reclamation: %f seconds\n", elapsed_time);
      
       		munmap((void *)p, SIZE);
       	}
       	return 0;
       }
      
      Testing results are as below,
      w/o patch:
      ~ # ./a.out
      Time taken by reclamation: 0.007300 seconds
      Time taken by reclamation: 0.007226 seconds
      Time taken by reclamation: 0.007295 seconds
      Time taken by reclamation: 0.007731 seconds
      Time taken by reclamation: 0.007134 seconds
      Time taken by reclamation: 0.007285 seconds
      Time taken by reclamation: 0.007720 seconds
      Time taken by reclamation: 0.007128 seconds
      Time taken by reclamation: 0.007710 seconds
      Time taken by reclamation: 0.007712 seconds
      Time taken by reclamation: 0.007236 seconds
      Time taken by reclamation: 0.007690 seconds
      Time taken by reclamation: 0.007174 seconds
      Time taken by reclamation: 0.007670 seconds
      Time taken by reclamation: 0.007169 seconds
      Time taken by reclamation: 0.007305 seconds
      Time taken by reclamation: 0.007432 seconds
      Time taken by reclamation: 0.007158 seconds
      Time taken by reclamation: 0.007133 seconds
      …
      
      w/ patch
      
      ~ # ./a.out
      Time taken by reclamation: 0.002124 seconds
      Time taken by reclamation: 0.002116 seconds
      Time taken by reclamation: 0.002150 seconds
      Time taken by reclamation: 0.002261 seconds
      Time taken by reclamation: 0.002137 seconds
      Time taken by reclamation: 0.002173 seconds
      Time taken by reclamation: 0.002063 seconds
      Time taken by reclamation: 0.002088 seconds
      Time taken by reclamation: 0.002169 seconds
      Time taken by reclamation: 0.002124 seconds
      Time taken by reclamation: 0.002111 seconds
      Time taken by reclamation: 0.002224 seconds
      Time taken by reclamation: 0.002297 seconds
      Time taken by reclamation: 0.002260 seconds
      Time taken by reclamation: 0.002246 seconds
      Time taken by reclamation: 0.002272 seconds
      Time taken by reclamation: 0.002277 seconds
      Time taken by reclamation: 0.002462 seconds
      …
      
      This patch significantly speeds up try_to_unmap_one() by allowing it
      to skip redirtied THPs without splitting the PMD.
      
      Link: https://lkml.kernel.org/r/20250214093015.51024-5-21cnbao@gmail.com
      
      
      Change-Id: Ifaca70178abd5b22e00d6e59ed4dcff0fc5cb0b6
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Suggested-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Suggested-by: default avatarLance Yang <ioworker0@gmail.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarLance Yang <ioworker0@gmail.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chis Li <chrisl@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Shaoqin Huang <shahuang@redhat.com>
      Cc: Tangquan Zheng <zhengtangquan@oppo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      (cherry picked from commit 76a230cb https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git
      
       mm-unstable)
      Bug: 313807618
      [ Fix trivial conflicts in unmap_huge_pmd_locked() - Kalesh Singh ]
      [ __discard_anon_folio_pmd_locked() drop changes related
        to VM_DROPPABLE which doesnt' exist on 6.6 - Kalesh Singh ]
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      64d98b55
    • Barry Song's avatar
      BACKPORT: FROMGIT: mm: support batched unmap for lazyfree large folios during reclamation · 5ec30d7e
      Barry Song authored
      Currently, the PTEs and rmap of a large folio are removed one at a time.
      This is not only slow but also causes the large folio to be unnecessarily
      added to deferred_split, which can lead to races between the
      deferred_split shrinker callback and memory reclamation.  This patch
      releases all PTEs and rmap entries in a batch.  Currently, it only handles
      lazyfree large folios.
      
      The below microbench tries to reclaim 128MB lazyfree large folios
      whose sizes are 64KiB:
      
       #include <stdio.h>
       #include <sys/mman.h>
       #include <string.h>
       #include <time.h>
      
       #define SIZE 128*1024*1024  // 128 MB
      
       unsigned long read_split_deferred()
       {
       	FILE *file = fopen("/sys/kernel/mm/transparent_hugepage"
      			"/hugepages-64kB/stats/split_deferred", "r");
       	if (!file) {
       		perror("Error opening file");
       		return 0;
       	}
      
       	unsigned long value;
       	if (fscanf(file, "%lu", &value) != 1) {
       		perror("Error reading value");
       		fclose(file);
       		return 0;
       	}
      
       	fclose(file);
       	return value;
       }
      
       int main(int argc, char *argv[])
       {
       	while(1) {
       		volatile int *p = mmap(0, SIZE, PROT_READ | PROT_WRITE,
       				MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
      
       		memset((void *)p, 1, SIZE);
      
       		madvise((void *)p, SIZE, MADV_FREE);
      
       		clock_t start_time = clock();
       		unsigned long start_split = read_split_deferred();
       		madvise((void *)p, SIZE, MADV_PAGEOUT);
       		clock_t end_time = clock();
       		unsigned long end_split = read_split_deferred();
      
       		double elapsed_time = (double)(end_time - start_time) / CLOCKS_PER_SEC;
       		printf("Time taken by reclamation: %f seconds, split_deferred: %ld\n",
       			elapsed_time, end_split - start_split);
      
       		munmap((void *)p, SIZE);
       	}
       	return 0;
       }
      
      w/o patch:
      ~ # ./a.out
      Time taken by reclamation: 0.177418 seconds, split_deferred: 2048
      Time taken by reclamation: 0.178348 seconds, split_deferred: 2048
      Time taken by reclamation: 0.174525 seconds, split_deferred: 2048
      Time taken by reclamation: 0.171620 seconds, split_deferred: 2048
      Time taken by reclamation: 0.172241 seconds, split_deferred: 2048
      Time taken by reclamation: 0.174003 seconds, split_deferred: 2048
      Time taken by reclamation: 0.171058 seconds, split_deferred: 2048
      Time taken by reclamation: 0.171993 seconds, split_deferred: 2048
      Time taken by reclamation: 0.169829 seconds, split_deferred: 2048
      Time taken by reclamation: 0.172895 seconds, split_deferred: 2048
      Time taken by reclamation: 0.176063 seconds, split_deferred: 2048
      Time taken by reclamation: 0.172568 seconds, split_deferred: 2048
      Time taken by reclamation: 0.171185 seconds, split_deferred: 2048
      Time taken by reclamation: 0.170632 seconds, split_deferred: 2048
      Time taken by reclamation: 0.170208 seconds, split_deferred: 2048
      Time taken by reclamation: 0.174192 seconds, split_deferred: 2048
      ...
      
      w/ patch:
      ~ # ./a.out
      Time taken by reclamation: 0.074231 seconds, split_deferred: 0
      Time taken by reclamation: 0.071026 seconds, split_deferred: 0
      Time taken by reclamation: 0.072029 seconds, split_deferred: 0
      Time taken by reclamation: 0.071873 seconds, split_deferred: 0
      Time taken by reclamation: 0.073573 seconds, split_deferred: 0
      Time taken by reclamation: 0.071906 seconds, split_deferred: 0
      Time taken by reclamation: 0.073604 seconds, split_deferred: 0
      Time taken by reclamation: 0.075903 seconds, split_deferred: 0
      Time taken by reclamation: 0.073191 seconds, split_deferred: 0
      Time taken by reclamation: 0.071228 seconds, split_deferred: 0
      Time taken by reclamation: 0.071391 seconds, split_deferred: 0
      Time taken by reclamation: 0.071468 seconds, split_deferred: 0
      Time taken by reclamation: 0.071896 seconds, split_deferred: 0
      Time taken by reclamation: 0.072508 seconds, split_deferred: 0
      Time taken by reclamation: 0.071884 seconds, split_deferred: 0
      Time taken by reclamation: 0.072433 seconds, split_deferred: 0
      Time taken by reclamation: 0.071939 seconds, split_deferred: 0
      ...
      
      Link: https://lkml.kernel.org/r/20250214093015.51024-4-21cnbao@gmail.com
      
      
      Change-Id: If4df73981837946621ec25247aa426c06ab7dd28
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chis Li <chrisl@kernel.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Shaoqin Huang <shahuang@redhat.com>
      Cc: Tangquan Zheng <zhengtangquan@oppo.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      (cherry picked from commit a0188db7 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git
      
       mm-unstable)
      Bug: 313807618
      [ Fix conflicts in try_to_unmap_one() - Kalesh Singh ]
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      5ec30d7e
    • Barry Song's avatar
      BACKPORT: FROMGIT: mm: support tlbbatch flush for a range of PTEs · d1e2d797
      Barry Song authored
      This patch lays the groundwork for supporting batch PTE unmapping in
      try_to_unmap_one().  It introduces range handling for TLB batch flushing,
      with the range currently set to the size of PAGE_SIZE.
      
      The function __flush_tlb_range_nosync() is architecture-specific and is
      only used within arch/arm64.  This function requires the mm structure
      instead of the vma structure.  To allow its reuse by
      arch_tlbbatch_add_pending(), which operates with mm but not vma, this
      patch modifies the argument of __flush_tlb_range_nosync() to take mm as
      its parameter.
      
      Link: https://lkml.kernel.org/r/20250214093015.51024-3-21cnbao@gmail.com
      
      
      Change-Id: Icfca8715feda7e6298a7e2ff76be8ca9a646d8b2
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Acked-by: default avatarWill Deacon <will@kernel.org>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Shaoqin Huang <shahuang@redhat.com>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Lance Yang <ioworker0@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Chis Li <chrisl@kernel.org>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
      Cc: Tangquan Zheng <zhengtangquan@oppo.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Bug: 313807618
      (cherry picked from commit e00a2e56 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git
      
       mm-unstable)
      [ Drop changes to riscv which don't exist in 6.6.
        Fix trivial conflicts in arm64 tlbflush.h
          - Kalesh Singh ]
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      d1e2d797
    • Barry Song's avatar
      BACKPORT: FROMGIT: mm: set folio swapbacked iff folios are dirty in try_to_unmap_one · afbafb14
      Barry Song authored
      Patch series "mm: batched unmap lazyfree large folios during reclamation",
      v4.
      
      Commit 735ecdfa ("mm/vmscan: avoid split lazyfree THP during
      shrink_folio_list()") prevents the splitting of MADV_FREE'd THP in
      madvise.c.
      
      However, those folios are still added to the deferred_split list in
      try_to_unmap_one() because we are unmapping PTEs and removing rmap entries
      one by one.
      
      Firstly, this has rendered the following counter somewhat confusing,
      /sys/kernel/mm/transparent_hugepage/hugepages-size/stats/split_deferred
      The split_deferred counter was originally designed to track operations
      such as partial unmap or madvise of large folios.  However, in practice,
      most split_deferred cases arise from memory reclamation of aligned
      lazyfree mTHPs as observed by Tangquan.  This discrepancy has made the
      split_deferred counter highly misleading.
      
      Secondly, this approach is slow because it requires iterating through each
      PTE and removing the rmap one by one for a large folio.  In fact, all PTEs
      of a pte-mapped large folio should be unmapped at once, and the entire
      folio should be removed from the rmap as a whole.
      
      Thirdly, it also increases the risk of a race condition where lazyfree
      folios are incorrectly set back to swapbacked, as a speculative folio_get
      may occur in the shrinker's callback.
      
      deferred_split_scan() might call folio_try_get(folio) since we have added
      the folio to split_deferred list while removing rmap for the 1st subpage,
      and while we are scanning the 2nd to nr_pages PTEs of this folio in
      try_to_unmap_one(), the entire mTHP could be transitioned back to
      swap-backed because the reference count is incremented, which can make
      "ref_count == 1 + map_count" within try_to_unmap_one() false.
      
         /*
          * The only page refs must be one from isolation
          * plus the rmap(s) (dropped by discard:).
          */
         if (ref_count == 1 + map_count &&
             (!folio_test_dirty(folio) ||
              ...
              (vma->vm_flags & VM_DROPPABLE))) {
                 dec_mm_counter(mm, MM_ANONPAGES);
                 goto discard;
         }
      
      This patchset resolves the issue by marking only genuinely dirty folios as
      swap-backed, as suggested by David, and transitioning to batched unmapping
      of entire folios in try_to_unmap_one().  Consequently, the deferred_split
      count drops to zero, and memory reclamation performance improves
      significantly — reclaiming 64KiB lazyfree large folios is now 2.5x
      faster(The specific data is embedded in the changelog of patch 3/4).
      
      By the way, while the patchset is primarily aimed at PTE-mapped large
      folios, Baolin and Lance also found that try_to_unmap_one() handles
      lazyfree redirtied PMD-mapped large folios inefficiently — it splits the
      PMD into PTEs and iterates over them.  This patchset removes the
      unnecessary splitting, enabling us to skip redirtied PMD-mapped large
      folios 3.5X faster during memory reclamation.  (The specific data can be
      found in the changelog of patch 4/4).
      
      This patch (of 4):
      
      The refcount may be temporarily or long-term increased, but this does not
      change the fundamental nature of the folio already being lazy- freed.
      Therefore, we only reset 'swapbacked' when we are certain the folio is
      dirty and not droppable.
      
      Link: https://lkml.kernel.org/r/20250214093015.51024-1-21cnbao@gmail.com
      Link: https://lkml.kernel.org/r/20250214093015.51024-2-21cnbao@gmail.com
      
      
      Fixes: 6c8e2a25 ("mm: fix race between MADV_FREE reclaim and blkdev direct IO read")
      Change-Id: Ifb1d1851924ad6264caab7e0178b0f910f4b62a1
      Signed-off-by: default avatarBarry Song <v-songbaohua@oppo.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarLance Yang <ioworker0@gmail.com>
      Cc: Mauricio Faria de Oliveira <mfo@canonical.com>
      Cc: Chis Li <chrisl@kernel.org> (Google)
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Kairui Song <kasong@tencent.com>
      Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Tangquan Zheng <zhengtangquan@oppo.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Gavin Shan <gshan@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Shaoqin Huang <shahuang@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yicong Yang <yangyicong@hisilicon.com>
      Cc: Yosry Ahmed <yosryahmed@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Bug: 313807618
      (cherry picked from commit 2e595b90 https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git
      
       mm-unstable)
      [ Fix conflicts in try_to_unmap_one() and drop changes for VM_DROPPABLE
        - Kalesh Singh ]
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      afbafb14
    • Andrew Morton's avatar
      UPSTREAM: mm/huge_memory.c: fix used-uninitialized · 10447947
      Andrew Morton authored
      
      Fix used-uninitialized of `page'.
      
      Fixes: dce7d10b ("mm/madvise: optimize lazyfreeing with mTHP in madvise_free")
      Reported-by: default avatarkernel test robot <lkp@intel.com>
      Closes: https://lore.kernel.org/oe-kbuild-all/202406260514.SLhNM9kQ-lkp@intel.com
      
      
      Cc: Lance Yang <ioworker0@gmail.com>
      Change-Id: I35d79caf4fc6b2cabdcc435b6fa259681d3ee10f
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      (cherry picked from commit d40f74ab)
      Bug: 313807618
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      10447947
    • Lance Yang's avatar
      UPSTREAM: mm/vmscan: avoid split lazyfree THP during shrink_folio_list() · 704a7624
      Lance Yang authored
      When the user no longer requires the pages, they would use
      madvise(MADV_FREE) to mark the pages as lazy free.  Subsequently, they
      typically would not re-write to that memory again.
      
      During memory reclaim, if we detect that the large folio and its PMD are
      both still marked as clean and there are no unexpected references (such as
      GUP), so we can just discard the memory lazily, improving the efficiency
      of memory reclamation in this case.
      
      On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
      mem_cgroup_force_empty() results in the following runtimes in seconds
      (shorter is better):
      
      --------------------------------------------
      |     Old       |      New       |  Change  |
      --------------------------------------------
      |   0.683426    |    0.049197    |  -92.80% |
      --------------------------------------------
      
      [ioworker0@gmail.com: minor changes per David]
        Link: https://lkml.kernel.org/r/20240622100057.3352-1-ioworker0@gmail.com
      Link: https://lkml.kernel.org/r/20240614015138.31461-4-ioworker0@gmail.com
      
      
      Change-Id: I716b3f00627134eb58fbaa44a8cc81fd11f52f8c
      Signed-off-by: default avatarLance Yang <ioworker0@gmail.com>
      Suggested-by: default avatarZi Yan <ziy@nvidia.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Bang Li <libang.li@antgroup.com>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Fangrui Song <maskray@google.com>
      Cc: Jeff Xie <xiehuan09@gmail.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      (cherry picked from commit 735ecdfa)
      Bug: 313807618
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      704a7624
    • Lance Yang's avatar
      BACKPORT: mm/rmap: integrate PMD-mapped folio splitting into pagewalk loop · 11d1e09d
      Lance Yang authored
      In preparation for supporting try_to_unmap_one() to unmap PMD-mapped
      folios, start the pagewalk first, then call split_huge_pmd_address() to
      split the folio.
      
      Link: https://lkml.kernel.org/r/20240614015138.31461-3-ioworker0@gmail.com
      
      
      Change-Id: I43f84f3e1d528bbacb239ad61e75e7c76487bc0d
      Signed-off-by: default avatarLance Yang <ioworker0@gmail.com>
      Suggested-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Suggested-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Acked-by: default avatarZi Yan <ziy@nvidia.com>
      Cc: Bang Li <libang.li@antgroup.com>
      Cc: Barry Song <baohua@kernel.org>
      Cc: Fangrui Song <maskray@google.com>
      Cc: Jeff Xie <xiehuan09@gmail.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      (cherry picked from commit 29e847d2)
      Bug: 313807618
      [ Fix trivial conflict in __split_huge_pmd();
        due to pmd_folio() not present in 6.6. instead use
        the equivalent page_folio(pmd_page(pmd))
          - Kalesh Singh ]
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      11d1e09d
    • Lance Yang's avatar
      UPSTREAM: mm/rmap: remove duplicated exit code in pagewalk loop · 4da55323
      Lance Yang authored
      Patch series "Reclaim lazyfree THP without splitting", v8.
      
      This series adds support for reclaiming PMD-mapped THP marked as lazyfree
      without needing to first split the large folio via
      split_huge_pmd_address().
      
      When the user no longer requires the pages, they would use
      madvise(MADV_FREE) to mark the pages as lazy free.  Subsequently, they
      typically would not re-write to that memory again.
      
      During memory reclaim, if we detect that the large folio and its PMD are
      both still marked as clean and there are no unexpected references(such as
      GUP), so we can just discard the memory lazily, improving the efficiency
      of memory reclamation in this case.
      
      Performance Testing
      ===================
      
      On an Intel i5 CPU, reclaiming 1GiB of lazyfree THPs using
      mem_cgroup_force_empty() results in the following runtimes in seconds
      (shorter is better):
      
      --------------------------------------------
      |     Old       |      New       |  Change  |
      --------------------------------------------
      |   0.683426    |    0.049197    |  -92.80% |
      --------------------------------------------
      
      This patch (of 8):
      
      Introduce the labels walk_done and walk_abort as exit points to eliminate
      duplicated exit code in the pagewalk loop.
      
      Link: https://lkml.kernel.org/r/20240614015138.31461-1-ioworker0@gmail.com
      Link: https://lkml.kernel.org/r/20240614015138.31461-2-ioworker0@gmail.com
      
      
      Change-Id: I82a14587672d6cbfb977795ffb6c5b06841adcd9
      Signed-off-by: default avatarLance Yang <ioworker0@gmail.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Reviewed-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarBarry Song <baohua@kernel.org>
      Cc: Bang Li <libang.li@antgroup.com>
      Cc: Fangrui Song <maskray@google.com>
      Cc: Jeff Xie <xiehuan09@gmail.com>
      Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: SeongJae Park <sj@kernel.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yin Fengwei <fengwei.yin@intel.com>
      Cc: Zach O'Keefe <zokeefe@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      (cherry picked from commit 26d21b18)
      Bug: 313807618
      Signed-off-by: default avatarKalesh Singh <kaleshsingh@google.com>
      4da55323
  4. Feb 27, 2025
  5. Feb 26, 2025
    • Ram Prakash Gupta's avatar
      ANDROID: firmware_loader: Fix buffer redzone overwritten issue · fe163fe3
      Ram Prakash Gupta authored
      
      Buffer allocated in sequential files are used by firmware loader,
      when no vendor firmware path is used, firmware uploader is modifying
      the memory in the redzone.
      
      =============================================================================
      BUG kmalloc-4k (Tainted: G        W  OE     ): Left Redzone overwritten
      -----------------------------------------------------------------------------
      0xffffff8854ae0fff-0xffffff8854ae0fff @offset=4095. First byte 0x0
      instead of 0xcc
      Allocated in kvmalloc_node+0x194/0x2b4 age=10 cpu=2 pid=4758
      __kmem_cache_alloc_node+0x2a8/0x388
      __kmalloc_node+0x60/0x1e0
      kvmalloc_node+0x194/0x2b4
      seq_read_iter+0x8c/0x4f0
      kernfs_fop_read_iter+0x70/0x1ec
      vfs_read+0x238/0x2d8
      ksys_read+0x78/0xe8
      __arm64_sys_read+0x1c/0x2c
      invoke_syscall+0x58/0x114
      el0_svc_common+0xac/0xe0
      do_el0_svc+0x1c/0x28
      el0_svc+0x3c/0x74
      el0t_64_sync_handler+0x68/0xbc
      el0t_64_sync+0x1a8/0x1ac
      
      Freed in kfree_link+0x10/0x20 age=46 cpu=1 pid=4786
      __kmem_cache_free+0x268/0x358
      kfree+0xa0/0x168
      kfree_link+0x10/0x20
      walk_component+0x90/0x128
      link_path_walk+0x27c/0x3cc
      path_openat+0x94/0xc7c
      do_filp_open+0xb8/0x164
      do_sys_openat2+0x84/0xf0
      __arm64_sys_openat+0x70/0x9c
      invoke_syscall+0x58/0x114
      el0_svc_common+0xac/0xe0
      do_el0_svc+0x1c/0x28
      el0_svc+0x3c/0x74
      el0t_64_sync_handler+0x68/0xbc
      el0t_64_sync+0x1a8/0x1ac
      
      Redzone  ffffff8854ae0ff0: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc 00
                                                             redzone modified ^^
      Object   ffffff8854ae1000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      Add check to avoid memory update in redzone when no vendor firmware path
      is used.
      
      Fixes: 2a46f357 ("ANDROID: firmware_loader: Add support for customer firmware paths")
      Bug: 395517985
      Change-Id: If58a44c0c8a26f3fe58b0e37b0fcc1f0e88e28cb
      Signed-off-by: default avatarRam Prakash Gupta <quic_rampraka@quicinc.com>
      Signed-off-by: default avatarSouradeep Chowdhury <quic_schowdhu@quicinc.com>
      fe163fe3
  6. Feb 25, 2025
Loading