- Dec 31, 2024
-
-
Baolin Wang authored
The '/proc/PID/smaps' does not have the 'FileHugeMapped' field to count the file transparent huge pages, instead, the 'FilePmdMapped' field should be used. Fix it. Link: https://lkml.kernel.org/r/d520ce3aba2b03b088be30bece732426a939049a.1734425264.git.baolin.wang@linux.alibaba.com Signed-off-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Dec 29, 2024
-
-
Vishnu Sankar authored
F8 mode key on Lenovo 2025 platforms use a different key code. Adding support for the new keycode 0x1401. Tested on X1 Carbon Gen 13 and X1 2-in-1 Gen 10. Signed-off-by:
Vishnu Sankar <vishnuocv@gmail.com> Reviewed-by:
Mark Pearson <mpearson-lenovo@squebb.ca> Link: https://lore.kernel.org/r/20241227231840.21334-1-vishnuocv@gmail.com Reviewed-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by:
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
-
- Dec 10, 2024
-
-
Mario Limonciello authored
commit ad4caad5 ("cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()") changed the semantics for highest perf and commit 18d9b522 ("cpufreq/amd-pstate: Use nominal perf for limits when boost is disabled") worked around those semantic changes. This however is a confusing result and furthermore makes it awkward to change frequency limits and boost due to the scaling differences. Restore the boost numerator to highest perf again. Suggested-by:
Dhananjay Ugwekar <Dhananjay.Ugwekar@amd.com> Reviewed-by:
Gautham R. Shenoy <gautham.shenoy@amd.com> Fixes: ad4caad5 ("cpufreq: amd-pstate: Merge amd_pstate_highest_perf_set() into amd_get_boost_ratio_numerator()") Link: https://lore.kernel.org/r/20241209185248.16301-2-mario.limonciello@amd.com Signed-off-by:
Mario Limonciello <mario.limonciello@amd.com>
-
- Dec 02, 2024
-
-
Sebastian Andrzej Siewior authored
Update the documentation for the `preempt=' parameter which now also accepts `lazy'. Fixes: 7c70cb94 ("sched: Add Lazy preemption model") Reported-by:
Shrikanth Hegde <sshegde@linux.ibm.com> Signed-off-by:
Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://lore.kernel.org/r/20241122173557.MYOtT95Q@linutronix.de
-
- Nov 27, 2024
-
-
Siddharth Menon authored
After the deprecation of CONFIG_DEFAULT_SECURITY, it is no longer used to enable and configure AppArmor. Since kernel 5.0, `CONFIG_SECURITY_APPARMOR_BOOTPARAM_VALUE` is not used either. Instead, the CONFIG_LSM parameter manages the order and selection of LSMs. Signed-off-by:
Siddharth Menon <simeddon@gmail.com> Signed-off-by:
John Johansen <john.johansen@canonical.com>
-
- Nov 22, 2024
-
-
Sebastian Fricke authored
Provide a guide for developers on how to debug code with a focus on the media subsystem. This document aims to provide a rough overview over the possibilities and a rational to help choosing the right tool for the given circumstances. Signed-off-by:
Sebastian Fricke <sebastian.fricke@collabora.com> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241028-media_docs_improve_v3-v3-2-edf5c5b3746f@collabora.com
-
- Nov 18, 2024
-
-
Mauro Carvalho Chehab authored
Due to recent changes on the way we're maintaining media, the location of the main tree was updated. Change docs accordingly. Cc: stable@vger.kernel.org Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by:
Hans Verkuil <hverkuil@xs4all.nl>
-
- Nov 15, 2024
-
-
Joshua Hahn authored
This patch introduces a new counter to memory.stat that tracks hugeTLB usage, only if hugeTLB accounting is done to memory.current. This feature is enabled the same way hugeTLB accounting is enabled, via the memory_hugetlb_accounting mount flag for cgroupsv2. 1. Why is this patch necessary? Currently, memcg hugeTLB accounting is an opt-in feature [1] that adds hugeTLB usage to memory.current. However, the metric is not reported in memory.stat. Given that users often interpret memory.stat as a breakdown of the value reported in memory.current, the disparity between the two reports can be confusing. This patch solves this problem by including the metric in memory.stat as well, but only if it is also reported in memory.current (it would also be confusing if the value was reported in memory.stat, but not in memory.current) Aside from the consistency between the two files, we also see benefits in observability. Userspace might be interested in the hugeTLB footprint of cgroups for many reasons. For instance, system admins might want to verify that hugeTLB usage is distributed as expected across tasks: i.e. memory-intensive tasks are using more hugeTLB pages than tasks that don't consume a lot of memory, or are seen to fault frequently. Note that this is separate from wanting to inspect the distribution for limiting purposes (in which case, hugeTLB controller makes more sense). 2. We already have a hugeTLB controller. Why not use that? It is true that hugeTLB tracks the exact value that we want. In fact, by enabling the hugeTLB controller, we get all of the observability benefits that I mentioned above, and users can check the total hugeTLB usage, verify if it is distributed as expected, etc. With this said, there are 2 problems: (a) They are still not reported in memory.stat, which means the disparity between the memcg reports are still there. (b) We cannot reasonably expect users to enable the hugeTLB controller just for the sake of hugeTLB usage reporting, especially since they don't have any use for hugeTLB usage enforcing [2]. 3. Implementation Details: In the alloc / free hugetlb functions, we call lruvec_stat_mod_folio regardless of whether memcg accounts hugetlb. mem_cgroup_commit_charge which is called from alloc_hugetlb_folio will set memcg for the folio only if the CGRP_ROOT_MEMORY_HUGETLB_ACCOUNTING cgroup mount option is used, so lruvec_stat_mod_folio accounts per-memcg hugetlb counters only if the feature is enabled. Regardless of whether memcg accounts for hugetlb, the newly added global counter is updated and shown in /proc/vmstat. The global counter is added because vmstats is the preferred framework for cgroup stats. It makes stat items consistent between global and cgroups. It also provides a per-node breakdown, which is useful. Because it does not use cgroup-specific hooks, we also keep generic MM code separate from memcg code. [1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@gmail.com/ [2] Of course, we can't make a new patch for every feature that can be duplicated. However, since the existing solution of enabling the hugeTLB controller is an imperfect solution that still leaves a discrepancy between memory.stat and memory.curent, I think that it is reasonable to isolate the feature in this case. Link: https://lkml.kernel.org/r/20241101204402.1885383-1-joshua.hahnjy@gmail.com Signed-off-by:
Joshua Hahn <joshua.hahnjy@gmail.com> Suggested-by:
Nhat Pham <nphamcs@gmail.com> Suggested-by:
Shakeel Butt <shakeel.butt@linux.dev> Suggested-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Shakeel Butt <shakeel.butt@linux.dev> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Chris Down <chris@chrisdown.name> Acked-by:
Michal Hocko <mhocko@suse.com> Reviewed-by:
Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by:
Nhat Pham <nphamcs@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Koutný <mkoutny@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Zefan Li <lizefan.x@bytedance.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 13, 2024
-
-
Jarkko Sakkinen authored
The initial HMAC session feature added TPM bus encryption and/or integrity protection to various in-kernel TPM operations. This can cause performance bottlenecks with IMA, as it heavily utilizes PCR extend operations. In order to mitigate this performance issue, introduce a kernel command-line parameter to the TPM driver for disabling the integrity protection for PCR extend operations (i.e. TPM2_PCR_Extend). Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Link: https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/ Fixes: 6519fea6 ("tpm: add hmac checks to tpm2_pcr_extend()") Tested-by:
Mimi Zohar <zohar@linux.ibm.com> Co-developed-by:
Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by:
Roberto Sassu <roberto.sassu@huawei.com> Co-developed-by:
Mimi Zohar <zohar@linux.ibm.com> Signed-off-by:
Mimi Zohar <zohar@linux.ibm.com> Signed-off-by:
Jarkko Sakkinen <jarkko@kernel.org>
-
- Nov 12, 2024
-
-
Paul E. McKenney authored
There is only ever the one read-exit task, and there is no module parameter named rcutorture.read_exit, so remove the bogus documentation. Instead, use rcutorture.read_exit_burst to enable/disable read-exit race testing. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by:
Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
This commit adds the rcuog kthreads to the list of callback-offloading kthreads that can be affinitied away from worker CPUs. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Reviewed-by:
Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Thorsten Leemhuis authored
Explicitly mention how to bisect -next, as nothing in the kernel tree currently explains that bisects between -next versions won't work well and it's better to bisect between mainline and -next. Co-developed-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Mark Brown <broonie@kernel.org> Reviewed-by:
Mark Brown <broonie@kernel.org> Signed-off-by:
Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/ec19d5fc503ff7db3d4c4ff9e97fff24cc78f72a.1730808651.git.linux@leemhuis.info
-
Paul E. McKenney authored
This commit causes bit 0x4 of rcutorture.reader_flavor to select the new srcu_read_lock_lite() and srcu_read_unlock_lite() functions. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by:
Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Paul E. McKenney authored
This commit adds an rcutorture.reader_flavor parameter whose bits correspond to reader flavors. For example, SRCU's readers are 0x1 for normal and 0x2 for NMI-safe. Signed-off-by:
Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by:
Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by:
Frederic Weisbecker <frederic@kernel.org>
-
Breno Leitao authored
Introduce a fault injection mechanism to force skb reallocation. The primary goal is to catch bugs related to pointer invalidation after potential skb reallocation. The fault injection mechanism aims to identify scenarios where callers retain pointers to various headers in the skb but fail to reload these pointers after calling a function that may reallocate the data. This type of bug can lead to memory corruption or crashes if the old, now-invalid pointers are used. By forcing reallocation through fault injection, we can stress-test code paths and ensure proper pointer management after potential skb reallocations. Add a hook for fault injection in the following functions: * pskb_trim_rcsum() * pskb_may_pull_reason() * pskb_trim() As the other fault injection mechanism, protect it under a debug Kconfig called CONFIG_FAIL_SKB_REALLOC. This patch was *heavily* inspired by Jakub's proposal from: https://lore.kernel.org/all/20240719174140.47a868e6@kernel.org/ CC: Akinobu Mita <akinobu.mita@gmail.com> Suggested-by:
Jakub Kicinski <kuba@kernel.org> Signed-off-by:
Breno Leitao <leitao@debian.org> Reviewed-by:
Akinobu Mita <akinobu.mita@gmail.com> Acked-by:
Paolo Abeni <pabeni@redhat.com> Acked-by:
Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20241107-fault_v6-v6-1-1b82cb6ecacd@debian.org Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Lance Yang authored
This commit introduces documentation for hung_task_detect_count in kernel.rst. Link: https://lkml.kernel.org/r/20241027120747.42833-3-ioworker0@gmail.com Signed-off-by:
Mingzhe Yang <mingzhe.yang@ly.com> Signed-off-by:
Lance Yang <ioworker0@gmail.com> Cc: Bang Li <libang.li@antgroup.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: David Hildenbrand <david@redhat.com> Cc: Huang Cun <cunhuang@tencent.com> Cc: Joel Granados <j.granados@samsung.com> Cc: Joel Granados <joel.granados@kernel.org> Cc: John Siddle <jsiddle@redhat.com> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Yongliang Gao <leonylgao@tencent.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 11, 2024
-
-
Maíra Canal authored
Add the ``thp_shmem=`` kernel command line to allow specifying the default policy of each supported shmem hugepage size. The kernel parameter accepts the following format: thp_shmem=<size>[KMG],<size>[KMG]:<policy>;<size>[KMG]-<size>[KMG]:<policy> For example, thp_shmem=16K-64K:always;128K,512K:inherit;256K:advise;1M-2M:never;4M-8M:within_size Some GPUs may benefit from using huge pages. Since DRM GEM uses shmem to allocate anonymous pageable memory, it's essential to control the huge page allocation policy for the internal shmem mount. This control can be achieved through the ``transparent_hugepage_shmem=`` parameter. Beyond just setting the allocation policy, it's crucial to have granular control over the size of huge pages that can be allocated. The GPU may support only specific huge page sizes, and allocating pages larger/smaller than those sizes would be ineffective. Link: https://lkml.kernel.org/r/20241101165719.1074234-6-mcanal@igalia.com Signed-off-by:
Maíra Canal <mcanal@igalia.com> Reviewed-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Maíra Canal authored
Patch series "mm: add more kernel parameters to control mTHP", v5. This series introduces four patches related to the kernel parameters controlling mTHP and a fifth patch replacing `strcpy()` for `strscpy()` in the file `mm/huge_memory.c`. The first patch is a straightforward documentation update, correcting the format of the kernel parameter ``thp_anon=``. The second, third, and fourth patches focus on controlling THP support for shmem via the kernel command line. The second patch introduces a parameter to control the global default huge page allocation policy for the internal shmem mount. The third patch moves a piece of code to a shared header to ease the implementation of the fourth patch. Finally, the fourth patch implements a parameter similar to ``thp_anon=``, but for shmem. The goal of these changes is to simplify the configuration of systems that rely on mTHP support for shmem. For instance, a platform with a GPU that benefits from huge pages may want to enable huge pages for shmem. Having these kernel parameters streamlines the configuration process and ensures consistency across setups. This patch (of 4): Add a new kernel command line to control the hugepage allocation policy for the internal shmem mount, ``transparent_hugepage_shmem``. The parameter is similar to ``transparent_hugepage`` and has the following format: transparent_hugepage_shmem=<policy> where ``<policy>`` is one of the seven valid policies available for shmem. Configuring the default huge page allocation policy for the internal shmem mount can be beneficial for DRM GPU drivers. Just as CPU architectures, GPUs can also take advantage of huge pages, but this is possible only if DRM GEM objects are backed by huge pages. Since GEM uses shmem to allocate anonymous pageable memory, having control over the default huge page allocation policy allows for the exploration of huge pages use on GPUs that rely on GEM objects backed by shmem. Link: https://lkml.kernel.org/r/20241101165719.1074234-2-mcanal@igalia.com Link: https://lkml.kernel.org/r/20241101165719.1074234-4-mcanal@igalia.com Signed-off-by:
Maíra Canal <mcanal@igalia.com> Reviewed-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by:
David Hildenbrand <david@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: dri-devel@lists.freedesktop.org Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: kernel-dev@igalia.com Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
This helps profile the sizes of folios being swapped in. Currently, only mTHP swap-out is being counted. The new interface can be found at: /sys/kernel/mm/transparent_hugepage/hugepages-<size>/stats swpin For example, cat /sys/kernel/mm/transparent_hugepage/hugepages-64kB/stats/swpin 12809 cat /sys/kernel/mm/transparent_hugepage/hugepages-32kB/stats/swpin 4763 [v-songbaohua@oppo.com: add a blank line in doc] Link: https://lkml.kernel.org/r/20241030233423.80759-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20241026082423.26298-1-21cnbao@gmail.com Signed-off-by:
Barry Song <v-songbaohua@oppo.com> Reviewed-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by:
David Hildenbrand <david@redhat.com> Cc: Chris Li <chrisl@kernel.org> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kairui Song <kasong@tencent.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Kanchana P Sridhar authored
Added a new MTHP_STAT_ZSWPOUT entry to the sysfs transparent_hugepage stats so that successful large folio zswap stores can be accounted under the per-order sysfs "zswpout" stats: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/zswpout Other non-zswap swap device swap-out events will be counted under the existing sysfs "swpout" stats: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/swpout Also, added documentation for the newly added sysfs per-order hugepage "zswpout" stats. The documentation clarifies that only non-zswap swapouts will be accounted in the existing "swpout" stats. Link: https://lkml.kernel.org/r/20241001053222.6944-8-kanchana.p.sridhar@intel.com Signed-off-by:
Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Reviewed-by:
Nhat Pham <nphamcs@gmail.com> Cc: Chengming Zhou <chengming.zhou@linux.dev> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wajdi Feghali <wajdi.k.feghali@intel.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: "Zou, Nanhai" <nanhai.zou@intel.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
When the proportion of folios from the zeromap is small, missing their accounting may not significantly impact profiling. However, it's easy to construct a scenario where this becomes an issue—for example, allocating 1 GB of memory, writing zeros from userspace, followed by MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out and swap-in counts seem to vanish into a black hole, potentially causing semantic ambiguity. On the other hand, Usama reported that zero-filled pages can exceed 10% in workloads utilizing zswap, while Hailong noted that some app in Android have more than 6% zero-filled pages. Before commit 0ca0c24e ("mm: store zero pages to be swapped out in a bitmap"), both zswap and zRAM implemented similar optimizations, leading to these optimized-out pages being counted in either zswap or zRAM counters (with pswpin/pswpout also increasing for zRAM). With zeromap functioning prior to both zswap and zRAM, userspace will no longer detect these swap-out and swap-in actions. We have three ways to address this: 1. Introduce a dedicated counter specifically for the zeromap. 2. Use pswpin/pswpout accounting, treating the zero map as a standard backend. This approach aligns with zRAM's current handling of same-page fills at the device level. However, it would mean losing the optimized-out page counters previously available in zRAM and would not align with systems using zswap. Additionally, as noted by Nhat Pham, pswpin/pswpout counters apply only to I/O done directly to the backend device. 3. Count zeromap pages under zswap, aligning with system behavior when zswap is enabled. However, this would not be consistent with zRAM, nor would it align with systems lacking both zswap and zRAM. Given the complications with options 2 and 3, this patch selects option 1. We can find these counters from /proc/vmstat (counters for the whole system) and memcg's memory.stat (counters for the interested memcg). For example: $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat swpin_zero 1648 swpout_zero 33536 $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat swpin_zero 3905 swpout_zero 3985 This patch does not address any specific zeromap bug, but the missing swpout and swpin counts for zero-filled pages can be highly confusing and may mislead user-space agents that rely on changes in these counters as indicators. Therefore, we add a Fixes tag to encourage the inclusion of this counter in any kernel versions with zeromap. Many thanks to Kanchana for the contribution of changing count_objcg_event() to count_objcg_events() to support large folios[1], which has now been incorporated into this patch. [1] https://lkml.kernel.org/r/20241001053222.6944-5-kanchana.p.sridhar@intel.com Link: https://lkml.kernel.org/r/20241107011246.59137-1-21cnbao@gmail.com Fixes: 0ca0c24e ("mm: store zero pages to be swapped out in a bitmap") Co-developed-by:
Kanchana P Sridhar <kanchana.p.sridhar@intel.com> Signed-off-by:
Barry Song <v-songbaohua@oppo.com> Reviewed-by:
Nhat Pham <nphamcs@gmail.com> Reviewed-by:
Chengming Zhou <chengming.zhou@linux.dev> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Cc: Hailong Liu <hailong.liu@oppo.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Andi Kleen <ak@linux.intel.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Chris Li <chrisl@kernel.org> Cc: "Huang, Ying" <ying.huang@intel.com> Cc: Kairui Song <kasong@tencent.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 07, 2024
-
-
Maíra Canal authored
If we add ``thp_anon=32,64K:always`` to the kernel command line, we will see the following error: [ 0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```, as [KMG] must follow each number to especify its unit. So, the correct format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```. Therefore, adjust the documentation to reflect the correct format of the parameter ``thp_anon=``. Link: https://lkml.kernel.org/r/20241101165719.1074234-3-mcanal@igalia.com Fixes: dd4d30d1 ("mm: override mTHP "enabled" defaults at kernel cmdline") Signed-off-by:
Maíra Canal <mcanal@igalia.com> Acked-by:
Barry Song <baohua@kernel.org> Acked-by:
David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Shakeel Butt authored
Patch series "memcg-v1: fully deprecate charge moving". The memcg v1's charge moving feature has been deprecated for almost 2 years and the kernel warns if someone try to use it. This warning has been backported to all stable kernel and there have not been any report of the warning or the request to support this feature anymore. Let's proceed to fully deprecate this feature. This patch (of 6): Proceed with the complete deprecation of memcg v1's charge moving feature. The deprecation warning has been in the kernel for almost two years and has been ported to all stable kernel since. Now is the time to fully deprecate this feature. Link: https://lkml.kernel.org/r/20241025012304.2473312-1-shakeel.butt@linux.dev Link: https://lkml.kernel.org/r/20241025012304.2473312-2-shakeel.butt@linux.dev Signed-off-by:
Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by:
Roman Gushchin <roman.gushchin@linux.dev> Acked-by:
Michal Hocko <mhocko@suse.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 06, 2024
-
-
Sergey Senozhatsky authored
Both recompress and writeback soon will unlock slots during processing, which makes things too complex wrt possible race-conditions. We still want to clear PP_SLOT in slot_free, because this is how we figure out that slot that was selected for post-processing has been released under us and when we start post-processing we check if slot still has PP_SLOT set. At the same time, theoretically, we can have something like this: CPU0 CPU1 recompress scan slots set PP_SLOT unlock slot slot_free clear PP_SLOT allocate PP_SLOT writeback scan slots set PP_SLOT unlock slot select PP-slot test PP_SLOT So recompress will not detect that slot has been re-used and re-selected for concurrent writeback post-processing. Make sure that we only permit on post-processing operation at a time. So now recompress and writeback post-processing don't race against each other, we only need to handle slot re-use (slot_free and write), which is handled individually by each pp operation. Having recompress and writeback competing for the same slots is not exactly good anyway (can't imagine anyone doing that). Link: https://lkml.kernel.org/r/20240917021020.883356-3-senozhatsky@chromium.org Signed-off-by:
Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 04, 2024
-
-
Guilherme G. Piccoli authored
The crash_kexec_post_notifiers description could be improved a bit, by clarifying its upsides (yes, there are some!) and be more descriptive about the downsides, specially mentioning code that enables the option unconditionally, like Hyper-V[0], PowerPC (fadump)[1] and more recently, AMD SEV-SNP[2]. [0] Commit a1158956 ("x86/Hyper-V: Report crash register data or kmsg before running crash kernel"). [1] Commit 06e629c2 ("powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic"). [2] Commit 8ef97958 ("crypto: ccp: Add panic notifier for SEV/SNP firmware shutdown on kdump"). Reviewed-by:
Stephen Brennan <stephen.s.brennan@oracle.com> Signed-off-by:
Guilherme G. Piccoli <gpiccoli@igalia.com> Reviewed-by:
Michael Kelley <mhklinux@outlook.com> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241027204159.985163-1-gpiccoli@igalia.com
-
Randy Dunlap authored
Reorganize the introduction to the kernel-parameters file to place related paragraphs together: - move module info together and near the beginning - add a Special Handling section for dashes, underscores, double quotes, cpu lists, and metric (KMG) suffixes. Expand the KMG suffixes to include TPE as well. - add a Kernel Build Options section Signed-off-by:
Randy Dunlap <rdunlap@infradead.org> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241029180320.412629-1-rdunlap@infradead.org
-
- Oct 29, 2024
-
-
Christoph Lameter authored
The old SLAB allocator used to support memory policies on a per allocation bases. In SLUB the memory policies are applied on a per page frame / folio bases. Doing so avoids having to check memory policies in critical code paths for kmalloc and friends. This worked on general well on Intel/AMD/PowerPC because the interconnect technology is mature and can minimize the latencies through intelligent caching even if a small object is not placed optimally. However, on ARM we have an emergence of new NUMA interconnect technology based more on embedded devices. Caching of remote content can currently be ineffective using the standard building blocks / mesh available on that platform. Such architectures benefit if each slab object is individually placed according to memory policies and other restrictions. This patch adds another kernel parameter slab_strict_numa If that is set then a static branch is activated that will cause the hotpaths of the allocator to evaluate the current memory allocation policy. Each object will be properly placed by paying the price of extra processing and SLUB will no longer defer to the page allocator to apply memory policies at the folio level. This patch improves performance of memcached running on Ampere Altra 2P system (ARM Neoverse N1 processor) by 3.6% due to accurate placement of small kernel objects. Tested-by:
Huang Shijie <shijie@os.amperecomputing.com> Signed-off-by:
Christoph Lameter (Ampere) <cl@gentwo.org> Signed-off-by:
Vlastimil Babka <vbabka@suse.cz>
-
- Oct 28, 2024
-
-
Gowthami Thiagarajan authored
PCI Express Interface PMU includes various performance counters to monitor the data that is transmitted over the PCIe link. The counters track various inbound and outbound transactions which includes separate counters for posted/non-posted/completion TLPs. Also, inbound and outbound memory read requests along with their latencies can also be monitored. Address Translation Services(ATS)events such as ATS Translation, ATS Page Request, ATS Invalidation along with their corresponding latencies are also supported. The performance counters are 64 bits wide. For instance, perf stat -e ib_tlp_pr <workload> tracks the inbound posted TLPs for the workload. Co-developed-by:
Linu Cherian <lcherian@marvell.com> Signed-off-by:
Linu Cherian <lcherian@marvell.com> Signed-off-by:
Gowthami Thiagarajan <gthiagarajan@marvell.com> Link: https://lore.kernel.org/r/20241028055309.17893-1-gthiagarajan@marvell.com Signed-off-by:
Will Deacon <will@kernel.org>
-
Pankaj Raghav authored
Most of the callers of wbc_account_cgroup_owner() are converting a folio to page before calling the function. wbc_account_cgroup_owner() is converting the page back to a folio to call mem_cgroup_css_from_folio(). Convert wbc_account_cgroup_owner() to take a folio instead of a page, and convert all callers to pass a folio directly except f2fs. Convert the page to folio for all the callers from f2fs as they were the only callers calling wbc_account_cgroup_owner() with a page. As f2fs is already in the process of converting to folios, these call sites might also soon be calling wbc_account_cgroup_owner() with a folio directly in the future. No functional changes. Only compile tested. Signed-off-by:
Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20240926140121.203821-1-kernel@pankajraghav.com Acked-by:
David Sterba <dsterba@suse.com> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Oct 25, 2024
-
-
Joanne Koong authored
Introduce the capability to dynamically configure the max pages limit (FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators to dynamically set the maximum number of pages that can be used for servicing requests in fuse. Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set to 256 pages. One result of this is that the buffer size for a write request is limited to 1 MiB on a 4k-page system. The default value for this sysctl is the original limit (256 pages). $ sysctl -a | grep max_pages_limit fs.fuse.max_pages_limit = 256 $ sysctl -n fs.fuse.max_pages_limit 256 $ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit 1024 $ sysctl -n fs.fuse.max_pages_limit 1024 $ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument $ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit 65535 $ sysctl -n fs.fuse.max_pages_limit 65535 Signed-off-by:
Joanne Koong <joannelkoong@gmail.com> Reviewed-by:
Josef Bacik <josef@toxicpanda.com> Reviewed-by:
Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by:
Miklos Szeredi <mszeredi@redhat.com>
-
- Oct 22, 2024
-
-
Yafang Shao authored
Commit 681ce862 ("vfs: Delete the associated dentry when deleting a file") introduced an unconditional deletion of the associated dentry when a file is removed. However, this led to performance regressions in specific benchmarks, such as ilebench.sum_operations/s [0], prompting a revert in commit 4a4be1ad ("Revert "vfs: Delete the associated dentry when deleting a file""). This patch seeks to reintroduce the concept conditionally, where the associated dentry is deleted only when the user explicitly opts for it during file removal. A new sysctl fs.automated_deletion_of_dentry is added for this purpose. Its default value is set to 0. There are practical use cases for this proactive dentry reclamation. Besides the Elasticsearch use case mentioned in commit 681ce862, additional examples have surfaced in our production environment. For instance, in video rendering services that continuously generate temporary files, upload them to persistent storage servers, and then delete them, a large number of negative dentries—serving no useful purpose—accumulate. Users in such cases would benefit from proactively reclaiming these negative dentries. Link: https://lore.kernel.org/linux-fsdevel/202405291318.4dfbb352-oliver.sang@intel.com [0] Link: https://lore.kernel.org/all/20240912-programm-umgibt-a1145fa73bb6@brauner/ Suggested-by:
Christian Brauner <brauner@kernel.org> Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/r/20240929122831.92515-1-laoar.shao@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by:
Christian Brauner <brauner@kernel.org>
-
- Oct 21, 2024
-
-
Christian Loehle authored
There were two changes related to transition latency recently. Namely commit e13aa799 ("cpufreq: Change default transition delay to 2ms") and commit 37c6dccd ("cpufreq: Remove LATENCY_MULTIPLIER"). Both changed the defaults / maximums so let the documentation reflect that. Signed-off-by:
Christian Loehle <christian.loehle@arm.com> Link: https://patch.msgid.link/46853b6e-bad5-4ace-9b23-ff157f234ae3@arm.com Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- Oct 17, 2024
-
-
Luca Boccassi authored
The current policy management makes it impossible to use IPE in a general purpose distribution. In such cases the users are not building the kernel, the distribution is, and access to the private key included in the trusted keyring is, for obvious reason, not available. This means that users have no way to enable IPE, since there will be no built-in generic policy, and no access to the key to sign updates validated by the trusted keyring. Just as we do for dm-verity, kernel modules and more, allow the secondary and platform keyrings to also validate policies. This allows users enrolling their own keys in UEFI db or MOK to also sign policies, and enroll them. This makes it sensible to enable IPE in general purpose distributions, as it becomes usable by any user wishing to do so. Keys in these keyrings can already load kernels and kernel modules, so there is no security downgrade. Add a kconfig each, like dm-verity does, but default to enabled if the dependencies are available. Signed-off-by:
Luca Boccassi <bluca@debian.org> Reviewed-by:
Serge Hallyn <serge@hallyn.com> [FW: fixed some style issues] Signed-off-by:
Fan Wu <wufan@kernel.org>
-
Luca Boccassi authored
Currently IPE accepts an update that has the same version as the policy being updated, but it doesn't make it a no-op nor it checks that the old and new policyes are the same. So it is possible to change the content of a policy, without changing its version. This is very confusing from userspace when managing policies. Instead change the update logic to reject updates that have the same version with ESTALE, as that is much clearer and intuitive behaviour. Signed-off-by:
Luca Boccassi <bluca@debian.org> Reviewed-by:
Serge Hallyn <serge@hallyn.com> Signed-off-by:
Fan Wu <wufan@kernel.org>
-
- Oct 16, 2024
-
-
Tomi Valkeinen authored
Add documentation for rp1-cfe driver. Signed-off-by:
Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Signed-off-by:
Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by:
Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
-
- Oct 14, 2024
-
-
Steven Rostedt authored
At the 2024 Linux Plumbers Conference, I was talking with Hans de Goede about the persistent buffer to display traces from previous boots. He mentioned that UEFI can clear memory. In my own tests I have not seen this. He later informed me that it requires the config option: CONFIG_RESET_ATTACK_MITIGATION It appears that setting this will allow the memory to be cleared on boot up, which will definitely clear out the trace of the previous boot. Add this information under the trace_instance in kernel-parameters.txt to let people know that this can cause issues. Link: https://lore.kernel.org/all/20170825155019.6740-2-ard.biesheuvel@linaro.org/ Reported-by:
Hans de Goede <hdegoede@redhat.com> Reviewed-by:
Hans de Goede <hdegoede@redhat.com> Acked-by:
Ard Biesheuvel <ardb@kernel.org> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241007131653.35837081@gandalf.local.home
-
- Oct 10, 2024
-
-
Pierre-Louis Bossart authored
Add a kernel parameter to work-around discrepancies between hardware and platform firmware, it's not unusual to see e.g. 38.4MHz listed in _DSD properties as the SoundWire clock source, but the hardware may be based on a 19.2 MHz mclk source. Signed-off-by:
Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by:
Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20241004021850.9758-1-yung-chuan.liao@linux.intel.com Signed-off-by:
Vinod Koul <vkoul@kernel.org>
-
- Oct 08, 2024
-
-
Hans Verkuil authored
The omap4 camera driver has seen no progress since forever, and now OMAP4 support has also been dropped from u-boot (1). So it is time to retire this driver. (1): https://lists.denx.de/pipermail/u-boot/2024-July/558846.html Signed-off-by:
Hans Verkuil <hverkuil-cisco@xs4all.nl> Acked-by:
Laurent Pinchart <laurent.pinchart@ideasonboard.com>
-
- Oct 04, 2024
-
-
Mark Brown authored
Hook up an override for GCS, allowing it to be disabled from the command line by specifying arm64.nogcs in case there are problems. Reviewed-by:
Thiago Jung Bauermann <thiago.bauermann@linaro.org> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by:
Catalin Marinas <catalin.marinas@arm.com> Signed-off-by:
Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-17-222b78d87eee@kernel.org Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com>
-
- Oct 02, 2024
-
-
Wei Huang authored
Add support for PCIe TLP Processing Hints (TPH) support (see PCIe r6.2, sec 6.17). Add TPH register definitions in pci_regs.h, including the TPH Requester capability register, TPH Requester control register, TPH Completer capability, and the ST fields of MSI-X entry. Introduce pcie_enable_tph() and pcie_disable_tph(), enabling drivers to toggle TPH support and configure specific ST mode as needed. Also add a new kernel parameter, "pci=notph", allowing users to disable TPH support across the entire system. Link: https://lore.kernel.org/r/20241002165954.128085-2-wei.huang2@amd.com Co-developed-by:
Jing Liu <jing2.liu@intel.com> Co-developed-by:
Paul Luse <paul.e.luse@linux.intel.com> Co-developed-by:
Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by:
Jing Liu <jing2.liu@intel.com> Signed-off-by:
Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by:
Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by:
Wei Huang <wei.huang2@amd.com> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Reviewed-by:
Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by:
Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by:
Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by:
Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by:
Lukas Wunner <lukas@wunner.de>
-