- Nov 07, 2024
-
-
Maíra Canal authored
If we add ``thp_anon=32,64K:always`` to the kernel command line, we will see the following error: [ 0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```, as [KMG] must follow each number to especify its unit. So, the correct format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```. Therefore, adjust the documentation to reflect the correct format of the parameter ``thp_anon=``. Link: https://lkml.kernel.org/r/20241101165719.1074234-3-mcanal@igalia.com Fixes: dd4d30d1 ("mm: override mTHP "enabled" defaults at kernel cmdline") Signed-off-by:
Maíra Canal <mcanal@igalia.com> Acked-by:
Barry Song <baohua@kernel.org> Acked-by:
David Hildenbrand <david@redhat.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
damon_feed_loop_next_input() is inefficient and fragile to overflows. Specifically, 'score_goal_diff_bp' calculation can overflow when 'score' is high. The calculation is actually unnecessary at all because 'goal' is a constant of value 10,000. Calculation of 'compensation' is again fragile to overflow. Final calculation of return value for under-achiving case is again fragile to overflow when the current score is under-achieving the target. Add two corner cases handling at the beginning of the function to make the body easier to read, and rewrite the body of the function to avoid overflows and the unnecessary bp value calcuation. Link: https://lkml.kernel.org/r/20241031161203.47751-1-sj@kernel.org Fixes: 9294a037 ("mm/damon/core: implement goal-oriented feedback-driven quota auto-tuning") Signed-off-by:
SeongJae Park <sj@kernel.org> Reported-by:
Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/944f3d5b-9177-48e7-8ec9-7f1331a3fea3@roeck-us.net Tested-by:
Guenter Roeck <linux@roeck-us.net> Cc: <stable@vger.kernel.org> [6.8.x] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
DAMON's logics to determine if this is the time to apply damos schemes assumes next_apply_sis is always set larger than current passed_sample_intervals. And therefore assume continuously incrementing passed_sample_intervals will make it reaches to the next_apply_sis in future. The logic hence does apply the scheme and update next_apply_sis only if passed_sample_intervals is same to next_apply_sis. If Schemes apply interval is set as zero, however, next_apply_sis is set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_apply_sis check. Hence, next_apply_sis becomes larger than next_apply_sis, and the logic says it is not the time to apply schemes and update next_apply_sis. In other words, DAMON stops applying schemes until passed_sample_intervals overflows. Based on the documents and the common sense, a reasonable behavior for such inputs would be applying the schemes for every sampling interval. Handle the case by removing the assumption. Link: https://lkml.kernel.org/r/20241031183757.49610-3-sj@kernel.org Fixes: 42f994b7 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by:
SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.7.x] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
SeongJae Park authored
Patch series "mm/damon/core: fix handling of zero non-sampling intervals". DAMON's internal intervals accounting logic is not correctly handling non-sampling intervals of zero values for a wrong assumption. This could cause unexpected monitoring behavior, and even result in infinite hang of DAMON sysfs interface user threads in case of zero aggregation interval. Fix those by updating the intervals accounting logic. For details of the root case and solutions, please refer to commit messages of fixes. This patch (of 2): DAMON's logics to determine if this is the time to do aggregation and ops update assumes next_{aggregation,ops_update}_sis are always set larger than current passed_sample_intervals. And therefore it further assumes continuously incrementing passed_sample_intervals every sampling interval will make it reaches to the next_{aggregation,ops_update}_sis in future. The logic therefore make the action and update next_{aggregation,ops_updaste}_sis only if passed_sample_intervals is same to the counts, respectively. If Aggregation interval or Ops update interval are zero, however, next_aggregation_sis or next_ops_update_sis are set same to current passed_sample_intervals, respectively. And passed_sample_intervals is incremented before doing the next_{aggregation,ops_update}_sis check. Hence, passed_sample_intervals becomes larger than next_{aggregation,ops_update}_sis, and the logic says it is not the time to do the action and update next_{aggregation,ops_update}_sis forever, until an overflow happens. In other words, DAMON stops doing aggregations or ops updates effectively forever, and users cannot get monitoring results. Based on the documents and the common sense, a reasonable behavior for such inputs is doing an aggregation and an ops update for every sampling interval. Handle the case by removing the assumption. Note that this could incur particular real issue for DAMON sysfs interface users, in case of zero Aggregation interval. When user starts DAMON with zero Aggregation interval and asks online DAMON parameter tuning via DAMON sysfs interface, the request is handled by the aggregation callback. Until the callback finishes the work, the user who requested the online tuning just waits. Hence, the user will be stuck until the passed_sample_intervals overflows. Link: https://lkml.kernel.org/r/20241031183757.49610-1-sj@kernel.org Link: https://lkml.kernel.org/r/20241031183757.49610-2-sj@kernel.org Fixes: 4472edf6 ("mm/damon/core: use number of passed access sampling as a timer") Signed-off-by:
SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> [6.7.x] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Wei Yang authored
After commit 94d7d923 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al."), if vma_modify_flags() return error, the vma is set to an error code. This will lead to an invalid prev be returned. Generally this shouldn't matter as the caller should treat an error as indicating state is now invalidated, however unfortunately apply_mlockall_flags() does not check for errors and assumes that mlock_fixup() correctly maintains prev even if an error were to occur. This patch fixes that assumption. [lorenzo.stoakes@oracle.com: provide a better fix and rephrase the log] Link: https://lkml.kernel.org/r/20241027123321.19511-1-richard.weiyang@gmail.com Fixes: 94d7d923 ("mm: abstract the vma_merge()/split_vma() pattern for mprotect() et al.") Signed-off-by:
Wei Yang <richard.weiyang@gmail.com> Reviewed-by:
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reviewed-by:
Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Masami Hiramatsu (Google) authored
Since gfp & GFP_ATOMIC == GFP_ATOMIC is true for GFP_KERNEL | GFP_HIGH, it will use kmalloc if user specifies that combination. Here the reason why combining the __vmalloc_node() and kmalloc_node() is that the vmalloc does not support all GFP flag, especially GFP_ATOMIC. So we should check if gfp & (GFP_ATOMIC | GFP_KERNEL) != GFP_ATOMIC for vmalloc first. This ensures caller can sleep. And for the robustness, even if vmalloc fails, it should retry with kmalloc to allocate it. Link: https://lkml.kernel.org/r/173008598713.1262174.2959179484209897252.stgit@mhiramat.roam.corp.google.com Fixes: aff1871b ("objpool: fix choosing allocation for percpu slots") Signed-off-by:
Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/all/CAHk-=whO+vSH+XVRio8byJU8idAWES0SPGVZ7KAVdc4qrV0VUA@mail.gmail.com/ Cc: Leo Yan <leo.yan@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Wu <wuqiang.matt@bytedance.com> Cc: Mikel Rychliski <mikel@mikelr.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Viktor Malik <vmalik@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Yu Zhao authored
OOM kills due to vastly overestimated free highatomic reserves were observed: ... invoked oom-killer: gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0 ... Node 0 Normal free:1482936kB boost:0kB min:410416kB low:739404kB high:1068392kB reserved_highatomic:1073152KB ... Node 0 Normal: 1292*4kB (ME) 1920*8kB (E) 383*16kB (UE) 220*32kB (ME) 340*64kB (E) 2155*128kB (UE) 3243*256kB (UE) 615*512kB (U) 1*1024kB (M) 0*2048kB 0*4096kB = 1477408kB The second line above shows that the OOM kill was due to the following condition: free (1482936kB) - reserved_highatomic (1073152kB) = 409784KB < min (410416kB) And the third line shows there were no free pages in any MIGRATE_HIGHATOMIC pageblocks, which otherwise would show up as type 'H'. Therefore __zone_watermark_unusable_free() underestimated the usable free memory by over 1GB, which resulted in the unnecessary OOM kill above. The comments in __zone_watermark_unusable_free() warns about the potential risk, i.e., If the caller does not have rights to reserves below the min watermark then subtract the high-atomic reserves. This will over-estimate the size of the atomic reserve but it avoids a search. However, it is possible to keep track of free pages in reserved highatomic pageblocks with a new per-zone counter nr_free_highatomic protected by the zone lock, to avoid a search when calculating the usable free memory. And the cost would be minimal, i.e., simple arithmetics in the highatomic alloc/free/move paths. Note that since nr_free_highatomic can be relatively small, using a per-cpu counter might cause too much drift and defeat its purpose, in addition to the extra memory overhead. Dependson e0932b6c ("mm: page_alloc: consolidate free page accounting") - see [1] [akpm@linux-foundation.org: s/if/else if/, per Johannes, stealth whitespace tweak] Link: https://lkml.kernel.org/r/20241028182653.3420139-1-yuzhao@google.com Link: https://lkml.kernel.org/r/0d0ddb33-fcdc-43e2-801f-0c1df2031afb@suse.cz [1] Fixes: 0aaa29a5 ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") Signed-off-by:
Yu Zhao <yuzhao@google.com> Reported-by:
Link Lin <linkl@google.com> Acked-by:
David Rientjes <rientjes@google.com> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 06, 2024
-
-
Lorenzo Stoakes authored
The mmap_region() function is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. Taking advantage of previous patches in this series we move a number of checks earlier in the code, simplifying things by moving the core of the logic into a static internal function __mmap_region(). Doing this allows us to perform a number of checks up front before we do any real work, and allows us to unwind the writable unmap check unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE validation unconditionally also. We move a number of things here: 1. We preallocate memory for the iterator before we call the file-backed memory hook, allowing us to exit early and avoid having to perform complicated and error-prone close/free logic. We carefully free iterator state on both success and error paths. 2. The enclosing mmap_region() function handles the mapping_map_writable() logic early. Previously the logic had the mapping_map_writable() at the point of mapping a newly allocated file-backed VMA, and a matching mapping_unmap_writable() on success and error paths. We now do this unconditionally if this is a file-backed, shared writable mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however doing so does not invalidate the seal check we just performed, and we in any case always decrement the counter in the wrapper. We perform a debug assert to ensure a driver does not attempt to do the opposite. 3. We also move arch_validate_flags() up into the mmap_region() function. This is only relevant on arm64 and sparc64, and the check is only meaningful for SPARC with ADI enabled. We explicitly add a warning for this arch if a driver invalidates this check, though the code ought eventually to be fixed to eliminate the need for this. With all of these measures in place, we no longer need to explicitly close the VMA on error paths, as we place all checks which might fail prior to a call to any driver mmap hook. This eliminates an entire class of errors, makes the code easier to reason about and more robust. Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f656 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by:
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by:
Jann Horn <jannh@google.com> Reviewed-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by:
Vlastimil Babka <vbabka@suse.cz> Tested-by:
Mark Brown <broonie@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Lorenzo Stoakes authored
Currently MTE is permitted in two circumstances (desiring to use MTE having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is specified, as checked by arch_calc_vm_flag_bits() and actualised by setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap hook is activated in mmap_region(). The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also set is the arm64 implementation of arch_validate_flags(). Unfortunately, we intend to refactor mmap_region() to perform this check earlier, meaning that in the case of a shmem backing we will not have invoked shmem_mmap() yet, causing the mapping to fail spuriously. It is inappropriate to set this architecture-specific flag in general mm code anyway, so a sensible resolution of this issue is to instead move the check somewhere else. We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via the arch_calc_vm_flag_bits() call. This is an appropriate place to do this as we already check for the MAP_ANONYMOUS case here, and the shmem file case is simply a variant of the same idea - we permit RAM-backed memory. This requires a modification to the arch_calc_vm_flag_bits() signature to pass in a pointer to the struct file associated with the mapping, however this is not too egregious as this is only used by two architectures anyway - arm64 and parisc. So this patch performs this adjustment and removes the unnecessary assignment of VM_MTE_ALLOWED in shmem_mmap(). [akpm@linux-foundation.org: fix whitespace, per Catalin] Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f656 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by:
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Suggested-by:
Catalin Marinas <catalin.marinas@arm.com> Reported-by:
Jann Horn <jannh@google.com> Reviewed-by:
Catalin Marinas <catalin.marinas@arm.com> Reviewed-by:
Vlastimil Babka <vbabka@suse.cz> Cc: Andreas Larsson <andreas@gaisler.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Liam R. Howlett <Liam.Howlett@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Lorenzo Stoakes authored
Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit. While we're here, we refactor the function to be more readable and add some additional documentation. Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f656 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by:
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by:
Jann Horn <jannh@google.com> Reviewed-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
Jann Horn <jannh@google.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Lorenzo Stoakes authored
Incorrect invocation of VMA callbacks when the VMA is no longer in a consistent state is bug prone and risky to perform. With regards to the important vm_ops->close() callback We have gone to great lengths to try to track whether or not we ought to close VMAs. Rather than doing so and risking making a mistake somewhere, instead unconditionally close and reset vma->vm_ops to an empty dummy operations set with a NULL .close operator. We introduce a new function to do so - vma_close() - and simplify existing vms logic which tracked whether we needed to close or not. This simplifies the logic, avoids incorrect double-calling of the .close() callback and allows us to update error paths to simply call vma_close() unconditionally - making VMA closure idempotent. Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f656 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by:
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by:
Jann Horn <jannh@google.com> Reviewed-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by:
Jann Horn <jannh@google.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Lorenzo Stoakes authored
Patch series "fix error handling in mmap_region() and refactor (hotfixes)", v4. mmap_region() is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. This series goes to great lengths to simplify how mmap_region() works and to avoid unwinding errors late on in the process of setting up the VMA for the new mapping, and equally avoids such operations occurring while the VMA is in an inconsistent state. The patches in this series comprise the minimal changes required to resolve existing issues in mmap_region() error handling, in order that they can be hotfixed and backported. There is additionally a follow up series which goes further, separated out from the v1 series and sent and updated separately. This patch (of 5): After an attempted mmap() fails, we are no longer in a situation where we can safely interact with VMA hooks. This is currently not enforced, meaning that we need complicated handling to ensure we do not incorrectly call these hooks. We can avoid the whole issue by treating the VMA as suspect the moment that the file->f_ops->mmap() function reports an error by replacing whatever VMA operations were installed with a dummy empty set of VMA operations. We do so through a new helper function internal to mm - mmap_file() - which is both more logically named than the existing call_mmap() function and correctly isolates handling of the vm_op reassignment to mm. All the existing invocations of call_mmap() outside of mm are ultimately nested within the call_mmap() from mm, which we now replace. It is therefore safe to leave call_mmap() in place as a convenience function (and to avoid churn). The invokers are: ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap() coda_file_operations -> mmap -> coda_file_mmap() shm_file_operations -> shm_mmap() shm_file_operations_huge -> shm_mmap() dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops -> i915_gem_dmabuf_mmap() None of these callers interact with vm_ops or mappings in a problematic way on error, quickly exiting out. Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.1730224667.git.lorenzo.stoakes@oracle.com Fixes: deb0f656 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by:
Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Reported-by:
Jann Horn <jannh@google.com> Reviewed-by:
Liam R. Howlett <Liam.Howlett@oracle.com> Reviewed-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
Jann Horn <jannh@google.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Brown <broonie@kernel.org> Cc: Peter Xu <peterx@redhat.com> Cc: Will Deacon <will@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4 ("mm: split underused THPs") Signed-off-by:
Hugh Dickins <hughd@google.com> Acked-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Yang Shi <shy828301@gmail.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Hugh Dickins authored
Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. The new unlocked list_del_init() in deferred_split_scan() is buggy. I gave bad advice, it looks plausible since that's a local on-stack list, but the fact is that it can race with a third party freeing or migrating the preceding folio (properly unqueueing it with refcount 0 while holding split_queue_lock), thereby corrupting the list linkage. The obvious answer would be to take split_queue_lock there: but it has a long history of contention, so I'm reluctant to add to that. Instead, make sure that there is always one safe (raised refcount) folio before, by delaying its folio_put(). (And of course I was wrong to suggest updating split_queue_len without the lock: leave that until the splice.) And remove two over-eager partially_mapped checks, restoring those tests to how they were before: if uncharge_folio() or free_tail_page_prepare() finds _deferred_list non-empty, it's in trouble whether or not that folio is partially_mapped (and the flag was already cleared in the latter case). Link: https://lkml.kernel.org/r/81e34a8b-113a-0701-740e-2135c97eb1d7@google.com Fixes: dafff3f4 ("mm: split underused THPs") Signed-off-by:
Hugh Dickins <hughd@google.com> Acked-by:
Usama Arif <usamaarif642@gmail.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by:
Zi Yan <ziy@nvidia.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Nov 04, 2024
-
-
Linus Torvalds authored
-
- Nov 03, 2024
-
-
Linus Torvalds authored
Merge tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "17 hotfixes. 9 are cc:stable. 13 are MM and 4 are non-MM. The usual collection of singletons - please see the changelogs" * tag 'mm-hotfixes-stable-2024-11-03-10-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats mm, mmap: limit THP alignment of anonymous mappings to PMD-aligned sizes mm: shrinker: avoid memleak in alloc_shrinker_info .mailmap: update e-mail address for Eugen Hristev vmscan,migrate: fix page count imbalance on node stats when demoting pages mailmap: update Jarkko's email addresses mm: allow set/clear page_type again nilfs2: fix potential deadlock with newly created symlinks Squashfs: fix variable overflow in squashfs_readpage_block kasan: remove vmalloc_percpu test tools/mm: -Werror fixes in page-types/slabinfo mm, swap: avoid over reclaim of full clusters mm: fix PSWPIN counter for large folios swap-in mm: avoid VM_BUG_ON when try to map an anon large folio to zero page. mm/codetag: fix null pointer check logic for ref and tag mm/gup: stop leaking pinned pages in low memory conditions
-
git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phyLinus Torvalds authored
Pull phy fixes from Vinod Koul: - Qualcomm QMP driver fixes for null deref on suspend, bogus supplies fix and reset entries fix - BCM usb driver init array fix - cadence array offset fix - starfive link configuration fix - config dependency fix for rockchip driver - freescale reset signal fix before pll lock - tegra driver fix for error pointer check * tag 'phy-fixes-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy: phy: tegra: xusb: Add error pointer check in xusb.c dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: Fix X1E80100 resets entries phy: freescale: imx8m-pcie: Do CMN_RST just before PHY PLL lock check phy: phy-rockchip-samsung-hdptx: Depend on CONFIG_COMMON_CLK phy: ti: phy-j721e-wiz: fix usxgmii configuration phy: starfive: jh7110-usb: Fix link configuration to controller phy: qcom: qmp-pcie: drop bogus x1e80100 qref supplies phy: qcom: qmp-combo: move driver data initialisation earlier phy: qcom: qmp-usbc: fix NULL-deref on runtime suspend phy: qcom: qmp-usb-legacy: fix NULL-deref on runtime suspend phy: qcom: qmp-usb: fix NULL-deref on runtime suspend dt-bindings: phy: qcom,sc8280xp-qmp-pcie-phy: add missing x1e80100 pipediv2 clocks phy: usb: disable COMMONONN for dual mode phy: cadence: Sierra: Fix offset of DEQ open eye algorithm control register phy: usb: Fix missing elements in BCM4908 USB init array
-
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengineLinus Torvalds authored
Pull dmaengine fixes from Vinod Koul: - TI driver fix to set EOP for cyclic BCDMA transfers - sh rz-dmac driver fix for handling config with zero address * tag 'dmaengine-fix-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: dmaengine: ti: k3-udma: Set EOP for all TRs in cyclic BCDMA transfer dmaengine: sh: rz-dmac: handle configs where one address is zero
-
Linus Torvalds authored
Merge tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core revert from Greg KH: "Here is a single driver core revert for 6.12-rc6. It reverts a change that came in -rc1 that was supposed to resolve a reported problem, but caused another one, so revert it for now so that we can get this all worked out properly in 6.13. The revert has been in linux-next all week with no reported issues" * tag 'driver-core-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Revert "driver core: Fix uevent_show() vs driver detach race"
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usbLinus Torvalds authored
Pull USB / Thunderbolt fixes from Greg KH: "Here are some small USB and Thunderbolt driver fixes for 6.12-rc6 that have been sitting in my tree this week. Included in here are the following: - thunderbolt driver fixes for reported issues - USB typec driver fixes - xhci driver fixes for reported problems - dwc2 driver revert for a broken change - usb phy driver fix - usbip tool fix All of these have been in linux-next this week with no reported issues" * tag 'usb-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tcpm: restrict SNK_WAIT_CAPABILITIES_TIMEOUT transitions to non self-powered devices usb: phy: Fix API devm_usb_put_phy() can not release the phy usb: typec: use cleanup facility for 'altmodes_node' usb: typec: fix unreleased fwnode_handle in typec_port_register_altmodes() usb: typec: qcom-pmic-typec: fix missing fwnode removal in error path usb: typec: qcom-pmic-typec: use fwnode_handle_put() to release fwnodes usb: acpi: fix boot hang due to early incorrect 'tunneled' USB3 device links Revert "usb: dwc2: Skip clock gating on Broadcom SoCs" xhci: Fix Link TRB DMA in command ring stopped completion event xhci: Use pm_runtime_get to prevent RPM on unsupported systems usbip: tools: Fix detach_port() invalid port error path thunderbolt: Honor TMU requirements in the domain when setting TMU mode thunderbolt: Fix KASAN reported stack out-of-bounds read in tb_retimer_scan()
-
Yu Zhao authored
When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton@google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdae ("mm: multi-gen LRU: support page table walks") Signed-off-by:
Yu Zhao <yuzhao@google.com> Signed-off-by:
James Houghton <jthoughton@google.com> Reported-by:
David Stevens <stevensd@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Cc: kernel test robot <lkp@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Yu Zhao authored
Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google.com/ This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdae ("mm: multi-gen LRU: support page table walks") Signed-off-by:
Yu Zhao <yuzhao@google.com> Signed-off-by:
James Houghton <jthoughton@google.com> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: David Matlack <dmatlack@google.com> Cc: David Rientjes <rientjes@google.com> Cc: David Stevens <stevensd@google.com> Cc: Oliver Upton <oliver.upton@linux.dev> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Wei Xu <weixugc@google.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-miscLinus Torvalds authored
Pull misc driver fixes from Greg KH: "Here are some small char/misc/iio fixes for 6.12-rc6 that resolve some reported issues. Included in here are the following: - small IIO driver fixes for many reported issues - mei driver fix for a suddenly much reported issue for an "old" issue. - MAINTAINERS update for a developer who has moved companies and forgot to update their old entry. All of these have been in linux-next this week with no reported issues" * tag 'char-misc-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: mei: use kvmalloc for read buffer MAINTAINERS: add netup_unidvb maintainer iio: dac: Kconfig: Fix build error for ltc2664 iio: adc: ad7124: fix division by zero in ad7124_set_channel_odr() staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg() docs: iio: ad7380: fix supply for ad7380-4 iio: adc: ad7380: fix supplies for ad7380-4 iio: adc: ad7380: add missing supplies iio: adc: ad7380: use devm_regulator_get_enable_read_voltage() dt-bindings: iio: adc: ad7380: fix ad7380-4 reference supply iio: light: veml6030: fix microlux value calculation iio: gts-helper: Fix memory leaks for the error path of iio_gts_build_avail_scale_table() iio: gts-helper: Fix memory leaks in iio_gts_build_avail_scale_table()
-
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/inputLinus Torvalds authored
Pull input fixes from Dmitry Torokhov: - a fix for regression in input core introduced in 6.11 preventing re-registering input handlers - a fix for adp5588-keys driver tyring to disable interrupt 0 at suspend when devices is used without interrupt - a fix for edt-ft5x06 to stop leaking regmap structure when probing fails and to make sure it is not released too early on removal. * tag 'input-for-v6.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: fix regression when re-registering input handlers Input: adp5588-keys - do not try to disable interrupt 0 Input: edt-ft5x06 - fix regmap leak when probe fails
-
Linus Torvalds authored
Merge tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix a memory leak in modpost - Resolve build issues when cross-compiling RPM and Debian packages - Fix another regression in Kconfig - Fix incorrect MODULE_ALIAS() output in modpost * tag 'kbuild-fixes-v6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: modpost: fix input MODULE_DEVICE_TABLE() built for 64-bit on 32-bit host modpost: fix acpi MODULE_DEVICE_TABLE built with mismatched endianness kconfig: show sub-menu entries even if the prompt is hidden kbuild: deb-pkg: add pkg.linux-upstream.nokerneldbg build profile kbuild: deb-pkg: add pkg.linux-upstream.nokernelheaders build profile kbuild: rpm-pkg: disable kernel-devel package when cross-compiling sumversion: Fix a memory leak in get_src_version()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull x86 fix from Thomas Gleixner: "A trivial compile test fix for x86: When CONFIG_AMD_NB is not set a COMPILE_TEST of an AMD specific driver fails due to a missing inline stub. Add the stub to cure it" * tag 'x86-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/amd_nb: Fix compile-testing without CONFIG_AMD_NB
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull timer fix from Thomas Gleixner: "A single fix for posix CPU timers. When a thread is cloned, the posix CPU timers are not inherited. If the parent has a CPU timer armed the corresponding tick dependency in the tasks tick_dep_mask is set and copied to the new thread, which means the new thread and all decendants will prevent the system to go into full NOHZ operation. Clear the tick dependency mask in copy_process() to fix this" * tag 'timers-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix-cpu-timers: Clear TICK_DEP_BIT_POSIX_TIMER on clone
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull scheduler fixes from Thomas Gleixner: - Plug a race between pick_next_task_fair() and try_to_wake_up() where both try to write to the same task, even though both paths hold a runqueue lock, but obviously from different runqueues. The problem is that the store to task::on_rq in __block_task() is visible to try_to_wake_up() which assumes that the task is not queued. Both sides then operate on the same task. Cure it by rearranging __block_task() so the the store to task::on_rq is the last operation on the task. - Prevent a potential NULL pointer dereference in task_numa_work() task_numa_work() iterates the VMAs of a process. A concurrent unmap of the address space can result in a NULL pointer return from vma_next() which is unchecked. Add the missing NULL pointer check to prevent this. - Operate on the correct scheduler policy in task_should_scx() task_should_scx() returns true when a task should be handled by sched EXT. It checks the tasks scheduling policy. This fails when the check is done before a policy has been set. Cure it by handing the policy into task_should_scx() so it operates on the requested value. - Add the missing handling of sched EXT in the delayed dequeue mechanism. This was simply forgotten. * tag 'sched-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/ext: Fix scx vs sched_delayed sched: Pass correct scheduling policy to __setscheduler_class sched/numa: Fix the potential null pointer dereference in task_numa_work() sched: Fix pick_next_task_fair() vs try_to_wake_up() race
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull perf fix from Thomas Gleixner: "perf_event_clear_cpumask() uses list_for_each_entry_rcu() without being in a RCU read side critical section, which triggers a 'suspicious RCU usage' warning. It turns out that the list walk does not be RCU protected because the write side lock is held in this context. Change it to a regular list walk" * tag 'perf-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix missing RCU reader protection in perf_event_clear_cpumask()
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull irq fixes from Thomas Gleixner: - Fix an off-by-one error in the failure path of msi_domain_alloc(), which causes the cleanup loop to terminate early and leaking the first allocated interrupt. - Handle a corner case in GIC-V4 versus a lazily mapped Virtual Processing Element (VPE). If the VPE has not been mapped because the guest has not yet emitted a mapping command, then the set_affinity() callback returns an error code, which causes the vCPU management to fail. Return success in this case without touching the hardware. This will be done later when the guest issues the mapping command. * tag 'irq-urgent-2024-11-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v4: Correctly deal with set_affinity on lazily-mapped VPEs genirq/msi: Fix off-by-one error in msi_domain_alloc()
-
Masahiro Yamada authored
When building a 64-bit kernel on a 32-bit build host, incorrect input MODULE_ALIAS() entries may be generated. For example, when compiling a 64-bit kernel with CONFIG_INPUT_MOUSEDEV=m on a 64-bit build machine, you will get the correct output: $ grep MODULE_ALIAS drivers/input/mousedev.mod.c MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*r*0,*1,*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*r*a*0,*1,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*r*a*0,*1,*18,*1C,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*r*a*0,*1,*m*l*s*f*w*"); However, building the same kernel on a 32-bit machine results in incorrect output: $ grep MODULE_ALIAS drivers/input/mousedev.mod.c MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*110,*130,*r*0,*1,*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*2,*k*r*8,*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*14A,*16A,*r*a*0,*1,*20,*21,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*145,*165,*r*a*0,*1,*18,*1C,*20,*21,*38,*3C,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*3,*k*110,*130,*r*a*0,*1,*20,*21,*m*l*s*f*w*"); A similar issue occurs with CONFIG_INPUT_JOYDEV=m. On a 64-bit build machine, the output is: $ grep MODULE_ALIAS drivers/input/joydev.mod.c MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*k*120,*r*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*k*130,*r*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*r*a*m*l*s*f*w*"); However, on a 32-bit machine, the output is incorrect: $ grep MODULE_ALIAS drivers/input/joydev.mod.c MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*0,*20,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*2,*22,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*8,*28,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*3,*k*r*a*6,*26,*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*k*11F,*13F,*r*a*m*l*s*f*w*"); MODULE_ALIAS("input:b*v*p*e*-e*1,*k*2C0,*2E0,*r*a*m*l*s*f*w*"); When building a 64-bit kernel, BITS_PER_LONG is defined as 64. However, on a 32-bit build machine, the constant 1L is a signed 32-bit value. Left-shifting it beyond 32 bits causes wraparound, and shifting by 31 or 63 bits makes it a negative value. The fix in commit e0e92632 ("[PATCH] PATCH: 1 line 2.6.18 bugfix: modpost-64bit-fix.patch") is incorrect; it only addresses cases where a 64-bit kernel is built on a 64-bit build machine, overlooking cases on a 32-bit build machine. Using 1ULL ensures a 64-bit width on both 32-bit and 64-bit machines, avoiding the wraparound issue. Fixes: e0e92632 ("[PATCH] PATCH: 1 line 2.6.18 bugfix: modpost-64bit-fix.patch") Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
Masahiro Yamada authored
When CONFIG_SATA_AHCI_PLATFORM=m, modpost outputs incorect acpi MODULE_ALIAS() if the endianness of the target and the build machine do not match. When the endianness of the target kernel and the build machine match, the output is correct: $ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c MODULE_ALIAS("acpi*:APMC0D33:*"); MODULE_ALIAS("acpi*:010601:*"); However, when building a little-endian kernel on a big-endian machine (or vice versa), the output is incorrect: $ grep 'MODULE_ALIAS("acpi' drivers/ata/ahci_platform.mod.c MODULE_ALIAS("acpi*:APMC0D33:*"); MODULE_ALIAS("acpi*:0601??:*"); The 'cls' and 'cls_msk' fields are 32-bit. DEF_FIELD() must be used instead of DEF_FIELD_ADDR() to correctly handle endianness of these 32-bit fields. The check 'if (cls)' was unnecessary; it never became NULL, as it was the pointer to 'symval' plus the offset to the 'cls' field. Fixes: 26095a01 ("ACPI / scan: Add support for ACPI _CLS device matching") Signed-off-by:
Masahiro Yamada <masahiroy@kernel.org>
-
Dmitry Torokhov authored
Commit d469647b ("Input: simplify event handling logic") introduced code that would set handler->events() method to either input_handler_events_filter() or input_handler_events_default() or input_handler_events_null(), depending on the kind of input handler (a filter or a regular one) we are dealing with. Unfortunately this breaks cases when we try to re-register the same filter (as is the case with sysrq handler): after initial registration the handler will have 2 event handling methods defined, and will run afoul of the check in input_handler_check_methods(): input: input_handler_check_methods: only one event processing method can be defined (sysrq) sysrq: Failed to register input handler, error -22 Fix this by adding handle_events() method to input_handle structure and setting it up when registering a new input handle according to event handling methods defined in associated input_handler structure, thus avoiding modifying the input_handler structure. Reported-by:
"Ned T. Crigler" <crigler@gmail.com> Reported-by:
Christian Heusel <christian@heusel.eu> Tested-by:
"Ned T. Crigler" <crigler@gmail.com> Tested-by:
Peter Seiderer <ps.report@gmx.net> Fixes: d469647b ("Input: simplify event handling logic") Link: https://lore.kernel.org/r/Zx2iQp6csn42PJA7@xavtug Cc: stable@vger.kernel.org Signed-off-by:
Dmitry Torokhov <dmitry.torokhov@gmail.com>
-
- Nov 02, 2024
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds authored
Pull nfsd fixes from Chuck Lever: - Fix two async COPY bugs found during NFS bake-a-thon - Fix an svcrdma memory leak * tag 'nfsd-6.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: rpcrdma: Always release the rpcrdma_device's xa_array NFSD: Never decrement pending_async_copies on error NFSD: Initialize struct nfsd4_copy earlier
-
git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds authored
Pull xfs fixes from Carlos Maiolino: - fix a sysbot reported crash on filestreams - Reduce cpu time spent searching for extents in a very fragmented FS - Check for delayed allocations before setting extsize * tag 'xfs-6.12-fixes-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: streamline xfs_filestream_pick_ag xfs: fix finding a last resort AG in xfs_filestream_pick_ag xfs: Reduce unnecessary searches when searching for the best extents xfs: Check for delayed allocations before setting extsize
-
Linus Torvalds authored
Merge tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: - fix syntax error in frequency calculation arithmetic expression in intel_pstate run.sh - add missing cpupower dependency check intel_pstate run.sh - fix idmap_mount_tree_invalid test failure due to incorrect argument - fix watchdog-test run leaving the watchdog timer enabled causing system reboot. With this fix, the test disables the watchdog timer when it gets terminated with SIGTERM, SIGKILL, and SIGQUIT in addition to SIGINT * tag 'linux_kselftest-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/watchdog-test: Fix system accidentally reset after watchdog-test selftests/intel_pstate: check if cpupower is installed selftests/intel_pstate: fix operand expected error selftests/mount_setattr: fix idmap_mount_tree_invalid failed to run
-
https://github.com/Rust-for-Linux/linuxLinus Torvalds authored
Pull rust fixes from Miguel Ojeda: "Toolchain and infrastructure: - Avoid build errors with old 'rustc's without LLVM patch version (important since it impacts people that do not even enable Rust) - Update LLVM version for 'HAVE_CFI_ICALL_NORMALIZE_INTEGERS' in 'depends on' condition (the fix was eventually backported rather than land in LLVM 19)" * tag 'rust-fixes-6.12-3' of https://github.com/Rust-for-Linux/linux: cfi: tweak llvm version for HAVE_CFI_ICALL_NORMALIZE_INTEGERS kbuild: rust: avoid errors with old `rustc`s without LLVM patch version
-
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pciLinus Torvalds authored
Pull pci fix from Bjorn Helgaas: - Enable device-specific ACS-like functionality even if the device doesn't advertise an ACS capability, which got broken when adding fancy ACS kernel parameter (Jason Gunthorpe) * tag 'pci-v6.12-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI: Fix pci_enable_acs() support for the ACS quirks
-
https://gitlab.freedesktop.org/drm/kernelLinus Torvalds authored
Pull drm fixes from Dave Airlie: "Regular fixes pull, nothing too out of the ordinary, the mediatek fixes came in a batch that I might have preferred a bit earlier but all seem fine, otherwise regular xe/amdgpu and a few misc ones. xe: - Fix missing HPD interrupt enabling, bringing one PM refactor with it - Workaround LNL GGTT invalidation not being visible to GuC - Avoid getting jobs stuck without a protecting timeout ivpu: - Fix firewall IRQ handling panthor: - Fix firmware initialization wrt page sizes - Fix handling and reporting of dead job groups sched: - Guarantee forward progress via WC_MEM_RECLAIM tests: - Fix memory leak in drm_display_mode_from_cea_vic() amdgpu: - DCN 3.5 fix - Vangogh SMU KASAN fix - SMU 13 profile reporting fix mediatek: - Fix degradation problem of alpha blending - Fix color format MACROs in OVL - Fix get efuse issue for MT8188 DPTX - Fix potential NULL dereference in mtk_crtc_destroy() - Correct dpi power-domains property - Add split subschema property constraints" * tag 'drm-fixes-2024-11-02' of https://gitlab.freedesktop.org/drm/kernel: (27 commits) drm/xe: Don't short circuit TDR on jobs not started drm/xe: Add mmio read before GGTT invalidate drm/tests: hdmi: Fix memory leaks in drm_display_mode_from_cea_vic() drm/connector: hdmi: Fix memory leak in drm_display_mode_from_cea_vic() drm/tests: helpers: Add helper for drm_display_mode_from_cea_vic() drm/panthor: Report group as timedout when we fail to properly suspend drm/panthor: Fail job creation when the group is dead drm/panthor: Fix firmware initialization on systems with a page size > 4k accel/ivpu: Fix NOC firewall interrupt handling drm/xe/display: Add missing HPD interrupt enabling during non-d3cold RPM resume drm/xe/display: Separate the d3cold and non-d3cold runtime PM handling drm/xe: Remove runtime argument from display s/r functions drm/amdgpu/smu13: fix profile reporting drm/amd/pm: Vangogh: Fix kernel memory out of bounds write Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" drm/sched: Mark scheduler work queues with WQ_MEM_RECLAIM drm/tegra: Fix NULL vs IS_ERR() check in probe() dt-bindings: display: mediatek: split: add subschema property constraints dt-bindings: display: mediatek: dpi: correct power-domains property drm/mediatek: Fix potential NULL dereference in mtk_crtc_destroy() ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxlLinus Torvalds authored
Pull cxl fixes from Ira Weiny: "The bulk of these fixes center around an initialization order bug reported by Gregory Price and some additional fall out from the debugging effort. In summary, cxl_acpi and cxl_mem race and previously worked because of a bus_rescan_devices() while testing without modules built in. Unfortunately with modules built in the rescan would fail due to the cxl_port driver being registered late via the build order. Furthermore it was found bus_rescan_devices() did not guarantee a probe barrier which CXL was expecting. Additional fixes to cxl-test and decoder allocation came along as they were found in this debugging effort. The other fixes are pretty minor but one affects trace point data seen by user space. Summary: - Fix crashes when running with cxl-test code - Fix Trace DRAM Event Record field decodes - Fix module/built in initialization order errors - Fix use after free on decoder shutdowns - Fix out of order decoder allocations - Improve cxl-test to better reflect real world systems" * tag 'cxl-fixes-6.12-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/test: Improve init-order fidelity relative to real-world systems cxl/port: Prevent out-of-order decoder allocation cxl/port: Fix use-after-free, permit out-of-order decoder shutdown cxl/acpi: Ensure ports ready at cxl_acpi_probe() return cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() cxl/port: Fix CXL port initialization order when the subsystem is built-in cxl/events: Fix Trace DRAM Event Record cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device
-