Skip to content
Snippets Groups Projects
  1. Mar 20, 2023
  2. Mar 08, 2023
    • SeongJae Park's avatar
      mm/damon/paddr: fix folio_nr_pages() after folio_put() in damon_pa_mark_accessed_or_deactivate() · dd52a61d
      SeongJae Park authored
      damon_pa_mark_accessed_or_deactivate() is accessing a folio via
      folio_nr_pages() after folio_put() for the folio has invoked.  Fix it.
      
      Link: https://lkml.kernel.org/r/20230304193949.296391-3-sj@kernel.org
      
      
      Fixes: f70da5ee ("mm/damon: convert damon_pa_mark_accessed_or_deactivate() to use folios")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      dd52a61d
    • SeongJae Park's avatar
      mm/damon/paddr: fix folio_size() call after folio_put() in damon_pa_young() · 751688b8
      SeongJae Park authored
      Patch series "mm/damon/paddr: Fix folio-use-after-put bugs".
      
      There are two folio accesses after folio_put() in mm/damon/paddr.c file. 
      Fix those.
      
      
      This patch (of 2):
      
      damon_pa_young() is accessing a folio via folio_size() after folio_put()
      for the folio has invoked.  Fix it.
      
      Link: https://lkml.kernel.org/r/20230304193949.296391-1-sj@kernel.org
      Link: https://lkml.kernel.org/r/20230304193949.296391-2-sj@kernel.org
      
      
      Fixes: 397b0c3a ("mm/damon/paddr: remove folio_sz field from damon_pa_access_chk_result")
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: default avatarMatthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com>
      Cc: <stable@vger.kernel.org>	[6.2.x]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      751688b8
    • Jan Kara via Ocfs2-devel's avatar
      ocfs2: fix data corruption after failed write · 90410bcf
      Jan Kara via Ocfs2-devel authored
      When buffered write fails to copy data into underlying page cache page,
      ocfs2_write_end_nolock() just zeroes out and dirties the page.  This can
      leave dirty page beyond EOF and if page writeback tries to write this page
      before write succeeds and expands i_size, page gets into inconsistent
      state where page dirty bit is clear but buffer dirty bits stay set
      resulting in page data never getting written and so data copied to the
      page is lost.  Fix the problem by invalidating page beyond EOF after
      failed write.
      
      Link: https://lkml.kernel.org/r/20230302153843.18499-1-jack@suse.cz
      
      
      Fixes: 6dbf7bb5 ("fs: Don't invalidate page buffers in block_write_full_page()")
      Signed-off-by: default avatarJan Kara <jack@suse.cz>
      Reviewed-by: default avatarJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      90410bcf
    • Huang Ying's avatar
      migrate_pages: try migrate in batch asynchronously firstly · 2ef7dbb2
      Huang Ying authored
      When we have locked more than one folios, we cannot wait the lock or bit
      (e.g., page lock, buffer head lock, writeback bit) synchronously. 
      Otherwise deadlock may be triggered.  This make it hard to batch the
      synchronous migration directly.
      
      This patch re-enables batching synchronous migration via trying to migrate
      in batch asynchronously firstly.  And any folios that are failed to be
      migrated asynchronously will be migrated synchronously one by one.
      
      Test shows that this can restore the TLB flushing batching performance for
      synchronous migration effectively.
      
      Link: https://lkml.kernel.org/r/20230303030155.160983-4-ying.huang@intel.com
      
      
      Fixes: 5dfab109 ("migrate_pages: batch _unmap and _move")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Tested-by: default avatarHugh Dickins <hughd@google.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: "Xu, Pengfei" <pengfei.xu@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Stefan Roesch <shr@devkernel.io>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Xin Hao <xhao@linux.alibaba.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2ef7dbb2
    • Huang Ying's avatar
      migrate_pages: move split folios processing out of migrate_pages_batch() · a21d2133
      Huang Ying authored
      To simplify the code logic and reduce the line number.
      
      Link: https://lkml.kernel.org/r/20230303030155.160983-3-ying.huang@intel.com
      
      
      Fixes: 5dfab109 ("migrate_pages: batch _unmap and _move")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reviewed-by: default avatarBaolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Xu, Pengfei" <pengfei.xu@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Stefan Roesch <shr@devkernel.io>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Xin Hao <xhao@linux.alibaba.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      a21d2133
    • Huang Ying's avatar
      migrate_pages: fix deadlock in batched migration · fb3592c4
      Huang Ying authored
      Patch series "migrate_pages: fix deadlock in batched synchronous
      migration", v2.
      
      Two deadlock bugs were reported for the migrate_pages() batching series. 
      Thanks Hugh and Pengfei.  Analysis shows that if we have locked some other
      folios except the one we are migrating, it's not safe in general to wait
      synchronously, for example, to wait the writeback to complete or wait to
      lock the buffer head.
      
      So 1/3 fixes the deadlock in a simple way, where the batching support for
      the synchronous migration is disabled.  The change is straightforward and
      easy to be understood.  While 3/3 re-introduce the batching for
      synchronous migration via trying to migrate asynchronously in batch
      optimistically, then fall back to migrate synchronously one by one for
      fail-to-migrate folios.  Test shows that this can restore the TLB flushing
      batching performance for synchronous migration effectively.
      
      
      This patch (of 3):
      
      Two deadlock bugs were reported for the migrate_pages() batching series. 
      Thanks Hugh and Pengfei!  For example, in the following deadlock trace
      snippet,
      
       INFO: task kworker/u4:0:9 blocked for more than 147 seconds.
             Not tainted 6.2.0-rc4-kvm+ #1314
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       task:kworker/u4:0    state:D stack:0     pid:9     ppid:2      flags:0x00004000
       Workqueue: loop4 loop_rootcg_workfn
       Call Trace:
        <TASK>
        __schedule+0x43b/0xd00
        schedule+0x6a/0xf0
        io_schedule+0x4a/0x80
        folio_wait_bit_common+0x1b5/0x4e0
        ? __pfx_wake_page_function+0x10/0x10
        __filemap_get_folio+0x73d/0x770
        shmem_get_folio_gfp+0x1fd/0xc80
        shmem_write_begin+0x91/0x220
        generic_perform_write+0x10e/0x2e0
        __generic_file_write_iter+0x17e/0x290
        ? generic_write_checks+0x12b/0x1a0
        generic_file_write_iter+0x97/0x180
        ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
        do_iter_readv_writev+0x13c/0x210
        ? __sanitizer_cov_trace_const_cmp4+0x1a/0x20
        do_iter_write+0xf6/0x330
        vfs_iter_write+0x46/0x70
        loop_process_work+0x723/0xfe0
        loop_rootcg_workfn+0x28/0x40
        process_one_work+0x3cc/0x8d0
        worker_thread+0x66/0x630
        ? __pfx_worker_thread+0x10/0x10
        kthread+0x153/0x190
        ? __pfx_kthread+0x10/0x10
        ret_from_fork+0x29/0x50
        </TASK>
      
       INFO: task repro:1023 blocked for more than 147 seconds.
             Not tainted 6.2.0-rc4-kvm+ #1314
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       task:repro           state:D stack:0     pid:1023  ppid:360    flags:0x00004004
       Call Trace:
        <TASK>
        __schedule+0x43b/0xd00
        schedule+0x6a/0xf0
        io_schedule+0x4a/0x80
        folio_wait_bit_common+0x1b5/0x4e0
        ? compaction_alloc+0x77/0x1150
        ? __pfx_wake_page_function+0x10/0x10
        folio_wait_bit+0x30/0x40
        folio_wait_writeback+0x2e/0x1e0
        migrate_pages_batch+0x555/0x1ac0
        ? __pfx_compaction_alloc+0x10/0x10
        ? __pfx_compaction_free+0x10/0x10
        ? __this_cpu_preempt_check+0x17/0x20
        ? lock_is_held_type+0xe6/0x140
        migrate_pages+0x100e/0x1180
        ? __pfx_compaction_free+0x10/0x10
        ? __pfx_compaction_alloc+0x10/0x10
        compact_zone+0xe10/0x1b50
        ? lock_is_held_type+0xe6/0x140
        ? check_preemption_disabled+0x80/0xf0
        compact_node+0xa3/0x100
        ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30
        ? _find_first_bit+0x7b/0x90
        sysctl_compaction_handler+0x5d/0xb0
        proc_sys_call_handler+0x29d/0x420
        proc_sys_write+0x2b/0x40
        vfs_write+0x3a3/0x780
        ksys_write+0xb7/0x180
        __x64_sys_write+0x26/0x30
        do_syscall_64+0x3b/0x90
        entry_SYSCALL_64_after_hwframe+0x72/0xdc
       RIP: 0033:0x7f3a2471f59d
       RSP: 002b:00007ffe567f7288 EFLAGS: 00000217 ORIG_RAX: 0000000000000001
       RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f3a2471f59d
       RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000005
       RBP: 00007ffe567f72a0 R08: 0000000000000010 R09: 0000000000000010
       R10: 0000000000000010 R11: 0000000000000217 R12: 00000000004012e0
       R13: 00007ffe567f73e0 R14: 0000000000000000 R15: 0000000000000000
        </TASK>
      
      The page migration task has held the lock of the shmem folio A, and is
      waiting the writeback of the folio B of the file system on the loop block
      device to complete.  While the loop worker task which writes back the
      folio B is waiting to lock the shmem folio A, because the folio A backs
      the folio B in the loop device.  Thus deadlock is triggered.
      
      In general, if we have locked some other folios except the one we are
      migrating, it's not safe to wait synchronously, for example, to wait the
      writeback to complete or wait to lock the buffer head.
      
      To fix the deadlock, in this patch, we avoid to batch the page migration
      except for MIGRATE_ASYNC mode.  In MIGRATE_ASYNC mode, synchronous waiting
      is avoided.
      
      The fix can be improved further.  We will do that as soon as possible.
      
      Link: https://lkml.kernel.org/r/20230303030155.160983-1-ying.huang@intel.com
      Link: https://lore.kernel.org/linux-mm/87a6c8c-c5c1-67dc-1e32-eb30831d6e3d@google.com/
      Link: https://lore.kernel.org/linux-mm/874jrg7kke.fsf@yhuang6-desk2.ccr.corp.intel.com/
      Link: https://lore.kernel.org/linux-mm/20230227110614.dngdub2j3exr6dfp@quack3/
      Link: https://lkml.kernel.org/r/20230303030155.160983-2-ying.huang@intel.com
      
      
      Fixes: 5dfab109 ("migrate_pages: batch _unmap and _move")
      Signed-off-by: default avatar"Huang, Ying" <ying.huang@intel.com>
      Reported-by: default avatarHugh Dickins <hughd@google.com>
      Reported-by: default avatar"Xu, Pengfei" <pengfei.xu@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Stefan Roesch <shr@devkernel.io>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Xin Hao <xhao@linux.alibaba.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      fb3592c4
    • Alexandre Ghiti's avatar
      .mailmap: add Alexandre Ghiti personal email address · 89a00450
      Alexandre Ghiti authored
      I'm no longer employed by Canonical which results in email bouncing so add
      an entry to my personal email address.
      
      Link: https://lkml.kernel.org/r/20230301090132.280475-1-alexghiti@rivosinc.com
      
      
      Signed-off-by: default avatarAlexandre Ghiti <alex@ghiti.fr>
      Reported-by: default avatarConor Dooley <conor.dooley@microchip.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      89a00450
    • Konrad Dybcio's avatar
      mailmap: correct Dikshita Agarwal's Qualcomm email address · 071ca76d
      Konrad Dybcio authored
      I recently sent a patch to map Dikshita's old CAF address to his current
      one @ Qualcomm.  It turned out however, that he has two of them, with the
      @quicinc.com one meant for upstream contributions.  Fix it.
      
      Link: https://lkml.kernel.org/r/20230301110012.1290379-1-konrad.dybcio@linaro.org
      
      
      Signed-off-by: default avatarKonrad Dybcio <konrad.dybcio@linaro.org>
      Cc: Dikshita Agarwal <quic_dikshita@quicinc.com>
      Cc: Andy Gross <agross@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Bjorn Andersson <andersson@kernel.org>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Marijn Suijten <marijn.suijten@somainline.org>
      Cc: Qais Yousef <qyousef@layalina.io>
      Cc: Vasily Averin <vasily.averin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      071ca76d
    • Jarkko Sakkinen's avatar
      mailmap: updates for Jarkko Sakkinen · af665b40
      Jarkko Sakkinen authored
      Update to my current employer:
      
      https://research.tuni.fi/nisec/
      
      Link: https://lkml.kernel.org/r/20230301235443.6663-1-jarkko@kernel.org
      
      
      Signed-off-by: default avatarJarkko Sakkinen <jarkko@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
      Cc: Ben Widawsky <bwidawsk@kernel.org>
      Cc: Bjorn Andersson <andersson@kernel.org>
      Cc: Colin Ian King <colin.i.king@gmail.com>
      Cc: Kirill Tkhai <tkhai@ya.ru>
      Cc: Qais Yousef <qyousef@layalina.io>
      Cc: Vasily Averin <vasily.averin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      af665b40
    • David Hildenbrand's avatar
      mm/userfaultfd: propagate uffd-wp bit when PTE-mapping the huge zeropage · 42b2af2c
      David Hildenbrand authored
      Currently, we'd lose the userfaultfd-wp marker when PTE-mapping a huge
      zeropage, resulting in the next write faults in the PMD range not
      triggering uffd-wp events.
      
      Various actions (partial MADV_DONTNEED, partial mremap, partial munmap,
      partial mprotect) could trigger this.  However, most importantly,
      un-protecting a single sub-page from the userfaultfd-wp handler when
      processing a uffd-wp event will PTE-map the shared huge zeropage and lose
      the uffd-wp bit for the remainder of the PMD.
      
      Let's properly propagate the uffd-wp bit to the PMDs.
      
       #define _GNU_SOURCE
       #include <stdio.h>
       #include <stdlib.h>
       #include <stdint.h>
       #include <stdbool.h>
       #include <inttypes.h>
       #include <fcntl.h>
       #include <unistd.h>
       #include <errno.h>
       #include <poll.h>
       #include <pthread.h>
       #include <sys/mman.h>
       #include <sys/syscall.h>
       #include <sys/ioctl.h>
       #include <linux/userfaultfd.h>
      
       static size_t pagesize;
       static int uffd;
       static volatile bool uffd_triggered;
      
       #define barrier() __asm__ __volatile__("": : :"memory")
      
       static void uffd_wp_range(char *start, size_t size, bool wp)
       {
       	struct uffdio_writeprotect uffd_writeprotect;
      
       	uffd_writeprotect.range.start = (unsigned long) start;
       	uffd_writeprotect.range.len = size;
       	if (wp) {
       		uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
       	} else {
       		uffd_writeprotect.mode = 0;
       	}
       	if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
       		fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
       		exit(1);
       	}
       }
      
       static void *uffd_thread_fn(void *arg)
       {
       	static struct uffd_msg msg;
       	ssize_t nread;
      
       	while (1) {
       		struct pollfd pollfd;
       		int nready;
      
       		pollfd.fd = uffd;
       		pollfd.events = POLLIN;
       		nready = poll(&pollfd, 1, -1);
       		if (nready == -1) {
       			fprintf(stderr, "poll() failed: %d\n", errno);
       			exit(1);
       		}
      
       		nread = read(uffd, &msg, sizeof(msg));
       		if (nread <= 0)
       			continue;
      
       		if (msg.event != UFFD_EVENT_PAGEFAULT ||
       		    !(msg.arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP)) {
       			printf("FAIL: wrong uffd-wp event fired\n");
       			exit(1);
       		}
      
       		/* un-protect the single page. */
       		uffd_triggered = true;
       		uffd_wp_range((char *)(uintptr_t)msg.arg.pagefault.address,
       			      pagesize, false);
       	}
       	return arg;
       }
      
       static int setup_uffd(char *map, size_t size)
       {
       	struct uffdio_api uffdio_api;
       	struct uffdio_register uffdio_register;
       	pthread_t thread;
      
       	uffd = syscall(__NR_userfaultfd,
       		       O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
       	if (uffd < 0) {
       		fprintf(stderr, "syscall() failed: %d\n", errno);
       		return -errno;
       	}
      
       	uffdio_api.api = UFFD_API;
       	uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
       	if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
       		fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
       		return -errno;
       	}
      
       	if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
       		fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n");
       		return -ENOSYS;
       	}
      
       	uffdio_register.range.start = (unsigned long) map;
       	uffdio_register.range.len = size;
       	uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
       	if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
       		fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
       		return -errno;
       	}
      
       	pthread_create(&thread, NULL, uffd_thread_fn, NULL);
      
       	return 0;
       }
      
       int main(void)
       {
       	const size_t size = 4 * 1024 * 1024ull;
       	char *map, *cur;
      
       	pagesize = getpagesize();
      
       	map = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
       	if (map == MAP_FAILED) {
       		fprintf(stderr, "mmap() failed\n");
       		return -errno;
       	}
      
       	if (madvise(map, size, MADV_HUGEPAGE)) {
       		fprintf(stderr, "MADV_HUGEPAGE failed\n");
       		return -errno;
       	}
      
       	if (setup_uffd(map, size))
       		return 1;
      
       	/* Read the whole range, populating zeropages. */
       	madvise(map, size, MADV_POPULATE_READ);
      
       	/* Write-protect the whole range. */
       	uffd_wp_range(map, size, true);
      
       	/* Make sure uffd-wp triggers on each page. */
       	for (cur = map; cur < map + size; cur += pagesize) {
       		uffd_triggered = false;
      
       		barrier();
       		/* Trigger a write fault. */
       		*cur = 1;
       		barrier();
      
       		if (!uffd_triggered) {
       			printf("FAIL: uffd-wp did not trigger\n");
       			return 1;
       		}
       	}
      
       	printf("PASS: uffd-wp triggered\n");
       	return 0;
       }
      
      Link: https://lkml.kernel.org/r/20230302175423.589164-1-david@redhat.com
      
      
      Fixes: e06f1e1d ("userfaultfd: wp: enabled write protection in userfaultfd API")
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      42b2af2c
    • James Houghton's avatar
      mm: teach mincore_hugetlb about pte markers · 63cf5842
      James Houghton authored
      By checking huge_pte_none(), we incorrectly classify PTE markers as
      "present".  Instead, check huge_pte_none_mostly(), classifying PTE markers
      the same as if the PTE were completely blank.
      
      PTE markers, unlike other kinds of swap entries, don't reference any
      physical page and don't indicate that a physical page was mapped
      previously.  As such, treat them as non-present for the sake of mincore().
      
      Link: https://lkml.kernel.org/r/20230302222404.175303-1-jthoughton@google.com
      
      
      Fixes: 5c041f5d ("mm: teach core mm about pte markers")
      Signed-off-by: default avatarJames Houghton <jthoughton@google.com>
      Acked-by: default avatarPeter Xu <peterx@redhat.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: James Houghton <jthoughton@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      63cf5842
  3. Mar 05, 2023
    • Linus Torvalds's avatar
      Linux 6.3-rc1 · fe15c26e
      Linus Torvalds authored
      v6.3-rc1
      fe15c26e
    • Linus Torvalds's avatar
      cpumask: re-introduce constant-sized cpumask optimizations · 596ff4a0
      Linus Torvalds authored
      
      Commit aa47a7c2 ("lib/cpumask: deprecate nr_cpumask_bits") resulted
      in the cpumask operations potentially becoming hugely less efficient,
      because suddenly the cpumask was always considered to be variable-sized.
      
      The optimization was then later added back in a limited form by commit
      6f9c07be ("lib/cpumask: add FORCE_NR_CPUS config option"), but that
      FORCE_NR_CPUS option is not useful in a generic kernel and more of a
      special case for embedded situations with fixed hardware.
      
      Instead, just re-introduce the optimization, with some changes.
      
      Instead of depending on CPUMASK_OFFSTACK being false, and then always
      using the full constant cpumask width, this introduces three different
      cpumask "sizes":
      
       - the exact size (nr_cpumask_bits) remains identical to nr_cpu_ids.
      
         This is used for situations where we should use the exact size.
      
       - the "small" size (small_cpumask_bits) is the NR_CPUS constant if it
         fits in a single word and the bitmap operations thus end up able
         to trigger the "small_const_nbits()" optimizations.
      
         This is used for the operations that have optimized single-word
         cases that get inlined, notably the bit find and scanning functions.
      
       - the "large" size (large_cpumask_bits) is the NR_CPUS constant if it
         is an sufficiently small constant that makes simple "copy" and
         "clear" operations more efficient.
      
         This is arbitrarily set at four words or less.
      
      As a an example of this situation, without this fixed size optimization,
      cpumask_clear() will generate code like
      
              movl    nr_cpu_ids(%rip), %edx
              addq    $63, %rdx
              shrq    $3, %rdx
              andl    $-8, %edx
              callq   memset@PLT
      
      on x86-64, because it would calculate the "exact" number of longwords
      that need to be cleared.
      
      In contrast, with this patch, using a MAX_CPU of 64 (which is quite a
      reasonable value to use), the above becomes a single
      
      	movq $0,cpumask
      
      instruction instead, because instead of caring to figure out exactly how
      many CPU's the system has, it just knows that the cpumask will be a
      single word and can just clear it all.
      
      Note that this does end up tightening the rules a bit from the original
      version in another way: operations that set bits in the cpumask are now
      limited to the actual nr_cpu_ids limit, whereas we used to do the
      nr_cpumask_bits thing almost everywhere in the cpumask code.
      
      But if you just clear bits, or scan for bits, we can use the simpler
      compile-time constants.
      
      In the process, remove 'cpumask_complement()' and 'for_each_cpu_not()'
      which were not useful, and which fundamentally have to be limited to
      'nr_cpu_ids'.  Better remove them now than have somebody introduce use
      of them later.
      
      Of course, on x86-64 with MAXSMP there is no sane small compile-time
      constant for the cpumask sizes, and we end up using the actual CPU bits,
      and will generate the above kind of horrors regardless.  Please don't
      use MAXSMP unless you really expect to have machines with thousands of
      cores.
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      596ff4a0
    • Linus Torvalds's avatar
      Merge tag 'v6.3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 · f915322f
      Linus Torvalds authored
      Pull crypto fix from Herbert Xu:
       "Fix a regression in the caam driver"
      
      * tag 'v6.3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
        crypto: caam - Fix edesc/iv ordering mixup
      f915322f
    • Linus Torvalds's avatar
      Merge tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f9ec7d8
      Linus Torvalds authored
      Pull x86 updates from Thomas Gleixner:
       "A small set of updates for x86:
      
         - Return -EIO instead of success when the certificate buffer for SEV
           guests is not large enough
      
         - Allow STIPB to be enabled with legacy IBSR. Legacy IBRS is cleared
           on return to userspace for performance reasons, but the leaves user
           space vulnerable to cross-thread attacks which STIBP prevents.
           Update the documentation accordingly"
      
      * tag 'x86-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        virt/sev-guest: Return -EIO if certificate buffer is not large enough
        Documentation/hw-vuln: Document the interaction between IBRS and STIBP
        x86/speculation: Allow enabling STIBP with legacy IBRS
      7f9ec7d8
    • Linus Torvalds's avatar
      Merge tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 4e9c542c
      Linus Torvalds authored
      Pull irq updates from Thomas Gleixner:
       "A set of updates for the interrupt susbsystem:
      
         - Prevent possible NULL pointer derefences in
           irq_data_get_affinity_mask() and irq_domain_create_hierarchy()
      
         - Take the per device MSI lock before invoking code which relies on
           it being hold
      
         - Make sure that MSI descriptors are unreferenced before freeing
           them. This was overlooked when the platform MSI code was converted
           to use core infrastructure and results in a fals positive warning
      
         - Remove dead code in the MSI subsystem
      
         - Clarify the documentation for pci_msix_free_irq()
      
         - More kobj_type constification"
      
      * tag 'irq-urgent-2023-03-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        genirq/msi, platform-msi: Ensure that MSI descriptors are unreferenced
        genirq/msi: Drop dead domain name assignment
        irqdomain: Add missing NULL pointer check in irq_domain_create_hierarchy()
        genirq/irqdesc: Make kobj_type structures constant
        PCI/MSI: Clarify usage of pci_msix_free_irq()
        genirq/msi: Take the per-device MSI lock before validating the control structure
        genirq/ipi: Fix NULL pointer deref in irq_data_get_affinity_mask()
      4e9c542c
    • Linus Torvalds's avatar
      Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1a90673e
      Linus Torvalds authored
      Pull vfs update from Al Viro:
       "Adding Christian Brauner as VFS co-maintainer"
      
      * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        Adding VFS co-maintainer
      1a90673e
    • Linus Torvalds's avatar
      Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 1a8d05a7
      Linus Torvalds authored
      Pull VM_FAULT_RETRY fixes from Al Viro:
       "Some of the page fault handlers do not deal with the following case
        correctly:
      
         - handle_mm_fault() has returned VM_FAULT_RETRY
      
         - there is a pending fatal signal
      
         - fault had happened in kernel mode
      
        Correct action in such case is not "return unconditionally" - fatal
        signals are handled only upon return to userland and something like
        copy_to_user() would end up retrying the faulting instruction and
        triggering the same fault again and again.
      
        What we need to do in such case is to make the caller to treat that as
        failed uaccess attempt - handle exception if there is an exception
        handler for faulting instruction or oops if there isn't one.
      
        Over the years some architectures had been fixed and now are handling
        that case properly; some still do not. This series should fix the
        remaining ones.
      
        Status:
      
         - m68k, riscv, hexagon, parisc: tested/acked by maintainers.
      
         - alpha, sparc32, sparc64: tested locally - bug has been reproduced
           on the unpatched kernel and verified to be fixed by this series.
      
         - ia64, microblaze, nios2, openrisc: build, but otherwise completely
           untested"
      
      * tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        openrisc: fix livelock in uaccess
        nios2: fix livelock in uaccess
        microblaze: fix livelock in uaccess
        ia64: fix livelock in uaccess
        sparc: fix livelock in uaccess
        alpha: fix livelock in uaccess
        parisc: fix livelock in uaccess
        hexagon: fix livelock in uaccess
        riscv: fix livelock in uaccess
        m68k: fix livelock in uaccess
      1a8d05a7
    • Masahiro Yamada's avatar
      Remove Intel compiler support · 95207db8
      Masahiro Yamada authored
      include/linux/compiler-intel.h had no update in the past 3 years.
      
      We often forget about the third C compiler to build the kernel.
      
      For example, commit a0a12c3e ("asm goto: eradicate CC_HAS_ASM_GOTO")
      only mentioned GCC and Clang.
      
      init/Kconfig defines CC_IS_GCC and CC_IS_CLANG but not CC_IS_ICC,
      and nobody has reported any issue.
      
      I guess the Intel Compiler support is broken, and nobody is caring
      about it.
      
      Harald Arnesen pointed out ICC (classic Intel C/C++ compiler) is
      deprecated:
      
          $ icc -v
          icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
          deprecated and will be removed from product release in the second half
          of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended
          compiler moving forward. Please transition to use this compiler. Use
          '-diag-disable=10441' to disable this message.
          icc version 2021.7.0 (gcc version 12.1.0 compatibility)
      
      Arnd Bergmann provided a link to the article, "Intel C/C++ compilers
      complete adoption of LLVM".
      
      lib/zstd/common/compiler.h and lib/zstd/compress/zstd_fast.c were kept
      untouched for better sync with https://github.com/facebook/zstd
      
      Link: https://www.intel.com/content/www/us/en/developer/articles/technical/adoption-of-llvm-complete-icx.html
      
      
      Signed-off-by: default avatarMasahiro Yamada <masahiroy@kernel.org>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarNick Desaulniers <ndesaulniers@google.com>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Reviewed-by: default avatarMiguel Ojeda <ojeda@kernel.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      95207db8
    • Al Viro's avatar
      Adding VFS co-maintainer · 3304f18b
      Al Viro authored
      
      Acked-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      3304f18b
  4. Mar 04, 2023
    • Linus Torvalds's avatar
      Merge tag 'i2c-for-6.3-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · b01fe98d
      Linus Torvalds authored
      Pull more i2c updates from Wolfram Sang:
       "Some improvements/fixes for the newly added GXP driver and a Kconfig
        dependency fix"
      
      * tag 'i2c-for-6.3-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
        i2c: gxp: fix an error code in probe
        i2c: gxp: return proper error on address NACK
        i2c: gxp: remove "empty" switch statement
        i2c: Disable I2C_APPLE when I2C_PASEMI is a builtin
      b01fe98d
    • Linus Torvalds's avatar
      mm: avoid gcc complaint about pointer casting · e77d587a
      Linus Torvalds authored
      
      The migration code ends up temporarily stashing information of the wrong
      type in unused fields of the newly allocated destination folio.  That
      all works fine, but gcc does complain about the pointer type mis-use:
      
          mm/migrate.c: In function ‘__migrate_folio_extract’:
          mm/migrate.c:1050:20: note: randstruct: casting between randomized structure pointer types (ssa): ‘struct anon_vma’ and ‘struct address_space’
      
           1050 |         *anon_vmap = (void *)dst->mapping;
                |         ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~
      
      and gcc is actually right to complain since it really doesn't understand
      that this is a very temporary special case where this is ok.
      
      This could be fixed in different ways by just obfuscating the assignment
      sufficiently that gcc doesn't see what is going on, but the truly
      "proper C" way to do this is by explicitly using a union.
      
      Using unions for type conversions like this is normally hugely ugly and
      syntactically nasty, but this really is one of the few cases where we
      want to make it clear that we're not doing type conversion, we're really
      re-using the value bit-for-bit just using another type.
      
      IOW, this should not become a common pattern, but in this one case using
      that odd union is probably the best way to document to the compiler what
      is conceptually going on here.
      
      [ Side note: there are valid cases where we convert pointers to other
        pointer types, notably the whole "folio vs page" situation, where the
        types actually have fundamental commonalities.
      
        The fact that the gcc note is limited to just randomized structures
        means that we don't see equivalent warnings for those cases, but it
        migth also mean that we miss other cases where we do play these kinds
        of dodgy games, and this kind of explicit conversion might be a good
        idea. ]
      
      I verified that at least for an allmodconfig build on x86-64, this
      generates the exact same code, apart from line numbers and assembler
      comment changes.
      
      Fixes: 64c8902e ("migrate_pages: split unmap_and_move() to _unmap() and _move()")
      Cc: Huang, Ying <ying.huang@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e77d587a
Loading