Skip to content
Snippets Groups Projects
  1. Oct 16, 2020
  2. Aug 12, 2020
  3. Jun 09, 2020
  4. Mar 22, 2020
    • Qian Cai's avatar
      mm/mmu_notifier: silence PROVE_RCU_LIST warnings · 63886bad
      Qian Cai authored
      
      It is safe to traverse mm->notifier_subscriptions->list either under
      SRCU read lock or mm->notifier_subscriptions->lock using
      hlist_for_each_entry_rcu().  Silence the PROVE_RCU_LIST false positives,
      for example,
      
        WARNING: suspicious RCU usage
        -----------------------------
        mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!!
      
        other info that might help us debug this:
      
        rcu_scheduler_active = 2, debug_locks = 1
        3 locks held by libvirtd/802:
         #0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0
         #1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800
         #2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460
      
        stack backtrace:
        CPU: 7 PID: 802 Comm: libvirtd Tainted: G          I       5.6.0-rc6-next-20200317+ #2
        Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
        Call Trace:
          dump_stack+0xa4/0xfe
          lockdep_rcu_suspicious+0xeb/0xf5
          __mmu_notifier_invalidate_range_start+0x3ff/0x460
          change_p4d_range+0x746/0x800
          change_protection+0x1df/0x300
          mprotect_fixup+0x245/0x3e0
          do_mprotect_pkey+0x23b/0x3e0
          __x64_sys_mprotect+0x51/0x70
          do_syscall_64+0x91/0xae8
          entry_SYSCALL_64_after_hwframe+0x49/0xb3
      
      Signed-off-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPaul E. McKenney <paulmck@kernel.org>
      Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pw
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      63886bad
  5. Jan 14, 2020
  6. Nov 23, 2019
    • Jason Gunthorpe's avatar
      mm/mmu_notifier: add an interval tree notifier · 99cb252f
      Jason Gunthorpe authored
      Of the 13 users of mmu_notifiers, 8 of them use only
      invalidate_range_start/end() and immediately intersect the
      mmu_notifier_range with some kind of internal list of VAs.  4 use an
      interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list
      of some kind (scif_dma, vhost, gntdev, hmm)
      
      And the remaining 5 either don't use invalidate_range_start() or do some
      special thing with it.
      
      It turns out that building a correct scheme with an interval tree is
      pretty complicated, particularly if the use case is synchronizing against
      another thread doing get_user_pages().  Many of these implementations have
      various subtle and difficult to fix races.
      
      This approach puts the interval tree as common code at the top of the mmu
      notifier call tree and implements a shareable locking scheme.
      
      It includes:
       - An interval tree tracking VA ranges, with per-range callbacks
       - A read/write locking scheme for the interval tree that avoids
         sleeping in the notifier path (for OOM killer)
       - A sequence counter based collision-retry locking scheme to tell
         device page fault that a VA range is being concurrently invalidated.
      
      This is based on various ideas:
      - hmm accumulates invalidated VA ranges and releases them when all
        invalidates are done, via active_invalidate_ranges count.
        This approach avoids having to intersect the interval tree twice (as
        umem_odp does) at the potential cost of a longer device page fault.
      
      - kvm/umem_odp use a sequence counter to drive the collision retry,
        via invalidate_seq
      
      - a deferred work todo list on unlock scheme like RTNL, via deferred_list.
        This makes adding/removing interval tree members more deterministic
      
      - seqlock, except this version makes the seqlock idea multi-holder on the
        write side by protecting it with active_invalidate_ranges and a spinlock
      
      To minimize MM overhead when only the interval tree is being used, the
      entire SRCU and hlist overheads are dropped using some simple
      branches. Similarly the interval tree overhead is dropped when in hlist
      mode.
      
      The overhead from the mandatory spinlock is broadly the same as most of
      existing users which already had a lock (or two) of some sort on the
      invalidation path.
      
      Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.ca
      
      
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Tested-by: default avatarPhilip Yang <Philip.Yang@amd.com>
      Tested-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      99cb252f
  7. Nov 13, 2019
  8. Nov 06, 2019
  9. Sep 07, 2019
  10. Aug 28, 2019
  11. Aug 21, 2019
  12. Aug 20, 2019
    • Daniel Vetter's avatar
      mm/mmu_notifiers: check if mmu notifier callbacks are allowed to fail · 8402ce61
      Daniel Vetter authored
      Just a bit of paranoia, since if we start pushing this deep into
      callchains it's hard to spot all places where an mmu notifier
      implementation might fail when it's not allowed to.
      
      Inspired by some confusion we had discussing i915 mmu notifiers and
      whether we could use the newly-introduced return value to handle some
      corner cases. Until we realized that these are only for when a task has
      been killed by the oom reaper.
      
      An alternative approach would be to split the callback into two versions,
      one with the int return value, and the other with void return value like
      in older kernels. But that's a lot more churn for fairly little gain I
      think.
      
      Summary from the m-l discussion on why we want something at warning level:
      This allows automated tooling in CI to catch bugs without humans having to
      look at everything. If we just upgrade the existing pr_info to a pr_warn,
      then we'll have false positives. And as-is, no one will ever spot the
      problem since it's lost in the massive amounts of overall dmesg noise.
      
      Link: https://lore.kernel.org/r/20190814202027.18735-2-daniel.vetter@ffwll.ch
      
      
      Signed-off-by: default avatarDaniel Vetter <daniel.vetter@intel.com>
      Reviewed-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: default avatarJason Gunthorpe <jgg@mellanox.com>
      8402ce61
  13. Aug 16, 2019
  14. Jul 12, 2019
    • Jean-Philippe Brucker's avatar
      mm/mmu_notifier: use hlist_add_head_rcu() · 543bdb2d
      Jean-Philippe Brucker authored
      Make mmu_notifier_register() safer by issuing a memory barrier before
      registering a new notifier.  This fixes a theoretical bug on weakly
      ordered CPUs.  For example, take this simplified use of notifiers by a
      driver:
      
      	my_struct->mn.ops = &my_ops; /* (1) */
      	mmu_notifier_register(&my_struct->mn, mm)
      		...
      		hlist_add_head(&mn->hlist, &mm->mmu_notifiers); /* (2) */
      		...
      
      Once mmu_notifier_register() releases the mm locks, another thread can
      invalidate a range:
      
      	mmu_notifier_invalidate_range()
      		...
      		hlist_for_each_entry_rcu(mn, &mm->mmu_notifiers, hlist) {
      			if (mn->ops->invalidate_range)
      
      The read side relies on the data dependency between mn and ops to ensure
      that the pointer is properly initialized.  But the write side doesn't have
      any dependency between (1) and (2), so they could be reordered and the
      readers could dereference an invalid mn->ops.  mmu_notifier_register()
      does take all the mm locks before adding to the hlist, but those have
      acquire semantics which isn't sufficient.
      
      By calling hlist_add_head_rcu() instead of hlist_add_head() we update the
      hlist using a store-release, ensuring that readers see prior
      initialization of my_struct.  This situation is better illustated by
      litmus test MP+onceassign+derefonce.
      
      Link: http://lkml.kernel.org/r/20190502133532.24981-1-jean-philippe.brucker@arm.com
      
      
      Fixes: cddb8a5c ("mmu-notifiers: core")
      Signed-off-by: default avatarJean-Philippe Brucker <jean-philippe.brucker@arm.com>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      543bdb2d
  15. Jun 19, 2019
  16. May 14, 2019
  17. Dec 28, 2018
    • Jérôme Glisse's avatar
      mm/mmu_notifier: use structure for invalidate_range_start/end calls v2 · ac46d4f3
      Jérôme Glisse authored
      To avoid having to change many call sites everytime we want to add a
      parameter use a structure to group all parameters for the mmu_notifier
      invalidate_range_start/end cakks.  No functional changes with this patch.
      
      [akpm@linux-foundation.org: coding style fixes]
      Link: http://lkml.kernel.org/r/20181205053628.3210-3-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      From: Jérôme Glisse <jglisse@redhat.com>
      Subject: mm/mmu_notifier: use structure for invalidate_range_start/end calls v3
      
      fix build warning in migrate.c when CONFIG_MMU_NOTIFIER=n
      
      Link: http://lkml.kernel.org/r/20181213171330.8489-3-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ac46d4f3
    • Jérôme Glisse's avatar
      mm/mmu_notifier: use structure for invalidate_range_start/end callback · 5d6527a7
      Jérôme Glisse authored
      Patch series "mmu notifier contextual informations", v2.
      
      This patchset adds contextual information, why an invalidation is
      happening, to mmu notifier callback.  This is necessary for user of mmu
      notifier that wish to maintains their own data structure without having to
      add new fields to struct vm_area_struct (vma).
      
      For instance device can have they own page table that mirror the process
      address space.  When a vma is unmap (munmap() syscall) the device driver
      can free the device page table for the range.
      
      Today we do not have any information on why a mmu notifier call back is
      happening and thus device driver have to assume that it is always an
      munmap().  This is inefficient at it means that it needs to re-allocate
      device page table on next page fault and rebuild the whole device driver
      data structure for the range.
      
      Other use case beside munmap() also exist, for instance it is pointless
      for device driver to invalidate the device page table when the
      invalidation is for the soft dirtyness tracking.  Or device driver can
      optimize away mprotect() that change the page table permission access for
      the range.
      
      This patchset enables all this optimizations for device drivers.  I do not
      include any of those in this series but another patchset I am posting will
      leverage this.
      
      The patchset is pretty simple from a code point of view.  The first two
      patches consolidate all mmu notifier arguments into a struct so that it is
      easier to add/change arguments.  The last patch adds the contextual
      information (munmap, protection, soft dirty, clear, ...).
      
      This patch (of 3):
      
      To avoid having to change many callback definition everytime we want to
      add a parameter use a structure to group all parameters for the
      mmu_notifier invalidate_range_start/end callback.  No functional changes
      with this patch.
      
      [akpm@linux-foundation.org: fix drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c kerneldoc]
      Link: http://lkml.kernel.org/r/20181205053628.3210-2-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Acked-by: default avatarJan Kara <jack@suse.cz>
      Acked-by: Jason Gunthorpe <jgg@mellanox.com>	[infiniband]
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Ross Zwisler <zwisler@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Christian Koenig <christian.koenig@amd.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5d6527a7
    • Sean Christopherson's avatar
      mm/mmu_notifier.c: remove mmu_notifier_synchronize() · 6a90a83f
      Sean Christopherson authored
      Contrary to its name, mmu_notifier_synchronize() does not synchronize the
      notifier's SRCU instance, but rather waits for RCU callbacks to finish.
      i.e.  it invokes rcu_barrier().  The RCU documentation is quite clear on
      this matter, explicitly calling out that rcu_barrier() does not imply
      synchronize_rcu().
      
      As there are no callers of mmu_notifier_synchronize() and it's unclear
      whether any user of mmu_notifier_call_srcu() will ever want to barrier on
      their callbacks, simply remove the function.
      
      Link: http://lkml.kernel.org/r/20181106134705.14197-1-sean.j.christopherson@intel.com
      
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Reviewed-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Jérôme Glisse <jglisse@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6a90a83f
  18. Oct 26, 2018
  19. Aug 22, 2018
    • Michal Hocko's avatar
      mm, oom: distinguish blockable mode for mmu notifiers · 93065ac7
      Michal Hocko authored
      There are several blockable mmu notifiers which might sleep in
      mmu_notifier_invalidate_range_start and that is a problem for the
      oom_reaper because it needs to guarantee a forward progress so it cannot
      depend on any sleepable locks.
      
      Currently we simply back off and mark an oom victim with blockable mmu
      notifiers as done after a short sleep.  That can result in selecting a new
      oom victim prematurely because the previous one still hasn't torn its
      memory down yet.
      
      We can do much better though.  Even if mmu notifiers use sleepable locks
      there is no reason to automatically assume those locks are held.  Moreover
      majority of notifiers only care about a portion of the address space and
      there is absolutely zero reason to fail when we are unmapping an unrelated
      range.  Many notifiers do really block and wait for HW which is harder to
      handle and we have to bail out though.
      
      This patch handles the low hanging fruit.
      __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks
      are not allowed to sleep if the flag is set to false.  This is achieved by
      using trylock instead of the sleepable lock for most callbacks and
      continue as long as we do not block down the call chain.
      
      I think we can improve that even further because there is a common pattern
      to do a range lookup first and then do something about that.  The first
      part can be done without a sleeping lock in most cases AFAICS.
      
      The oom_reaper end then simply retries if there is at least one notifier
      which couldn't make any progress in !blockable mode.  A retry loop is
      already implemented to wait for the mmap_sem and this is basically the
      same thing.
      
      The simplest way for driver developers to test this code path is to wrap
      userspace code which uses these notifiers into a memcg and set the hard
      limit to hit the oom.  This can be done e.g.  after the test faults in all
      the mmu notifier managed memory and set the hard limit to something really
      small.  Then we are looking for a proper process tear down.
      
      [akpm@linux-foundation.org: coding style fixes]
      [akpm@linux-foundation.org: minor code simplification]
      Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.org
      
      
      Signed-off-by: default avatarMichal Hocko <mhocko@suse.com>
      Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp
      Reported-by: default avatarDavid Rientjes <rientjes@google.com>
      Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
      Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
      Cc: Sudeep Dutt <sudeep.dutt@intel.com>
      Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
      Cc: Dimitri Sivanich <sivanich@sgi.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Felix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      93065ac7
  20. Feb 01, 2018
  21. Nov 16, 2017
    • Jérôme Glisse's avatar
      mm/mmu_notifier: avoid call to invalidate_range() in range_end() · 4645b9fe
      Jérôme Glisse authored
      This is an optimization patch that only affect mmu_notifier users which
      rely on the invalidate_range() callback.  This patch avoids calling that
      callback twice in a row from inside __mmu_notifier_invalidate_range_end
      
      Existing pattern (before this patch):
          mmu_notifier_invalidate_range_start()
              pte/pmd/pud_clear_flush_notify()
                  mmu_notifier_invalidate_range()
          mmu_notifier_invalidate_range_end()
              mmu_notifier_invalidate_range()
      
      New pattern (after this patch):
          mmu_notifier_invalidate_range_start()
              pte/pmd/pud_clear_flush_notify()
                  mmu_notifier_invalidate_range()
          mmu_notifier_invalidate_range_only_end()
      
      We call the invalidate_range callback after clearing the page table
      under the page table lock and we skip the call to invalidate_range
      inside the __mmu_notifier_invalidate_range_end() function.
      
      Idea from Andrea Arcangeli
      
      Link: http://lkml.kernel.org/r/20171017031003.7481-3-jglisse@redhat.com
      
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Alistair Popple <alistair@popple.id.au>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4645b9fe
  22. Aug 31, 2017
    • Jérôme Glisse's avatar
      mm/mmu_notifier: kill invalidate_page · 5f32b265
      Jérôme Glisse authored
      
      The invalidate_page callback suffered from two pitfalls.  First it used
      to happen after the page table lock was release and thus a new page
      might have setup before the call to invalidate_page() happened.
      
      This is in a weird way fixed by commit c7ab0d2f ("mm: convert
      try_to_unmap_one() to use page_vma_mapped_walk()") that moved the
      callback under the page table lock but this also broke several existing
      users of the mmu_notifier API that assumed they could sleep inside this
      callback.
      
      The second pitfall was invalidate_page() being the only callback not
      taking a range of address in respect to invalidation but was giving an
      address and a page.  Lots of the callback implementers assumed this
      could never be THP and thus failed to invalidate the appropriate range
      for THP.
      
      By killing this callback we unify the mmu_notifier callback API to
      always take a virtual address range as input.
      
      Finally this also simplifies the end user life as there is now two clear
      choices:
        - invalidate_range_start()/end() callback (which allow you to sleep)
        - invalidate_range() where you can not sleep but happen right after
          page table update under page table lock
      
      Signed-off-by: default avatarJérôme Glisse <jglisse@redhat.com>
      Cc: Bernhard Held <berny156@gmx.de>
      Cc: Adam Borowski <kilobyte@angband.pl>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: axie <axie@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      5f32b265
  23. Apr 18, 2017
    • Paul E. McKenney's avatar
      mm: Use static initialization for "srcu" · dde8da6c
      Paul E. McKenney authored
      
      The MM-notifier code currently dynamically initializes the srcu_struct
      named "srcu" at subsys_initcall() time, and includes a BUG_ON() to check
      this initialization in do_mmu_notifier_register().  Unfortunately, there
      is no foolproof way to verify that an srcu_struct has been initialized,
      given the possibility of an srcu_struct being allocated on the stack or
      on the heap.  This means that creating an srcu_struct_is_initialized()
      function is not a reasonable course of action.  Nor is peppering
      do_mmu_notifier_register() with SRCU-specific #ifdefs an attractive
      alternative.
      
      This commit therefore uses DEFINE_STATIC_SRCU() to initialize
      this srcu_struct at compile time, thus eliminating both the
      subsys_initcall()-time initialization and the runtime BUG_ON().
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: <linux-mm@kvack.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      dde8da6c
  24. Mar 02, 2017
    • Ingo Molnar's avatar
      sched/headers: Prepare for new header dependencies before moving code to <linux/sched/mm.h> · 6e84f315
      Ingo Molnar authored
      
      We are going to split <linux/sched/mm.h> out of <linux/sched.h>, which
      will have to be picked up from other headers and a couple of .c files.
      
      Create a trivial placeholder <linux/sched/mm.h> file that just
      maps to <linux/sched.h> to make this patch obviously correct and
      bisectable.
      
      The APIs that are going to be moved first are:
      
         mm_alloc()
         __mmdrop()
         mmdrop()
         mmdrop_async_fn()
         mmdrop_async()
         mmget_not_zero()
         mmput()
         mmput_async()
         get_task_mm()
         mm_access()
         mm_release()
      
      Include the new header in the files that are going to need it.
      
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6e84f315
  25. Feb 28, 2017
  26. Mar 17, 2016
  27. Sep 10, 2015
    • Vladimir Davydov's avatar
      mmu-notifier: add clear_young callback · 1d7715c6
      Vladimir Davydov authored
      
      In the scope of the idle memory tracking feature, which is introduced by
      the following patch, we need to clear the referenced/accessed bit not only
      in primary, but also in secondary ptes.  The latter is required in order
      to estimate wss of KVM VMs.  At the same time we want to avoid flushing
      tlb, because it is quite expensive and it won't really affect the final
      result.
      
      Currently, there is no function for clearing pte young bit that would meet
      our requirements, so this patch introduces one.  To achieve that we have
      to add a new mmu-notifier callback, clear_young, since there is no method
      for testing-and-clearing a secondary pte w/o flushing tlb.  The new method
      is not mandatory and currently only implemented by KVM.
      
      Signed-off-by: default avatarVladimir Davydov <vdavydov@parallels.com>
      Reviewed-by: default avatarAndres Lagar-Cavilla <andreslc@google.com>
      Acked-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1d7715c6
  28. Sep 24, 2014
    • Andres Lagar-Cavilla's avatar
      kvm: Fix page ageing bugs · 57128468
      Andres Lagar-Cavilla authored
      
      1. We were calling clear_flush_young_notify in unmap_one, but we are
      within an mmu notifier invalidate range scope. The spte exists no more
      (due to range_start) and the accessed bit info has already been
      propagated (due to kvm_pfn_set_accessed). Simply call
      clear_flush_young.
      
      2. We clear_flush_young on a primary MMU PMD, but this may be mapped
      as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
      This required expanding the interface of the clear_flush_young mmu
      notifier, so a lot of code has been trivially touched.
      
      3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
      the access bit by blowing the spte. This requires proper synchronizing
      with MMU notifier consumers, like every other removal of spte's does.
      
      Signed-off-by: default avatarAndres Lagar-Cavilla <andreslc@google.com>
      Acked-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      57128468
Loading