Skip to content
Snippets Groups Projects
  1. Apr 07, 2014
  2. Apr 03, 2014
  3. Mar 20, 2014
  4. Mar 03, 2014
  5. Feb 11, 2014
    • Tejun Heo's avatar
      cgroup: convert to kernfs · 2bd59d48
      Tejun Heo authored
      
      cgroup filesystem code was derived from the original sysfs
      implementation which was heavily intertwined with vfs objects and
      locking with the goal of re-using the existing vfs infrastructure.
      That experiment turned out rather disastrous and sysfs switched, a
      long time ago, to distributed filesystem model where a separate
      representation is maintained which is queried by vfs.  Unfortunately,
      cgroup stuck with the failed experiment all these years and
      accumulated even more problems over time.
      
      Locking and object lifetime management being entangled with vfs is
      probably the most egregious.  vfs is never designed to be misused like
      this and cgroup ends up jumping through various convoluted dancing to
      make things work.  Even then, operations across multiple cgroups can't
      be done safely as it'll deadlock with rename locking.
      
      Recently, kernfs is separated out from sysfs so that it can be used by
      users other than sysfs.  This patch converts cgroup to use kernfs,
      which will bring the following benefits.
      
      * Separation from vfs internals.  Locking and object lifetime
        management is contained in cgroup proper making things a lot
        simpler.  This removes significant amount of locking convolutions,
        hairy object lifetime rules and the restriction on multi-cgroup
        operations.
      
      * Can drop a lot of code to implement filesystem interface as most are
        provided by kernfs.
      
      * Proper "severing" semantics, which allows controllers to not worry
        about lingering file accesses after offline.
      
      While the preceding patches did as much as possible to make the
      transition less painful, large part of the conversion has to be one
      discrete step making this patch rather large.  The rest of the commit
      message lists notable changes in different areas.
      
      Overall
      -------
      
      * vfs constructs replaced with kernfs ones.  cgroup->dentry w/ ->kn,
        cgroupfs_root->sb w/ ->kf_root.
      
      * All dentry accessors are removed.  Helpers to map from kernfs
        constructs are added.
      
      * All vfs plumbing around dentry, inode and bdi removed.
      
      * cgroup_mount() now directly looks for matching root and then
        proceeds to create a new one if not found.
      
      Synchronization and object lifetime
      -----------------------------------
      
      * vfs inode locking removed.  Among other things, this removes the
        need for the convolution in cgroup_cfts_commit().  Future patches
        will further simplify it.
      
      * vfs refcnting replaced with cgroup internal ones.  cgroup->refcnt,
        cgroupfs_root->refcnt added.  cgroup_put_root() now directly puts
        root->refcnt and when it reaches zero proceeds to destroy it thus
        merging cgroup_put_root() and the former cgroup_kill_sb().
        Simliarly, cgroup_put() now directly schedules cgroup_free_rcu()
        when refcnt reaches zero.
      
      * Unlike before, kernfs objects don't hold onto cgroup objects.  When
        cgroup destroys a kernfs node, all existing operations are drained
        and the association is broken immediately.  The same for
        cgroupfs_roots and mounts.
      
      * All operations which come through kernfs guarantee that the
        associated cgroup is and stays valid for the duration of operation;
        however, there are two paths which need to find out the associated
        cgroup from dentry without going through kernfs -
        css_tryget_from_dir() and cgroupstats_build().  For these two,
        kernfs_node->priv is RCU managed so that they can dereference it
        under RCU read lock.
      
      File and directory handling
      ---------------------------
      
      * File and directory operations converted to kernfs_ops and
        kernfs_syscall_ops.
      
      * xattrs is implicitly supported by kernfs.  No need to worry about it
        from cgroup.  This means that "xattr" mount option is no longer
        necessary.  A future patch will add a deprecated warning message
        when sane_behavior.
      
      * When cftype->max_write_len > PAGE_SIZE, it's necessary to make a
        private copy of one of the kernfs_ops to set its atomic_write_len.
        cftype->kf_ops is added and cgroup_init/exit_cftypes() are updated
        to handle it.
      
      * cftype->lockdep_key added so that kernfs lockdep annotation can be
        per cftype.
      
      * Inidividual file entries and open states are now managed by kernfs.
        No need to worry about them from cgroup.  cfent, cgroup_open_file
        and their friends are removed.
      
      * kernfs_nodes are created deactivated and kernfs_activate()
        invocations added to places where creation of new nodes are
        committed.
      
      * cgroup_rmdir() uses kernfs_[un]break_active_protection() for
        self-removal.
      
      v2: - Li pointed out in an earlier patch that specifying "name="
            during mount without subsystem specification should succeed if
            there's an existing hierarchy with a matching name although it
            should fail with -EINVAL if a new hierarchy should be created.
            Prior to the conversion, this used by handled by deferring
            failure from NULL return from cgroup_root_from_opts(), which was
            necessary because root was being created before checking for
            existing ones.  Note that cgroup_root_from_opts() returned an
            ERR_PTR() value for error conditions which require immediate
            mount failure.
      
            As we now have separate search and creation steps, deferring
            failure from cgroup_root_from_opts() is no longer necessary.
            cgroup_root_from_opts() is updated to always return ERR_PTR()
            value on failure.
      
          - The logic to match existing roots is updated so that a mount
            attempt with a matching name but different subsys_mask are
            rejected.  This was handled by a separate matching loop under
            the comment "Check for name clashes with existing mounts" but
            got lost during conversion.  Merge the check into the main
            search loop.
      
          - Add __rcu __force casting in RCU_INIT_POINTER() in
            cgroup_destroy_locked() to avoid the sparse address space
            warning reported by kbuild test bot.  Maybe we want an explicit
            interface to use kn->priv as RCU protected pointer?
      
      v3: Make CONFIG_CGROUPS select CONFIG_KERNFS.
      
      v4: Rebased on top of 0ab02ca8 ("cgroup: protect modifications to
          cgroup_idr with cgroup_mutex").
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Cc: kbuild test robot fengguang.wu@intel.com>
      2bd59d48
  6. Jan 31, 2014
  7. Dec 11, 2013
  8. Dec 02, 2013
  9. Nov 27, 2013
  10. Nov 22, 2013
    • Tejun Heo's avatar
      cgroup, memcg: move cgroup_event implementation to memcg · 79bd9814
      Tejun Heo authored
      
      cgroup_event is way over-designed and tries to build a generic
      flexible event mechanism into cgroup - fully customizable event
      specification for each user of the interface.  This is utterly
      unnecessary and overboard especially in the light of the planned
      unified hierarchy as there's gonna be single agent.  Simply generating
      events at fixed points, or if that's too restrictive, configureable
      cadence or single set of configureable points should be enough.
      
      Thankfully, memcg is the only user and gets to keep it.  Replacing it
      with something simpler on sane_behavior is strongly recommended.
      
      This patch moves cgroup_event and "cgroup.event_control"
      implementation to mm/memcontrol.c.  Clearing of events on cgroup
      destruction is moved from cgroup_destroy_locked() to
      mem_cgroup_css_offline(), which shouldn't make any noticeable
      difference.
      
      cgroup_css() and __file_cft() are exported to enable the move;
      however, this will soon be reverted once the event code is updated to
      be memcg specific.
      
      Note that "cgroup.event_control" will now exist only on the hierarchy
      with memcg attached to it.  While this change is visible to userland,
      it is unlikely to be noticeable as the file has never been meaningful
      outside memcg.
      
      Aside from the above change, this is pure code relocation.
      
      v2: Per Li Zefan's comments, init/Kconfig updated accordingly and
          poll.h inclusion moved from cgroup.c to memcontrol.c.
      
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Acked-by: default avatarLi Zefan <lizefan@huawei.com>
      Acked-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <bsingharora@gmail.com>
      79bd9814
  11. Nov 17, 2013
  12. Nov 13, 2013
  13. Nov 07, 2013
  14. Nov 05, 2013
    • Eric Paris's avatar
      audit: remove CONFIG_AUDIT_LOGINUID_IMMUTABLE · 83fa6bbe
      Eric Paris authored
      
      After trying to use this feature in Fedora we found the hard coding
      policy like this into the kernel was a bad idea.  Surprise surprise.
      We ran into these problems because it was impossible to launch a
      container as a logged in user and run a login daemon inside that container.
      This reverts back to the old behavior before this option was added.  The
      option will be re-added in a userspace selectable manor such that
      userspace can choose when it is and when it is not appropriate.
      
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      Signed-off-by: default avatarRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      83fa6bbe
  15. Oct 14, 2013
  16. Sep 30, 2013
    • Kevin Hilman's avatar
      nohz: Drop generic vtime obsolete dependency on CONFIG_64BIT · ff3fb254
      Kevin Hilman authored
      
      The CONFIG_64BIT requirement on vtime can finally be removed
      since we now depend on HAVE_VIRT_CPU_ACCOUNTING_GEN which
      already takes care of the arch ability to handle nsecs based
      cputime_t safely.
      
      Signed-off-by: default avatarKevin Hilman <khilman@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      ff3fb254
    • Kevin Hilman's avatar
      vtime: Add HAVE_VIRT_CPU_ACCOUNTING_GEN Kconfig · 554b0004
      Kevin Hilman authored
      
      With VIRT_CPU_ACCOUNTING_GEN, cputime_t becomes 64-bit. In order
      to use that feature, arch code should be audited to ensure there are no
      races in concurrent read/write of cputime_t. For example,
      reading/writing 64-bit cputime_t on some 32-bit arches may require
      multiple accesses for low and high value parts, so proper locking
      is needed to protect against concurrent accesses.
      
      Therefore, add CONFIG_HAVE_VIRT_CPU_ACCOUNTING_GEN which arches can
      enable after they've been audited for potential races.
      
      This option is automatically enabled on 64-bit platforms.
      
      Feature requested by Frederic Weisbecker.
      
      Signed-off-by: default avatarKevin Hilman <khilman@linaro.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Arm Linux <linux-arm-kernel@lists.infradead.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      554b0004
  17. Sep 25, 2013
  18. Aug 23, 2013
  19. Aug 19, 2013
  20. Aug 15, 2013
  21. Aug 13, 2013
  22. Aug 12, 2013
    • Frederic Weisbecker's avatar
      context_tracking: Remove full dynticks' hacky dependency on wide context tracking · d84d27a4
      Frederic Weisbecker authored
      
      Now that the full dynticks subsystem only enables the context tracking
      on full dynticks CPUs, lets remove the dependency on CONTEXT_TRACKING_FORCE
      
      This dependency was a hack to enable the context tracking widely for the
      full dynticks susbsystem until the latter becomes able to enable it in a
      more CPU-finegrained fashion.
      
      Now CONTEXT_TRACKING_FORCE only stands for testing on archs that
      work on support for the context tracking while full dynticks can't be
      used yet due to unmet dependencies. It simulates a system where all CPUs
      are full dynticks so that RCU user extended quiescent states and dynticks
      cputime accounting can be tested on the given arch.
      
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Kevin Hilman <khilman@linaro.org>
      d84d27a4
  23. Jul 09, 2013
  24. Jul 07, 2013
  25. Jul 03, 2013
  26. Jun 24, 2013
    • Jiri Slaby's avatar
      build some drivers only when compile-testing · 4bb16672
      Jiri Slaby authored
      
      Some drivers can be built on more platforms than they run on. This is
      a burden for users and distributors who package a kernel. They have to
      manually deselect some (for them useless) drivers when updating their
      configs via oldconfig. And yet, sometimes it is even impossible to
      disable the drivers without patching the kernel.
      
      Introduce a new config option COMPILE_TEST and make all those drivers
      to depend on the platform they run on, or on the COMPILE_TEST option.
      Now, when users/distributors choose COMPILE_TEST=n they will not have
      the drivers in their allmodconfig setups, but developers still can
      compile-test them with COMPILE_TEST=y.
      
      Now the drivers where we use this new option:
      * PTP_1588_CLOCK_PCH: The PCH EG20T is only compatible with Intel Atom
        processors so it should depend on x86.
      * FB_GEODE: Geode is 32-bit only so only enable it for X86_32.
      * USB_CHIPIDEA_IMX: The OF_DEVICE dependency will be met on powerpc
        systems -- which do not actually support the hardware via that
        method.
      * INTEL_MID_PTI: It is specific to the Penwell type of Intel Atom
        device.
      
      [v2]
      * remove EXPERT dependency
      
      [gregkh - remove chipidea portion, as it's incorrect, and also doesn't
       apply to my driver-core tree]
      
      Signed-off-by: default avatarJiri Slaby <jslaby@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: linux-usb@vger.kernel.org
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: linux-geode@lists.infradead.org
      Cc: linux-fbdev@vger.kernel.org
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: netdev@vger.kernel.org
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: "Keller, Jacob E" <jacob.e.keller@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4bb16672
  27. Jun 12, 2013
  28. Jun 10, 2013
    • Paul E. McKenney's avatar
      rcu: Remove TINY_PREEMPT_RCU · 127781d1
      Paul E. McKenney authored
      TINY_PREEMPT_RCU adds significant code and complexity, but does not
      offer commensurate benefits.  People currently using TINY_PREEMPT_RCU
      can get much better memory footprint with TINY_RCU, or, if they really
      need preemptible RCU, they can use TREE_PREEMPT_RCU with a relatively
      minor degradation in memory footprint.  Please note that this move
      has been widely publicized on LKML (https://lkml.org/lkml/2012/11/12/545)
      and on LWN (http://lwn.net/Articles/541037/
      
      ).
      
      This commit therefore removes TINY_PREEMPT_RCU.
      
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      [ paulmck: Updated to eliminate #else in rcutiny.h as suggested by Josh ]
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      127781d1
    • Paul E. McKenney's avatar
      rcu: Apply Dave Jones's NOCB Kconfig help feedback · 676c3dc2
      Paul E. McKenney authored
      
      The Kconfig help text for the RCU_NOCB_CPU_NONE, RCU_NOCB_CPU_ZERO,
      and RCU_NOCB_CPU_ALL Kconfig options was unclear, so this commit
      adds a bit more detail.
      
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      676c3dc2
    • Paul E. McKenney's avatar
      rcu: Remove "Experimental" flags · 9a5739d7
      Paul E. McKenney authored
      
      After a release or two, features are no longer experimental.  Therefore,
      this commit removes the "Experimental" tag from them.
      
      Reported-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reviewed-by: default avatarJosh Triplett <josh@joshtriplett.org>
      9a5739d7
    • Steven Rostedt's avatar
      rcu: Don't call wakeup() with rcu_node structure ->lock held · 016a8d5b
      Steven Rostedt authored
      
      This commit fixes a lockdep-detected deadlock by moving a wake_up()
      call out from a rnp->lock critical section.  Please see below for
      the long version of this story.
      
      On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote:
      
      > [12572.705832] ======================================================
      > [12572.750317] [ INFO: possible circular locking dependency detected ]
      > [12572.796978] 3.10.0-rc3+ #39 Not tainted
      > [12572.833381] -------------------------------------------------------
      > [12572.862233] trinity-child17/31341 is trying to acquire lock:
      > [12572.870390]  (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12572.878859]
      > but task is already holding lock:
      > [12572.894894]  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12572.903381]
      > which lock already depends on the new lock.
      >
      > [12572.927541]
      > the existing dependency chain (in reverse order) is:
      > [12572.943736]
      > -> #4 (&ctx->lock){-.-...}:
      > [12572.960032]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12572.968337]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12572.976633]        [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0
      > [12572.984969]        [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0
      > [12572.993326]        [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0
      > [12573.001652]        [<ffffffff816eacfe>] schedule_user+0x2e/0x70
      > [12573.009998]        [<ffffffff816ecd64>] retint_careful+0x12/0x2e
      > [12573.018321]
      > -> #3 (&rq->lock){-.-.-.}:
      > [12573.034628]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.042930]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.051248]        [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260
      > [12573.059579]        [<ffffffff810492f5>] do_fork+0x105/0x470
      > [12573.067880]        [<ffffffff81049686>] kernel_thread+0x26/0x30
      > [12573.076202]        [<ffffffff816cee63>] rest_init+0x23/0x140
      > [12573.084508]        [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe
      > [12573.092852]        [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c
      > [12573.101233]        [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf
      > [12573.109528]
      > -> #2 (&p->pi_lock){-.-.-.}:
      > [12573.125675]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.133829]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.141964]        [<ffffffff8108e881>] try_to_wake_up+0x31/0x320
      > [12573.150065]        [<ffffffff8108ebe2>] default_wake_function+0x12/0x20
      > [12573.158151]        [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40
      > [12573.166195]        [<ffffffff81085398>] __wake_up_common+0x58/0x90
      > [12573.174215]        [<ffffffff81086909>] __wake_up+0x39/0x50
      > [12573.182146]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.190119]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.198023]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.205860]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.213656]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      > [12573.221379]
      > -> #1 (&rsp->gp_wq){..-.-.}:
      > [12573.236329]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.243783]        [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90
      > [12573.251178]        [<ffffffff810868f3>] __wake_up+0x23/0x50
      > [12573.258505]        [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50
      > [12573.265891]        [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0
      > [12573.273248]        [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930
      > [12573.280564]        [<ffffffff8107a91d>] kthread+0xed/0x100
      > [12573.287807]        [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0
      
      Notice the above call chain.
      
      rcu_start_future_gp() is called with the rnp->lock held. Then it calls
      rcu_start_gp_advance, which does a wakeup.
      
      You can't do wakeups while holding the rnp->lock, as that would mean
      that you could not do a rcu_read_unlock() while holding the rq lock, or
      any lock that was taken while holding the rq lock. This is because...
      (See below).
      
      > [12573.295067]
      > -> #0 (rcu_node_0){..-.-.}:
      > [12573.309293]        [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.316568]        [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.323825]        [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.331081]        [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.338377]        [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.345648]        [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.352942]        [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.360211]        [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.367514]        [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.374816]        [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      Notice the above trace.
      
      perf took its own ctx->lock, which can be taken while holding the rq
      lock. While holding this lock, it did a rcu_read_unlock(). The
      perf_lock_task_context() basically looks like:
      
      rcu_read_lock();
      raw_spin_lock(ctx->lock);
      rcu_read_unlock();
      
      Now, what looks to have happened, is that we scheduled after taking that
      first rcu_read_lock() but before taking the spin lock. When we scheduled
      back in and took the ctx->lock, the following rcu_read_unlock()
      triggered the "special" code.
      
      The rcu_read_unlock_special() takes the rnp->lock, which gives us a
      possible deadlock scenario.
      
      	CPU0		CPU1		CPU2
      	----		----		----
      
      				     rcu_nocb_kthread()
          lock(rq->lock);
      		    lock(ctx->lock);
      				     lock(rnp->lock);
      
      				     wake_up();
      
      				     lock(rq->lock);
      
      		    rcu_read_unlock();
      
      		    rcu_read_unlock_special();
      
      		    lock(rnp->lock);
          lock(ctx->lock);
      
      **** DEADLOCK ****
      
      > [12573.382068]
      > other info that might help us debug this:
      >
      > [12573.403229] Chain exists of:
      >   rcu_node_0 --> &rq->lock --> &ctx->lock
      >
      > [12573.424471]  Possible unsafe locking scenario:
      >
      > [12573.438499]        CPU0                    CPU1
      > [12573.445599]        ----                    ----
      > [12573.452691]   lock(&ctx->lock);
      > [12573.459799]                                lock(&rq->lock);
      > [12573.467010]                                lock(&ctx->lock);
      > [12573.474192]   lock(rcu_node_0);
      > [12573.481262]
      >  *** DEADLOCK ***
      >
      > [12573.501931] 1 lock held by trinity-child17/31341:
      > [12573.508990]  #0:  (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0
      > [12573.516475]
      > stack backtrace:
      > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39
      > [12573.545357]  ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00
      > [12573.552868]  ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40
      > [12573.560353]  0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0
      > [12573.567856] Call Trace:
      > [12573.575011]  [<ffffffff816e375b>] dump_stack+0x19/0x1b
      > [12573.582284]  [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f
      > [12573.589637]  [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0
      > [12573.596982]  [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100
      > [12573.604344]  [<ffffffff810b9851>] lock_acquire+0x91/0x1f0
      > [12573.611652]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.619030]  [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80
      > [12573.626331]  [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0
      > [12573.633671]  [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0
      > [12573.640992]  [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0
      > [12573.648330]  [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40
      > [12573.655662]  [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0
      > [12573.662964]  [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0
      > [12573.670276]  [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0
      > [12573.677622]  [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370
      > [12573.684981]  [<ffffffff8113938e>] find_get_context+0x4e/0x1f0
      > [12573.692358]  [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0
      > [12573.699753]  [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50
      > [12573.707135]  [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0
      > [12573.714599]  [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10
      > [12573.721996]  [<ffffffff816f4dd4>] tracesys+0xdd/0xe2
      
      This commit delays the wakeup via irq_work(), which is what
      perf and ftrace use to perform wakeups in critical sections.
      
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      016a8d5b
  29. Jun 03, 2013
  30. May 04, 2013
    • Frederic Weisbecker's avatar
      rcu: Fix full dynticks' dependency on wide RCU nocb mode · 73c30828
      Frederic Weisbecker authored
      
      Commit 0637e029
      ("nohz: Select wide RCU nocb for full dynticks") intended
      to force CONFIG_RCU_NOCB_CPU_ALL=y when full dynticks is
      enabled.
      
      However this option is part of a choice menu and Kconfig's
      "select" instruction has no effect on such targets.
      
      Fix this by using reverse dependencies on the targets we
      don't want instead.
      
      Reviewed-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      73c30828
  31. May 01, 2013
    • Mike Frysinger's avatar
      init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display · 657a5209
      Mike Frysinger authored
      
      The kconfig language requires that dependent options all follow the
      menuconfig symbol in order to be collapsed below it.  Recently some hidden
      options were added below the EXPERT menuconfig, but did not depend on
      EXPERT (because hidden options can't).  This broke the display.  So
      re-order all these options, and while we're here stick the PCI quirks
      under the EXPERT menu (since it isn't sitting with any related options).
      
      Before this commit, we get:
      	[*] Configure standard kernel features (expert users)  --->
      	[ ] Sysctl syscall support
      	[*] Load all symbols for debugging/ksymoops
      	...
      	[ ] Embedded system
      
      Now we get the older (and correct) behavior:
      	[*] Configure standard kernel features (expert users)  --->
      	[ ] Embedded system
      And if you go into the expert menu you get the expert options:
      	[ ] Sysctl syscall support
      	[*] Load all symbols for debugging/ksymoops
      	...
      
      Signed-off-by: default avatarMike Frysinger <vapier@gentoo.org>
      Acked-by: default avatarRandy Dunlap <rdunlap@infradead.org>
      Cc: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
      Cc: Michal Marek <mmarek@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      657a5209
  32. Apr 26, 2013
    • Frederic Weisbecker's avatar
      nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config · c58b0df1
      Frederic Weisbecker authored
      
      Turn the full dynticks passive dependency on VIRT_CPU_ACCOUNTING_GEN
      to an active one.
      
      The full dynticks Kconfig is currently hidden behind the full dynticks
      cputime accounting, which is an awkward and counter-intuitive layout:
      the user first has to select the dynticks cputime accounting in order
      to make the full dynticks feature to be visible.
      
      We definetly want it the other way around. The usual way to perform
      this kind of active dependency is use "select" on the depended target.
      Now we can't use the Kconfig "select" instruction when the target is
      a "choice".
      
      So this patch inspires on how the RCU subsystem Kconfig interact
      with its dependencies on SMP and PREEMPT: we make sure that cputime
      accounting can't propose another option than VIRT_CPU_ACCOUNTING_GEN
      when NO_HZ_FULL is selected by using the right "depends on" instruction
      for each cputime accounting choices.
      
      v2: Keep full dynticks cputime accounting available even without
      full dynticks, as per Paul McKenney's suggestion.
      
      Reported-by: default avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c58b0df1
Loading