Skip to content
Snippets Groups Projects
Commit 9a00a192 authored by Waiman Long's avatar Waiman Long
Browse files

cgroup: Use separate src/dst nodes when preloading css_sets for migration

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2077665



commit 07fd5b6c
Author: Tejun Heo <tj@kernel.org>
Date:   Mon, 13 Jun 2022 12:19:50 -1000

    cgroup: Use separate src/dst nodes when preloading css_sets for migration

    Each cset (css_set) is pinned by its tasks. When we're moving tasks around
    across csets for a migration, we need to hold the source and destination
    csets to ensure that they don't go away while we're moving tasks about. This
    is done by linking cset->mg_preload_node on either the
    mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the
    same cset->mg_preload_node for both the src and dst lists was deemed okay as
    a cset can't be both the source and destination at the same time.

    Unfortunately, this overloading becomes problematic when multiple tasks are
    involved in a migration and some of them are identity noop migrations while
    others are actually moving across cgroups. For example, this can happen with
    the following sequence on cgroup1:

     #1> mkdir -p /sys/fs/cgroup/misc/a/b
     #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs
     #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS &
     #4> PID=$!
     #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks
     #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs

    the process including the group leader back into a. In this final migration,
    non-leader threads would be doing identity migration while the group leader
    is doing an actual one.

    After #3, let's say the whole process was in cset A, and that after #4, the
    leader moves to cset B. Then, during #6, the following happens:

     1. cgroup_migrate_add_src() is called on B for the leader.

     2. cgroup_migrate_add_src() is called on A for the other threads.

     3. cgroup_migrate_prepare_dst() is called. It scans the src list.

     4. It notices that B wants to migrate to A, so it tries to A to the dst
        list but realizes that its ->mg_preload_node is already busy.

     5. and then it notices A wants to migrate to A as it's an identity
        migration, it culls it by list_del_init()'ing its ->mg_preload_node and
        putting references accordingly.

     6. The rest of migration takes place with B on the src list but nothing on
        the dst list.

    This means that A isn't held while migration is in progress. If all tasks
    leave A before the migration finishes and the incoming task pins it, the
    cset will be destroyed leading to use-after-free.

    This is caused by overloading cset->mg_preload_node for both src and dst
    preload lists. We wanted to exclude the cset from the src list but ended up
    inadvertently excluding it from the dst list too.

    This patch fixes the issue by separating out cset->mg_preload_node into
    ->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst
    preloadings don't interfere with each other.

Signed-off-by: default avatarTejun Heo <tj@kernel.org>
Reported-by: default avatarMukesh Ojha <quic_mojha@quicinc.com>
Reported-by: default avatarshisiyuan <shisiyuan19870131@gmail.com>
    Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com
    Link: https://www.spinics.net/lists/cgroups/msg33313.html


    Fixes: f817de98 ("cgroup: prepare migration path for unified hierarchy")
    Cc: stable@vger.kernel.org # v3.16+

Signed-off-by: default avatarWaiman Long <longman@redhat.com>
parent 478a7136
No related branches found
No related tags found
Loading
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment