Skip to content
Snippets Groups Projects
  1. Feb 01, 2025
  2. Jan 23, 2025
    • Greg Kroah-Hartman's avatar
    • Wang Liang's avatar
      net: fix data-races around sk->sk_forward_alloc · be7c61ea
      Wang Liang authored
      
      commit 073d8980 upstream.
      
      Syzkaller reported this warning:
       ------------[ cut here ]------------
       WARNING: CPU: 0 PID: 16 at net/ipv4/af_inet.c:156 inet_sock_destruct+0x1c5/0x1e0
       Modules linked in:
       CPU: 0 UID: 0 PID: 16 Comm: ksoftirqd/0 Not tainted 6.12.0-rc5 #26
       Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
       RIP: 0010:inet_sock_destruct+0x1c5/0x1e0
       Code: 24 12 4c 89 e2 5b 48 c7 c7 98 ec bb 82 41 5c e9 d1 18 17 ff 4c 89 e6 5b 48 c7 c7 d0 ec bb 82 41 5c e9 bf 18 17 ff 0f 0b eb 83 <0f> 0b eb 97 0f 0b eb 87 0f 0b e9 68 ff ff ff 66 66 2e 0f 1f 84 00
       RSP: 0018:ffffc9000008bd90 EFLAGS: 00010206
       RAX: 0000000000000300 RBX: ffff88810b172a90 RCX: 0000000000000007
       RDX: 0000000000000002 RSI: 0000000000000300 RDI: ffff88810b172a00
       RBP: ffff88810b172a00 R08: ffff888104273c00 R09: 0000000000100007
       R10: 0000000000020000 R11: 0000000000000006 R12: ffff88810b172a00
       R13: 0000000000000004 R14: 0000000000000000 R15: ffff888237c31f78
       FS:  0000000000000000(0000) GS:ffff888237c00000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00007ffc63fecac8 CR3: 000000000342e000 CR4: 00000000000006f0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       Call Trace:
        <TASK>
        ? __warn+0x88/0x130
        ? inet_sock_destruct+0x1c5/0x1e0
        ? report_bug+0x18e/0x1a0
        ? handle_bug+0x53/0x90
        ? exc_invalid_op+0x18/0x70
        ? asm_exc_invalid_op+0x1a/0x20
        ? inet_sock_destruct+0x1c5/0x1e0
        __sk_destruct+0x2a/0x200
        rcu_do_batch+0x1aa/0x530
        ? rcu_do_batch+0x13b/0x530
        rcu_core+0x159/0x2f0
        handle_softirqs+0xd3/0x2b0
        ? __pfx_smpboot_thread_fn+0x10/0x10
        run_ksoftirqd+0x25/0x30
        smpboot_thread_fn+0xdd/0x1d0
        kthread+0xd3/0x100
        ? __pfx_kthread+0x10/0x10
        ret_from_fork+0x34/0x50
        ? __pfx_kthread+0x10/0x10
        ret_from_fork_asm+0x1a/0x30
        </TASK>
       ---[ end trace 0000000000000000 ]---
      
      Its possible that two threads call tcp_v6_do_rcv()/sk_forward_alloc_add()
      concurrently when sk->sk_state == TCP_LISTEN with sk->sk_lock unlocked,
      which triggers a data-race around sk->sk_forward_alloc:
      tcp_v6_rcv
          tcp_v6_do_rcv
              skb_clone_and_charge_r
                  sk_rmem_schedule
                      __sk_mem_schedule
                          sk_forward_alloc_add()
                  skb_set_owner_r
                      sk_mem_charge
                          sk_forward_alloc_add()
              __kfree_skb
                  skb_release_all
                      skb_release_head_state
                          sock_rfree
                              sk_mem_uncharge
                                  sk_forward_alloc_add()
                                  sk_mem_reclaim
                                      // set local var reclaimable
                                      __sk_mem_reclaim
                                          sk_forward_alloc_add()
      
      In this syzkaller testcase, two threads call
      tcp_v6_do_rcv() with skb->truesize=768, the sk_forward_alloc changes like
      this:
       (cpu 1)             | (cpu 2)             | sk_forward_alloc
       ...                 | ...                 | 0
       __sk_mem_schedule() |                     | +4096 = 4096
                           | __sk_mem_schedule() | +4096 = 8192
       sk_mem_charge()     |                     | -768  = 7424
                           | sk_mem_charge()     | -768  = 6656
       ...                 |    ...              |
       sk_mem_uncharge()   |                     | +768  = 7424
       reclaimable=7424    |                     |
                           | sk_mem_uncharge()   | +768  = 8192
                           | reclaimable=8192    |
       __sk_mem_reclaim()  |                     | -4096 = 4096
                           | __sk_mem_reclaim()  | -8192 = -4096 != 0
      
      The skb_clone_and_charge_r() should not be called in tcp_v6_do_rcv() when
      sk->sk_state is TCP_LISTEN, it happens later in tcp_v6_syn_recv_sock().
      Fix the same issue in dccp_v6_do_rcv().
      
      Suggested-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Fixes: e994b2f0 ("tcp: do not lock listener to process SYN packets")
      Signed-off-by: default avatarWang Liang <wangliang74@huawei.com>
      Link: https://patch.msgid.link/20241107023405.889239-1-wangliang74@huawei.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarAlva Lan <alvalan9@foxmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      be7c61ea
    • Juergen Gross's avatar
      x86/xen: fix SLS mitigation in xen_hypercall_iret() · 060de371
      Juergen Gross authored
      
      The backport of upstream patch a2796dff ("x86/xen: don't do PV iret
      hypercall through hypercall page") missed to adapt the SLS mitigation
      config check from CONFIG_MITIGATION_SLS to CONFIG_SLS.
      
      Signed-off-by: default avatarJuergen Gross <jgross@suse.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      060de371
    • Youzhong Yang's avatar
      nfsd: add list_head nf_gc to struct nfsd_file · 400fb0e9
      Youzhong Yang authored
      
      commit 8e6e2ffa upstream.
      
      nfsd_file_put() in one thread can race with another thread doing
      garbage collection (running nfsd_file_gc() -> list_lru_walk() ->
      nfsd_file_lru_cb()):
      
        * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
        * nfsd_file_lru_add() returns true (with NFSD_FILE_REFERENCED bit set)
        * garbage collector kicks in, nfsd_file_lru_cb() clears REFERENCED bit and
          returns LRU_ROTATE.
        * garbage collector kicks in again, nfsd_file_lru_cb() now decrements nf->nf_ref
          to 0, runs nfsd_file_unhash(), removes it from the LRU and adds to the dispose
          list [list_lru_isolate_move(lru, &nf->nf_lru, head)]
        * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
          the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))]. The 'nf' has been added
          to the 'dispose' list by nfsd_file_lru_cb(), so nfsd_file_lru_remove(nf) simply
          treats it as part of the LRU and removes it, which leads to its removal from
          the 'dispose' list.
        * At this moment, 'nf' is unhashed with its nf_ref being 0, and not on the LRU.
          nfsd_file_put() continues its execution [if (refcount_dec_and_test(&nf->nf_ref))],
          as nf->nf_ref is already 0, nf->nf_ref is set to REFCOUNT_SATURATED, and the 'nf'
          gets no chance of being freed.
      
      nfsd_file_put() can also race with nfsd_file_cond_queue():
        * In nfsd_file_put(), nf->nf_ref is 1, so it tries to do nfsd_file_lru_add().
        * nfsd_file_lru_add() sets REFERENCED bit and returns true.
        * Some userland application runs 'exportfs -f' or something like that, which triggers
          __nfsd_file_cache_purge() -> nfsd_file_cond_queue().
        * In nfsd_file_cond_queue(), it runs [if (!nfsd_file_unhash(nf))], unhash is done
          successfully.
        * nfsd_file_cond_queue() runs [if (!nfsd_file_get(nf))], now nf->nf_ref goes to 2.
        * nfsd_file_cond_queue() runs [if (nfsd_file_lru_remove(nf))], it succeeds.
        * nfsd_file_cond_queue() runs [if (refcount_sub_and_test(decrement, &nf->nf_ref))]
          (with "decrement" being 2), so the nf->nf_ref goes to 0, the 'nf' is added to the
          dispose list [list_add(&nf->nf_lru, dispose)]
        * nfsd_file_put() detects NFSD_FILE_HASHED bit is cleared, so it tries to remove
          the 'nf' from the LRU [if (!nfsd_file_lru_remove(nf))], although the 'nf' is not
          in the LRU, but it is linked in the 'dispose' list, nfsd_file_lru_remove() simply
          treats it as part of the LRU and removes it. This leads to its removal from
          the 'dispose' list!
        * Now nf->ref is 0, unhashed. nfsd_file_put() continues its execution and set
          nf->nf_ref to REFCOUNT_SATURATED.
      
      As shown in the above analysis, using nf_lru for both the LRU list and dispose list
      can cause the leaks. This patch adds a new list_head nf_gc in struct nfsd_file, and uses
      it for the dispose list. This does not fix the nfsd_file leaking issue completely.
      
      Signed-off-by: default avatarYouzhong Yang <youzhong@gmail.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      400fb0e9
    • Gao Xiang's avatar
      erofs: handle NONHEAD !delta[1] lclusters gracefully · 75a0a6dd
      Gao Xiang authored
      
      commit 0bc8061f upstream.
      
      syzbot reported a WARNING in iomap_iter_done:
       iomap_fiemap+0x73b/0x9b0 fs/iomap/fiemap.c:80
       ioctl_fiemap fs/ioctl.c:220 [inline]
      
      Generally, NONHEAD lclusters won't have delta[1]==0, except for crafted
      images and filesystems created by pre-1.0 mkfs versions.
      
      Previously, it would immediately bail out if delta[1]==0, which led to
      inadequate decompressed lengths (thus FIEMAP is impacted).  Treat it as
      delta[1]=1 to work around these legacy mkfs versions.
      
      `lclusterbits > 14` is illegal for compact indexes, error out too.
      
      Reported-by: default avatar <syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/r/67373c0c.050a0220.2a2fcc.0079.GAE@google.com
      
      
      Tested-by: default avatar <syzbot+6c0b301317aa0156f9eb@syzkaller.appspotmail.com>
      Fixes: d95ae5e2 ("erofs: add support for the full decompressed length")
      Fixes: 001b8ccd ("erofs: fix compact 4B support for 16k block size")
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Link: https://lore.kernel.org/r/20241115173651.3339514-1-hsiangkao@linux.alibaba.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      75a0a6dd
    • Gao Xiang's avatar
      erofs: tidy up EROFS on-disk naming · 6326a3dc
      Gao Xiang authored
      
      commit 1c7f49a7 upstream.
      
       - Get rid of all "vle" (variable-length extents) expressions
         since they only expand overall name lengths unnecessarily;
       - Rename COMPRESSION_LEGACY to COMPRESSED_FULL;
       - Move on-disk directory definitions ahead of compression;
       - Drop unused extended attribute definitions;
       - Move inode ondisk union `i_u` out as `union erofs_inode_i_u`.
      
      No actual logical change.
      
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Reviewed-by: default avatarYue Hu <huyue2@coolpad.com>
      Reviewed-by: default avatarChao Yu <chao@kernel.org>
      Link: https://lore.kernel.org/r/20230331063149.25611-1-hsiangkao@linux.alibaba.com
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6326a3dc
Loading