Skip to content
Snippets Groups Projects
  1. Jan 17, 2025
    • Liankun Yang's avatar
      drm/mediatek: Add return value check when reading DPCD · f05f5ab5
      Liankun Yang authored
      
      [ Upstream commit 52290814 ]
      
      Check the return value of drm_dp_dpcd_readb() to confirm that
      AUX communication is successful. To simplify the code, replace
      drm_dp_dpcd_readb() and DP_GET_SINK_COUNT() with drm_dp_read_sink_count().
      
      Fixes: f70ac097 ("drm/mediatek: Add MT8195 Embedded DisplayPort driver")
      Signed-off-by: default avatarLiankun Yang <liankun.yang@mediatek.com>
      Reviewed-by: default avatarGuillaume Ranquet <granquet@baylibre.com>
      Link: https://patchwork.kernel.org/project/dri-devel/patch/20241218113448.2992-1-liankun.yang@mediatek.com/
      
      
      Signed-off-by: default avatarChun-Kuang Hu <chunkuang.hu@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f05f5ab5
    • Liankun Yang's avatar
      drm/mediatek: Fix mode valid issue for dp · 4e674923
      Liankun Yang authored
      
      [ Upstream commit 0d68b558 ]
      
      Fix dp mode valid issue to avoid abnormal display of limit state.
      
      After DP passes link training, it can express the lane count of the
      current link status is good. Calculate the maximum bandwidth supported
      by DP using the current lane count.
      
      The color format will select the best one based on the bandwidth
      requirements of the current timing mode. If the current timing mode
      uses RGB and meets the DP link bandwidth requirements, RGB will be used.
      
      If the timing mode uses RGB but does not meet the DP link bandwidthi
      requirements, it will continue to check whether YUV422 meets
      the DP link bandwidth.
      
      FEC overhead is approximately 2.4% from DP 1.4a spec 2.2.1.4.2.
      The down-spread amplitude shall either be disabled (0.0%) or up
      to 0.5% from 1.4a 3.5.2.6. Add up to approximately 3% total overhead.
      
      Because rate is already divided by 10,
      mode->clock does not need to be multiplied by 10.
      
      Fixes: f70ac097 ("drm/mediatek: Add MT8195 Embedded DisplayPort driver")
      Signed-off-by: default avatarLiankun Yang <liankun.yang@mediatek.com>
      Link: https://patchwork.kernel.org/project/dri-devel/patch/20241025083036.8829-3-liankun.yang@mediatek.com/
      
      
      Signed-off-by: default avatarChun-Kuang Hu <chunkuang.hu@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4e674923
    • Liankun Yang's avatar
      drm/mediatek: Fix YCbCr422 color format issue for DP · e0ad4b01
      Liankun Yang authored
      
      [ Upstream commit ef24fbd8 ]
      
      Setting up misc0 for Pixel Encoding Format.
      
      According to the definition of YCbCr in spec 1.2a Table 2-96,
      0x1 << 1 should be written to the register.
      
      Use switch case to distinguish RGB, YCbCr422,
      and unsupported color formats.
      
      Fixes: f70ac097 ("drm/mediatek: Add MT8195 Embedded DisplayPort driver")
      Signed-off-by: default avatarLiankun Yang <liankun.yang@mediatek.com>
      Link: https://patchwork.kernel.org/project/dri-devel/patch/20241025083036.8829-2-liankun.yang@mediatek.com/
      
      
      Signed-off-by: default avatarChun-Kuang Hu <chunkuang.hu@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e0ad4b01
    • Arnd Bergmann's avatar
      drm/mediatek: stop selecting foreign drivers · 21c501e6
      Arnd Bergmann authored
      
      [ Upstream commit 924d6601 ]
      
      The PHY portion of the mediatek hdmi driver was originally part of
      the driver it self and later split out into drivers/phy, which a
      'select' to keep the prior behavior.
      
      However, this leads to build failures when the PHY driver cannot
      be built:
      
      WARNING: unmet direct dependencies detected for PHY_MTK_HDMI
        Depends on [n]: (ARCH_MEDIATEK || COMPILE_TEST [=y]) && COMMON_CLK [=y] && OF [=y] && REGULATOR [=n]
        Selected by [m]:
        - DRM_MEDIATEK_HDMI [=m] && HAS_IOMEM [=y] && DRM [=m] && DRM_MEDIATEK [=m]
      ERROR: modpost: "devm_regulator_register" [drivers/phy/mediatek/phy-mtk-hdmi-drv.ko] undefined!
      ERROR: modpost: "rdev_get_drvdata" [drivers/phy/mediatek/phy-mtk-hdmi-drv.ko] undefined!
      
      The best option here is to just not select the phy driver and leave that
      up to the defconfig. Do the same for the other PHY and memory drivers
      selected here as well for consistency.
      
      Fixes: a481bf2f ("drm/mediatek: Separate mtk_hdmi_phy to an independent module")
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      Reviewed-by: default avatarAngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
      Reviewed-by: default avatarCK Hu <ck.hu@mediatek.com>
      Link: https://patchwork.kernel.org/project/dri-devel/patch/20241218085837.2670434-1-arnd@kernel.org/
      
      
      Signed-off-by: default avatarChun-Kuang Hu <chunkuang.hu@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21c501e6
    • Chenguang Zhao's avatar
      net/mlx5: Fix variable not being completed when function returns · f0a28087
      Chenguang Zhao authored
      
      [ Upstream commit 0e2909c6 ]
      
      When cmd_alloc_index(), fails cmd_work_handler() needs
      to complete ent->slotted before returning early.
      Otherwise the task which issued the command may hang:
      
         mlx5_core 0000:01:00.0: cmd_work_handler:877:(pid 3880418): failed to allocate command entry
         INFO: task kworker/13:2:4055883 blocked for more than 120 seconds.
               Not tainted 4.19.90-25.44.v2101.ky10.aarch64 #1
         "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
         kworker/13:2    D    0 4055883      2 0x00000228
         Workqueue: events mlx5e_tx_dim_work [mlx5_core]
         Call trace:
            __switch_to+0xe8/0x150
            __schedule+0x2a8/0x9b8
            schedule+0x2c/0x88
            schedule_timeout+0x204/0x478
            wait_for_common+0x154/0x250
            wait_for_completion+0x28/0x38
            cmd_exec+0x7a0/0xa00 [mlx5_core]
            mlx5_cmd_exec+0x54/0x80 [mlx5_core]
            mlx5_core_modify_cq+0x6c/0x80 [mlx5_core]
            mlx5_core_modify_cq_moderation+0xa0/0xb8 [mlx5_core]
            mlx5e_tx_dim_work+0x54/0x68 [mlx5_core]
            process_one_work+0x1b0/0x448
            worker_thread+0x54/0x468
            kthread+0x134/0x138
            ret_from_fork+0x10/0x18
      
      Fixes: 485d65e1 ("net/mlx5: Add a timeout to acquire the command queue semaphore")
      Signed-off-by: default avatarChenguang Zhao <zhaochenguang@kylinos.cn>
      Reviewed-by: default avatarMoshe Shemesh <moshe@nvidia.com>
      Acked-by: default avatarTariq Toukan <tariqt@nvidia.com>
      Link: https://patch.msgid.link/20250108030009.68520-1-zhaochenguang@kylinos.cn
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f0a28087
    • Toke Høiland-Jørgensen's avatar
      sched: sch_cake: add bounds checks to host bulk flow fairness counts · a777e06d
      Toke Høiland-Jørgensen authored
      
      [ Upstream commit 737d4d91 ]
      
      Even though we fixed a logic error in the commit cited below, syzbot
      still managed to trigger an underflow of the per-host bulk flow
      counters, leading to an out of bounds memory access.
      
      To avoid any such logic errors causing out of bounds memory accesses,
      this commit factors out all accesses to the per-host bulk flow counters
      to a series of helpers that perform bounds-checking before any
      increments and decrements. This also has the benefit of improving
      readability by moving the conditional checks for the flow mode into
      these helpers, instead of having them spread out throughout the
      code (which was the cause of the original logic error).
      
      As part of this change, the flow quantum calculation is consolidated
      into a helper function, which means that the dithering applied to the
      ost load scaling is now applied both in the DRR rotation and when a
      sparse flow's quantum is first initiated. The only user-visible effect
      of this is that the maximum packet size that can be sent while a flow
      stays sparse will now vary with +/- one byte in some cases. This should
      not make a noticeable difference in practice, and thus it's not worth
      complicating the code to preserve the old behaviour.
      
      Fixes: 546ea84d ("sched: sch_cake: fix bulk flow accounting logic for host fairness")
      Reported-by: default avatar <syzbot+f63600d288bfb7057424@syzkaller.appspotmail.com>
      Signed-off-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarDave Taht <dave.taht@gmail.com>
      Link: https://patch.msgid.link/20250107120105.70685-1-toke@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a777e06d
    • Pablo Neira Ayuso's avatar
      netfilter: conntrack: clamp maximum hashtable size to INT_MAX · 5552b4fd
      Pablo Neira Ayuso authored
      
      [ Upstream commit b541ba7d ]
      
      Use INT_MAX as maximum size for the conntrack hashtable. Otherwise, it
      is possible to hit WARN_ON_ONCE in __kvmalloc_node_noprof() when
      resizing hashtable because __GFP_NOWARN is unset. See:
      
        0708a0af ("mm: Consider __GFP_NOWARN flag for oversized kvmalloc() calls")
      
      Note: hashtable resize is only possible from init_netns.
      
      Fixes: 9cc1c73a ("netfilter: conntrack: avoid integer overflow when resizing")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5552b4fd
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: imbalance in flowtable binding · d470b925
      Pablo Neira Ayuso authored
      
      [ Upstream commit 13210fc6 ]
      
      All these cases cause imbalance between BIND and UNBIND calls:
      
      - Delete an interface from a flowtable with multiple interfaces
      
      - Add a (device to a) flowtable with --check flag
      
      - Delete a netns containing a flowtable
      
      - In an interactive nft session, create a table with owner flag and
        flowtable inside, then quit.
      
      Fix it by calling FLOW_BLOCK_UNBIND when unregistering hooks, then
      remove late FLOW_BLOCK_UNBIND call when destroying flowtable.
      
      Fixes: ff4bf2f4 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()")
      Reported-by: default avatarPhil Sutter <phil@nwl.cc>
      Tested-by: default avatarPhil Sutter <phil@nwl.cc>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d470b925
    • Daniel Borkmann's avatar
      tcp: Annotate data-race around sk->sk_mark in tcp_v4_send_reset · 636d7b95
      Daniel Borkmann authored
      
      [ Upstream commit 80fb40ba ]
      
      This is a follow-up to 3c5b4d69 ("net: annotate data-races around
      sk->sk_mark"). sk->sk_mark can be read and written without holding
      the socket lock. IPv6 equivalent is already covered with READ_ONCE()
      annotation in tcp_v6_send_response().
      
      Fixes: 3c5b4d69 ("net: annotate data-races around sk->sk_mark")
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/f459d1fc44f205e13f6d8bdca2c8bfb9902ffac9.1736244569.git.daniel@iogearbox.net
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      636d7b95
    • Luiz Augusto von Dentz's avatar
      Bluetooth: hci_sync: Fix not setting Random Address when required · faa8a33e
      Luiz Augusto von Dentz authored
      
      [ Upstream commit c2994b00 ]
      
      This fixes errors such as the following when Own address type is set to
      Random Address but it has not been programmed yet due to either be
      advertising or connecting:
      
      < HCI Command: LE Set Exte.. (0x08|0x0041) plen 13
              Own address type: Random (0x03)
              Filter policy: Ignore not in accept list (0x01)
              PHYs: 0x05
              Entry 0: LE 1M
                Type: Passive (0x00)
                Interval: 60.000 msec (0x0060)
                Window: 30.000 msec (0x0030)
              Entry 1: LE Coded
                Type: Passive (0x00)
                Interval: 180.000 msec (0x0120)
                Window: 90.000 msec (0x0090)
      > HCI Event: Command Complete (0x0e) plen 4
            LE Set Extended Scan Parameters (0x08|0x0041) ncmd 1
              Status: Success (0x00)
      < HCI Command: LE Set Exten.. (0x08|0x0042) plen 6
              Extended scan: Enabled (0x01)
              Filter duplicates: Enabled (0x01)
              Duration: 0 msec (0x0000)
              Period: 0.00 sec (0x0000)
      > HCI Event: Command Complete (0x0e) plen 4
            LE Set Extended Scan Enable (0x08|0x0042) ncmd 1
              Status: Invalid HCI Command Parameters (0x12)
      
      Fixes: c45074d6 ("Bluetooth: Fix not generating RPA when required")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      faa8a33e
    • Benjamin Coddington's avatar
      tls: Fix tls_sw_sendmsg error handling · ecb1356a
      Benjamin Coddington authored
      
      [ Upstream commit b341ca51 ]
      
      We've noticed that NFS can hang when using RPC over TLS on an unstable
      connection, and investigation shows that the RPC layer is stuck in a tight
      loop attempting to transmit, but forever getting -EBADMSG back from the
      underlying network.  The loop begins when tcp_sendmsg_locked() returns
      -EPIPE to tls_tx_records(), but that error is converted to -EBADMSG when
      calling the socket's error reporting handler.
      
      Instead of converting errors from tcp_sendmsg_locked(), let's pass them
      along in this path.  The RPC layer handles -EPIPE by reconnecting the
      transport, which prevents the endless attempts to transmit on a broken
      connection.
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Fixes: a42055e8 ("net/tls: Add support for async encryption of records for performance")
      Link: https://patch.msgid.link/9594185559881679d81f071b181a10eb07cd079f.1736004079.git.bcodding@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ecb1356a
    • Przemyslaw Korba's avatar
      ice: fix incorrect PHY settings for 100 GB/s · 657a87c2
      Przemyslaw Korba authored
      
      [ Upstream commit 6c5b9891 ]
      
      ptp4l application reports too high offset when ran on E823 device
      with a 100GB/s link. Those values cannot go under 100ns, like in a
      working case when using 100 GB/s cable.
      
      This is due to incorrect frequency settings on the PHY clocks for
      100 GB/s speed. Changes are introduced to align with the internal
      hardware documentation, and correctly initialize frequency in PHY
      clocks with the frequency values that are in our HW spec.
      
      To reproduce the issue run ptp4l as a Time Receiver on E823 device,
      and observe the offset, which will never approach values seen
      in the PTP working case.
      
      Reproduction output:
      ptp4l -i enp137s0f3 -m -2 -s -f /etc/ptp4l_8275.conf
      ptp4l[5278.775]: master offset      12470 s2 freq  +41288 path delay -3002
      ptp4l[5278.837]: master offset      10525 s2 freq  +39202 path delay -3002
      ptp4l[5278.900]: master offset     -24840 s2 freq  -20130 path delay -3002
      ptp4l[5278.963]: master offset      10597 s2 freq  +37908 path delay -3002
      ptp4l[5279.025]: master offset       8883 s2 freq  +36031 path delay -3002
      ptp4l[5279.088]: master offset       7267 s2 freq  +34151 path delay -3002
      ptp4l[5279.150]: master offset       5771 s2 freq  +32316 path delay -3002
      ptp4l[5279.213]: master offset       4388 s2 freq  +30526 path delay -3002
      ptp4l[5279.275]: master offset     -30434 s2 freq  -28485 path delay -3002
      ptp4l[5279.338]: master offset     -28041 s2 freq  -27412 path delay -3002
      ptp4l[5279.400]: master offset       7870 s2 freq  +31118 path delay -3002
      
      Fixes: 3a749623 ("ice: implement basic E822 PTP support")
      Reviewed-by: default avatarMilena Olech <milena.olech@intel.com>
      Signed-off-by: default avatarPrzemyslaw Korba <przemyslaw.korba@intel.com>
      Tested-by: Rinitha S <sx.rinitha@intel.com> (A Contingent worker at Intel)
      Signed-off-by: default avatarTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      657a87c2
    • Anumula Murali Mohan Reddy's avatar
      cxgb4: Avoid removal of uninserted tid · 8a7b73f1
      Anumula Murali Mohan Reddy authored
      
      [ Upstream commit 4c122450 ]
      
      During ARP failure, tid is not inserted but _c4iw_free_ep()
      attempts to remove tid which results in error.
      This patch fixes the issue by avoiding removal of uninserted tid.
      
      Fixes: 59437d78 ("cxgb4/chtls: fix ULD connection failures due to wrong TID base")
      Signed-off-by: default avatarAnumula Murali Mohan Reddy <anumula@chelsio.com>
      Signed-off-by: default avatarPotnuri Bharat Teja <bharat@chelsio.com>
      Link: https://patch.msgid.link/20250103092327.1011925-1-anumula@chelsio.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8a7b73f1
    • Kalesh AP's avatar
      bnxt_en: Fix possible memory leak when hwrm_req_replace fails · b9582838
      Kalesh AP authored
      
      [ Upstream commit c8dafb0e ]
      
      When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop()
      which could cause a memory leak.
      
      Fixes: bbf33d1d ("bnxt_en: update all firmware calls to use the new APIs")
      Reviewed-by: default avatarPavan Chebbi <pavan.chebbi@broadcom.com>
      Signed-off-by: default avatarKalesh AP <kalesh-anakkur.purayil@broadcom.com>
      Signed-off-by: default avatarMichael Chan <michael.chan@broadcom.com>
      Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      b9582838
    • Eric Dumazet's avatar
      net_sched: cls_flow: validate TCA_FLOW_RSHIFT attribute · 2011749c
      Eric Dumazet authored
      
      [ Upstream commit a039e543 ]
      
      syzbot found that TCA_FLOW_RSHIFT attribute was not validated.
      Right shitfing a 32bit integer is undefined for large shift values.
      
      UBSAN: shift-out-of-bounds in net/sched/cls_flow.c:329:23
      shift exponent 9445 is too large for 32-bit type 'u32' (aka 'unsigned int')
      CPU: 1 UID: 0 PID: 54 Comm: kworker/u8:3 Not tainted 6.13.0-rc3-syzkaller-00180-g4f619d518db9 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
      Workqueue: ipv6_addrconf addrconf_dad_work
      Call Trace:
       <TASK>
        __dump_stack lib/dump_stack.c:94 [inline]
        dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
        ubsan_epilogue lib/ubsan.c:231 [inline]
        __ubsan_handle_shift_out_of_bounds+0x3c8/0x420 lib/ubsan.c:468
        flow_classify+0x24d5/0x25b0 net/sched/cls_flow.c:329
        tc_classify include/net/tc_wrapper.h:197 [inline]
        __tcf_classify net/sched/cls_api.c:1771 [inline]
        tcf_classify+0x420/0x1160 net/sched/cls_api.c:1867
        sfb_classify net/sched/sch_sfb.c:260 [inline]
        sfb_enqueue+0x3ad/0x18b0 net/sched/sch_sfb.c:318
        dev_qdisc_enqueue+0x4b/0x290 net/core/dev.c:3793
        __dev_xmit_skb net/core/dev.c:3889 [inline]
        __dev_queue_xmit+0xf0e/0x3f50 net/core/dev.c:4400
        dev_queue_xmit include/linux/netdevice.h:3168 [inline]
        neigh_hh_output include/net/neighbour.h:523 [inline]
        neigh_output include/net/neighbour.h:537 [inline]
        ip_finish_output2+0xd41/0x1390 net/ipv4/ip_output.c:236
        iptunnel_xmit+0x55d/0x9b0 net/ipv4/ip_tunnel_core.c:82
        udp_tunnel_xmit_skb+0x262/0x3b0 net/ipv4/udp_tunnel_core.c:173
        geneve_xmit_skb drivers/net/geneve.c:916 [inline]
        geneve_xmit+0x21dc/0x2d00 drivers/net/geneve.c:1039
        __netdev_start_xmit include/linux/netdevice.h:5002 [inline]
        netdev_start_xmit include/linux/netdevice.h:5011 [inline]
        xmit_one net/core/dev.c:3590 [inline]
        dev_hard_start_xmit+0x27a/0x7d0 net/core/dev.c:3606
        __dev_queue_xmit+0x1b73/0x3f50 net/core/dev.c:4434
      
      Fixes: e5dfb815 ("[NET_SCHED]: Add flow classifier")
      Reported-by: default avatar <syzbot+1dbb57d994e54aaa04d2@syzkaller.appspotmail.com>
      Closes: https://lore.kernel.org/netdev/6777bf49.050a0220.178762.0040.GAE@google.com/T/#u
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20250103104546.3714168-1-edumazet@google.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2011749c
    • Zhongqiu Duan's avatar
      tcp/dccp: allow a connection when sk_max_ack_backlog is zero · 2d230410
      Zhongqiu Duan authored
      [ Upstream commit 3479c754 ]
      
      If the backlog of listen() is set to zero, sk_acceptq_is_full() allows
      one connection to be made, but inet_csk_reqsk_queue_is_full() does not.
      When the net.ipv4.tcp_syncookies is zero, inet_csk_reqsk_queue_is_full()
      will cause an immediate drop before the sk_acceptq_is_full() check in
      tcp_conn_request(), resulting in no connection can be made.
      
      This patch tries to keep consistent with 64a14651 ("[NET]: Revert
      incorrect accept queue backlog changes.").
      
      Link: https://lore.kernel.org/netdev/20250102080258.53858-1-kuniyu@amazon.com/
      
      
      Fixes: ef547f2a ("tcp: remove max_qlen_log")
      Signed-off-by: default avatarZhongqiu Duan <dzq.aishenghu0@gmail.com>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarJason Xing <kerneljasonxing@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20250102171426.915276-1-dzq.aishenghu0@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2d230410
    • Jason Xing's avatar
      tcp/dccp: complete lockless accesses to sk->sk_max_ack_backlog · c0b0d9ae
      Jason Xing authored
      
      [ Upstream commit 9a79c65f ]
      
      Since commit 099ecf59 ("net: annotate lockless accesses to
      sk->sk_max_ack_backlog") decided to handle the sk_max_ack_backlog
      locklessly, there is one more function mostly called in TCP/DCCP
      cases. So this patch completes it:)
      
      Signed-off-by: default avatarJason Xing <kernelxing@tencent.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20240331090521.71965-1-kerneljasonxing@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Stable-dep-of: 3479c754 ("tcp/dccp: allow a connection when sk_max_ack_backlog is zero")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c0b0d9ae
    • Antonio Pastor's avatar
      net: 802: LLC+SNAP OID:PID lookup on start of skb data · 0a5026be
      Antonio Pastor authored
      
      [ Upstream commit 1e9b0e1c ]
      
      802.2+LLC+SNAP frames received by napi_complete_done() with GRO and DSA
      have skb->transport_header set two bytes short, or pointing 2 bytes
      before network_header & skb->data. This was an issue as snap_rcv()
      expected offset to point to SNAP header (OID:PID), causing packet to
      be dropped.
      
      A fix at llc_fixup_skb() (a024e377) resets transport_header for any
      LLC consumers that may care about it, and stops SNAP packets from being
      dropped, but doesn't fix the problem which is that LLC and SNAP should
      not use transport_header offset.
      
      Ths patch eliminates the use of transport_header offset for SNAP lookup
      of OID:PID so that SNAP does not rely on the offset at all.
      The offset is reset after pull for any SNAP packet consumers that may
      (but shouldn't) use it.
      
      Fixes: fda55eca ("net: introduce skb_transport_header_was_set()")
      Signed-off-by: default avatarAntonio Pastor <antonio.pastor@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20250103012303.746521-1-antonio.pastor@gmail.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0a5026be
    • Keisuke Nishimura's avatar
      ieee802154: ca8210: Add missing check for kfifo_alloc() in ca8210_probe() · 4589abf8
      Keisuke Nishimura authored
      
      [ Upstream commit 2c87309e ]
      
      ca8210_test_interface_init() returns the result of kfifo_alloc(),
      which can be non-zero in case of an error. The caller, ca8210_probe(),
      should check the return value and do error-handling if it fails.
      
      Fixes: ded845a7 ("ieee802154: Add CA8210 IEEE 802.15.4 device driver")
      Signed-off-by: default avatarKeisuke Nishimura <keisuke.nishimura@inria.fr>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Reviewed-by: default avatarMiquel Raynal <miquel.raynal@bootlin.com>
      Link: https://lore.kernel.org/20241029182712.318271-1-keisuke.nishimura@inria.fr
      
      
      Signed-off-by: default avatarStefan Schmidt <stefan@datenfreihafen.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4589abf8
    • Chen-Yu Tsai's avatar
      ASoC: mediatek: disable buffer pre-allocation · f6dce4dc
      Chen-Yu Tsai authored
      
      [ Upstream commit 32c9c06a ]
      
      On Chromebooks based on Mediatek MT8195 or MT8188, the audio frontend
      (AFE) is limited to accessing a very small window (1 MiB) of memory,
      which is described as a reserved memory region in the device tree.
      
      On these two platforms, the maximum buffer size is given as 512 KiB.
      The MediaTek common code uses the same value for preallocations. This
      means that only the first two PCM substreams get preallocations, and
      then the whole space is exhausted, barring any other substreams from
      working. Since the substreams used are not always the first two, this
      means audio won't work correctly.
      
      This is observed on the MT8188 Geralt Chromebooks, on which the
      "mediatek,dai-link" property was dropped when it was upstreamed. That
      property causes the driver to only register the PCM substreams listed
      in the property, and in the order given.
      
      Instead of trying to compute an optimal value and figuring out which
      streams are used, simply disable preallocation. The PCM buffers are
      managed by the core and are allocated and released on the fly. There
      should be no impact to any of the other MediaTek platforms.
      
      Signed-off-by: default avatarChen-Yu Tsai <wenst@chromium.org>
      Reviewed-by: default avatarAngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
      Link: https://patch.msgid.link/20241219105303.548437-1-wenst@chromium.org
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f6dce4dc
    • Kuan-Wei Chiu's avatar
      scripts/sorttable: fix orc_sort_cmp() to maintain symmetry and transitivity · 939d239f
      Kuan-Wei Chiu authored
      The orc_sort_cmp() function, used with qsort(), previously violated the
      symmetry and transitivity rules required by the C standard.  Specifically,
      when both entries are ORC_REG_UNDEFINED, it could result in both a < b
      and b < a, which breaks the required symmetry and transitivity.  This can
      lead to undefined behavior and incorrect sorting results, potentially
      causing memory corruption in glibc implementations [1].
      
      Symmetry: If x < y, then y > x.
      Transitivity: If x < y and y < z, then x < z.
      
      Fix the comparison logic to return 0 when both entries are
      ORC_REG_UNDEFINED, ensuring compliance with qsort() requirements.
      
      Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
      Link: https://lkml.kernel.org/r/20241226140332.2670689-1-visitorckw@gmail.com
      
      
      Fixes: 57fa1899 ("scripts/sorttable: Implement build-time ORC unwind table sorting")
      Fixes: fb799447 ("x86,objtool: Split UNWIND_HINT_EMPTY in two")
      Signed-off-by: default avatarKuan-Wei Chiu <visitorckw@gmail.com>
      Cc: Ching-Chun (Jim) Huang <jserv@ccns.ncku.edu.tw>
      Cc: <chuang@cs.nycu.edu.tw>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      (cherry picked from commit 0210d251)
      Signed-off-by: default avatarKuan-Wei Chiu <visitorckw@gmail.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      939d239f
    • Yuezhang Mo's avatar
      exfat: fix the infinite loop in __exfat_free_cluster() · d23f2621
      Yuezhang Mo authored
      
      [ Upstream commit a5324b3a ]
      
      In __exfat_free_cluster(), the cluster chain is traversed until the
      EOF cluster. If the cluster chain includes a loop due to file system
      corruption, the EOF cluster cannot be traversed, resulting in an
      infinite loop.
      
      This commit uses the total number of clusters to prevent this infinite
      loop.
      
      Reported-by: default avatar <syzbot+1de5a37cb85a2d536330@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=1de5a37cb85a2d536330
      
      
      Tested-by: default avatar <syzbot+1de5a37cb85a2d536330@syzkaller.appspotmail.com>
      Fixes: 31023864 ("exfat: add fat entry operations")
      Signed-off-by: default avatarYuezhang Mo <Yuezhang.Mo@sony.com>
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d23f2621
    • Yuezhang Mo's avatar
      exfat: fix the infinite loop in exfat_readdir() · 31beabd0
      Yuezhang Mo authored
      
      [ Upstream commit fee87376 ]
      
      If the file system is corrupted so that a cluster is linked to
      itself in the cluster chain, and there is an unused directory
      entry in the cluster, 'dentry' will not be incremented, causing
      condition 'dentry < max_dentries' unable to prevent an infinite
      loop.
      
      This infinite loop causes s_lock not to be released, and other
      tasks will hang, such as exfat_sync_fs().
      
      This commit stops traversing the cluster chain when there is unused
      directory entry in the cluster to avoid this infinite loop.
      
      Reported-by: default avatar <syzbot+205c2644abdff9d3f9fc@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=205c2644abdff9d3f9fc
      
      
      Tested-by: default avatar <syzbot+205c2644abdff9d3f9fc@syzkaller.appspotmail.com>
      Fixes: ca061973 ("exfat: add directory operations")
      Signed-off-by: default avatarYuezhang Mo <Yuezhang.Mo@sony.com>
      Reviewed-by: default avatarSungjong Seo <sj1557.seo@samsung.com>
      Signed-off-by: default avatarNamjae Jeon <linkinjeon@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      31beabd0
    • Ming-Hung Tsai's avatar
      dm array: fix cursor index when skipping across block boundaries · 43c38c3b
      Ming-Hung Tsai authored
      
      [ Upstream commit 0bb1968d ]
      
      dm_array_cursor_skip() seeks to the target position by loading array
      blocks iteratively until the specified number of entries to skip is
      reached. When seeking across block boundaries, it uses
      dm_array_cursor_next() to step into the next block.
      dm_array_cursor_skip() must first move the cursor index to the end
      of the current block; otherwise, the cursor position could incorrectly
      remain in the same block, causing the actual number of skipped entries
      to be much smaller than expected.
      
      This bug affects cache resizing in v2 metadata and could lead to data
      loss if the fast device is shrunk during the first-time resume. For
      example:
      
      1. create a cache metadata consists of 32768 blocks, with a dirty block
         assigned to the second bitmap block. cache_restore v1.0 is required.
      
      cat <<EOF >> cmeta.xml
      <superblock uuid="" block_size="64" nr_cache_blocks="32768" \
      policy="smq" hint_width="4">
        <mappings>
          <mapping cache_block="32767" origin_block="0" dirty="true"/>
        </mappings>
      </superblock>
      EOF
      dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
      cache_restore -i cmeta.xml -o /dev/mapper/cmeta --metadata-version=2
      
      2. bring up the cache while attempt to discard all the blocks belonging
         to the second bitmap block (block# 32576 to 32767). The last command
         is expected to fail, but it actually succeeds.
      
      dmsetup create cdata --table "0 2084864 linear /dev/sdc 8192"
      dmsetup create corig --table "0 65536 linear /dev/sdc 2105344"
      dmsetup create cache --table "0 65536 cache /dev/mapper/cmeta \
      /dev/mapper/cdata /dev/mapper/corig 64 2 metadata2 writeback smq \
      2 migration_threshold 0"
      
      In addition to the reproducer described above, this fix can be
      verified using the "array_cursor/skip" tests in dm-unit:
        dm-unit run /pdata/array_cursor/skip/ --kernel-dir <KERNEL_DIR>
      
      Signed-off-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Fixes: 9b696229 ("dm persistent data: add cursor skip functions to the cursor APIs")
      Reviewed-by: default avatarJoe Thornber <thornber@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      43c38c3b
    • Ming-Hung Tsai's avatar
      dm array: fix unreleased btree blocks on closing a faulty array cursor · 956a74b2
      Ming-Hung Tsai authored
      
      [ Upstream commit 626f128e ]
      
      The cached block pointer in dm_array_cursor might be NULL if it reaches
      an unreadable array block, or the array is empty. Therefore,
      dm_array_cursor_end() should call dm_btree_cursor_end() unconditionally,
      to prevent leaving unreleased btree blocks.
      
      This fix can be verified using the "array_cursor/iterate/empty" test
      in dm-unit:
        dm-unit run /pdata/array_cursor/iterate/empty --kernel-dir <KERNEL_DIR>
      
      Signed-off-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Fixes: fdd1315a ("dm array: introduce cursor api")
      Reviewed-by: default avatarJoe Thornber <thornber@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      956a74b2
    • Ming-Hung Tsai's avatar
      dm array: fix releasing a faulty array block twice in dm_array_cursor_end · e477021d
      Ming-Hung Tsai authored
      
      [ Upstream commit f2893c08 ]
      
      When dm_bm_read_lock() fails due to locking or checksum errors, it
      releases the faulty block implicitly while leaving an invalid output
      pointer behind. The caller of dm_bm_read_lock() should not operate on
      this invalid dm_block pointer, or it will lead to undefined result.
      For example, the dm_array_cursor incorrectly caches the invalid pointer
      on reading a faulty array block, causing a double release in
      dm_array_cursor_end(), then hitting the BUG_ON in dm-bufio cache_put().
      
      Reproduce steps:
      
      1. initialize a cache device
      
      dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
      dmsetup create cdata --table "0 65536 linear /dev/sdc 8192"
      dmsetup create corig --table "0 524288 linear /dev/sdc $262144"
      dd if=/dev/zero of=/dev/mapper/cmeta bs=4k count=1
      dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \
      /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"
      
      2. wipe the second array block offline
      
      dmsteup remove cache cmeta cdata corig
      mapping_root=$(dd if=/dev/sdc bs=1c count=8 skip=192 \
      2>/dev/null | hexdump -e '1/8 "%u\n"')
      ablock=$(dd if=/dev/sdc bs=1c count=8 skip=$((4096*mapping_root+2056)) \
      2>/dev/null | hexdump -e '1/8 "%u\n"')
      dd if=/dev/zero of=/dev/sdc bs=4k count=1 seek=$ablock
      
      3. try reopen the cache device
      
      dmsetup create cmeta --table "0 8192 linear /dev/sdc 0"
      dmsetup create cdata --table "0 65536 linear /dev/sdc 8192"
      dmsetup create corig --table "0 524288 linear /dev/sdc $262144"
      dmsetup create cache --table "0 524288 cache /dev/mapper/cmeta \
      /dev/mapper/cdata /dev/mapper/corig 128 2 metadata2 writethrough smq 0"
      
      Kernel logs:
      
      (snip)
      device-mapper: array: array_block_check failed: blocknr 0 != wanted 10
      device-mapper: block manager: array validator check failed for block 10
      device-mapper: array: get_ablock failed
      device-mapper: cache metadata: dm_array_cursor_next for mapping failed
      ------------[ cut here ]------------
      kernel BUG at drivers/md/dm-bufio.c:638!
      
      Fix by setting the cached block pointer to NULL on errors.
      
      In addition to the reproducer described above, this fix can be
      verified using the "array_cursor/damaged" test in dm-unit:
        dm-unit run /pdata/array_cursor/damaged --kernel-dir <KERNEL_DIR>
      
      Signed-off-by: default avatarMing-Hung Tsai <mtsai@redhat.com>
      Fixes: fdd1315a ("dm array: introduce cursor api")
      Reviewed-by: default avatarJoe Thornber <thornber@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e477021d
    • Zhang Yi's avatar
      jbd2: flush filesystem device before updating tail sequence · 5af095cb
      Zhang Yi authored
      
      [ Upstream commit a0851ea9 ]
      
      When committing transaction in jbd2_journal_commit_transaction(), the
      disk caches for the filesystem device should be flushed before updating
      the journal tail sequence. However, this step is missed if the journal
      is not located on the filesystem device. As a result, the filesystem may
      become inconsistent following a power failure or system crash. Fix it by
      ensuring that the filesystem device is flushed appropriately.
      
      Fixes: 3339578f ("jbd2: cleanup journal tail after transaction commit")
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Link: https://lore.kernel.org/r/20241203014407.805916-3-yi.zhang@huaweicloud.com
      
      
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      5af095cb
    • Zhang Yi's avatar
      jbd2: increase IO priority for writing revoke records · 62834f5b
      Zhang Yi authored
      
      [ Upstream commit ac1e21bd ]
      
      Commit '6a3afb6a ("jbd2: increase the journal IO's priority")'
      increases the priority of journal I/O by marking I/O with the
      JBD2_JOURNAL_REQ_FLAGS. However, that commit missed the revoke buffers,
      so also addresses that kind of I/Os.
      
      Fixes: 6a3afb6a ("jbd2: increase the journal IO's priority")
      Signed-off-by: default avatarZhang Yi <yi.zhang@huawei.com>
      Link: https://lore.kernel.org/r/20241203014407.805916-2-yi.zhang@huaweicloud.com
      
      
      Reviewed-by: default avatarKemeng Shi <shikemeng@huaweicloud.com>
      Reviewed-by: default avatarJan Kara <jack@suse.cz>
      Signed-off-by: default avatarChristian Brauner <brauner@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      62834f5b
    • Qun-Wei Lin's avatar
      sched/task_stack: fix object_is_on_stack() for KASAN tagged pointers · 397383db
      Qun-Wei Lin authored
      commit fd7b4f9f upstream.
      
      When CONFIG_KASAN_SW_TAGS and CONFIG_KASAN_STACK are enabled, the
      object_is_on_stack() function may produce incorrect results due to the
      presence of tags in the obj pointer, while the stack pointer does not have
      tags.  This discrepancy can lead to incorrect stack object detection and
      subsequently trigger warnings if CONFIG_DEBUG_OBJECTS is also enabled.
      
      Example of the warning:
      
      ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
      ------------[ cut here ]------------
      WARNING: CPU: 0 PID: 1 at lib/debugobjects.c:557 __debug_object_init+0x330/0x364
      Modules linked in:
      CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5 #4
      Hardware name: linux,dummy-virt (DT)
      pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
      pc : __debug_object_init+0x330/0x364
      lr : __debug_object_init+0x330/0x364
      sp : ffff800082ea7b40
      x29: ffff800082ea7b40 x28: 98ff0000c0164518 x27: 98ff0000c0164534
      x26: ffff800082d93ec8 x25: 0000000000000001 x24: 1cff0000c00172a0
      x23: 0000000000000000 x22: ffff800082d93ed0 x21: ffff800081a24418
      x20: 3eff800082ea7bb0 x19: efff800000000000 x18: 0000000000000000
      x17: 00000000000000ff x16: 0000000000000047 x15: 206b63617473206e
      x14: 0000000000000018 x13: ffff800082ea7780 x12: 0ffff800082ea78e
      x11: 0ffff800082ea790 x10: 0ffff800082ea79d x9 : 34d77febe173e800
      x8 : 34d77febe173e800 x7 : 0000000000000001 x6 : 0000000000000001
      x5 : feff800082ea74b8 x4 : ffff800082870a90 x3 : ffff80008018d3c4
      x2 : 0000000000000001 x1 : ffff800082858810 x0 : 0000000000000050
      Call trace:
       __debug_object_init+0x330/0x364
       debug_object_init_on_stack+0x30/0x3c
       schedule_hrtimeout_range_clock+0xac/0x26c
       schedule_hrtimeout+0x1c/0x30
       wait_task_inactive+0x1d4/0x25c
       kthread_bind_mask+0x28/0x98
       init_rescuer+0x1e8/0x280
       workqueue_init+0x1a0/0x3cc
       kernel_init_freeable+0x118/0x200
       kernel_init+0x28/0x1f0
       ret_from_fork+0x10/0x20
      ---[ end trace 0000000000000000 ]---
      ODEBUG: object 3eff800082ea7bb0 is NOT on stack ffff800082ea0000, but annotated.
      ------------[ cut here ]------------
      
      Link: https://lkml.kernel.org/r/20241113042544.19095-1-qun-wei.lin@mediatek.com
      
      
      Signed-off-by: default avatarQun-Wei Lin <qun-wei.lin@mediatek.com>
      Cc: Andrew Yang <andrew.yang@mediatek.com>
      Cc: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
      Cc: Casper Li <casper.li@mediatek.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: Kent Overstreet <kent.overstreet@linux.dev>
      Cc: Matthias Brugger <matthias.bgg@gmail.com>
      Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Shakeel Butt <shakeel.butt@linux.dev>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAlva Lan <alvalan9@foxmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      397383db
    • Michal Luczaj's avatar
      bpf, sockmap: Fix race between element replace and close() · b79a0d1e
      Michal Luczaj authored
      
      commit ed1fc5d7 upstream.
      
      Element replace (with a socket different from the one stored) may race
      with socket's close() link popping & unlinking. __sock_map_delete()
      unconditionally unrefs the (wrong) element:
      
      // set map[0] = s0
      map_update_elem(map, 0, s0)
      
      // drop fd of s0
      close(s0)
        sock_map_close()
          lock_sock(sk)               (s0!)
          sock_map_remove_links(sk)
            link = sk_psock_link_pop()
            sock_map_unlink(sk, link)
              sock_map_delete_from_link
                                              // replace map[0] with s1
                                              map_update_elem(map, 0, s1)
                                                sock_map_update_elem
                                      (s1!)       lock_sock(sk)
                                                  sock_map_update_common
                                                    psock = sk_psock(sk)
                                                    spin_lock(&stab->lock)
                                                    osk = stab->sks[idx]
                                                    sock_map_add_link(..., &stab->sks[idx])
                                                    sock_map_unref(osk, &stab->sks[idx])
                                                      psock = sk_psock(osk)
                                                      sk_psock_put(sk, psock)
                                                        if (refcount_dec_and_test(&psock))
                                                          sk_psock_drop(sk, psock)
                                                    spin_unlock(&stab->lock)
                                                  unlock_sock(sk)
                __sock_map_delete
                  spin_lock(&stab->lock)
                  sk = *psk                        // s1 replaced s0; sk == s1
                  if (!sk_test || sk_test == sk)   // sk_test (s0) != sk (s1); no branch
                    sk = xchg(psk, NULL)
                  if (sk)
                    sock_map_unref(sk, psk)        // unref s1; sks[idx] will dangle
                      psock = sk_psock(sk)
                      sk_psock_put(sk, psock)
                        if (refcount_dec_and_test())
                          sk_psock_drop(sk, psock)
                  spin_unlock(&stab->lock)
          release_sock(sk)
      
      Then close(map) enqueues bpf_map_free_deferred, which finally calls
      sock_map_free(). This results in some refcount_t warnings along with
      a KASAN splat [1].
      
      Fix __sock_map_delete(), do not allow sock_map_unref() on elements that
      may have been replaced.
      
      [1]:
      BUG: KASAN: slab-use-after-free in sock_map_free+0x10e/0x330
      Write of size 4 at addr ffff88811f5b9100 by task kworker/u64:12/1063
      
      CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Not tainted 6.12.0+ #125
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
      Workqueue: events_unbound bpf_map_free_deferred
      Call Trace:
       <TASK>
       dump_stack_lvl+0x68/0x90
       print_report+0x174/0x4f6
       kasan_report+0xb9/0x190
       kasan_check_range+0x10f/0x1e0
       sock_map_free+0x10e/0x330
       bpf_map_free_deferred+0x173/0x320
       process_one_work+0x846/0x1420
       worker_thread+0x5b3/0xf80
       kthread+0x29e/0x360
       ret_from_fork+0x2d/0x70
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      Allocated by task 1202:
       kasan_save_stack+0x1e/0x40
       kasan_save_track+0x10/0x30
       __kasan_slab_alloc+0x85/0x90
       kmem_cache_alloc_noprof+0x131/0x450
       sk_prot_alloc+0x5b/0x220
       sk_alloc+0x2c/0x870
       unix_create1+0x88/0x8a0
       unix_create+0xc5/0x180
       __sock_create+0x241/0x650
       __sys_socketpair+0x1ce/0x420
       __x64_sys_socketpair+0x92/0x100
       do_syscall_64+0x93/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Freed by task 46:
       kasan_save_stack+0x1e/0x40
       kasan_save_track+0x10/0x30
       kasan_save_free_info+0x37/0x60
       __kasan_slab_free+0x4b/0x70
       kmem_cache_free+0x1a1/0x590
       __sk_destruct+0x388/0x5a0
       sk_psock_destroy+0x73e/0xa50
       process_one_work+0x846/0x1420
       worker_thread+0x5b3/0xf80
       kthread+0x29e/0x360
       ret_from_fork+0x2d/0x70
       ret_from_fork_asm+0x1a/0x30
      
      The buggy address belongs to the object at ffff88811f5b9080
       which belongs to the cache UNIX-STREAM of size 1984
      The buggy address is located 128 bytes inside of
       freed 1984-byte region [ffff88811f5b9080, ffff88811f5b9840)
      
      The buggy address belongs to the physical page:
      page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x11f5b8
      head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
      memcg:ffff888127d49401
      flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
      page_type: f5(slab)
      raw: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
      raw: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
      head: 0017ffffc0000040 ffff8881042e4500 dead000000000122 0000000000000000
      head: 0000000000000000 00000000800f000f 00000001f5000000 ffff888127d49401
      head: 0017ffffc0000003 ffffea00047d6e01 ffffffffffffffff 0000000000000000
      head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88811f5b9000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff88811f5b9080: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                         ^
       ffff88811f5b9180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88811f5b9200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      Disabling lock debugging due to kernel taint
      
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 14 PID: 1063 at lib/refcount.c:25 refcount_warn_saturate+0xce/0x150
      CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G    B              6.12.0+ #125
      Tainted: [B]=BAD_PAGE
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
      Workqueue: events_unbound bpf_map_free_deferred
      RIP: 0010:refcount_warn_saturate+0xce/0x150
      Code: 34 73 eb 03 01 e8 82 53 ad fe 0f 0b eb b1 80 3d 27 73 eb 03 00 75 a8 48 c7 c7 80 bd 95 84 c6 05 17 73 eb 03 01 e8 62 53 ad fe <0f> 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05
      RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
      RBP: 0000000000000002 R08: 0000000000000001 R09: ffffed10bcde6349
      R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
      R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
      FS:  0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? __warn.cold+0x5f/0x1ff
       ? refcount_warn_saturate+0xce/0x150
       ? report_bug+0x1ec/0x390
       ? handle_bug+0x58/0x90
       ? exc_invalid_op+0x13/0x40
       ? asm_exc_invalid_op+0x16/0x20
       ? refcount_warn_saturate+0xce/0x150
       sock_map_free+0x2e5/0x330
       bpf_map_free_deferred+0x173/0x320
       process_one_work+0x846/0x1420
       worker_thread+0x5b3/0xf80
       kthread+0x29e/0x360
       ret_from_fork+0x2d/0x70
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      irq event stamp: 10741
      hardirqs last  enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
      hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
      softirqs last  enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
      softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
      
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 14 PID: 1063 at lib/refcount.c:28 refcount_warn_saturate+0xee/0x150
      CPU: 14 UID: 0 PID: 1063 Comm: kworker/u64:12 Tainted: G    B   W          6.12.0+ #125
      Tainted: [B]=BAD_PAGE, [W]=WARN
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014
      Workqueue: events_unbound bpf_map_free_deferred
      RIP: 0010:refcount_warn_saturate+0xee/0x150
      Code: 17 73 eb 03 01 e8 62 53 ad fe 0f 0b eb 91 80 3d 06 73 eb 03 00 75 88 48 c7 c7 e0 bd 95 84 c6 05 f6 72 eb 03 01 e8 42 53 ad fe <0f> 0b e9 6e ff ff ff 80 3d e6 72 eb 03 00 0f 85 61 ff ff ff 48 c7
      RSP: 0018:ffff88815c49fc70 EFLAGS: 00010282
      RAX: 0000000000000000 RBX: ffff88811f5b9100 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
      RBP: 0000000000000003 R08: 0000000000000001 R09: ffffed10bcde6349
      R10: ffff8885e6f31a4b R11: 0000000000000000 R12: ffff88813be0b000
      R13: ffff88811f5b9100 R14: ffff88811f5b9080 R15: ffff88813be0b024
      FS:  0000000000000000(0000) GS:ffff8885e6f00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000055dda99b0250 CR3: 000000015dbac000 CR4: 0000000000752ef0
      PKRU: 55555554
      Call Trace:
       <TASK>
       ? __warn.cold+0x5f/0x1ff
       ? refcount_warn_saturate+0xee/0x150
       ? report_bug+0x1ec/0x390
       ? handle_bug+0x58/0x90
       ? exc_invalid_op+0x13/0x40
       ? asm_exc_invalid_op+0x16/0x20
       ? refcount_warn_saturate+0xee/0x150
       sock_map_free+0x2d3/0x330
       bpf_map_free_deferred+0x173/0x320
       process_one_work+0x846/0x1420
       worker_thread+0x5b3/0xf80
       kthread+0x29e/0x360
       ret_from_fork+0x2d/0x70
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      irq event stamp: 10741
      hardirqs last  enabled at (10741): [<ffffffff84400ec6>] asm_sysvec_apic_timer_interrupt+0x16/0x20
      hardirqs last disabled at (10740): [<ffffffff811e532d>] handle_softirqs+0x60d/0x770
      softirqs last  enabled at (10506): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
      softirqs last disabled at (10301): [<ffffffff811e55a9>] __irq_exit_rcu+0x109/0x210
      
      Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface")
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20241202-sockmap-replace-v1-3-1e88579e7bd5@rbox.co
      
      
      Signed-off-by: default avatarAlva Lan <alvalan9@foxmail.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b79a0d1e
    • Max Kellermann's avatar
      ceph: give up on paths longer than PATH_MAX · e4b168c6
      Max Kellermann authored
      
      commit 550f7ca9 upstream.
      
      If the full path to be built by ceph_mdsc_build_path() happens to be
      longer than PATH_MAX, then this function will enter an endless (retry)
      loop, effectively blocking the whole task.  Most of the machine
      becomes unusable, making this a very simple and effective DoS
      vulnerability.
      
      I cannot imagine why this retry was ever implemented, but it seems
      rather useless and harmful to me.  Let's remove it and fail with
      ENAMETOOLONG instead.
      
      Cc: stable@vger.kernel.org
      Reported-by: default avatarDario Weißer <dario@cure53.de>
      Signed-off-by: default avatarMax Kellermann <max.kellermann@ionos.com>
      Reviewed-by: default avatarAlex Markuze <amarkuze@redhat.com>
      Signed-off-by: default avatarIlya Dryomov <idryomov@gmail.com>
      [idryomov@gmail.com: backport to 6.1: pr_warn() is still in use]
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4b168c6
  2. Jan 09, 2025
Loading