Skip to content
Snippets Groups Projects
  1. Feb 16, 2024
  2. Jan 25, 2024
    • Greg Kroah-Hartman's avatar
    • Song Liu's avatar
      Revert "Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"" · 87165c64
      Song Liu authored
      
      This reverts commit bed9e27baf52a09b7ba2a3714f1e24e17ced386d.
      
      The original set [1][2] was expected to undo a suboptimal fix in [2], and
      replace it with a better fix [1]. However, as reported by Dan Moulding [2]
      causes an issue with raid5 with journal device.
      
      Revert [2] for now to close the issue. We will follow up on another issue
      reported by Juxiao Bi, as [2] is expected to fix it. We believe this is a
      good trade-off, because the latter issue happens less freqently.
      
      In the meanwhile, we will NOT revert [1], as it contains the right logic.
      
      [1] commit d6e035aad6c0 ("md: bypass block throttle for superblock update")
      [2] commit bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")
      
      Reported-by: default avatarDan Moulding <dan@danm.net>
      Closes: https://lore.kernel.org/linux-raid/20240123005700.9302-1-dan@danm.net/
      
      
      Fixes: bed9e27baf52 ("Revert "md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d"")
      Cc: stable@vger.kernel.org # v5.19+
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Yu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarSong Liu <song@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      87165c64
    • Sjoerd Simons's avatar
      arm64: dts: armada-3720-turris-mox: set irq type for RTC · cba7f07e
      Sjoerd Simons authored
      
      commit fca8a117c1c9a0f8b8feed117db34cf58134dc2c upstream.
      
      The rtc on the mox shares its interrupt line with the moxtet bus. Set
      the interrupt type to be consistent between both devices. This ensures
      correct setup of the interrupt line regardless of probing order.
      
      Signed-off-by: default avatarSjoerd Simons <sjoerd@collabora.com>
      Cc: <stable@vger.kernel.org> # v6.2+
      Fixes: 21aad8ba ("arm64: dts: armada-3720-turris-mox: Add missing interrupt for RTC")
      Reviewed-by: default avatarMarek Behún <kabel@kernel.org>
      Signed-off-by: default avatarGregory CLEMENT <gregory.clement@bootlin.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cba7f07e
    • Mimi Zohar's avatar
      Revert "KEYS: encrypted: Add check for strsep" · b5beb861
      Mimi Zohar authored
      
      commit 1ed4b563100230ea68821a2b25a3d9f25388a3e6 upstream.
      
      This reverts commit b4af096b5df5dd131ab796c79cedc7069d8f4882.
      
      New encrypted keys are created either from kernel-generated random
      numbers or user-provided decrypted data.  Revert the change requiring
      user-provided decrypted data.
      
      Reported-by: default avatarVishal Verma <vishal.l.verma@intel.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.ibm.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b5beb861
    • Marek Szyprowski's avatar
      i2c: s3c24xx: fix transferring more than one message in polling mode · 63dee5ad
      Marek Szyprowski authored
      
      [ Upstream commit 990489e1042c6c5d6bccf56deca68f8dbeed8180 ]
      
      To properly handle ACK on the bus when transferring more than one
      message in polling mode, move the polling handling loop from
      s3c24xx_i2c_message_start() to s3c24xx_i2c_doxfer(). This way
      i2c_s3c_irq_nextbyte() is always executed till the end, properly
      acknowledging the IRQ bits and no recursive calls to
      i2c_s3c_irq_nextbyte() are made.
      
      While touching this, also fix finishing transfers in polling mode by
      using common code path and always waiting for the bus to become idle
      and disabled.
      
      Fixes: 117053f7 ("i2c: s3c2410: Add polling mode support")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      63dee5ad
    • Marek Szyprowski's avatar
      i2c: s3c24xx: fix read transfers in polling mode · a4f8ee0f
      Marek Szyprowski authored
      
      [ Upstream commit 0d9cf23ed55d7ba3ab26d617a3ae507863674c8f ]
      
      To properly handle read transfers in polling mode, no waiting for the ACK
      state is needed as it will never come. Just wait a bit to ensure start
      state is on the bus and continue processing next bytes.
      
      Fixes: 117053f7 ("i2c: s3c2410: Add polling mode support")
      Signed-off-by: default avatarMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: default avatarChanho Park <chanho61.park@samsung.com>
      Reviewed-by: default avatarAndi Shyti <andi.shyti@kernel.org>
      Signed-off-by: default avatarWolfram Sang <wsa@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a4f8ee0f
    • Nikita Zhandarovich's avatar
      ipv6: mcast: fix data-race in ipv6_mc_down / mld_ifc_work · 3bb58496
      Nikita Zhandarovich authored
      
      [ Upstream commit 2e7ef287f07c74985f1bf2858bedc62bd9ebf155 ]
      
      idev->mc_ifc_count can be written over without proper locking.
      
      Originally found by syzbot [1], fix this issue by encapsulating calls
      to mld_ifc_stop_work() (and mld_gq_stop_work() for good measure) with
      mutex_lock() and mutex_unlock() accordingly as these functions
      should only be called with mc_lock per their declarations.
      
      [1]
      BUG: KCSAN: data-race in ipv6_mc_down / mld_ifc_work
      
      write to 0xffff88813a80c832 of 1 bytes by task 3771 on cpu 0:
       mld_ifc_stop_work net/ipv6/mcast.c:1080 [inline]
       ipv6_mc_down+0x10a/0x280 net/ipv6/mcast.c:2725
       addrconf_ifdown+0xe32/0xf10 net/ipv6/addrconf.c:3949
       addrconf_notify+0x310/0x980
       notifier_call_chain kernel/notifier.c:93 [inline]
       raw_notifier_call_chain+0x6b/0x1c0 kernel/notifier.c:461
       __dev_notify_flags+0x205/0x3d0
       dev_change_flags+0xab/0xd0 net/core/dev.c:8685
       do_setlink+0x9f6/0x2430 net/core/rtnetlink.c:2916
       rtnl_group_changelink net/core/rtnetlink.c:3458 [inline]
       __rtnl_newlink net/core/rtnetlink.c:3717 [inline]
       rtnl_newlink+0xbb3/0x1670 net/core/rtnetlink.c:3754
       rtnetlink_rcv_msg+0x807/0x8c0 net/core/rtnetlink.c:6558
       netlink_rcv_skb+0x126/0x220 net/netlink/af_netlink.c:2545
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:6576
       netlink_unicast_kernel net/netlink/af_netlink.c:1342 [inline]
       netlink_unicast+0x589/0x650 net/netlink/af_netlink.c:1368
       netlink_sendmsg+0x66e/0x770 net/netlink/af_netlink.c:1910
       ...
      
      write to 0xffff88813a80c832 of 1 bytes by task 22 on cpu 1:
       mld_ifc_work+0x54c/0x7b0 net/ipv6/mcast.c:2653
       process_one_work kernel/workqueue.c:2627 [inline]
       process_scheduled_works+0x5b8/0xa30 kernel/workqueue.c:2700
       worker_thread+0x525/0x730 kernel/workqueue.c:2781
       ...
      
      Fixes: 2d9a93b4 ("mld: convert from timer to delayed work")
      Reported-by: default avatar <syzbot+a9400cabb1d784e49abf@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/all/000000000000994e09060ebcdffb@google.com/
      
      
      Signed-off-by: default avatarNikita Zhandarovich <n.zhandarovich@fintech.ru>
      Acked-by: default avatarTaehee Yoo <ap420073@gmail.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Link: https://lore.kernel.org/r/20240117172102.12001-1-n.zhandarovich@fintech.ru
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3bb58496
    • Amit Cohen's avatar
      selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes · ff21a0e2
      Amit Cohen authored
      
      [ Upstream commit b34f4de6d30cbaa8fed905a5080b6eace8c84dc7 ]
      
      'qos_pfc' test checks PFC behavior. The idea is to limit the traffic
      using a shaper somewhere in the flow of the packets. In this area, the
      buffer is smaller than the buffer at the beginning of the flow, so it fills
      up until there is no more space left. The test configures there PFC
      which is supposed to notice that the headroom is filling up and send PFC
      Xoff to indicate the transmitter to stop sending traffic for the priorities
      sharing this PG.
      
      The Xon/Xoff threshold is auto-configured and always equal to
      2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet,
      traffic will keep arriving until the transmitter receives and processes
      the PFC packet. This amount of traffic is known as the PFC delay allowance.
      
      Currently the buffer for the delay traffic is configured as 100KB. The
      MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB.
      This allows 80KB extra to be stored in this buffer.
      
      8-lane ports use two buffers among which the configured buffer is split,
      the Xoff threshold then applies to each buffer in parallel.
      
      The test does not take into account the behavior of 8-lane ports, when the
      ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes,
      packets are dropped and the test fails.
      
      Check if the relevant ports use 8 lanes, in such case double the size of
      the buffer, as the headroom is split half-half.
      
      Cc: Shuah Khan <shuah@kernel.org>
      Fixes: bfa80478 ("selftests: mlxsw: Add a PFC test")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/23ff11b7dff031eb04a41c0f5254a2b636cd8ebb.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ff21a0e2
    • Petr Machata's avatar
      mlxsw: spectrum_router: Register netdevice notifier before nexthop · c5aa144f
      Petr Machata authored
      
      [ Upstream commit 62bef63646c194e0f82b40304a0f2d060b28687b ]
      
      If there are IPIP nexthops at the time when the driver is loaded (or the
      devlink instance reloaded), the driver looks up the corresponding IPIP
      entry. But IPIP entries are only created as a result of netdevice
      notifications. Since the netdevice notifier is registered after the nexthop
      notifier, mlxsw_sp_nexthop_type_init() never finds the IPIP entry,
      registers the nexthop MLXSW_SP_NEXTHOP_TYPE_ETH, and fails to assign a CRIF
      to the nexthop. Later on when the CRIF is necessary, the WARN_ON in
      mlxsw_sp_nexthop_rif() triggers, causing the splat [1].
      
      In order to fix the issue, reorder the netdevice notifier to be registered
      before the nexthop one.
      
      [1] (edited for clarity):
      
          WARNING: CPU: 1 PID: 1364 at drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3245 mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum
          Hardware name: Mellanox Technologies Ltd. MSN4410/VMOD0010, BIOS 5.11 01/06/2019
          Call Trace:
          ? mlxsw_sp_nexthop_rif (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3246 (discriminator 1)) mlxsw_spectrum
          __mlxsw_sp_nexthop_eth_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3637) mlxsw_spectrum
          mlxsw_sp_nexthop_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3679 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3727) mlxsw_spectrum
          mlxsw_sp_nexthop_group_update (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:3757) mlxsw_spectrum
          mlxsw_sp_nexthop_group_refresh (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:4112) mlxsw_spectrum
          mlxsw_sp_nexthop_obj_event (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5118 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5191 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5315 drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:5500) mlxsw_spectrum
          nexthops_dump (net/ipv4/nexthop.c:217 net/ipv4/nexthop.c:440 net/ipv4/nexthop.c:3609)
          register_nexthop_notifier (net/ipv4/nexthop.c:3624)
          mlxsw_sp_router_init (drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c:11486) mlxsw_spectrum
          mlxsw_sp_init (drivers/net/ethernet/mellanox/mlxsw/spectrum.c:3267) mlxsw_spectrum
          __mlxsw_core_bus_device_register (drivers/net/ethernet/mellanox/mlxsw/core.c:2202) mlxsw_core
          mlxsw_devlink_core_bus_device_reload_up (drivers/net/ethernet/mellanox/mlxsw/core.c:2265 drivers/net/ethernet/mellanox/mlxsw/core.c:1603) mlxsw_core
          devlink_reload (net/devlink/dev.c:314 net/devlink/dev.c:475)
          [...]
      
      Fixes: 9464a3d6 ("mlxsw: spectrum_router: Track next hops at CRIFs")
      Reported-by: default avatarMaksym Yaremchuk <maksymy@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Reviewed-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/74edb8d45d004e8d8f5318eede6ccc3d786d8ba9.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5aa144f
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Fix stack corruption · a361c2c1
      Ido Schimmel authored
      
      [ Upstream commit 483ae90d8f976f8339cf81066312e1329f2d3706 ]
      
      When tc filters are first added to a net device, the corresponding local
      port gets bound to an ACL group in the device. The group contains a list
      of ACLs. In turn, each ACL points to a different TCAM region where the
      filters are stored. During forwarding, the ACLs are sequentially
      evaluated until a match is found.
      
      One reason to place filters in different regions is when they are added
      with decreasing priorities and in an alternating order so that two
      consecutive filters can never fit in the same region because of their
      key usage.
      
      In Spectrum-2 and newer ASICs the firmware started to report that the
      maximum number of ACLs in a group is more than 16, but the layout of the
      register that configures ACL groups (PAGT) was not updated to account
      for that. It is therefore possible to hit stack corruption [1] in the
      rare case where more than 16 ACLs in a group are required.
      
      Fix by limiting the maximum ACL group size to the minimum between what
      the firmware reports and the maximum ACLs that fit in the PAGT register.
      
      Add a test case to make sure the machine does not crash when this
      condition is hit.
      
      [1]
      Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
      [...]
       dump_stack_lvl+0x36/0x50
       panic+0x305/0x330
       __stack_chk_fail+0x15/0x20
       mlxsw_sp_acl_tcam_group_update+0x116/0x120
       mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
       mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
       mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
       mlxsw_sp_acl_rule_add+0x47/0x240
       mlxsw_sp_flower_replace+0x1a9/0x1d0
       tc_setup_cb_add+0xdc/0x1c0
       fl_hw_replace_filter+0x146/0x1f0
       fl_change+0xc17/0x1360
       tc_new_tfilter+0x472/0xb90
       rtnetlink_rcv_msg+0x313/0x3b0
       netlink_rcv_skb+0x58/0x100
       netlink_unicast+0x244/0x390
       netlink_sendmsg+0x1e4/0x440
       ____sys_sendmsg+0x164/0x260
       ___sys_sendmsg+0x9a/0xe0
       __sys_sendmsg+0x7a/0xc0
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Reported-by: default avatarOrel Hagag <orelh@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/2d91c89afba59c22587b444994ae419dbea8d876.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a361c2c1
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path · d0a1efe4
      Ido Schimmel authored
      
      [ Upstream commit efeb7dfea8ee10cdec11b6b6ba4e405edbe75809 ]
      
      When calling mlxsw_sp_acl_tcam_region_destroy() from an error path after
      failing to attach the region to an ACL group, we hit a NULL pointer
      dereference upon 'region->group->tcam' [1].
      
      Fix by retrieving the 'tcam' pointer using mlxsw_sp_acl_to_tcam().
      
      [1]
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      [...]
      RIP: 0010:mlxsw_sp_acl_tcam_region_destroy+0xa0/0xd0
      [...]
      Call Trace:
       mlxsw_sp_acl_tcam_vchunk_get+0x88b/0xa20
       mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
       mlxsw_sp_acl_rule_add+0x47/0x240
       mlxsw_sp_flower_replace+0x1a9/0x1d0
       tc_setup_cb_add+0xdc/0x1c0
       fl_hw_replace_filter+0x146/0x1f0
       fl_change+0xc17/0x1360
       tc_new_tfilter+0x472/0xb90
       rtnetlink_rcv_msg+0x313/0x3b0
       netlink_rcv_skb+0x58/0x100
       netlink_unicast+0x244/0x390
       netlink_sendmsg+0x1e4/0x440
       ____sys_sendmsg+0x164/0x260
       ___sys_sendmsg+0x9a/0xe0
       __sys_sendmsg+0x7a/0xc0
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fixes: 22a67766 ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/fb6a4542bbc9fcab5a523802d97059bffbca7126.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d0a1efe4
    • Amit Cohen's avatar
      mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure · 1a720f3e
      Amit Cohen authored
      
      [ Upstream commit 6d6eeabcfaba2fcadf5443b575789ea606f9de83 ]
      
      Lately, a bug was found when many TC filters are added - at some point,
      several bugs are printed to dmesg [1] and the switch is crashed with
      segmentation fault.
      
      The issue starts when gen_pool_free() fails because of unexpected
      behavior - a try to free memory which is already freed, this leads to BUG()
      call which crashes the switch and makes many other bugs.
      
      Trying to track down the unexpected behavior led to a bug in eRP code. The
      function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
      index, sets the value and returns an error code. When gen_pool_alloc()
      fails it returns address 0, we track it and return -ENOBUFS outside, BUT
      the call for gen_pool_alloc() already override the index in erp_table
      structure. This is a problem when such allocation is done as part of
      table expansion. This is not a new table, which will not be used in case
      of allocation failure. We try to expand eRP table and override the
      current index (non-zero) with zero. Then, it leads to an unexpected
      behavior when address 0 is freed twice. Note that address 0 is valid in
      erp_table->base_index and indeed other tables use it.
      
      gen_pool_alloc() fails in case that there is no space left in the
      pre-allocated pool, in our case, the pool is limited to
      ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
      erp entries are required, we exceed the limit and return an error, this
      error leads to "Failed to migrate vregion" print.
      
      Fix this by changing erp_table->base_index only in case of a successful
      allocation.
      
      Add a test case for such a scenario. Without this fix it causes
      segmentation fault:
      
      $ TESTS="max_erp_entries_test" ./tc_flower.sh
      ./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null
      
      [1]:
      kernel BUG at lib/genalloc.c:508!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
      Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
      RIP: 0010:gen_pool_free_owner+0xc9/0xe0
      ...
      Call Trace:
       <TASK>
       __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
       mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
       objagg_obj_root_destroy+0x18/0x80 [objagg]
       objagg_obj_destroy+0x12c/0x130 [objagg]
       mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
       mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
       mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
       mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
       mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
       mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
       tc_setup_cb_destroy+0xc1/0x180
       fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
       __fl_delete+0x1ac/0x1c0 [cls_flower]
       fl_destroy+0xc2/0x150 [cls_flower]
       tcf_proto_destroy+0x1a/0xa0
      ...
      mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
      mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
      
      Fixes: f465261a ("mlxsw: spectrum_acl: Implement common eRP core")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1a720f3e
    • Christoph Hellwig's avatar
      loop: fix the the direct I/O support check when used on top of block devices · df4bb784
      Christoph Hellwig authored
      
      [ Upstream commit baa7d536077dcdfe2b70c476a8873d1745d3de0f ]
      
      __loop_update_dio only checks the alignment requirement for block backed
      file systems, but misses them for the case where the loop device is
      created directly on top of another block device.  Due to this creating
      a loop device with default option plus the direct I/O flag on a > 512 byte
      sector size file system will lead to incorrect I/O being submitted to the
      lower block device and a lot of error from the lock layer.  This can
      be seen with xfstests generic/563.
      
      Fix the code in __loop_update_dio by factoring the alignment check into
      a helper, and calling that also for the struct block_device of a block
      device inode.
      
      Also remove the TODO comment talking about dynamically switching between
      buffered and direct I/O, which is a would be a recipe for horrible
      performance and occasional data loss.
      
      Fixes: 2e5ab5f3 ("block: loop: prepare for supporing direct IO")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20240117175901.871796-1-hch@lst.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      df4bb784
    • Ludvig Pärsson's avatar
      ethtool: netlink: Add missing ethnl_ops_begin/complete · 21d86d37
      Ludvig Pärsson authored
      
      [ Upstream commit f1172f3ee3a98754d95b968968920a7d03fdebcc ]
      
      Accessing an ethernet device that is powered off or clock gated might
      cause the CPU to hang. Add ethnl_ops_begin/complete in
      ethnl_set_features() to protect against this.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: default avatarLudvig Pärsson <ludvig.parsson@axis.com>
      Link: https://lore.kernel.org/r/20240117-etht2-v2-1-1a96b6e8c650@axis.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21d86d37
    • Mark Brown's avatar
      arm64/ptrace: Don't flush ZA/ZT storage when writing ZA via ptrace · ae836b7f
      Mark Brown authored
      
      [ Upstream commit b7c510d049049409e8945b932f4b0b357fa17415 ]
      
      When writing ZA we currently unconditionally flush the buffer used to store
      it as part of ensuring that it is allocated. Since this buffer is shared
      with ZT0 this means that a write to ZA when PSTATE.ZA is already set will
      corrupt the value of ZT0 on a SME2 system. Fix this by only flushing the
      backing storage if PSTATE.ZA was not previously set.
      
      This will mean that short or failed writes may leave stale data in the
      buffer, this seems as correct as our current behaviour and unlikely to be
      something that userspace will rely on.
      
      Fixes: f90b529b ("arm64/sme: Implement ZT0 ptrace support")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240115-arm64-fix-ptrace-za-zt-v1-1-48617517028a@kernel.org
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae836b7f
    • Christophe JAILLET's avatar
      kdb: Fix a potential buffer overflow in kdb_local() · 4daed382
      Christophe JAILLET authored
      [ Upstream commit 4f41d30cd6dc865c3cbc1a852372321eba6d4e4c ]
      
      When appending "[defcmd]" to 'kdb_prompt_str', the size of the string
      already in the buffer should be taken into account.
      
      An option could be to switch from strncat() to strlcat() which does the
      correct test to avoid such an overflow.
      
      However, this actually looks as dead code, because 'defcmd_in_progress'
      can't be true here.
      See a more detailed explanation at [1].
      
      [1]: https://lore.kernel.org/all/CAD=FV=WSh7wKN7Yp-3wWiDgX4E3isQ8uh0LCzTmd1v9Cg9j+nQ@mail.gmail.com/
      
      
      
      Fixes: 5d5314d6 ("kdb: core for kgdb back end (1 of 2)")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4daed382
    • Pavel Begunkov's avatar
      io_uring: adjust defer tw counting · e24bf5b4
      Pavel Begunkov authored
      
      [ Upstream commit dc12d1799ce710fd90abbe0ced71e7e1ae0894fc ]
      
      The UINT_MAX work item counting bias in io_req_local_work_add() in case
      of !IOU_F_TWQ_LAZY_WAKE works in a sense that we will not miss a wake up,
      however it's still eerie. In particular, if we add a lazy work item
      after a non-lazy one, we'll increment it and get nr_tw==0, and
      subsequent adds may try to unnecessarily wake up the task, which is
      though not so likely to happen in real workloads.
      
      Half the bias, it's still large enough to be larger than any valid
      ->cq_wait_nr, which is limited by IORING_MAX_CQ_ENTRIES, but further
      have a good enough of space before it overflows.
      
      Fixes: 8751d154 ("io_uring: reduce scheduling due to tw")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/108b971e958deaf7048342930c341ba90f75d806.1705438669.git.asml.silence@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e24bf5b4
    • Fedor Pchelkin's avatar
      ipvs: avoid stat macros calls from preemptible context · c149cc7c
      Fedor Pchelkin authored
      
      [ Upstream commit d6938c1c76c64f42363d0d1f051e1b4641c2ad40 ]
      
      Inside decrement_ttl() upon discovering that the packet ttl has exceeded,
      __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible
      context having the following backtrace:
      
      check_preemption_disabled: 48 callbacks suppressed
      BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177
      caller is decrement_ttl+0x217/0x830
      CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ #34
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0xbd/0xe0
       check_preemption_disabled+0xd1/0xe0
       decrement_ttl+0x217/0x830
       __ip_vs_get_out_rt+0x4e0/0x1ef0
       ip_vs_nat_xmit+0x205/0xcd0
       ip_vs_in_hook+0x9b1/0x26a0
       nf_hook_slow+0xc2/0x210
       nf_hook+0x1fb/0x770
       __ip_local_out+0x33b/0x640
       ip_local_out+0x2a/0x490
       __ip_queue_xmit+0x990/0x1d10
       __tcp_transmit_skb+0x288b/0x3d10
       tcp_connect+0x3466/0x5180
       tcp_v4_connect+0x1535/0x1bb0
       __inet_stream_connect+0x40d/0x1040
       inet_stream_connect+0x57/0xa0
       __sys_connect_file+0x162/0x1a0
       __sys_connect+0x137/0x160
       __x64_sys_connect+0x72/0xb0
       do_syscall_64+0x6f/0x140
       entry_SYSCALL_64_after_hwframe+0x6e/0x76
      RIP: 0033:0x7fe6dbbc34e0
      
      Use the corresponding preemption-aware variants: IP_INC_STATS and
      IP6_INC_STATS.
      
      Found by Linux Verification Center (linuxtesting.org).
      
      Fixes: 8d8e20e2 ("ipvs: Decrement ttl")
      Signed-off-by: default avatarFedor Pchelkin <pchelkin@ispras.ru>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c149cc7c
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description · ce2189d3
      Pablo Neira Ayuso authored
      
      [ Upstream commit 113661e07460a6604aacc8ae1b23695a89e7d4b3 ]
      
      It is still possible to set on the NFT_SET_CONCAT flag by specifying a
      set size and no field description, report EINVAL in such case.
      
      Fixes: 1b6345d4 ("netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ce2189d3
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: skip dead set elements in netlink dump · 9f025447
      Pablo Neira Ayuso authored
      
      [ Upstream commit 6b1ca88e4bb63673dc9f9c7f23c899f22c3cb17a ]
      
      Delete from packet path relies on the garbage collector to purge
      elements with NFT_SET_ELEM_DEAD_BIT on.
      
      Skip these dead elements from nf_tables_dump_setelem() path, I very
      rarely see tests/shell/testcases/maps/typeof_maps_add_delete reports
      [DUMP FAILED] showing a mismatch in the expected output with an element
      that should not be there.
      
      If the netlink dump happens before GC worker run, it might show dead
      elements in the ruleset listing.
      
      nft_rhash_get() already skips dead elements in nft_rhash_cmp(),
      therefore, it already does not show the element when getting a single
      element via netlink control plane.
      
      Fixes: 5f68718b ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9f025447
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: do not allow mismatch field size and set key length · ff67e3e4
      Pablo Neira Ayuso authored
      
      [ Upstream commit 3ce67e3793f48c1b9635beb9bb71116ca1e51b58 ]
      
      The set description provides the size of each field in the set whose sum
      should not mismatch the set key length, bail out otherwise.
      
      I did not manage to crash nft_set_pipapo with mismatch fields and set key
      length so far, but this is UB which must be disallowed.
      
      Fixes: f3a2181e ("netfilter: nf_tables: Support for sets with multiple ranged fields")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ff67e3e4
    • Pavel Tikhomirov's avatar
      netfilter: bridge: replace physindev with physinif in nf_bridge_info · 544add1f
      Pavel Tikhomirov authored
      
      [ Upstream commit 9874808878d9eed407e3977fd11fee49de1e1d86 ]
      
      An skb can be added to a neigh->arp_queue while waiting for an arp
      reply. Where original skb's skb->dev can be different to neigh's
      neigh->dev. For instance in case of bridging dnated skb from one veth to
      another, the skb would be added to a neigh->arp_queue of the bridge.
      
      As skb->dev can be reset back to nf_bridge->physindev and used, and as
      there is no explicit mechanism that prevents this physindev from been
      freed under us (for instance neigh_flush_dev doesn't cleanup skbs from
      different device's neigh queue) we can crash on e.g. this stack:
      
      arp_process
        neigh_update
          skb = __skb_dequeue(&neigh->arp_queue)
            neigh_resolve_output(..., skb)
              ...
                br_nf_dev_xmit
                  br_nf_pre_routing_finish_bridge_slow
                    skb->dev = nf_bridge->physindev
                    br_handle_frame_finish
      
      Let's use plain ifindex instead of net_device link. To peek into the
      original net_device we will use dev_get_by_index_rcu(). Thus either we
      get device and are safe to use it or we don't get it and drop skb.
      
      Fixes: c4e70a87 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      544add1f
    • Pavel Tikhomirov's avatar
      netfilter: propagate net to nf_bridge_get_physindev · eb417043
      Pavel Tikhomirov authored
      
      [ Upstream commit a54e72197037d2c9bfcd70dddaac8c8ccb5b41ba ]
      
      This is a preparation patch for replacing physindev with physinif on
      nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve
      device, when needed, and it requires net to be available.
      
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: 9874808878d9 ("netfilter: bridge: replace physindev with physinif in nf_bridge_info")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eb417043
    • Pavel Tikhomirov's avatar
      netfilter: nf_queue: remove excess nf_bridge variable · 10849493
      Pavel Tikhomirov authored
      
      [ Upstream commit aeaa44075f8e49e2e0ad4507d925e690b7950145 ]
      
      We don't really need nf_bridge variable here. And nf_bridge_info_exists
      is better replacement for nf_bridge_info_get in case we are only
      checking for existence.
      
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: 9874808878d9 ("netfilter: bridge: replace physindev with physinif in nf_bridge_info")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      10849493
    • Pavel Tikhomirov's avatar
      netfilter: nfnetlink_log: use proper helper for fetching physinif · 0a12e679
      Pavel Tikhomirov authored
      
      [ Upstream commit c3f9fd54cd87233f53bdf0e191a86b3a5e960e02 ]
      
      We don't use physindev in __build_packet_message except for getting
      physinif from it. So let's switch to nf_bridge_get_physinif to get what
      we want directly.
      
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: 9874808878d9 ("netfilter: bridge: replace physindev with physinif in nf_bridge_info")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0a12e679
    • Pablo Neira Ayuso's avatar
      netfilter: nft_limit: do not ignore unsupported flags · ae6c0543
      Pablo Neira Ayuso authored
      
      [ Upstream commit 91a139cee1202a4599a380810d93c69b5bac6197 ]
      
      Bail out if userspace provides unsupported flags, otherwise future
      extensions to the limit expression will be silently ignored by the
      kernel.
      
      Fixes: c7862a5f ("netfilter: nft_limit: allow to invert matching criteria")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae6c0543
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject invalid set policy · 7d2d0393
      Pablo Neira Ayuso authored
      
      [ Upstream commit 0617c3de9b4026b87be12b0cb5c35f42c7c66fcb ]
      
      Report -EINVAL in case userspace provides a unsupported set backend
      policy.
      
      Fixes: c50b960c ("netfilter: nf_tables: implement proper set selection")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7d2d0393
    • Jakub Kicinski's avatar
      net: netdevsim: don't try to destroy PHC on VFs · c5068e44
      Jakub Kicinski authored
      
      [ Upstream commit ea937f77208323d35ffe2f8d8fc81b00118bfcda ]
      
      PHC gets initialized in nsim_init_netdevsim(), which
      is only called if (nsim_dev_port_is_pf()).
      
      Create a counterpart of nsim_init_netdevsim() and
      move the mock_phc_destroy() there.
      
      This fixes a crash trying to destroy netdevsim with
      VFs instantiated, as caught by running the devlink.sh test:
      
          BUG: kernel NULL pointer dereference, address: 00000000000000b8
          RIP: 0010:mock_phc_destroy+0xd/0x30
          Call Trace:
           <TASK>
           nsim_destroy+0x4a/0x70 [netdevsim]
           __nsim_dev_port_del+0x47/0x70 [netdevsim]
           nsim_dev_reload_destroy+0x105/0x120 [netdevsim]
           nsim_drv_remove+0x2f/0xb0 [netdevsim]
           device_release_driver_internal+0x1a1/0x210
           bus_remove_device+0xd5/0x120
           device_del+0x159/0x490
           device_unregister+0x12/0x30
           del_device_store+0x11a/0x1a0 [netdevsim]
           kernfs_fop_write_iter+0x130/0x1d0
           vfs_write+0x30b/0x4b0
           ksys_write+0x69/0xf0
           do_syscall_64+0xcc/0x1e0
           entry_SYSCALL_64_after_hwframe+0x6f/0x77
      
      Fixes: b63e78fc ("net: netdevsim: use mock PHC driver")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5068e44
    • Paolo Abeni's avatar
      mptcp: relax check on MPC passive fallback · 77c63a08
      Paolo Abeni authored
      
      [ Upstream commit c0f5aec28edf98906d28f08daace6522adf9ee7a ]
      
      While testing the blamed commit below, I was able to miss (!)
      packetdrill failures in the fastopen test-cases.
      
      On passive fastopen the child socket is created by incoming TCP MPC syn,
      allow for both MPC_SYN and MPC_ACK header.
      
      Fixes: 724b00c12957 ("mptcp: refine opt_mp_capable determination")
      Reviewed-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      77c63a08
    • Hengqi Chen's avatar
      LoongArch: BPF: Prevent out-of-bounds memory access · 7924ade1
      Hengqi Chen authored
      
      [ Upstream commit 36a87385e31c9343af9a4756598e704741250a67 ]
      
      The test_tag test triggers an unhandled page fault:
      
        # ./test_tag
        [  130.640218] CPU 0 Unable to handle kernel paging request at virtual address ffff80001b898004, era == 9000000003137f7c, ra == 9000000003139e70
        [  130.640501] Oops[#3]:
        [  130.640553] CPU: 0 PID: 1326 Comm: test_tag Tainted: G      D    O       6.7.0-rc4-loong-devel-gb62ab1a397cf #47 61985c1d94084daa2432f771daa45b56b10d8d2a
        [  130.640764] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
        [  130.640874] pc 9000000003137f7c ra 9000000003139e70 tp 9000000104cb4000 sp 9000000104cb7a40
        [  130.641001] a0 ffff80001b894000 a1 ffff80001b897ff8 a2 000000006ba210be a3 0000000000000000
        [  130.641128] a4 000000006ba210be a5 00000000000000f1 a6 00000000000000b3 a7 0000000000000000
        [  130.641256] t0 0000000000000000 t1 00000000000007f6 t2 0000000000000000 t3 9000000004091b70
        [  130.641387] t4 000000006ba210be t5 0000000000000004 t6 fffffffffffffff0 t7 90000000040913e0
        [  130.641512] t8 0000000000000005 u0 0000000000000dc0 s9 0000000000000009 s0 9000000104cb7ae0
        [  130.641641] s1 00000000000007f6 s2 0000000000000009 s3 0000000000000095 s4 0000000000000000
        [  130.641771] s5 ffff80001b894000 s6 ffff80001b897fb0 s7 9000000004090c50 s8 0000000000000000
        [  130.641900]    ra: 9000000003139e70 build_body+0x1fcc/0x4988
        [  130.642007]   ERA: 9000000003137f7c build_body+0xd8/0x4988
        [  130.642112]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
        [  130.642261]  PRMD: 00000004 (PPLV0 +PIE -PWE)
        [  130.642353]  EUEN: 00000003 (+FPE +SXE -ASXE -BTE)
        [  130.642458]  ECFG: 00071c1c (LIE=2-4,10-12 VS=7)
        [  130.642554] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
        [  130.642658]  BADV: ffff80001b898004
        [  130.642719]  PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
        [  130.642815] Modules linked in: [last unloaded: bpf_testmod(O)]
        [  130.642924] Process test_tag (pid: 1326, threadinfo=00000000f7f4015f, task=000000006499f9fd)
        [  130.643062] Stack : 0000000000000000 9000000003380724 0000000000000000 0000000104cb7be8
        [  130.643213]         0000000000000000 25af8d9b6e600558 9000000106250ea0 9000000104cb7ae0
        [  130.643378]         0000000000000000 0000000000000000 9000000104cb7be8 90000000049f6000
        [  130.643538]         0000000000000090 9000000106250ea0 ffff80001b894000 ffff80001b894000
        [  130.643685]         00007ffffb917790 900000000313ca94 0000000000000000 0000000000000000
        [  130.643831]         ffff80001b894000 0000000000000ff7 0000000000000000 9000000100468000
        [  130.643983]         0000000000000000 0000000000000000 0000000000000040 25af8d9b6e600558
        [  130.644131]         0000000000000bb7 ffff80001b894048 0000000000000000 0000000000000000
        [  130.644276]         9000000104cb7be8 90000000049f6000 0000000000000090 9000000104cb7bdc
        [  130.644423]         ffff80001b894000 0000000000000000 00007ffffb917790 90000000032acfb0
        [  130.644572]         ...
        [  130.644629] Call Trace:
        [  130.644641] [<9000000003137f7c>] build_body+0xd8/0x4988
        [  130.644785] [<900000000313ca94>] bpf_int_jit_compile+0x228/0x4ec
        [  130.644891] [<90000000032acfb0>] bpf_prog_select_runtime+0x158/0x1b0
        [  130.645003] [<90000000032b3504>] bpf_prog_load+0x760/0xb44
        [  130.645089] [<90000000032b6744>] __sys_bpf+0xbb8/0x2588
        [  130.645175] [<90000000032b8388>] sys_bpf+0x20/0x2c
        [  130.645259] [<9000000003f6ab38>] do_syscall+0x7c/0x94
        [  130.645369] [<9000000003121c5c>] handle_syscall+0xbc/0x158
        [  130.645507]
        [  130.645539] Code: 380839f6  380831f9  28412bae <24000ca6> 004081ad  0014cb50  004083e8  02bff34c  58008e91
        [  130.645729]
        [  130.646418] ---[ end trace 0000000000000000 ]---
      
      On my machine, which has CONFIG_PAGE_SIZE_16KB=y, the test failed at
      loading a BPF prog with 2039 instructions:
      
        prog = (struct bpf_prog *)ffff80001b894000
        insn = (struct bpf_insn *)(prog->insnsi)ffff80001b894048
        insn + 2039 = (struct bpf_insn *)ffff80001b898000 <- end of the page
      
      In the build_insn() function, we are trying to access next instruction
      unconditionally, i.e. `(insn + 1)->imm`. The address lies in the next
      page and can be not owned by the current process, thus an page fault is
      inevitable and then segfault.
      
      So, let's access next instruction only under `dst = imm64` context.
      
      With this fix, we have:
      
        # ./test_tag
        test_tag: OK (40945 tests)
      
      Fixes: bbfddb90 ("LoongArch: BPF: Avoid declare variables in switch-case")
      Tested-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHengqi Chen <hengqi.chen@gmail.com>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7924ade1
    • Kunwu Chan's avatar
      net: dsa: vsc73xx: Add null pointer check to vsc73xx_gpio_probe · 91f9ecae
      Kunwu Chan authored
      
      [ Upstream commit 776dac5a662774f07a876b650ba578d0a62d20db ]
      
      devm_kasprintf() returns a pointer to dynamically allocated memory
      which can be NULL upon failure.
      
      Fixes: 05bd97fc ("net: dsa: Add Vitesse VSC73xx DSA router driver")
      Signed-off-by: default avatarKunwu Chan <chentao@kylinos.cn>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240111072018.75971-1-chentao@kylinos.cn
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      91f9ecae
    • Hao Sun's avatar
      bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS · 1b500d5d
      Hao Sun authored
      
      [ Upstream commit 22c7fa171a02d310e3a3f6ed46a698ca8a0060ed ]
      
      For PTR_TO_FLOW_KEYS, check_flow_keys_access() only uses fixed off
      for validation. However, variable offset ptr alu is not prohibited
      for this ptr kind. So the variable offset is not checked.
      
      The following prog is accepted:
      
        func#0 @0
        0: R1=ctx() R10=fp0
        0: (bf) r6 = r1                       ; R1=ctx() R6_w=ctx()
        1: (79) r7 = *(u64 *)(r6 +144)        ; R6_w=ctx() R7_w=flow_keys()
        2: (b7) r8 = 1024                     ; R8_w=1024
        3: (37) r8 /= 1                       ; R8_w=scalar()
        4: (57) r8 &= 1024                    ; R8_w=scalar(smin=smin32=0,
        smax=umax=smax32=umax32=1024,var_off=(0x0; 0x400))
        5: (0f) r7 += r8
        mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1
        mark_precise: frame0: regs=r8 stack= before 4: (57) r8 &= 1024
        mark_precise: frame0: regs=r8 stack= before 3: (37) r8 /= 1
        mark_precise: frame0: regs=r8 stack= before 2: (b7) r8 = 1024
        6: R7_w=flow_keys(smin=smin32=0,smax=umax=smax32=umax32=1024,var_off
        =(0x0; 0x400)) R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1024,
        var_off=(0x0; 0x400))
        6: (79) r0 = *(u64 *)(r7 +0)          ; R0_w=scalar()
        7: (95) exit
      
      This prog loads flow_keys to r7, and adds the variable offset r8
      to r7, and finally causes out-of-bounds access:
      
        BUG: unable to handle page fault for address: ffffc90014c80038
        [...]
        Call Trace:
         <TASK>
         bpf_dispatcher_nop_func include/linux/bpf.h:1231 [inline]
         __bpf_prog_run include/linux/filter.h:651 [inline]
         bpf_prog_run include/linux/filter.h:658 [inline]
         bpf_prog_run_pin_on_cpu include/linux/filter.h:675 [inline]
         bpf_flow_dissect+0x15f/0x350 net/core/flow_dissector.c:991
         bpf_prog_test_run_flow_dissector+0x39d/0x620 net/bpf/test_run.c:1359
         bpf_prog_test_run kernel/bpf/syscall.c:4107 [inline]
         __sys_bpf+0xf8f/0x4560 kernel/bpf/syscall.c:5475
         __do_sys_bpf kernel/bpf/syscall.c:5561 [inline]
         __se_sys_bpf kernel/bpf/syscall.c:5559 [inline]
         __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5559
         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
         do_syscall_64+0x3f/0x110 arch/x86/entry/common.c:83
         entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fix this by rejecting ptr alu with variable offset on flow_keys.
      Applying the patch rejects the program with "R7 pointer arithmetic
      on flow_keys prohibited".
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Signed-off-by: default avatarHao Sun <sunhao.th@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/bpf/20240115082028.9992-1-sunhao.th@gmail.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b500d5d
    • Qiang Ma's avatar
      net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls · 7a4d71c9
      Qiang Ma authored
      
      [ Upstream commit a23aa04042187cbde16f470b49d4ad60d32e9206 ]
      
      We found the following dmesg calltrace when testing the GMAC NIC notebook:
      
      [9.448656] ------------[ cut here ]------------
      [9.448658] Unbalanced IRQ 43 wake disable
      [9.448673] WARNING: CPU: 3 PID: 1083 at kernel/irq/manage.c:688 irq_set_irq_wake+0xe0/0x128
      [9.448717] CPU: 3 PID: 1083 Comm: ethtool Tainted: G           O      4.19 #1
      [9.448773]         ...
      [9.448774] Call Trace:
      [9.448781] [<9000000000209b5c>] show_stack+0x34/0x140
      [9.448788] [<9000000000d52700>] dump_stack+0x98/0xd0
      [9.448794] [<9000000000228610>] __warn+0xa8/0x120
      [9.448797] [<9000000000d2fb60>] report_bug+0x98/0x130
      [9.448800] [<900000000020a418>] do_bp+0x248/0x2f0
      [9.448805] [<90000000002035f4>] handle_bp_int+0x4c/0x78
      [9.448808] [<900000000029ea40>] irq_set_irq_wake+0xe0/0x128
      [9.448813] [<9000000000a96a7c>] stmmac_set_wol+0x134/0x150
      [9.448819] [<9000000000be6ed0>] dev_ethtool+0x1368/0x2440
      [9.448824] [<9000000000c08350>] dev_ioctl+0x1f8/0x3e0
      [9.448827] [<9000000000bb2a34>] sock_ioctl+0x2a4/0x450
      [9.448832] [<900000000046f044>] do_vfs_ioctl+0xa4/0x738
      [9.448834] [<900000000046f778>] ksys_ioctl+0xa0/0xe8
      [9.448837] [<900000000046f7d8>] sys_ioctl+0x18/0x28
      [9.448840] [<9000000000211ab4>] syscall_common+0x20/0x34
      [9.448842] ---[ end trace 40c18d9aec863c3e ]---
      
      Multiple disable_irq_wake() calls will keep decreasing the IRQ
      wake_depth, When wake_depth is 0, calling disable_irq_wake() again,
      will report the above calltrace.
      
      Due to the need to appear in pairs, we cannot call disable_irq_wake()
      without calling enable_irq_wake(). Fix this by making sure there are
      no unbalanced disable_irq_wake() calls.
      
      Fixes: 3172d3af ("stmmac: support wake up irq from external sources (v3)")
      Signed-off-by: default avatarQiang Ma <maqianga@uniontech.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240112021249.24598-1-maqianga@uniontech.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7a4d71c9
    • Benjamin Poirier's avatar
      selftests: bonding: Change script interpreter · 2b973b5b
      Benjamin Poirier authored
      
      [ Upstream commit c2518da8e6b0e248cfff1d4b6682e14020bd4d3f ]
      
      The tests changed by this patch, as well as the scripts they source, use
      features which are not part of POSIX sh (ex. 'source' and 'local'). As a
      result, these tests fail when /bin/sh is dash such as on Debian. Change the
      interpreter to bash so that these tests can run successfully.
      
      Fixes: d43eff0b ("selftests: bonding: up/down delay w/ slave link flapping")
      Tested-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2b973b5b
    • Alex Deucher's avatar
      drm/amdgpu: fall back to INPUT power for AVG power via INFO IOCTL · 1db180f5
      Alex Deucher authored
      [ Upstream commit d02069850fc102b07ae923535d5e212f2c8a34e9 ]
      
      For backwards compatibility with userspace.
      
      Fixes: 47f1724d ("drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`")
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2897
      
      
      Reviewed-by: default avatarYang Wang <kevinyang.wang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1db180f5
    • Dafna Hirschfeld's avatar
      drm/amdkfd: fixes for HMM mem allocation · 1b37284a
      Dafna Hirschfeld authored
      
      [ Upstream commit 02eed83abc1395a1207591aafad9bcfc5cb1abcb ]
      
      Fix err return value and reset pgmap->type after checking it.
      
      Fixes: c83dee9b ("drm/amdkfd: add SPM support for SVM")
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarDafna Hirschfeld <dhirschfeld@habana.ai>
      Signed-off-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b37284a
Loading