Skip to content
Snippets Groups Projects
  1. Jan 25, 2024
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Fix stack corruption · a361c2c1
      Ido Schimmel authored
      
      [ Upstream commit 483ae90d8f976f8339cf81066312e1329f2d3706 ]
      
      When tc filters are first added to a net device, the corresponding local
      port gets bound to an ACL group in the device. The group contains a list
      of ACLs. In turn, each ACL points to a different TCAM region where the
      filters are stored. During forwarding, the ACLs are sequentially
      evaluated until a match is found.
      
      One reason to place filters in different regions is when they are added
      with decreasing priorities and in an alternating order so that two
      consecutive filters can never fit in the same region because of their
      key usage.
      
      In Spectrum-2 and newer ASICs the firmware started to report that the
      maximum number of ACLs in a group is more than 16, but the layout of the
      register that configures ACL groups (PAGT) was not updated to account
      for that. It is therefore possible to hit stack corruption [1] in the
      rare case where more than 16 ACLs in a group are required.
      
      Fix by limiting the maximum ACL group size to the minimum between what
      the firmware reports and the maximum ACLs that fit in the PAGT register.
      
      Add a test case to make sure the machine does not crash when this
      condition is hit.
      
      [1]
      Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: mlxsw_sp_acl_tcam_group_update+0x116/0x120
      [...]
       dump_stack_lvl+0x36/0x50
       panic+0x305/0x330
       __stack_chk_fail+0x15/0x20
       mlxsw_sp_acl_tcam_group_update+0x116/0x120
       mlxsw_sp_acl_tcam_group_region_attach+0x69/0x110
       mlxsw_sp_acl_tcam_vchunk_get+0x492/0xa20
       mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
       mlxsw_sp_acl_rule_add+0x47/0x240
       mlxsw_sp_flower_replace+0x1a9/0x1d0
       tc_setup_cb_add+0xdc/0x1c0
       fl_hw_replace_filter+0x146/0x1f0
       fl_change+0xc17/0x1360
       tc_new_tfilter+0x472/0xb90
       rtnetlink_rcv_msg+0x313/0x3b0
       netlink_rcv_skb+0x58/0x100
       netlink_unicast+0x244/0x390
       netlink_sendmsg+0x1e4/0x440
       ____sys_sendmsg+0x164/0x260
       ___sys_sendmsg+0x9a/0xe0
       __sys_sendmsg+0x7a/0xc0
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fixes: c3ab4354 ("mlxsw: spectrum: Extend to support Spectrum-2 ASIC")
      Reported-by: default avatarOrel Hagag <orelh@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/2d91c89afba59c22587b444994ae419dbea8d876.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a361c2c1
    • Ido Schimmel's avatar
      mlxsw: spectrum_acl_tcam: Fix NULL pointer dereference in error path · d0a1efe4
      Ido Schimmel authored
      
      [ Upstream commit efeb7dfea8ee10cdec11b6b6ba4e405edbe75809 ]
      
      When calling mlxsw_sp_acl_tcam_region_destroy() from an error path after
      failing to attach the region to an ACL group, we hit a NULL pointer
      dereference upon 'region->group->tcam' [1].
      
      Fix by retrieving the 'tcam' pointer using mlxsw_sp_acl_to_tcam().
      
      [1]
      BUG: kernel NULL pointer dereference, address: 0000000000000000
      [...]
      RIP: 0010:mlxsw_sp_acl_tcam_region_destroy+0xa0/0xd0
      [...]
      Call Trace:
       mlxsw_sp_acl_tcam_vchunk_get+0x88b/0xa20
       mlxsw_sp_acl_tcam_ventry_add+0x25/0xe0
       mlxsw_sp_acl_rule_add+0x47/0x240
       mlxsw_sp_flower_replace+0x1a9/0x1d0
       tc_setup_cb_add+0xdc/0x1c0
       fl_hw_replace_filter+0x146/0x1f0
       fl_change+0xc17/0x1360
       tc_new_tfilter+0x472/0xb90
       rtnetlink_rcv_msg+0x313/0x3b0
       netlink_rcv_skb+0x58/0x100
       netlink_unicast+0x244/0x390
       netlink_sendmsg+0x1e4/0x440
       ____sys_sendmsg+0x164/0x260
       ___sys_sendmsg+0x9a/0xe0
       __sys_sendmsg+0x7a/0xc0
       do_syscall_64+0x40/0xe0
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fixes: 22a67766 ("mlxsw: spectrum: Introduce ACL core with simple TCAM implementation")
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Reviewed-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/fb6a4542bbc9fcab5a523802d97059bffbca7126.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d0a1efe4
    • Amit Cohen's avatar
      mlxsw: spectrum_acl_erp: Fix error flow of pool allocation failure · 1a720f3e
      Amit Cohen authored
      
      [ Upstream commit 6d6eeabcfaba2fcadf5443b575789ea606f9de83 ]
      
      Lately, a bug was found when many TC filters are added - at some point,
      several bugs are printed to dmesg [1] and the switch is crashed with
      segmentation fault.
      
      The issue starts when gen_pool_free() fails because of unexpected
      behavior - a try to free memory which is already freed, this leads to BUG()
      call which crashes the switch and makes many other bugs.
      
      Trying to track down the unexpected behavior led to a bug in eRP code. The
      function mlxsw_sp_acl_erp_table_alloc() gets a pointer to the allocated
      index, sets the value and returns an error code. When gen_pool_alloc()
      fails it returns address 0, we track it and return -ENOBUFS outside, BUT
      the call for gen_pool_alloc() already override the index in erp_table
      structure. This is a problem when such allocation is done as part of
      table expansion. This is not a new table, which will not be used in case
      of allocation failure. We try to expand eRP table and override the
      current index (non-zero) with zero. Then, it leads to an unexpected
      behavior when address 0 is freed twice. Note that address 0 is valid in
      erp_table->base_index and indeed other tables use it.
      
      gen_pool_alloc() fails in case that there is no space left in the
      pre-allocated pool, in our case, the pool is limited to
      ACL_MAX_ERPT_BANK_SIZE, which is read from hardware. When more than max
      erp entries are required, we exceed the limit and return an error, this
      error leads to "Failed to migrate vregion" print.
      
      Fix this by changing erp_table->base_index only in case of a successful
      allocation.
      
      Add a test case for such a scenario. Without this fix it causes
      segmentation fault:
      
      $ TESTS="max_erp_entries_test" ./tc_flower.sh
      ./tc_flower.sh: line 988:  1560 Segmentation fault      tc filter del dev $h2 ingress chain $i protocol ip pref $i handle $j flower &>/dev/null
      
      [1]:
      kernel BUG at lib/genalloc.c:508!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 6 PID: 3531 Comm: tc Not tainted 6.7.0-rc5-custom-ga6893f479f5e #1
      Hardware name: Mellanox Technologies Ltd. MSN4700/VMOD0010, BIOS 5.11 07/12/2021
      RIP: 0010:gen_pool_free_owner+0xc9/0xe0
      ...
      Call Trace:
       <TASK>
       __mlxsw_sp_acl_erp_table_other_dec+0x70/0xa0 [mlxsw_spectrum]
       mlxsw_sp_acl_erp_mask_destroy+0xf5/0x110 [mlxsw_spectrum]
       objagg_obj_root_destroy+0x18/0x80 [objagg]
       objagg_obj_destroy+0x12c/0x130 [objagg]
       mlxsw_sp_acl_erp_mask_put+0x37/0x50 [mlxsw_spectrum]
       mlxsw_sp_acl_ctcam_region_entry_remove+0x74/0xa0 [mlxsw_spectrum]
       mlxsw_sp_acl_ctcam_entry_del+0x1e/0x40 [mlxsw_spectrum]
       mlxsw_sp_acl_tcam_ventry_del+0x78/0xd0 [mlxsw_spectrum]
       mlxsw_sp_flower_destroy+0x4d/0x70 [mlxsw_spectrum]
       mlxsw_sp_flow_block_cb+0x73/0xb0 [mlxsw_spectrum]
       tc_setup_cb_destroy+0xc1/0x180
       fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
       __fl_delete+0x1ac/0x1c0 [cls_flower]
       fl_destroy+0xc2/0x150 [cls_flower]
       tcf_proto_destroy+0x1a/0xa0
      ...
      mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
      mlxsw_spectrum3 0000:07:00.0: Failed to migrate vregion
      
      Fixes: f465261a ("mlxsw: spectrum_acl: Implement common eRP core")
      Signed-off-by: default avatarAmit Cohen <amcohen@nvidia.com>
      Signed-off-by: default avatarIdo Schimmel <idosch@nvidia.com>
      Signed-off-by: default avatarPetr Machata <petrm@nvidia.com>
      Acked-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Link: https://lore.kernel.org/r/4cfca254dfc0e5d283974801a24371c7b6db5989.1705502064.git.petrm@nvidia.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1a720f3e
    • Christoph Hellwig's avatar
      loop: fix the the direct I/O support check when used on top of block devices · df4bb784
      Christoph Hellwig authored
      
      [ Upstream commit baa7d536077dcdfe2b70c476a8873d1745d3de0f ]
      
      __loop_update_dio only checks the alignment requirement for block backed
      file systems, but misses them for the case where the loop device is
      created directly on top of another block device.  Due to this creating
      a loop device with default option plus the direct I/O flag on a > 512 byte
      sector size file system will lead to incorrect I/O being submitted to the
      lower block device and a lot of error from the lock layer.  This can
      be seen with xfstests generic/563.
      
      Fix the code in __loop_update_dio by factoring the alignment check into
      a helper, and calling that also for the struct block_device of a block
      device inode.
      
      Also remove the TODO comment talking about dynamically switching between
      buffered and direct I/O, which is a would be a recipe for horrible
      performance and occasional data loss.
      
      Fixes: 2e5ab5f3 ("block: loop: prepare for supporing direct IO")
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Reviewed-by: default avatarMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20240117175901.871796-1-hch@lst.de
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      df4bb784
    • Ludvig Pärsson's avatar
      ethtool: netlink: Add missing ethnl_ops_begin/complete · 21d86d37
      Ludvig Pärsson authored
      
      [ Upstream commit f1172f3ee3a98754d95b968968920a7d03fdebcc ]
      
      Accessing an ethernet device that is powered off or clock gated might
      cause the CPU to hang. Add ethnl_ops_begin/complete in
      ethnl_set_features() to protect against this.
      
      Fixes: 0980bfcd ("ethtool: set netdev features with FEATURES_SET request")
      Signed-off-by: default avatarLudvig Pärsson <ludvig.parsson@axis.com>
      Link: https://lore.kernel.org/r/20240117-etht2-v2-1-1a96b6e8c650@axis.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      21d86d37
    • Mark Brown's avatar
      arm64/ptrace: Don't flush ZA/ZT storage when writing ZA via ptrace · ae836b7f
      Mark Brown authored
      
      [ Upstream commit b7c510d049049409e8945b932f4b0b357fa17415 ]
      
      When writing ZA we currently unconditionally flush the buffer used to store
      it as part of ensuring that it is allocated. Since this buffer is shared
      with ZT0 this means that a write to ZA when PSTATE.ZA is already set will
      corrupt the value of ZT0 on a SME2 system. Fix this by only flushing the
      backing storage if PSTATE.ZA was not previously set.
      
      This will mean that short or failed writes may leave stale data in the
      buffer, this seems as correct as our current behaviour and unlikely to be
      something that userspace will rely on.
      
      Fixes: f90b529b ("arm64/sme: Implement ZT0 ptrace support")
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Link: https://lore.kernel.org/r/20240115-arm64-fix-ptrace-za-zt-v1-1-48617517028a@kernel.org
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae836b7f
    • Christophe JAILLET's avatar
      kdb: Fix a potential buffer overflow in kdb_local() · 4daed382
      Christophe JAILLET authored
      [ Upstream commit 4f41d30cd6dc865c3cbc1a852372321eba6d4e4c ]
      
      When appending "[defcmd]" to 'kdb_prompt_str', the size of the string
      already in the buffer should be taken into account.
      
      An option could be to switch from strncat() to strlcat() which does the
      correct test to avoid such an overflow.
      
      However, this actually looks as dead code, because 'defcmd_in_progress'
      can't be true here.
      See a more detailed explanation at [1].
      
      [1]: https://lore.kernel.org/all/CAD=FV=WSh7wKN7Yp-3wWiDgX4E3isQ8uh0LCzTmd1v9Cg9j+nQ@mail.gmail.com/
      
      
      
      Fixes: 5d5314d6 ("kdb: core for kgdb back end (1 of 2)")
      Signed-off-by: default avatarChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Reviewed-by: default avatarDouglas Anderson <dianders@chromium.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      4daed382
    • Pavel Begunkov's avatar
      io_uring: adjust defer tw counting · e24bf5b4
      Pavel Begunkov authored
      
      [ Upstream commit dc12d1799ce710fd90abbe0ced71e7e1ae0894fc ]
      
      The UINT_MAX work item counting bias in io_req_local_work_add() in case
      of !IOU_F_TWQ_LAZY_WAKE works in a sense that we will not miss a wake up,
      however it's still eerie. In particular, if we add a lazy work item
      after a non-lazy one, we'll increment it and get nr_tw==0, and
      subsequent adds may try to unnecessarily wake up the task, which is
      though not so likely to happen in real workloads.
      
      Half the bias, it's still large enough to be larger than any valid
      ->cq_wait_nr, which is limited by IORING_MAX_CQ_ENTRIES, but further
      have a good enough of space before it overflows.
      
      Fixes: 8751d154 ("io_uring: reduce scheduling due to tw")
      Signed-off-by: default avatarPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/108b971e958deaf7048342930c341ba90f75d806.1705438669.git.asml.silence@gmail.com
      
      
      Signed-off-by: default avatarJens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e24bf5b4
    • Fedor Pchelkin's avatar
      ipvs: avoid stat macros calls from preemptible context · c149cc7c
      Fedor Pchelkin authored
      
      [ Upstream commit d6938c1c76c64f42363d0d1f051e1b4641c2ad40 ]
      
      Inside decrement_ttl() upon discovering that the packet ttl has exceeded,
      __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible
      context having the following backtrace:
      
      check_preemption_disabled: 48 callbacks suppressed
      BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177
      caller is decrement_ttl+0x217/0x830
      CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ #34
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0xbd/0xe0
       check_preemption_disabled+0xd1/0xe0
       decrement_ttl+0x217/0x830
       __ip_vs_get_out_rt+0x4e0/0x1ef0
       ip_vs_nat_xmit+0x205/0xcd0
       ip_vs_in_hook+0x9b1/0x26a0
       nf_hook_slow+0xc2/0x210
       nf_hook+0x1fb/0x770
       __ip_local_out+0x33b/0x640
       ip_local_out+0x2a/0x490
       __ip_queue_xmit+0x990/0x1d10
       __tcp_transmit_skb+0x288b/0x3d10
       tcp_connect+0x3466/0x5180
       tcp_v4_connect+0x1535/0x1bb0
       __inet_stream_connect+0x40d/0x1040
       inet_stream_connect+0x57/0xa0
       __sys_connect_file+0x162/0x1a0
       __sys_connect+0x137/0x160
       __x64_sys_connect+0x72/0xb0
       do_syscall_64+0x6f/0x140
       entry_SYSCALL_64_after_hwframe+0x6e/0x76
      RIP: 0033:0x7fe6dbbc34e0
      
      Use the corresponding preemption-aware variants: IP_INC_STATS and
      IP6_INC_STATS.
      
      Found by Linux Verification Center (linuxtesting.org).
      
      Fixes: 8d8e20e2 ("ipvs: Decrement ttl")
      Signed-off-by: default avatarFedor Pchelkin <pchelkin@ispras.ru>
      Acked-by: default avatarJulian Anastasov <ja@ssi.bg>
      Acked-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c149cc7c
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject NFT_SET_CONCAT with not field length description · ce2189d3
      Pablo Neira Ayuso authored
      
      [ Upstream commit 113661e07460a6604aacc8ae1b23695a89e7d4b3 ]
      
      It is still possible to set on the NFT_SET_CONCAT flag by specifying a
      set size and no field description, report EINVAL in such case.
      
      Fixes: 1b6345d4 ("netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ce2189d3
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: skip dead set elements in netlink dump · 9f025447
      Pablo Neira Ayuso authored
      
      [ Upstream commit 6b1ca88e4bb63673dc9f9c7f23c899f22c3cb17a ]
      
      Delete from packet path relies on the garbage collector to purge
      elements with NFT_SET_ELEM_DEAD_BIT on.
      
      Skip these dead elements from nf_tables_dump_setelem() path, I very
      rarely see tests/shell/testcases/maps/typeof_maps_add_delete reports
      [DUMP FAILED] showing a mismatch in the expected output with an element
      that should not be there.
      
      If the netlink dump happens before GC worker run, it might show dead
      elements in the ruleset listing.
      
      nft_rhash_get() already skips dead elements in nft_rhash_cmp(),
      therefore, it already does not show the element when getting a single
      element via netlink control plane.
      
      Fixes: 5f68718b ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9f025447
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: do not allow mismatch field size and set key length · ff67e3e4
      Pablo Neira Ayuso authored
      
      [ Upstream commit 3ce67e3793f48c1b9635beb9bb71116ca1e51b58 ]
      
      The set description provides the size of each field in the set whose sum
      should not mismatch the set key length, bail out otherwise.
      
      I did not manage to crash nft_set_pipapo with mismatch fields and set key
      length so far, but this is UB which must be disallowed.
      
      Fixes: f3a2181e ("netfilter: nf_tables: Support for sets with multiple ranged fields")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ff67e3e4
    • Pavel Tikhomirov's avatar
      netfilter: bridge: replace physindev with physinif in nf_bridge_info · 544add1f
      Pavel Tikhomirov authored
      
      [ Upstream commit 9874808878d9eed407e3977fd11fee49de1e1d86 ]
      
      An skb can be added to a neigh->arp_queue while waiting for an arp
      reply. Where original skb's skb->dev can be different to neigh's
      neigh->dev. For instance in case of bridging dnated skb from one veth to
      another, the skb would be added to a neigh->arp_queue of the bridge.
      
      As skb->dev can be reset back to nf_bridge->physindev and used, and as
      there is no explicit mechanism that prevents this physindev from been
      freed under us (for instance neigh_flush_dev doesn't cleanup skbs from
      different device's neigh queue) we can crash on e.g. this stack:
      
      arp_process
        neigh_update
          skb = __skb_dequeue(&neigh->arp_queue)
            neigh_resolve_output(..., skb)
              ...
                br_nf_dev_xmit
                  br_nf_pre_routing_finish_bridge_slow
                    skb->dev = nf_bridge->physindev
                    br_handle_frame_finish
      
      Let's use plain ifindex instead of net_device link. To peek into the
      original net_device we will use dev_get_by_index_rcu(). Thus either we
      get device and are safe to use it or we don't get it and drop skb.
      
      Fixes: c4e70a87 ("netfilter: bridge: rename br_netfilter.c to br_netfilter_hooks.c")
      Suggested-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      544add1f
    • Pavel Tikhomirov's avatar
      netfilter: propagate net to nf_bridge_get_physindev · eb417043
      Pavel Tikhomirov authored
      
      [ Upstream commit a54e72197037d2c9bfcd70dddaac8c8ccb5b41ba ]
      
      This is a preparation patch for replacing physindev with physinif on
      nf_bridge_info structure. We will use dev_get_by_index_rcu to resolve
      device, when needed, and it requires net to be available.
      
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: 9874808878d9 ("netfilter: bridge: replace physindev with physinif in nf_bridge_info")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eb417043
    • Pavel Tikhomirov's avatar
      netfilter: nf_queue: remove excess nf_bridge variable · 10849493
      Pavel Tikhomirov authored
      
      [ Upstream commit aeaa44075f8e49e2e0ad4507d925e690b7950145 ]
      
      We don't really need nf_bridge variable here. And nf_bridge_info_exists
      is better replacement for nf_bridge_info_get in case we are only
      checking for existence.
      
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: 9874808878d9 ("netfilter: bridge: replace physindev with physinif in nf_bridge_info")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      10849493
    • Pavel Tikhomirov's avatar
      netfilter: nfnetlink_log: use proper helper for fetching physinif · 0a12e679
      Pavel Tikhomirov authored
      
      [ Upstream commit c3f9fd54cd87233f53bdf0e191a86b3a5e960e02 ]
      
      We don't use physindev in __build_packet_message except for getting
      physinif from it. So let's switch to nf_bridge_get_physinif to get what
      we want directly.
      
      Signed-off-by: default avatarPavel Tikhomirov <ptikhomirov@virtuozzo.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Stable-dep-of: 9874808878d9 ("netfilter: bridge: replace physindev with physinif in nf_bridge_info")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0a12e679
    • Pablo Neira Ayuso's avatar
      netfilter: nft_limit: do not ignore unsupported flags · ae6c0543
      Pablo Neira Ayuso authored
      
      [ Upstream commit 91a139cee1202a4599a380810d93c69b5bac6197 ]
      
      Bail out if userspace provides unsupported flags, otherwise future
      extensions to the limit expression will be silently ignored by the
      kernel.
      
      Fixes: c7862a5f ("netfilter: nft_limit: allow to invert matching criteria")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ae6c0543
    • Pablo Neira Ayuso's avatar
      netfilter: nf_tables: reject invalid set policy · 7d2d0393
      Pablo Neira Ayuso authored
      
      [ Upstream commit 0617c3de9b4026b87be12b0cb5c35f42c7c66fcb ]
      
      Report -EINVAL in case userspace provides a unsupported set backend
      policy.
      
      Fixes: c50b960c ("netfilter: nf_tables: implement proper set selection")
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7d2d0393
    • Jakub Kicinski's avatar
      net: netdevsim: don't try to destroy PHC on VFs · c5068e44
      Jakub Kicinski authored
      
      [ Upstream commit ea937f77208323d35ffe2f8d8fc81b00118bfcda ]
      
      PHC gets initialized in nsim_init_netdevsim(), which
      is only called if (nsim_dev_port_is_pf()).
      
      Create a counterpart of nsim_init_netdevsim() and
      move the mock_phc_destroy() there.
      
      This fixes a crash trying to destroy netdevsim with
      VFs instantiated, as caught by running the devlink.sh test:
      
          BUG: kernel NULL pointer dereference, address: 00000000000000b8
          RIP: 0010:mock_phc_destroy+0xd/0x30
          Call Trace:
           <TASK>
           nsim_destroy+0x4a/0x70 [netdevsim]
           __nsim_dev_port_del+0x47/0x70 [netdevsim]
           nsim_dev_reload_destroy+0x105/0x120 [netdevsim]
           nsim_drv_remove+0x2f/0xb0 [netdevsim]
           device_release_driver_internal+0x1a1/0x210
           bus_remove_device+0xd5/0x120
           device_del+0x159/0x490
           device_unregister+0x12/0x30
           del_device_store+0x11a/0x1a0 [netdevsim]
           kernfs_fop_write_iter+0x130/0x1d0
           vfs_write+0x30b/0x4b0
           ksys_write+0x69/0xf0
           do_syscall_64+0xcc/0x1e0
           entry_SYSCALL_64_after_hwframe+0x6f/0x77
      
      Fixes: b63e78fc ("net: netdevsim: use mock PHC driver")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      c5068e44
    • Paolo Abeni's avatar
      mptcp: relax check on MPC passive fallback · 77c63a08
      Paolo Abeni authored
      
      [ Upstream commit c0f5aec28edf98906d28f08daace6522adf9ee7a ]
      
      While testing the blamed commit below, I was able to miss (!)
      packetdrill failures in the fastopen test-cases.
      
      On passive fastopen the child socket is created by incoming TCP MPC syn,
      allow for both MPC_SYN and MPC_ACK header.
      
      Fixes: 724b00c12957 ("mptcp: refine opt_mp_capable determination")
      Reviewed-by: default avatarMatthieu Baerts <matttbe@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      77c63a08
    • Hengqi Chen's avatar
      LoongArch: BPF: Prevent out-of-bounds memory access · 7924ade1
      Hengqi Chen authored
      
      [ Upstream commit 36a87385e31c9343af9a4756598e704741250a67 ]
      
      The test_tag test triggers an unhandled page fault:
      
        # ./test_tag
        [  130.640218] CPU 0 Unable to handle kernel paging request at virtual address ffff80001b898004, era == 9000000003137f7c, ra == 9000000003139e70
        [  130.640501] Oops[#3]:
        [  130.640553] CPU: 0 PID: 1326 Comm: test_tag Tainted: G      D    O       6.7.0-rc4-loong-devel-gb62ab1a397cf #47 61985c1d94084daa2432f771daa45b56b10d8d2a
        [  130.640764] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 2/2/2022
        [  130.640874] pc 9000000003137f7c ra 9000000003139e70 tp 9000000104cb4000 sp 9000000104cb7a40
        [  130.641001] a0 ffff80001b894000 a1 ffff80001b897ff8 a2 000000006ba210be a3 0000000000000000
        [  130.641128] a4 000000006ba210be a5 00000000000000f1 a6 00000000000000b3 a7 0000000000000000
        [  130.641256] t0 0000000000000000 t1 00000000000007f6 t2 0000000000000000 t3 9000000004091b70
        [  130.641387] t4 000000006ba210be t5 0000000000000004 t6 fffffffffffffff0 t7 90000000040913e0
        [  130.641512] t8 0000000000000005 u0 0000000000000dc0 s9 0000000000000009 s0 9000000104cb7ae0
        [  130.641641] s1 00000000000007f6 s2 0000000000000009 s3 0000000000000095 s4 0000000000000000
        [  130.641771] s5 ffff80001b894000 s6 ffff80001b897fb0 s7 9000000004090c50 s8 0000000000000000
        [  130.641900]    ra: 9000000003139e70 build_body+0x1fcc/0x4988
        [  130.642007]   ERA: 9000000003137f7c build_body+0xd8/0x4988
        [  130.642112]  CRMD: 000000b0 (PLV0 -IE -DA +PG DACF=CC DACM=CC -WE)
        [  130.642261]  PRMD: 00000004 (PPLV0 +PIE -PWE)
        [  130.642353]  EUEN: 00000003 (+FPE +SXE -ASXE -BTE)
        [  130.642458]  ECFG: 00071c1c (LIE=2-4,10-12 VS=7)
        [  130.642554] ESTAT: 00010000 [PIL] (IS= ECode=1 EsubCode=0)
        [  130.642658]  BADV: ffff80001b898004
        [  130.642719]  PRID: 0014c010 (Loongson-64bit, Loongson-3A5000)
        [  130.642815] Modules linked in: [last unloaded: bpf_testmod(O)]
        [  130.642924] Process test_tag (pid: 1326, threadinfo=00000000f7f4015f, task=000000006499f9fd)
        [  130.643062] Stack : 0000000000000000 9000000003380724 0000000000000000 0000000104cb7be8
        [  130.643213]         0000000000000000 25af8d9b6e600558 9000000106250ea0 9000000104cb7ae0
        [  130.643378]         0000000000000000 0000000000000000 9000000104cb7be8 90000000049f6000
        [  130.643538]         0000000000000090 9000000106250ea0 ffff80001b894000 ffff80001b894000
        [  130.643685]         00007ffffb917790 900000000313ca94 0000000000000000 0000000000000000
        [  130.643831]         ffff80001b894000 0000000000000ff7 0000000000000000 9000000100468000
        [  130.643983]         0000000000000000 0000000000000000 0000000000000040 25af8d9b6e600558
        [  130.644131]         0000000000000bb7 ffff80001b894048 0000000000000000 0000000000000000
        [  130.644276]         9000000104cb7be8 90000000049f6000 0000000000000090 9000000104cb7bdc
        [  130.644423]         ffff80001b894000 0000000000000000 00007ffffb917790 90000000032acfb0
        [  130.644572]         ...
        [  130.644629] Call Trace:
        [  130.644641] [<9000000003137f7c>] build_body+0xd8/0x4988
        [  130.644785] [<900000000313ca94>] bpf_int_jit_compile+0x228/0x4ec
        [  130.644891] [<90000000032acfb0>] bpf_prog_select_runtime+0x158/0x1b0
        [  130.645003] [<90000000032b3504>] bpf_prog_load+0x760/0xb44
        [  130.645089] [<90000000032b6744>] __sys_bpf+0xbb8/0x2588
        [  130.645175] [<90000000032b8388>] sys_bpf+0x20/0x2c
        [  130.645259] [<9000000003f6ab38>] do_syscall+0x7c/0x94
        [  130.645369] [<9000000003121c5c>] handle_syscall+0xbc/0x158
        [  130.645507]
        [  130.645539] Code: 380839f6  380831f9  28412bae <24000ca6> 004081ad  0014cb50  004083e8  02bff34c  58008e91
        [  130.645729]
        [  130.646418] ---[ end trace 0000000000000000 ]---
      
      On my machine, which has CONFIG_PAGE_SIZE_16KB=y, the test failed at
      loading a BPF prog with 2039 instructions:
      
        prog = (struct bpf_prog *)ffff80001b894000
        insn = (struct bpf_insn *)(prog->insnsi)ffff80001b894048
        insn + 2039 = (struct bpf_insn *)ffff80001b898000 <- end of the page
      
      In the build_insn() function, we are trying to access next instruction
      unconditionally, i.e. `(insn + 1)->imm`. The address lies in the next
      page and can be not owned by the current process, thus an page fault is
      inevitable and then segfault.
      
      So, let's access next instruction only under `dst = imm64` context.
      
      With this fix, we have:
      
        # ./test_tag
        test_tag: OK (40945 tests)
      
      Fixes: bbfddb90 ("LoongArch: BPF: Avoid declare variables in switch-case")
      Tested-by: default avatarTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: default avatarHengqi Chen <hengqi.chen@gmail.com>
      Signed-off-by: default avatarHuacai Chen <chenhuacai@loongson.cn>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7924ade1
    • Kunwu Chan's avatar
      net: dsa: vsc73xx: Add null pointer check to vsc73xx_gpio_probe · 91f9ecae
      Kunwu Chan authored
      
      [ Upstream commit 776dac5a662774f07a876b650ba578d0a62d20db ]
      
      devm_kasprintf() returns a pointer to dynamically allocated memory
      which can be NULL upon failure.
      
      Fixes: 05bd97fc ("net: dsa: Add Vitesse VSC73xx DSA router driver")
      Signed-off-by: default avatarKunwu Chan <chentao@kylinos.cn>
      Suggested-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240111072018.75971-1-chentao@kylinos.cn
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      91f9ecae
    • Hao Sun's avatar
      bpf: Reject variable offset alu on PTR_TO_FLOW_KEYS · 1b500d5d
      Hao Sun authored
      
      [ Upstream commit 22c7fa171a02d310e3a3f6ed46a698ca8a0060ed ]
      
      For PTR_TO_FLOW_KEYS, check_flow_keys_access() only uses fixed off
      for validation. However, variable offset ptr alu is not prohibited
      for this ptr kind. So the variable offset is not checked.
      
      The following prog is accepted:
      
        func#0 @0
        0: R1=ctx() R10=fp0
        0: (bf) r6 = r1                       ; R1=ctx() R6_w=ctx()
        1: (79) r7 = *(u64 *)(r6 +144)        ; R6_w=ctx() R7_w=flow_keys()
        2: (b7) r8 = 1024                     ; R8_w=1024
        3: (37) r8 /= 1                       ; R8_w=scalar()
        4: (57) r8 &= 1024                    ; R8_w=scalar(smin=smin32=0,
        smax=umax=smax32=umax32=1024,var_off=(0x0; 0x400))
        5: (0f) r7 += r8
        mark_precise: frame0: last_idx 5 first_idx 0 subseq_idx -1
        mark_precise: frame0: regs=r8 stack= before 4: (57) r8 &= 1024
        mark_precise: frame0: regs=r8 stack= before 3: (37) r8 /= 1
        mark_precise: frame0: regs=r8 stack= before 2: (b7) r8 = 1024
        6: R7_w=flow_keys(smin=smin32=0,smax=umax=smax32=umax32=1024,var_off
        =(0x0; 0x400)) R8_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=1024,
        var_off=(0x0; 0x400))
        6: (79) r0 = *(u64 *)(r7 +0)          ; R0_w=scalar()
        7: (95) exit
      
      This prog loads flow_keys to r7, and adds the variable offset r8
      to r7, and finally causes out-of-bounds access:
      
        BUG: unable to handle page fault for address: ffffc90014c80038
        [...]
        Call Trace:
         <TASK>
         bpf_dispatcher_nop_func include/linux/bpf.h:1231 [inline]
         __bpf_prog_run include/linux/filter.h:651 [inline]
         bpf_prog_run include/linux/filter.h:658 [inline]
         bpf_prog_run_pin_on_cpu include/linux/filter.h:675 [inline]
         bpf_flow_dissect+0x15f/0x350 net/core/flow_dissector.c:991
         bpf_prog_test_run_flow_dissector+0x39d/0x620 net/bpf/test_run.c:1359
         bpf_prog_test_run kernel/bpf/syscall.c:4107 [inline]
         __sys_bpf+0xf8f/0x4560 kernel/bpf/syscall.c:5475
         __do_sys_bpf kernel/bpf/syscall.c:5561 [inline]
         __se_sys_bpf kernel/bpf/syscall.c:5559 [inline]
         __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:5559
         do_syscall_x64 arch/x86/entry/common.c:52 [inline]
         do_syscall_64+0x3f/0x110 arch/x86/entry/common.c:83
         entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Fix this by rejecting ptr alu with variable offset on flow_keys.
      Applying the patch rejects the program with "R7 pointer arithmetic
      on flow_keys prohibited".
      
      Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook")
      Signed-off-by: default avatarHao Sun <sunhao.th@gmail.com>
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Link: https://lore.kernel.org/bpf/20240115082028.9992-1-sunhao.th@gmail.com
      
      
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b500d5d
    • Qiang Ma's avatar
      net: stmmac: ethtool: Fixed calltrace caused by unbalanced disable_irq_wake calls · 7a4d71c9
      Qiang Ma authored
      
      [ Upstream commit a23aa04042187cbde16f470b49d4ad60d32e9206 ]
      
      We found the following dmesg calltrace when testing the GMAC NIC notebook:
      
      [9.448656] ------------[ cut here ]------------
      [9.448658] Unbalanced IRQ 43 wake disable
      [9.448673] WARNING: CPU: 3 PID: 1083 at kernel/irq/manage.c:688 irq_set_irq_wake+0xe0/0x128
      [9.448717] CPU: 3 PID: 1083 Comm: ethtool Tainted: G           O      4.19 #1
      [9.448773]         ...
      [9.448774] Call Trace:
      [9.448781] [<9000000000209b5c>] show_stack+0x34/0x140
      [9.448788] [<9000000000d52700>] dump_stack+0x98/0xd0
      [9.448794] [<9000000000228610>] __warn+0xa8/0x120
      [9.448797] [<9000000000d2fb60>] report_bug+0x98/0x130
      [9.448800] [<900000000020a418>] do_bp+0x248/0x2f0
      [9.448805] [<90000000002035f4>] handle_bp_int+0x4c/0x78
      [9.448808] [<900000000029ea40>] irq_set_irq_wake+0xe0/0x128
      [9.448813] [<9000000000a96a7c>] stmmac_set_wol+0x134/0x150
      [9.448819] [<9000000000be6ed0>] dev_ethtool+0x1368/0x2440
      [9.448824] [<9000000000c08350>] dev_ioctl+0x1f8/0x3e0
      [9.448827] [<9000000000bb2a34>] sock_ioctl+0x2a4/0x450
      [9.448832] [<900000000046f044>] do_vfs_ioctl+0xa4/0x738
      [9.448834] [<900000000046f778>] ksys_ioctl+0xa0/0xe8
      [9.448837] [<900000000046f7d8>] sys_ioctl+0x18/0x28
      [9.448840] [<9000000000211ab4>] syscall_common+0x20/0x34
      [9.448842] ---[ end trace 40c18d9aec863c3e ]---
      
      Multiple disable_irq_wake() calls will keep decreasing the IRQ
      wake_depth, When wake_depth is 0, calling disable_irq_wake() again,
      will report the above calltrace.
      
      Due to the need to appear in pairs, we cannot call disable_irq_wake()
      without calling enable_irq_wake(). Fix this by making sure there are
      no unbalanced disable_irq_wake() calls.
      
      Fixes: 3172d3af ("stmmac: support wake up irq from external sources (v3)")
      Signed-off-by: default avatarQiang Ma <maqianga@uniontech.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Link: https://lore.kernel.org/r/20240112021249.24598-1-maqianga@uniontech.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7a4d71c9
    • Benjamin Poirier's avatar
      selftests: bonding: Change script interpreter · 2b973b5b
      Benjamin Poirier authored
      
      [ Upstream commit c2518da8e6b0e248cfff1d4b6682e14020bd4d3f ]
      
      The tests changed by this patch, as well as the scripts they source, use
      features which are not part of POSIX sh (ex. 'source' and 'local'). As a
      result, these tests fail when /bin/sh is dash such as on Debian. Change the
      interpreter to bash so that these tests can run successfully.
      
      Fixes: d43eff0b ("selftests: bonding: up/down delay w/ slave link flapping")
      Tested-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarPetr Machata <petrm@nvidia.com>
      Signed-off-by: default avatarBenjamin Poirier <bpoirier@nvidia.com>
      Reviewed-by: default avatarPrzemek Kitszel <przemyslaw.kitszel@intel.com>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2b973b5b
    • Alex Deucher's avatar
      drm/amdgpu: fall back to INPUT power for AVG power via INFO IOCTL · 1db180f5
      Alex Deucher authored
      [ Upstream commit d02069850fc102b07ae923535d5e212f2c8a34e9 ]
      
      For backwards compatibility with userspace.
      
      Fixes: 47f1724d ("drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`")
      Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2897
      
      
      Reviewed-by: default avatarYang Wang <kevinyang.wang@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1db180f5
    • Dafna Hirschfeld's avatar
      drm/amdkfd: fixes for HMM mem allocation · 1b37284a
      Dafna Hirschfeld authored
      
      [ Upstream commit 02eed83abc1395a1207591aafad9bcfc5cb1abcb ]
      
      Fix err return value and reset pgmap->type after checking it.
      
      Fixes: c83dee9b ("drm/amdkfd: add SPM support for SVM")
      Reviewed-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarDafna Hirschfeld <dhirschfeld@habana.ai>
      Signed-off-by: default avatarFelix Kuehling <felix.kuehling@amd.com>
      Signed-off-by: default avatarAlex Deucher <alexander.deucher@amd.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1b37284a
    • Lukas Wunner's avatar
      gpiolib: Fix scope-based gpio_device refcounting · 79e1bbfb
      Lukas Wunner authored
      
      [ Upstream commit 832b371097eb928d077c827b8f117bf5b99d35c0 ]
      
      Commit 9e4555d1 ("gpiolib: add support for scope-based management to
      gpio_device") sought to add scope-based gpio_device refcounting, but
      erroneously forgot a negation of IS_ERR_OR_NULL().
      
      As a result, gpio_device_put() is not called if the gpio_device pointer
      is valid (meaning the ref is leaked), but only called if the pointer is
      NULL or an ERR_PTR().
      
      While at it drop a superfluous trailing semicolon.
      
      Fixes: 9e4555d1 ("gpiolib: add support for scope-based management to gpio_device")
      Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
      Signed-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      79e1bbfb
    • Kai Vehmanen's avatar
      ASoC: SOF: ipc4-loader: remove the CPC check warnings · 768708af
      Kai Vehmanen authored
      
      [ Upstream commit ab09fb9c629ed3aaea6a82467f08595dbc549726 ]
      
      Warnings related to missing data in firmware manifest have
      proven to be too verbose. This relates to description of
      DSP module cost expressed in cycles per chunk (CPC). If
      a matching value is not found in the manifest, kernel will
      pass a zero value and DSP firmware will use a conservative
      value in its place.
      
      Downgrade the warnings to dev_dbg().
      
      Fixes: d8a2c987 ("ASoC: SOF: ipc4-loader/topology: Query the CPC value from manifest")
      Signed-off-by: default avatarKai Vehmanen <kai.vehmanen@linux.intel.com>
      Reviewed-by: default avatarPierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
      Reviewed-by: default avatarPéter Ujfalusi <peter.ujfalusi@linux.intel.com>
      Reviewed-by: default avatarLiam Girdwood <liam.r.girdwood@intel.com>
      Signed-off-by: default avatarPeter Ujfalusi <peter.ujfalusi@linux.intel.com>
      Link: https://msgid.link/r/20240115092209.7184-3-peter.ujfalusi@linux.intel.com
      
      
      Signed-off-by: default avatarMark Brown <broonie@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      768708af
    • Su Hui's avatar
      gpio: mlxbf3: add an error code check in mlxbf3_gpio_probe · 63bf2258
      Su Hui authored
      
      [ Upstream commit d460e9c2075164e9b1fa9c4c95f8c05517bd8752 ]
      
      Clang static checker warning: Value stored to 'ret' is never read.
      bgpio_init() returns error code if failed, it's better to add this
      check.
      
      Fixes: cd33f216 ("gpio: mlxbf3: Add gpio driver support")
      Signed-off-by: default avatarSu Hui <suhui@nfschina.com>
      [Bartosz: add the Fixes: tag]
      Signed-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      63bf2258
    • Michal Simek's avatar
      dt-bindings: gpio: xilinx: Fix node address in gpio · 7a1fff06
      Michal Simek authored
      
      [ Upstream commit 314c020c4ed3de72b15603eb6892250bc4b51702 ]
      
      Node address doesn't match reg property which is not correct.
      
      Fixes: ba96b2e7 ("dt-bindings: gpio: gpio-xilinx: Convert Xilinx axi gpio binding to YAML")
      Signed-off-by: default avatarMichal Simek <michal.simek@amd.com>
      Reviewed-by: default avatarKrzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
      Signed-off-by: default avatarBartosz Golaszewski <bartosz.golaszewski@linaro.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7a1fff06
    • Nikita Yushchenko's avatar
      net: ravb: Fix dma_addr_t truncation in error case · 129d5832
      Nikita Yushchenko authored
      
      [ Upstream commit e327b2372bc0f18c30433ac40be07741b59231c5 ]
      
      In ravb_start_xmit(), ravb driver uses u32 variable to store result of
      dma_map_single() call. Since ravb hardware has 32-bit address fields in
      descriptors, this works properly when mapping is successful - it is
      platform's job to provide mapping addresses that fit into hardware
      limitations.
      
      However, in failure case dma_map_single() returns DMA_MAPPING_ERROR
      constant that is 64-bit when dma_addr_t is 64-bit. Storing this constant
      in u32 leads to truncation, and further call to dma_mapping_error()
      fails to notice the error.
      
      Fix that by storing result of dma_map_single() in a dma_addr_t
      variable.
      
      Fixes: c156633f ("Renesas Ethernet AVB driver proper")
      Signed-off-by: default avatarNikita Yushchenko <nikita.yoush@cogentembedded.com>
      Reviewed-by: default avatarNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
      Reviewed-by: default avatarSergey Shtylyov <s.shtylyov@omp.ru>
      Reviewed-by: default avatarFlorian Fainelli <florian.fainelli@broadcom.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      129d5832
    • John Fastabend's avatar
      net: tls, fix WARNIING in __sk_msg_free · 294e7ea8
      John Fastabend authored
      
      [ Upstream commit dc9dfc8dc629e42f2234e3327b75324ffc752bc9 ]
      
      A splice with MSG_SPLICE_PAGES will cause tls code to use the
      tls_sw_sendmsg_splice path in the TLS sendmsg code to move the user
      provided pages from the msg into the msg_pl. This will loop over the
      msg until msg_pl is full, checked by sk_msg_full(msg_pl). The user
      can also set the MORE flag to hint stack to delay sending until receiving
      more pages and ideally a full buffer.
      
      If the user adds more pages to the msg than can fit in the msg_pl
      scatterlist (MAX_MSG_FRAGS) we should ignore the MORE flag and send
      the buffer anyways.
      
      What actually happens though is we abort the msg to msg_pl scatterlist
      setup and then because we forget to set 'full record' indicating we
      can no longer consume data without a send we fallthrough to the 'continue'
      path which will check if msg_data_left(msg) has more bytes to send and
      then attempts to fit them in the already full msg_pl. Then next
      iteration of sender doing send will encounter a full msg_pl and throw
      the warning in the syzbot report.
      
      To fix simply check if we have a full_record in splice code path and
      if not send the msg regardless of MORE flag.
      
      Reported-and-tested-by: default avatar <syzbot+f2977222e0e95cec15c8@syzkaller.appspotmail.com>
      Reported-by: default avatarEdward Adam Davis <eadavis@qq.com>
      Fixes: fe1e81d4 ("tls/sw: Support MSG_SPLICE_PAGES")
      Reviewed-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      294e7ea8
    • Martin KaFai Lau's avatar
      bpf: Avoid iter->offset making backward progress in bpf_iter_udp · d1d27950
      Martin KaFai Lau authored
      
      [ Upstream commit 2242fd537fab52d5f4d2fbb1845f047c01fad0cf ]
      
      There is a bug in the bpf_iter_udp_batch() function that stops
      the userspace from making forward progress.
      
      The case that triggers the bug is the userspace passed in
      a very small read buffer. When the bpf prog does bpf_seq_printf,
      the userspace read buffer is not enough to capture the whole bucket.
      
      When the read buffer is not large enough, the kernel will remember
      the offset of the bucket in iter->offset such that the next userspace
      read() can continue from where it left off.
      
      The kernel will skip the number (== "iter->offset") of sockets in
      the next read(). However, the code directly decrements the
      "--iter->offset". This is incorrect because the next read() may
      not consume the whole bucket either and then the next-next read()
      will start from offset 0. The net effect is the userspace will
      keep reading from the beginning of a bucket and the process will
      never finish. "iter->offset" must always go forward until the
      whole bucket is consumed.
      
      This patch fixes it by using a local variable "resume_offset"
      and "resume_bucket". "iter->offset" is always reset to 0 before
      it may be used. "iter->offset" will be advanced to the
      "resume_offset" when it continues from the "resume_bucket" (i.e.
      "state->bucket == resume_bucket"). This brings it closer to
      the bpf_iter_tcp's offset handling which does not suffer
      the same bug.
      
      Cc: Aditi Ghag <aditi.ghag@isovalent.com>
      Fixes: c96dac8d ("bpf: udp: Implement batching for sockets iterator")
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Reviewed-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20240112190530.3751661-3-martin.lau@linux.dev
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d1d27950
    • Martin KaFai Lau's avatar
      bpf: iter_udp: Retry with a larger batch size without going back to the previous bucket · 588d3874
      Martin KaFai Lau authored
      
      [ Upstream commit 19ca0823f6eaad01d18f664a00550abe912c034c ]
      
      The current logic is to use a default size 16 to batch the whole bucket.
      If it is too small, it will retry with a larger batch size.
      
      The current code accidentally does a state->bucket-- before retrying.
      This goes back to retry with the previous bucket which has already
      been done. This patch fixed it.
      
      It is hard to create a selftest. I added a WARN_ON(state->bucket < 0),
      forced a particular port to be hashed to the first bucket,
      created >16 sockets, and observed the for-loop went back
      to the "-1" bucket.
      
      Cc: Aditi Ghag <aditi.ghag@isovalent.com>
      Fixes: c96dac8d ("bpf: udp: Implement batching for sockets iterator")
      Acked-by: default avatarYonghong Song <yonghong.song@linux.dev>
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Reviewed-by: default avatarAditi Ghag <aditi.ghag@isovalent.com>
      Link: https://lore.kernel.org/r/20240112190530.3751661-2-martin.lau@linux.dev
      
      
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      588d3874
    • Marc Kleine-Budde's avatar
      net: netdev_queue: netdev_txq_completed_mb(): fix wake condition · d36f0df8
      Marc Kleine-Budde authored
      
      [ Upstream commit 894d7508316e7ad722df597d68b4b1797a9eee11 ]
      
      netif_txq_try_stop() uses "get_desc >= start_thrs" as the check for
      the call to netif_tx_start_queue().
      
      Use ">=" i netdev_txq_completed_mb(), too.
      
      Fixes: c91c46de ("net: provide macros for commonly copied lockless queue stop/wake code")
      Signed-off-by: default avatarMarc Kleine-Budde <mkl@pengutronix.de>
      Acked-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d36f0df8
    • Eric Dumazet's avatar
      net: add more sanity check in virtio_net_hdr_to_skb() · 765290b6
      Eric Dumazet authored
      
      [ Upstream commit 9181d6f8a2bb32d158de66a84164fac05e3ddd18 ]
      
      syzbot/KMSAN reports access to uninitialized data from gso_features_check() [1]
      
      The repro use af_packet, injecting a gso packet and hdrlen == 0.
      
      We could fix the issue making gso_features_check() more careful
      while dealing with NETIF_F_TSO_MANGLEID in fast path.
      
      Or we can make sure virtio_net_hdr_to_skb() pulls minimal network and
      transport headers as intended.
      
      Note that for GSO packets coming from untrusted sources, SKB_GSO_DODGY
      bit forces a proper header validation (and pull) before the packet can
      hit any device ndo_start_xmit(), thus we do not need a precise disection
      at virtio_net_hdr_to_skb() stage.
      
      [1]
      BUG: KMSAN: uninit-value in skb_gso_segment include/net/gso.h:83 [inline]
      BUG: KMSAN: uninit-value in validate_xmit_skb+0x10f2/0x1930 net/core/dev.c:3629
       skb_gso_segment include/net/gso.h:83 [inline]
       validate_xmit_skb+0x10f2/0x1930 net/core/dev.c:3629
       __dev_queue_xmit+0x1eac/0x5130 net/core/dev.c:4341
       dev_queue_xmit include/linux/netdevice.h:3134 [inline]
       packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
       packet_snd net/packet/af_packet.c:3087 [inline]
       packet_sendmsg+0x8b1d/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
       ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
       __sys_sendmsg net/socket.c:2667 [inline]
       __do_sys_sendmsg net/socket.c:2676 [inline]
       __se_sys_sendmsg net/socket.c:2674 [inline]
       __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      Uninit was created at:
       slab_post_alloc_hook+0x129/0xa70 mm/slab.h:768
       slab_alloc_node mm/slub.c:3478 [inline]
       kmem_cache_alloc_node+0x5e9/0xb10 mm/slub.c:3523
       kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:560
       __alloc_skb+0x318/0x740 net/core/skbuff.c:651
       alloc_skb include/linux/skbuff.h:1286 [inline]
       alloc_skb_with_frags+0xc8/0xbd0 net/core/skbuff.c:6334
       sock_alloc_send_pskb+0xa80/0xbf0 net/core/sock.c:2780
       packet_alloc_skb net/packet/af_packet.c:2936 [inline]
       packet_snd net/packet/af_packet.c:3030 [inline]
       packet_sendmsg+0x70e8/0x9f30 net/packet/af_packet.c:3119
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       ____sys_sendmsg+0x9c2/0xd60 net/socket.c:2584
       ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2638
       __sys_sendmsg net/socket.c:2667 [inline]
       __do_sys_sendmsg net/socket.c:2676 [inline]
       __se_sys_sendmsg net/socket.c:2674 [inline]
       __x64_sys_sendmsg+0x307/0x490 net/socket.c:2674
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      CPU: 0 PID: 5025 Comm: syz-executor279 Not tainted 6.7.0-rc7-syzkaller-00003-gfbafc3e621c3 #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      
      Reported-by: default avatar <syzbot+7f4d0ea3df4d4fa9a65f@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/netdev/0000000000005abd7b060eb160cd@google.com/
      
      
      Fixes: 9274124f ("net: stricter validation of untrusted gso packets")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Reviewed-by: default avatarWillem de Bruijn <willemb@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      765290b6
    • Gao Xiang's avatar
      erofs: fix inconsistent per-file compression format · eed24b81
      Gao Xiang authored
      
      [ Upstream commit 118a8cf504d7dfa519562d000f423ee3ca75d2c4 ]
      
      EROFS can select compression algorithms on a per-file basis, and each
      per-file compression algorithm needs to be marked in the on-disk
      superblock for initialization.
      
      However, syzkaller can generate inconsistent crafted images that use
      an unsupported algorithmtype for specific inodes, e.g. use MicroLZMA
      algorithmtype even it's not set in `sbi->available_compr_algs`.  This
      can lead to an unexpected "BUG: kernel NULL pointer dereference" if
      the corresponding decompressor isn't built-in.
      
      Fix this by checking against `sbi->available_compr_algs` for each
      m_algorithmformat request.  Incorrect !erofs_sb_has_compr_cfgs preset
      bitmap is now fixed together since it was harmless previously.
      
      Reported-by: default avatar <bugreport@ubisectech.com>
      Fixes: 8f899262 ("erofs: get compression algorithms directly on mapping")
      Fixes: 622ceadd ("erofs: lzma compression support")
      Reviewed-by: default avatarYue Hu <huyue2@coolpad.com>
      Link: https://lore.kernel.org/r/20240113150602.1471050-1-hsiangkao@linux.alibaba.com
      
      
      Signed-off-by: default avatarGao Xiang <hsiangkao@linux.alibaba.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      eed24b81
    • Eric Dumazet's avatar
      udp: annotate data-races around up->pending · ac1815b6
      Eric Dumazet authored
      
      [ Upstream commit 482521d8e0c6520429478aa6866cd44128b33d5d ]
      
      up->pending can be read without holding the socket lock,
      as pointed out by syzbot [1]
      
      Add READ_ONCE() in lockless contexts, and WRITE_ONCE()
      on write side.
      
      [1]
      BUG: KCSAN: data-race in udpv6_sendmsg / udpv6_sendmsg
      
      write to 0xffff88814e5eadf0 of 4 bytes by task 15547 on cpu 1:
       udpv6_sendmsg+0x1405/0x1530 net/ipv6/udp.c:1596
       inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       __sys_sendto+0x257/0x310 net/socket.c:2192
       __do_sys_sendto net/socket.c:2204 [inline]
       __se_sys_sendto net/socket.c:2200 [inline]
       __x64_sys_sendto+0x78/0x90 net/socket.c:2200
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      read to 0xffff88814e5eadf0 of 4 bytes by task 15551 on cpu 0:
       udpv6_sendmsg+0x22c/0x1530 net/ipv6/udp.c:1373
       inet6_sendmsg+0x63/0x80 net/ipv6/af_inet6.c:657
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg net/socket.c:745 [inline]
       ____sys_sendmsg+0x37c/0x4d0 net/socket.c:2586
       ___sys_sendmsg net/socket.c:2640 [inline]
       __sys_sendmmsg+0x269/0x500 net/socket.c:2726
       __do_sys_sendmmsg net/socket.c:2755 [inline]
       __se_sys_sendmmsg net/socket.c:2752 [inline]
       __x64_sys_sendmmsg+0x57/0x60 net/socket.c:2752
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0x44/0x110 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x63/0x6b
      
      value changed: 0x00000000 -> 0x0000000a
      
      Reported by Kernel Concurrency Sanitizer on:
      CPU: 0 PID: 15551 Comm: syz-executor.1 Tainted: G        W          6.7.0-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Reported-by: default avatar <syzbot+8d482d0e407f665d9d10@syzkaller.appspotmail.com>
      Link: https://lore.kernel.org/netdev/0000000000009e46c3060ebcdffd@google.com/
      
      
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarJiri Pirko <jiri@nvidia.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ac1815b6
    • Sneh Shah's avatar
      net: stmmac: Fix ethool link settings ops for integrated PCS · 204f2d03
      Sneh Shah authored
      
      [ Upstream commit 08300adac3b8dab9e2fd3be0155c7d3093c755f4 ]
      
      Currently get/set_link_ksettings ethtool ops are dependent on PCS.
      When PCS is integrated, it will not have separate link config.
      Bypass configuring and checking PCS for integrated PCS.
      
      Fixes: aa571b62 ("net: stmmac: add new switch to struct plat_stmmacenet_data")
      Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8775p-ride
      Signed-off-by: default avatarSneh Shah <quic_snehshah@quicinc.com>
      Reviewed-by: default avatarSimon Horman <horms@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      204f2d03
Loading