- May 14, 2024
-
-
Gregory Detal authored
The sysctl lists the available schedulers that can be set using net.mptcp.scheduler similarly to net.ipv4.tcp_available_congestion_control. Signed-off-by:
Gregory Detal <gregory.detal@gmail.com> Reviewed-by:
Mat Martineau <martineau@kernel.org> Tested-by:
Geliang Tang <geliang@kernel.org> Reviewed-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by:
Mat Martineau <martineau@kernel.org> Link: https://lore.kernel.org/r/20240514011335.176158-5-martineau@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Jason Xing authored
We're going to send an RST due to invalid syn packet which is already checked whether 1) it is in sequence, 2) it is a retransmitted skb. As RFC 793 says, if the state of socket is not CLOSED/LISTEN/SYN-SENT, then we should send an RST when receiving bad syn packet: "fourth, check the SYN bit,...If the SYN is in the window it is an error, send a reset" Signed-off-by:
Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-6-kerneljasonxing@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Jason Xing authored
There are two possible cases where TCP layer can send an RST. Since they happen in the same place, I think using one independent reason is enough to identify this special situation. Signed-off-by:
Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-5-kerneljasonxing@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Jason Xing authored
Like the previous patch does in this series, finish the conversion map is enough to let rstreason mechanism work in this function. Signed-off-by:
Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-4-kerneljasonxing@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Jason Xing authored
Based on the existing skb drop reason, updating the rstreason map can help us finish the rstreason job in this function. Signed-off-by:
Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-3-kerneljasonxing@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Jason Xing authored
In this function, only updating the map can finish the job for socket reset reason because the corresponding drop reasons are ready. Signed-off-by:
Jason Xing <kernelxing@tencent.com> Link: https://lore.kernel.org/r/20240510122502.27850-2-kerneljasonxing@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Russell King (Oracle) authored
Introduce a mechanism whereby platforms can create their PCS instances prior to the network device being published to userspace, but after some of the core stmmac initialisation has been completed. This means that the data structures that platforms need will be available. Signed-off-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by:
Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by:
Serge Semin <fancer.lancer@gmail.com> Co-developed-by:
Romain Gantois <romain.gantois@bootlin.com> Signed-off-by:
Romain Gantois <romain.gantois@bootlin.com> Reviewed-by:
Hariprasad Kelam <hkelam@marvell.com> Link: https://lore.kernel.org/r/20240513-rzn1-gmac1-v7-4-6acf58b5440d@bootlin.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- May 13, 2024
-
-
Parav Pandit authored
MSIX irq allocation and free APIs are no longer in use. Hence, remove the dead code. Signed-off-by:
Parav Pandit <parav@nvidia.com> Reviewed-by:
Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by:
Tariq Toukan <tariqt@nvidia.com> Reviewed-by:
Simon Horman <horms@kernel.org> Reviewed-by:
Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://lore.kernel.org/r/20240512124306.740898-4-tariqt@nvidia.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Shay Drory authored
This patch adds to mlx5 drivers support for 8 ports HCAs. Starting with ConnectX-8 HCAs with 8 ports are possible. As most driver parts aren't affected by such configuration most driver code is unchanged. Specially the only affected areas are: - Lag - Multiport E-Switch - Single FDB E-Switch All of the above are already factored in generic way, and LAG and VF LAG are tested, so all that left is to change a #define and remove checks which are no longer needed. However, Multiport E-Switch is not tested yet, so it is left untouched. This patch will allow to create hardware LAG/VF LAG when all 8 ports are added to the same bond device. for example, In order to activate the hardware lag a user can execute the following: ip link add bond0 type bond ip link set bond0 type bond miimon 100 mode 2 ip link set eth2 master bond0 ip link set eth3 master bond0 ip link set eth4 master bond0 ip link set eth5 master bond0 ip link set eth6 master bond0 ip link set eth7 master bond0 ip link set eth8 master bond0 ip link set eth9 master bond0 Where eth2, eth3, eth4, eth5, eth6, eth7, eth8 and eth9 are the PFs of the same HCA. Signed-off-by:
Shay Drory <shayd@nvidia.com> Reviewed-by:
Mark Bloch <mbloch@nvidia.com> Signed-off-by:
Tariq Toukan <tariqt@nvidia.com> Reviewed-by:
Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240512124306.740898-2-tariqt@nvidia.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Daniel Jurgens authored
TX queue stop and wake are counted by some drivers. Support reporting these via netdev-genl queue stats. Signed-off-by:
Daniel Jurgens <danielj@nvidia.com> Reviewed-by:
Jiri Pirko <jiri@nvidia.com> Reviewed-by:
Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240510201927.1821109-2-danielj@nvidia.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Matthieu Baerts (NGI0) authored
A way for an application to know if an MPTCP connection fell back to TCP is to use getsockopt(MPTCP_INFO) and look for errors. The issue with this technique is that the same errors -- EOPNOTSUPP (IPv4) and ENOPROTOOPT (IPv6) -- are returned if there was a fallback, *or* if the kernel doesn't support this socket option. The userspace then has to look at the kernel version to understand what the errors mean. It is not clean, and it doesn't take into account older kernels where the socket option has been backported. A cleaner way would be to expose this info to the TCP socket level. In case of MPTCP socket where no fallback happened, the socket options for the TCP level will be handled in MPTCP code, in mptcp_getsockopt_sol_tcp(). If not, that will be in TCP code, in do_tcp_getsockopt(). So MPTCP simply has to set the value 1, while TCP has to set 0. If the socket option is not supported, one of these two errors will be reported: - EOPNOTSUPP (95 - Operation not supported) for MPTCP sockets - ENOPROTOOPT (92 - Protocol not available) for TCP sockets, e.g. on the socket received after an 'accept()', when the client didn't request to use MPTCP: this socket will be a TCP one, even if the listen socket was an MPTCP one. With this new option, the kernel can return a clear answer to both "Is this kernel new enough to tell me the fallback status?" and "If it is new enough, is it currently a TCP or MPTCP socket?" questions, while not breaking the previous method. Acked-by:
Mat Martineau <martineau@kernel.org> Signed-off-by:
Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://lore.kernel.org/r/20240509-upstream-net-next-20240509-mptcp-tcp_is_mptcp-v1-1-f846df999202@kernel.org Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Richard Gobert authored
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags, iph->id, ...) against all packets in a loop. These flush checks are used in all merging UDP and TCP flows. These checks need to be done only once and only against the found p skb, since they only affect flush and not same_flow. This patch leverages correct network header offsets from the cb for both outer and inner network headers - allowing these checks to be done only once, in tcp_gro_receive and udp_gro_receive_segment. As a result, NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are more declarative and contained in inet_gro_flush, thus removing the need for flush_id in napi_gro_cb. This results in less parsing code for non-loop flush tests for TCP and UDP flows. To make sure results are not within noise range - I've made netfilter drop all TCP packets, and measured CPU performance in GRO (in this case GRO is responsible for about 50% of the CPU utilization). perf top while replaying 64 parallel IP/TCP streams merging in GRO: (gro_receive_network_flush is compiled inline to tcp_gro_receive) net-next: 6.94% [kernel] [k] inet_gro_receive 3.02% [kernel] [k] tcp_gro_receive patch applied: 4.27% [kernel] [k] tcp_gro_receive 4.22% [kernel] [k] inet_gro_receive perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same results for any encapsulation, in this case inet_gro_receive is top offender in net-next) net-next: 10.09% [kernel] [k] inet_gro_receive 2.08% [kernel] [k] tcp_gro_receive patch applied: 6.97% [kernel] [k] inet_gro_receive 3.68% [kernel] [k] tcp_gro_receive Signed-off-by:
Richard Gobert <richardbgobert@gmail.com> Reviewed-by:
Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240509190819.2985-3-richardbgobert@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Richard Gobert authored
This patch converts references of skb->network_header to napi_gro_cb's network_offset and inner_network_offset. Signed-off-by:
Richard Gobert <richardbgobert@gmail.com> Reviewed-by:
Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20240509190819.2985-2-richardbgobert@gmail.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- May 12, 2024
-
-
Puranjay Mohan authored
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit. RISCV saves the pointer to the CPU's task_struct in the TP (thread pointer) register. This makes it trivial to get the CPU's processor id. As thread_info is the first member of task_struct, we can read the processor id from TP + offsetof(struct thread_info, cpu). RISCV64 JIT output for `call bpf_get_smp_processor_id` ====================================================== Before After -------- ------- auipc t1,0x848c ld a5,32(tp) jalr 604(t1) mv a5,a0 Benchmark using [1] on Qemu. ./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc +---------------+------------------+------------------+--------------+ | Name | Before | After | % change | |---------------+------------------+------------------+--------------| | glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% | | arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% | | hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% | +---------------+------------------+------------------+--------------+ NOTE: This benchmark includes changes from this patch and the previous patch that implemented the per-cpu insn. [1] https://github.com/anakryiko/linux/commit/8dec900975ef Signed-off-by:
Puranjay Mohan <puranjay@kernel.org> Acked-by:
Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by:
Andrii Nakryiko <andrii@kernel.org> Acked-by:
Björn Töpel <bjorn@kernel.org> Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org Signed-off-by:
Alexei Starovoitov <ast@kernel.org>
-
- May 11, 2024
-
-
Kuniyuki Iwashima authored
Commit 1af2dfac ("af_unix: Don't access successor in unix_del_edges() during GC.") fixed use-after-free by avoid accessing edge->successor while GC is in progress. However, there could be a small race window where another process could call unix_del_edges() while gc_in_progress is true and __skb_queue_purge() is on the way. So, we need another marker for struct scm_fp_list which indicates if the skb is garbage-collected. This patch adds dead flag in struct scm_fp_list and set it true before calling __skb_queue_purge(). Fixes: 1af2dfac ("af_unix: Don't access successor in unix_del_edges() during GC.") Signed-off-by:
Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by:
Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- May 10, 2024
-
-
Florian Westphal authored
Sven Auhagen reports transaction failures with following error: ./main.nft:13:1-26: Error: Could not process rule: Cannot allocate memory percpu: allocation failed, size=16 align=8 atomic=1, atomic alloc failed, no space left This points to failing pcpu allocation with GFP_ATOMIC flag. However, transactions happen from user context and are allowed to sleep. One case where we can call into percpu allocator with GFP_ATOMIC is nft_counter expression. Normally this happens from control plane, so this could use GFP_KERNEL instead. But one use case, element insertion from packet path, needs to use GFP_ATOMIC allocations (nft_dynset expression). At this time, .clone callbacks always use GFP_ATOMIC for this reason. Add gfp_t argument to the .clone function and pass GFP_KERNEL or GFP_ATOMIC flag depending on context, this allows all clone memory allocations to sleep for the normal (transaction) case. Cc: Sven Auhagen <sven.auhagen@voleatech.de> Signed-off-by:
Florian Westphal <fw@strlen.de> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Eric Dumazet authored
DCCP is going away soon, and had no twsk_unique() method. We can directly call tcp_twsk_unique() for TCP sockets. Signed-off-by:
Eric Dumazet <edumazet@google.com> Reviewed-by:
Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20240507164140.940547-1-edumazet@google.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- May 09, 2024
-
-
Eric Dumazet authored
dst_cache->reset_ts is read or written locklessly, add READ_ONCE() and WRITE_ONCE() annotations. Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
Paolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20240507132000.614591-1-edumazet@google.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- May 08, 2024
-
-
Peilin He authored
Introduce a tracepoint for icmp_send, which can help users to get more detail information conveniently when icmp abnormal events happen. 1. Giving an usecase example: ============================= When an application experiences packet loss due to an unreachable UDP destination port, the kernel will send an exception message through the icmp_send function. By adding a trace point for icmp_send, developers or system administrators can obtain detailed information about the UDP packet loss, including the type, code, source address, destination address, source port, and destination port. This facilitates the trouble-shooting of UDP packet loss issues especially for those network-service applications. 2. Operation Instructions: ========================== Switch to the tracing directory. cd /sys/kernel/tracing Filter for destination port unreachable. echo "type==3 && code==3" > events/icmp/icmp_send/filter Enable trace event. echo 1 > events/icmp/icmp_send/enable 3. Result View: ================ udp_client_erro-11370 [002] ...s.12 124.728002: icmp_send: icmp_send: type=3, code=3. From 127.0.0.1:41895 to 127.0.0.1:6666 ulen=23 skbaddr=00000000589b167a Signed-off-by:
Peilin He <he.peilin@zte.com.cn> Signed-off-by:
xu xin <xu.xin16@zte.com.cn> Reviewed-by:
Yunkai Zhang <zhang.yunkai@zte.com.cn> Cc: Yang Yang <yang.yang29@zte.com.cn> Cc: Liu Chun <liu.chun2@zte.com.cn> Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn> Reviewed-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Oleksij Rempel authored
Some switches like Microchip KSZ variants do not support per port DSCP priority configuration. Instead there is a global DSCP mapping table. To handle it, we will accept set/del request to any of user ports to make global configuration and update dcb app entries for all other ports. Signed-off-by:
Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Oleksij Rempel authored
IEEE 802.1q specification provides recommendation and examples which can be used as good default values for different drivers. This patch implements mapping examples documented in IEEE 802.1Q-2022 in Annex I "I.3 Traffic type to traffic class mapping" and IETF DSCP naming and mapping DSCP to Traffic Type inspired by RFC8325. This helpers will be used in followup patches for dsa/microchip DCB implementation. Signed-off-by:
Oleksij Rempel <o.rempel@pengutronix.de> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Oleksij Rempel authored
Add DCB support to get/set trust configuration for different packet priority information sources. Some switch allow to chose different source of packet priority classification. For example on KSZ switches it is possible to configure VLAN PCP and/or DSCP sources. Signed-off-by:
Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by:
Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- May 07, 2024
-
-
Matthias Schiffer authored
The embedded PHYs of the 88E6250 family switches are very basic - they do not even have an Extended Address / Page register. This adds support for the PHYs to the driver to set up PHY interrupts and retrieve error stats. To deal with PHYs without a page register, "simple" variants of all stat handling functions are introduced. The code should work with all 88E6250 family switches (6250/6220/6071/ 6070/6020). The PHY ID 0x01410db0 was read from a 88E6020, under the assumption that all switches of this family use the same ID. The spec only lists the prefix 0x01410c00 and leaves the last 10 bits as reserved, but that seems too unspecific to be useful, as it would cover several existing PHY IDs already supported by the driver; therefore, the ID read from the actual hardware is used. Signed-off-by:
Matthias Schiffer <matthias.schiffer@ew.tq-group.com> Reviewed-by:
Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/0695f699cd942e6e06da9d30daeedfd47785bc01.1714643285.git.matthias.schiffer@ew.tq-group.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
- May 06, 2024
-
-
Pablo Neira Ayuso authored
Add new iflink attributes to configure in-kernel UDP listener socket address: IFLA_GTP_LOCAL and IFLA_GTP_LOCAL6. If none of these attributes are specified, default is still to IPv4 INADDR_ANY for backward compatibility. Add new attributes to set up family and IPv6 address of GTP tunnels: GTPA_FAMILY, GTPA_PEER_ADDR6 and GTPA_MS_ADDR6. If no GTPA_FAMILY is specified, AF_INET is assumed for backward compatibility. setsockopt IPV6_ADDRFORM allows to downgrade socket from IPv6 to IPv4 after socket is bound. Assumption is that socket listener that is attached to the gtp device needs to be either IPv4 or IPv6. Therefore, GTP socket listener does not allow for IPv4-mapped-IPv6 listener. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Pablo Neira Ayuso authored
Currently GTP packets are dropped if the next extension field is set to non-zero value, but this are valid GTP packets. TS 29.281 provides a longer header format, which is defined as struct gtp1_header_long. Such long header format is used if any of the S, PN, E flags is set. This long header is 4 bytes longer than struct gtp1_header, plus variable length (optional) extension headers. The next extension header field is zero is no extension header is provided. The extension header is composed of a length field which includes total number of 4 byte words including the extension header itself (1 byte), payload (variable length) and next type (1 byte). The extension header size and its payload is aligned to 4 bytes. A GTP packet might come with a chain extensions headers, which makes it slightly cumbersome to parse because the extension next header field comes at the end of the extension header, and there is a need to check if this field becomes zero to stop the extension header parser. Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Linus Torvalds authored
This reverts commit 07ed11af. Stephen Rostedt reports: "I went to run my tests on my VMs and the tests hung on boot up. Unfortunately, the most I ever got out was: [ 93.607888] Testing event system initcall: OK [ 93.667730] Running tests on all trace events: [ 93.669757] Testing all events: OK [ 95.631064] ------------[ cut here ]------------ Timed out after 60 seconds" and further debugging points to a possible circular locking dependency between the console_owner locking and the worker pool locking. Reverting the commit allows Steve's VM to boot to completion again. [ This may obviously result in the "[TTM] Buffer eviction failed" messages again, which was the reason for that original revert. But at this point this seems preferable to a non-booting system... ] Reported-and-bisected-by:
Steven Rostedt <rostedt@goodmis.org> Link: https://lore.kernel.org/all/20240502081641.457aa25f@gandalf.local.home/ Acked-by:
Maxime Ripard <mripard@kernel.org> Cc: Alex Constantino <dreaming.about.electric.sheep@gmail.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Timo Lindfors <timo.lindfors@iki.fi> Cc: Dave Airlie <airlied@redhat.com> Cc: Gerd Hoffmann <kraxel@redhat.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Felix Fietkau authored
Pull the code out of tcp_gro_receive in order to access the tcp header from tcp4/6_gro_receive. Acked-by:
Paolo Abeni <pabeni@redhat.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Felix Fietkau <nbd@nbd.name> Reviewed-by:
David Ahern <dsahern@kernel.org> Reviewed-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Felix Fietkau authored
This pulls the flow port matching out of tcp_gro_receive, so that it can be reused for the next change, which adds the TCP fraglist GRO heuristic. Acked-by:
Paolo Abeni <pabeni@redhat.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Felix Fietkau <nbd@nbd.name> Reviewed-by:
David Ahern <dsahern@kernel.org> Reviewed-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Felix Fietkau authored
This helper function will be used for TCP fraglist GRO support Acked-by:
Paolo Abeni <pabeni@redhat.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Felix Fietkau <nbd@nbd.name> Reviewed-by:
David Ahern <dsahern@kernel.org> Reviewed-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Linus Lüssing authored
So far Multicast Router Advertisements and Multicast Router Solicitations from the Multicast Router Discovery protocol (RFC4286) would be marked as INVALID for IPv6, even if they are in fact intact and adhering to RFC4286. This broke MRA reception and by that multicast reception on IPv6 multicast routers in a Proxmox managed setup, where Proxmox would install a rule like "-m conntrack --ctstate INVALID -j DROP" at the top of the FORWARD chain with br-nf-call-ip6tables enabled by default. Similar to as it's done for MLDv1, MLDv2 and IPv6 Neighbor Discovery already, fix this issue by excluding MRD from connection tracking handling as MRD always uses predefined multicast destinations for its messages, too. This changes the ct-state for ICMPv6 MRD messages from INVALID to UNTRACKED. This issue was found and fixed with the help of the mrdisc tool (https://github.com/troglobit/mrdisc ). Signed-off-by:
Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- May 05, 2024
-
-
Mina Almasry authored
This API enables the net stack to reset the queues used for devmem TCP. Signed-off-by:
Mina Almasry <almasrymina@google.com> Signed-off-by:
Shailend Chand <shailend@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- May 04, 2024
-
-
Steven Rostedt (Google) authored
Synthetic events create and destroy tracefs files when they are created and removed. The tracing subsystem has its own file descriptor representing the state of the events attached to the tracefs files. There's a race between the eventfs files and this file descriptor of the tracing system where the following can cause an issue: With two scripts 'A' and 'B' doing: Script 'A': echo "hello int aaa" > /sys/kernel/tracing/synthetic_events while : do echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable done Script 'B': echo > /sys/kernel/tracing/synthetic_events Script 'A' creates a synthetic event "hello" and then just writes zero into its enable file. Script 'B' removes all synthetic events (including the newly created "hello" event). What happens is that the opening of the "enable" file has: { struct trace_event_file *file = inode->i_private; int ret; ret = tracing_check_open_get_tr(file->tr); [..] But deleting the events frees the "file" descriptor, and a "use after free" happens with the dereference at "file->tr". The file descriptor does have a reference counter, but there needs to be a way to decrement it from the eventfs when the eventfs_inode is removed that represents this file descriptor. Add an optional "release" callback to the eventfs_entry array structure, that gets called when the eventfs file is about to be removed. This allows for the creating on the eventfs file to increment the tracing file descriptor ref counter. When the eventfs file is deleted, it can call the release function that will call the put function for the tracing file descriptor. This will protect the tracing file from being freed while a eventfs file that references it is being opened. Link: https://lore.kernel.org/linux-trace-kernel/20240426073410.17154-1-Tze-nan.Wu@mediatek.com/ Link: https://lore.kernel.org/linux-trace-kernel/20240502090315.448cba46@gandalf.local.home Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Fixes: 5790b1fb ("eventfs: Remove eventfs_file and just use eventfs_inode") Reported-by:
Tze-nan wu <Tze-nan.Wu@mediatek.com> Tested-by:
Tze-nan Wu (吳澤南) <Tze-nan.Wu@mediatek.com> Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org>
-
- May 03, 2024
-
-
Mina Almasry authored
This reverts commit a580ea99. This revert is to resolve Dragos's report of page_pool leak here: https://lore.kernel.org/lkml/20240424165646.1625690-2-dtatulea@nvidia.com/ The reverted patch interacts very badly with commit 2cc3aeb5 ("skbuff: Fix a potential race while recycling page_pool packets"). The reverted commit hopes that the pp_recycle + is_pp_page variables do not change between the skb_frag_ref and skb_frag_unref operation. If such a change occurs, the skb_frag_ref/unref will not operate on the same reference type. In the case of Dragos's report, the grabbed ref was a pp ref, but the unref was a page ref, because the pp_recycle setting on the skb was changed. Attempting to fix this issue on the fly is risky. Lets revert and I hope to reland this with better understanding and testing to ensure we don't regress some edge case while streamlining skb reffing. Fixes: a580ea99 ("net: mirror skb frag ref/unref helpers") Reported-by:
Dragos Tatulea <dtatulea@nvidia.com> Signed-off-by:
Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20240502175423.2456544-1-almasrymina@google.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Eric Dumazet authored
dev->threaded can be read locklessly, if we add corresponding READ_ONCE()/WRITE_ONCE() annotations. Signed-off-by:
Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240502173926.2010646-1-edumazet@google.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-
Joel Granados authored
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/ ) Avoid a buffer overflow when traversing the ctl_table by ensuring that AX25_MAX_VALUES is the same as the size of ax25_param_table. This is done with a BUILD_BUG_ON where ax25_param_table is defined and a CONFIG_AX25_DAMA_SLAVE guard in the unnamed enum definition as well as in the ax25_dev_device_up and ax25_ds_set_timer functions. The overflow happened when the sentinel was removed from ax25_param_table. The sentinel's data element was changed when CONFIG_AX25_DAMA_SLAVE was undefined. This had no adverse effects as it still stopped on the sentinel's null procname but needed to be addressed once the sentinel was removed. Signed-off-by:
Joel Granados <j.granados@samsung.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Aditya Kumar Singh authored
In order to support color change with MLO, handle the link ID now passed from cfg80211, adjust the code to do everything per link and call the notifications to cfg80211 correctly. Signed-off-by:
Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240422053412.2024075-4-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-5-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-6-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-7-quic_adisi@quicinc.com [squash, move API call updates to this patch] Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
Aditya Kumar Singh authored
Currently, during color change, no link id information is passed down. In order to support color change during Multi Link Operation, it is required to pass link id as well. Additionally, update notification APIs to allow drivers/mac80211 to pass the link ID. Signed-off-by:
Aditya Kumar Singh <quic_adisi@quicinc.com> Link: https://msgid.link/20240422053412.2024075-2-quic_adisi@quicinc.com Link: https://msgid.link/20240422053412.2024075-3-quic_adisi@quicinc.com [squash, actually only pass 0 from mac80211] Signed-off-by:
Johannes Berg <johannes.berg@intel.com>
-
- May 02, 2024
-
-
Miao Xu authored
This patch adds two new arguments for cong_control of struct tcp_congestion_ops: - ack - flag These two arguments are inherited from the caller tcp_cong_control in tcp_intput.c. One use case of them is to update cwnd and pacing rate inside cong_control based on the info they provide. For example, the flag can be used to decide if it is the right time to raise or reduce a sender's cwnd. Reviewed-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
Miao Xu <miaxu@meta.com> Link: https://lore.kernel.org/r/20240502042318.801932-2-miaxu@meta.com Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org>
-
Richard Gobert authored
Commits a6024562 ("udp: Add GRO functions to UDP socket") and 57c67ff4 ("udp: additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the complete phase of gro. The functions always return skb->network_header, which in the case of encapsulated packets at the gro complete phase, is always set to the innermost L3 of the packet. That means that calling {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ L3/L4 may return an unexpected value. This incorrect usage leads to a bug in GRO's UDP socket lookup. udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These *_hdr functions return network_header which will point to the innermost L3, resulting in the wrong offset being used in __udp{4,6}_lib_lookup with encapsulated packets. This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. To fix the issue, network_offsets union is used inside napi_gro_cb, in which both the outer and the inner network offsets are saved. Reproduction example: Endpoint configuration example (fou + local address bind) # ip fou add port 6666 ipproto 4 # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip # ip link set tun1 up # ip a add 1.1.1.2/24 dev tun1 Netperf TCP_STREAM result on net-next before patch is applied: net-next main, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.28 2.37 net-next main, GRO disabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2745.06 patch applied, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2877.38 Fixes: a6024562 ("udp: Add GRO functions to UDP socket") Signed-off-by:
Richard Gobert <richardbgobert@gmail.com> Reviewed-by:
Eric Dumazet <edumazet@google.com> Reviewed-by:
Willem de Bruijn <willemb@google.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com>
-
Florian Fainelli authored
Now that we no longer any drivers using PHYLIB's adjust_link callback, remove all paths that made use of adjust_link as well as the associated functions. Signed-off-by:
Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://lore.kernel.org/r/20240430164816.2400606-3-florian.fainelli@broadcom.com Signed-off-by:
Jakub Kicinski <kuba@kernel.org>
-