- Jan 02, 2025
-
-
Zichen Xie authored
commit 9b458e8b upstream. There may be a potential integer overflow issue in inftl_partscan(). parts[0].size is defined as "uint64_t" while mtd->erasesize and ip->firstUnit are defined as 32-bit unsigned integer. The result of the calculation will be limited to 32 bits without correct casting. Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by:
Zichen Xie <zichenxie0106@gmail.com> Cc: stable@vger.kernel.org Signed-off-by:
Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
NeilBrown authored
[ Upstream commit 7917f01a ] A recent patch inadvertently broke callbacks for NFSv4.0. In the 4.0 case we do not expect a session to be found but still need to call setup_callback_client() which will not try to dereference it. This patch moves the check for failure to find a session into the 4.1+ branch of setup_callback_client() Fixes: 1e02c641 ("NFSD: Prevent NULL dereference in nfsd4_process_cb_update()") Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yang Erkun authored
[ Upstream commit 69d803c4 ] This reverts commit f8c989a0. Before this commit, svc_export_put or expkey_put will call path_put with sync mode. After this commit, path_put will be called with async mode. And this can lead the unexpected results show as follow. mkfs.xfs -f /dev/sda echo "/ *(rw,no_root_squash,fsid=0)" > /etc/exports echo "/mnt *(rw,no_root_squash,fsid=1)" >> /etc/exports exportfs -ra service nfs-server start mount -t nfs -o vers=4.0 127.0.0.1:/mnt /mnt1 mount /dev/sda /mnt/sda touch /mnt1/sda/file exportfs -r umount /mnt/sda # failed unexcepted The touch will finally call nfsd_cross_mnt, add refcount to mount, and then add cache_head. Before this commit, exportfs -r will call cache_flush to cleanup all cache_head, and path_put in svc_export_put/expkey_put will be finished with sync mode. So, the latter umount will always success. However, after this commit, path_put will be called with async mode, the latter umount may failed, and if we add some delay, umount will success too. Personally I think this bug and should be fixed. We first revert before bugfix patch, and then fix the original bug with a different way. Fixes: f8c989a0 ("nfsd: release svc_expkey/svc_export with rcu_work") Signed-off-by:
Yang Erkun <yangerkun@huawei.com> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Cong Wang authored
[ Upstream commit 9ecc4d85 ] skb_network_offset() and skb_transport_offset() can be negative when they are called after we pull the transport header, for example, when we use eBPF sockmap at the point of ->sk_data_ready(). __bpf_skb_min_len() uses an unsigned int to get these offsets, this leads to a very large number which then causes bpf_skb_change_tail() failed unexpectedly. Fix this by using a signed int to get these offsets and ensure the minimum is at least zero. Fixes: 5293efe6 ("bpf: add bpf_skb_change_tail helper") Signed-off-by:
Cong Wang <cong.wang@bytedance.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Acked-by:
John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241213034057.246437-2-xiyou.wangcong@gmail.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Zijian Zhang authored
[ Upstream commit d888b7af ] When we do sk_psock_verdict_apply->sk_psock_skb_ingress, an sk_msg will be created out of the skb, and the rmem accounting of the sk_msg will be handled by the skb. For skmsgs in __SK_REDIRECT case of tcp_bpf_send_verdict, when redirecting to the ingress of a socket, although we sk_rmem_schedule and add sk_msg to the ingress_msg of sk_redir, we do not update sk_rmem_alloc. As a result, except for the global memory limit, the rmem of sk_redir is nearly unlimited. Thus, add sk_rmem_alloc related logic to limit the recv buffer. Since the function sk_msg_recvmsg and __sk_psock_purge_ingress_msg are used in these two paths. We use "msg->skb" to test whether the sk_msg is skb backed up. If it's not, we shall do the memory accounting explicitly. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by:
Zijian Zhang <zijianzhang@bytedance.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-3-zijianzhang@bytedance.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Cong Wang authored
[ Upstream commit 54f89b31 ] When bpf_tcp_ingress() is called, the skmsg is being redirected to the ingress of the destination socket. Therefore, we should charge its receive socket buffer, instead of sending socket buffer. Because sk_rmem_schedule() tests pfmemalloc of skb, we need to introduce a wrapper and call it for skmsg. Fixes: 604326b4 ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by:
Cong Wang <cong.wang@bytedance.com> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20241210012039.1669389-2-zijianzhang@bytedance.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Bart Van Assche authored
[ Upstream commit 30c2de0a ] Fix the following clang compiler warning that is reported if the kernel is built with W=1: ./include/linux/vmstat.h:518:36: error: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Werror,-Wenum-enum-conversion] 518 | return node_stat_name(NR_LRU_BASE + lru) + 3; // skip "nr_" | ~~~~~~~~~~~ ^ ~~~ Link: https://lkml.kernel.org/r/20241212213126.1269116-1-bvanassche@acm.org Fixes: 9d7ea9a2 ("mm/vmstat: add helpers to get vmstat item names for each enum type") Signed-off-by:
Bart Van Assche <bvanassche@acm.org> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ilya Dryomov authored
[ Upstream commit 18d44c5d ] If mounted with sparseread option, ceph_direct_read_write() ends up making an unnecessarily allocation for O_DIRECT writes. Fixes: 03bc06c7 ("ceph: add new mount option to enable sparse reads") Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Alex Markuze <amarkuze@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ilya Dryomov authored
[ Upstream commit 66e0c4f9 ] The bvecs array which is allocated in iter_get_bvecs_alloc() is leaked and pages remain pinned if ceph_alloc_sparse_ext_map() fails. There is no need to delay the allocation of sparse_ext map until after the bvecs array is set up, so fix this by moving sparse_ext allocation a bit earlier. Also, make a similar adjustment in __ceph_sync_read() for consistency (a leak of the same kind in __ceph_sync_read() has been addressed differently). Cc: stable@vger.kernel.org Fixes: 03bc06c7 ("ceph: add new mount option to enable sparse reads") Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Alex Markuze <amarkuze@redhat.com> Stable-dep-of: 18d44c5d ("ceph: allocate sparse_ext map only for sparse reads") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Xiubo Li authored
[ Upstream commit aaefabc4 ] In fscrypt case and for a smaller read length we can predict the max count of the extent map. And for small read length use cases this could save some memories. [ idryomov: squash into a single patch to avoid build break, drop redundant variable in ceph_alloc_sparse_ext_map() ] Signed-off-by:
Xiubo Li <xiubli@redhat.com> Reviewed-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Stable-dep-of: 18d44c5d ("ceph: allocate sparse_ext map only for sparse reads") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Nikita Zhandarovich authored
[ Upstream commit 2dd59fe0 ] Syzbot reports [1] an uninitialized value issue found by KMSAN in dib3000_read_reg(). Local u8 rb[2] is used in i2c_transfer() as a read buffer; in case that call fails, the buffer may end up with some undefined values. Since no elaborate error handling is expected in dib3000_write_reg(), simply zero out rb buffer to mitigate the problem. [1] Syzkaller report dvb-usb: bulk message failed: -22 (6/0) ===================================================== BUG: KMSAN: uninit-value in dib3000mb_attach+0x2d8/0x3c0 drivers/media/dvb-frontends/dib3000mb.c:758 dib3000mb_attach+0x2d8/0x3c0 drivers/media/dvb-frontends/dib3000mb.c:758 dibusb_dib3000mb_frontend_attach+0x155/0x2f0 drivers/media/usb/dvb-usb/dibusb-mb.c:31 dvb_usb_adapter_frontend_init+0xed/0x9a0 drivers/media/usb/dvb-usb/dvb-usb-dvb.c:290 dvb_usb_adapter_init drivers/media/usb/dvb-usb/dvb-usb-init.c:90 [inline] dvb_usb_init drivers/media/usb/dvb-usb/dvb-usb-init.c:186 [inline] dvb_usb_device_init+0x25a8/0x3760 drivers/media/usb/dvb-usb/dvb-usb-init.c:310 dibusb_probe+0x46/0x250 drivers/media/usb/dvb-usb/dibusb-mb.c:110 ... Local variable rb created at: dib3000_read_reg+0x86/0x4e0 drivers/media/dvb-frontends/dib3000mb.c:54 dib3000mb_attach+0x123/0x3c0 drivers/media/dvb-frontends/dib3000mb.c:758 ... Fixes: 74340b0a ("V4L/DVB (4457): Remove dib3000-common-module") Reported-by:
<syzbot+c88fc0ebe0d5935c70da@syzkaller.appspotmail.com> Signed-off-by:
Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20240517155800.9881-1-n.zhandarovich@fintech.ru Signed-off-by:
Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Dec 27, 2024
-
-
Greg Kroah-Hartman authored
Tested-by:
Hardik Garg <hargar@linux.microsoft.com> Link: https://lore.kernel.org/r/20241223155359.534468176@linuxfoundation.org Tested-by:
SeongJae Park <sj@kernel.org> Tested-by:
Shuah Khan <skhan@linuxfoundation.org> Tested-by:
Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Tested-by:
Ron Economos <re@w6rz.net> Tested-by:
Peter Schneider <pschneider1968@googlemail.com> Tested-by:
Jon Hunter <jonathanh@nvidia.com> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
kernelci.org bot <bot@kernelci.org> Tested-by:
Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Francesco Dolcini authored
commit 1aa772be upstream. Add fsl,pps-channel property to select where to connect the PPS signal. This depends on the internal SoC routing and on the board, for example on the i.MX8 SoC it can be connected to an external pin (using channel 1) or to internal eDMA as DMA request (channel 0). Signed-off-by:
Francesco Dolcini <francesco.dolcini@toradex.com> Acked-by:
Conor Dooley <conor.dooley@microchip.com> Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Michel Dänzer authored
commit 85230ee3 upstream. Third time's the charm, I hope? Fixes: d3116756 ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by:
Christian König <christian.koenig@amd.com> Signed-off-by:
Michel Dänzer <mdaenzer@redhat.com> Signed-off-by:
Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 695c2c74) Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Francesco Dolcini authored
commit 566c2d83 upstream. Depending on the SoC where the FEC is integrated into the PPS channel might be routed to different timer instances. Make this configurable from the devicetree. When the related DT property is not present fallback to the previous default and use channel 0. Reviewed-by:
Frank Li <Frank.Li@nxp.com> Tested-by:
Rafael Beims <rafael.beims@toradex.com> Signed-off-by:
Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by:
Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Francesco Dolcini authored
commit bf8ca67e upstream. Preparation patch to allow for PPS channel configuration, no functional change intended. Signed-off-by:
Francesco Dolcini <francesco.dolcini@toradex.com> Reviewed-by:
Frank Li <Frank.Li@nxp.com> Reviewed-by:
Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Csókás, Bence <csokas.bence@prolan.hu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
Commit 6e6b8c62 upstream. kiocb_done() should care to specifically redirecting requests to io-wq. Remove the hopping to tw to then queue an io-wq, return -EAGAIN and let the core code io_uring handle offloading. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Tested-by:
Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/413564e550fe23744a970e1783dfa566291b0e6f.1710799188.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit 6e6b8c62) Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
Commit c0a9d496 upstream. Some file systems, ocfs2 in this case, will return -EOPNOTSUPP for an IOCB_NOWAIT read/write attempt. While this can be argued to be correct, the usual return value for something that requires blocking issue is -EAGAIN. A refactoring io_uring commit dropped calling kiocb_done() for negative return values, which is otherwise where we already do that transformation. To ensure we catch it in both spots, check it in __io_read() itself as well. Reported-by:
Robert Sander <r.sander@heinlein-support.de> Link: https://fosstodon.org/@gurubert@mastodon.gurubert.de/113112431889638440 Cc: stable@vger.kernel.org Fixes: a08d195b ("io_uring/rw: split io_read() into a helper") Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
Commit a08d195b upstream. Add __io_read() which does the grunt of the work, leaving the completion side to the new io_read(). No functional changes in this patch. Reviewed-by:
Gabriel Krisman Bertazi <krisman@suse.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk> (cherry picked from commit a08d195b) Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xuewen Yan authored
commit 900bbaae upstream. Now, the epoll only use wake_up() interface to wake up task. However, sometimes, there are epoll users which want to use the synchronous wakeup flag to hint the scheduler, such as Android binder driver. So add a wake_up_sync() define, and use the wake_up_sync() when the sync is true in ep_poll_callback(). Co-developed-by:
Jing Xia <jing.xia@unisoc.com> Signed-off-by:
Jing Xia <jing.xia@unisoc.com> Signed-off-by:
Xuewen Yan <xuewen.yan@unisoc.com> Link: https://lore.kernel.org/r/20240426080548.8203-1-xuewen.yan@unisoc.com Tested-by:
Brian Geffon <bgeffon@google.com> Reviewed-by:
Brian Geffon <bgeffon@google.com> Reported-by:
Benoit Lize <lizeb@google.com> Signed-off-by:
Christian Brauner <brauner@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Max Kellermann authored
commit d6fd6f82 upstream. In two `break` statements, the call to ceph_release_page_vector() was missing, leaking the allocation from ceph_alloc_page_vector(). Instead of adding the missing ceph_release_page_vector() calls, the Ceph maintainers preferred to transfer page ownership to the `ceph_osd_request` by passing `own_pages=true` to osd_req_op_extent_osd_data_pages(). This requires postponing the ceph_osdc_put_request() call until after the block that accesses the `pages`. Cc: stable@vger.kernel.org Fixes: 03bc06c7 ("ceph: add new mount option to enable sparse reads") Fixes: f0fe1e54 ("ceph: plumb in decryption during reads") Signed-off-by:
Max Kellermann <max.kellermann@ionos.com> Reviewed-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alex Markuze authored
commit 9abee475 upstream. This patch refines the read logic in __ceph_sync_read() to ensure more predictable and efficient behavior in various edge cases. - Return early if the requested read length is zero or if the file size (`i_size`) is zero. - Initialize the index variable (`idx`) where needed and reorder some code to ensure it is always set before use. - Improve error handling by checking for negative return values earlier. - Remove redundant encrypted file checks after failures. Only attempt filesystem-level decryption if the read succeeded. - Simplify leftover calculations to correctly handle cases where the read extends beyond the end of the file or stops short. This can be hit by continuously reading a file while, on another client, we keep truncating and writing new data into it. - This resolves multiple issues caused by integer and consequent buffer overflow (`pages` array being accessed beyond `num_pages`): - https://tracker.ceph.com/issues/67524 - https://tracker.ceph.com/issues/68980 - https://tracker.ceph.com/issues/68981 Cc: stable@vger.kernel.org Fixes: 1065da21 ("ceph: stop copying to iter at EOF on sync reads") Reported-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Signed-off-by:
Alex Markuze <amarkuze@redhat.com> Reviewed-by:
Viacheslav Dubeyko <Slava.Dubeyko@ibm.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ilya Dryomov authored
commit 12eb22a5 upstream. It becomes a path component, so it shouldn't exceed NAME_MAX characters. This was hardened in commit c152737b ("ceph: Use strscpy() instead of strcpy() in __get_snap_name()"), but no actual check was put in place. Cc: stable@vger.kernel.org Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Reviewed-by:
Alex Markuze <amarkuze@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zijun Hu authored
commit 5d009e02 upstream. __of_get_dma_parent() returns OF device node @args.np, but the node's refcount is increased twice, by both of_parse_phandle_with_args() and of_node_get(), so causes refcount leakage for the node. Fix by directly returning the node got by of_parse_phandle_with_args(). Fixes: f83a6e5d ("of: address: Add support for the parent DMA bus") Cc: stable@vger.kernel.org Signed-off-by:
Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241206-of_core_fix-v1-4-dc28ed56bec3@quicinc.com Signed-off-by:
Rob Herring (Arm) <robh@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Herve Codina authored
commit d7dfa7fd upstream. The current code uses some 'goto put;' to cancel the parsing operation and can lead to a return code value of 0 even on error cases. Indeed, some goto calls are done from a loop without setting the ret value explicitly before the goto call and so the ret value can be set to 0 due to operation done in previous loop iteration. For instance match can be set to 0 in the previous loop iteration (leading to a new iteration) but ret can also be set to 0 it the of_property_read_u32() call succeed. In that case if no match are found or if an error is detected the new iteration, the return value can be wrongly 0. Avoid those cases setting the ret value explicitly before the goto calls. Fixes: bd6f2fd5 ("of: Support parsing phandle argument lists through a nexus node") Cc: stable@vger.kernel.org Signed-off-by:
Herve Codina <herve.codina@bootlin.com> Link: https://lore.kernel.org/r/20241202165819.158681-1-herve.codina@bootlin.com Signed-off-by:
Rob Herring (Arm) <robh@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jann Horn authored
commit 0a16e24e upstream. When F_SEAL_FUTURE_WRITE was introduced, it was overlooked that udmabuf must reject memfds with this flag, just like ones with F_SEAL_WRITE. Fix it by adding F_SEAL_FUTURE_WRITE to SEALS_DENIED. Fixes: ab3948f5 ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd") Cc: stable@vger.kernel.org Acked-by:
Vivek Kasireddy <vivek.kasireddy@intel.com> Signed-off-by:
Jann Horn <jannh@google.com> Reviewed-by:
Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by:
Vivek Kasireddy <vivek.kasireddy@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241204-udmabuf-fixes-v2-2-23887289de1c@google.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Edward Adam Davis authored
commit 901ce970 upstream. syzbot reported a WARNING in nilfs_rmdir. [1] Because the inode bitmap is corrupted, an inode with an inode number that should exist as a ".nilfs" file was reassigned by nilfs_mkdir for "file0", causing an inode duplication during execution. And this causes an underflow of i_nlink in rmdir operations. The inode is used twice by the same task to unmount and remove directories ".nilfs" and "file0", it trigger warning in nilfs_rmdir. Avoid to this issue, check i_nlink in nilfs_iget(), if it is 0, it means that this inode has been deleted, and iput is executed to reclaim it. [1] WARNING: CPU: 1 PID: 5824 at fs/inode.c:407 drop_nlink+0xc4/0x110 fs/inode.c:407 ... Call Trace: <TASK> nilfs_rmdir+0x1b0/0x250 fs/nilfs2/namei.c:342 vfs_rmdir+0x3a3/0x510 fs/namei.c:4394 do_rmdir+0x3b5/0x580 fs/namei.c:4453 __do_sys_rmdir fs/namei.c:4472 [inline] __se_sys_rmdir fs/namei.c:4470 [inline] __x64_sys_rmdir+0x47/0x50 fs/namei.c:4470 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Link: https://lkml.kernel.org/r/20241209065759.6781-1-konishi.ryusuke@gmail.com Fixes: d2500652 ("nilfs2: pathname operations") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+9260555647a5132edd48@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=9260555647a5132edd48 Tested-by:
<syzbot+9260555647a5132edd48@syzkaller.appspotmail.com> Signed-off-by:
Edward Adam Davis <eadavis@qq.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ryusuke Konishi authored
commit 6309b8ce upstream. When block_invalidatepage was converted to block_invalidate_folio, the fallback to block_invalidatepage in folio_invalidate() if the address_space_operations method invalidatepage (currently invalidate_folio) was not set, was removed. Unfortunately, some pseudo-inodes in nilfs2 use empty_aops set by inode_init_always_gfp() as is, or explicitly set it to address_space_operations. Therefore, with this change, block_invalidatepage() is no longer called from folio_invalidate(), and as a result, the buffer_head structures attached to these pages/folios are no longer freed via try_to_free_buffers(). Thus, these buffer heads are now leaked by truncate_inode_pages(), which cleans up the page cache from inode evict(), etc. Three types of caches use empty_aops: gc inode caches and the DAT shadow inode used by GC, and b-tree node caches. Of these, b-tree node caches explicitly call invalidate_mapping_pages() during cleanup, which involves calling try_to_free_buffers(), so the leak was not visible during normal operation but worsened when GC was performed. Fix this issue by using address_space_operations with invalidate_folio set to block_invalidate_folio instead of empty_aops, which will ensure the same behavior as before. Link: https://lkml.kernel.org/r/20241212164556.21338-1-konishi.ryusuke@gmail.com Fixes: 7ba13abb ("fs: Turn block_invalidatepage into block_invalidate_folio") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [5.18+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zijun Hu authored
commit 0f7ca6f6 upstream. of_irq_parse_one() may use uninitialized variable @addr_len as shown below: // @addr_len is uninitialized int addr_len; // This operation does not touch @addr_len if it fails. addr = of_get_property(device, "reg", &addr_len); // Use uninitialized @addr_len if the operation fails. if (addr_len > sizeof(addr_buf)) addr_len = sizeof(addr_buf); // Check the operation result here. if (addr) memcpy(addr_buf, addr, addr_len); Fix by initializing @addr_len before the operation. Fixes: b739dffa ("of/irq: Prevent device address out-of-bounds read in interrupt map walk") Cc: stable@vger.kernel.org Signed-off-by:
Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241209-of_irq_fix-v1-4-782f1419c8a1@quicinc.com Signed-off-by:
Rob Herring (Arm) <robh@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zijun Hu authored
commit fec3edc4 upstream. On a malformed interrupt-map property which is shorter than expected by 1 cell, we may read bogus data past the end of the property instead of returning an error in of_irq_parse_imap_parent(). Decrement the remaining length when skipping over the interrupt parent phandle cell. Fixes: 935df1bd ("of/irq: Factor out parsing of interrupt-map parent phandle+args from of_irq_parse_raw()") Cc: stable@vger.kernel.org Signed-off-by:
Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/r/20241209-of_irq_fix-v1-1-782f1419c8a1@quicinc.com [rh: reword commit msg] Signed-off-by:
Rob Herring (Arm) <robh@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Trond Myklebust authored
commit 62e2a47c upstream. When the server is recalling a layout, we should ignore the count of outstanding layoutget calls, since the server is expected to return either NFS4ERR_RECALLCONFLICT or NFS4ERR_RETURNCONFLICT for as long as the recall is outstanding. Currently, we may end up livelocking, causing the layout to eventually be forcibly revoked. Fixes: bf0291dd ("pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised") Cc: stable@vger.kernel.org Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pavel Begunkov authored
commit dbd2ca93 upstream. task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case. Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race. Cc: stable@vger.kernel.org Fixes: 50c52250 ("block: implement async io_uring discard cmd") Fixes: 773af691 ("io_uring: always reissue from task_work context") Reported-by:
Will <willsroot@protonmail.com> Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.1734637909.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jann Horn authored
commit 12d90811 upstream. Currently, io_uring_unreg_ringfd() (which cleans up registered rings) is only called on exit, but __io_uring_free (which frees the tctx in which the registered ring pointers are stored) is also called on execve (via begin_new_exec -> io_uring_task_cancel -> __io_uring_cancel -> io_uring_cancel_generic -> __io_uring_free). This means: A process going through execve while having registered rings will leak references to the rings' `struct file`. Fix it by zapping registered rings on execve(). This is implemented by moving the io_uring_unreg_ringfd() from io_uring_files_cancel() into its callee __io_uring_cancel(), which is called from io_uring_task_cancel() on execve. This could probably be exploited *on 32-bit kernels* by leaking 2^32 references to the same ring, because the file refcount is stored in a pointer-sized field and get_file() doesn't have protection against refcount overflow, just a WARN_ONCE(); but on 64-bit it should have no impact beyond a memory leak. Cc: stable@vger.kernel.org Fixes: e7a6c00d ("io_uring: add support for registering ring file descriptors") Signed-off-by:
Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20241218-uring-reg-ring-cleanup-v1-1-8f63e999045b@google.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Tiezhu Yang authored
commit 29d44cce upstream. Currently, LoongArch LLVM does not support the constraint "o" and no plan to support it, it only supports the similar constraint "m", so change the constraints from "nor" in the "else" case to arch-specific "nmr" to avoid the build error such as "unexpected asm memory constraint" for LoongArch. Fixes: 630301b0 ("selftests/bpf: Add basic USDT selftests") Suggested-by:
Weining Lu <luweining@loongson.cn> Suggested-by:
Li Chen <chenli@loongson.cn> Signed-off-by:
Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by:
Daniel Borkmann <daniel@iogearbox.net> Reviewed-by:
Huacai Chen <chenhuacai@loongson.cn> Cc: stable@vger.kernel.org Link: https://llvm.org/docs/LangRef.html#supported-constraint-code-list Link: https://github.com/llvm/llvm-project/blob/main/llvm/lib/Target/LoongArch/LoongArchISelDAGToDAG.cpp#L172 Link: https://lore.kernel.org/bpf/20241219111506.20643-1-yangtiezhu@loongson.cn Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Isaac Manjarres authored
commit 6a75f19a upstream. The sysctl tests for vm.memfd_noexec rely on the kernel to support PID namespaces (i.e. the kernel is built with CONFIG_PID_NS=y). If the kernel the test runs on does not support PID namespaces, the first sysctl test will fail when attempting to spawn a new thread in a new PID namespace, abort the test, preventing the remaining tests from being run. This is not desirable, as not all kernels need PID namespaces, but can still use the other features provided by memfd. Therefore, only run the sysctl tests if the kernel supports PID namespaces. Otherwise, skip those tests and emit an informative message to let the user know why the sysctl tests are not being run. Link: https://lkml.kernel.org/r/20241205192943.3228757-1-isaacmanjarres@google.com Fixes: 11f75a01 ("selftests/memfd: add tests for MFD_NOEXEC_SEAL MFD_EXEC") Signed-off-by:
Isaac J. Manjarres <isaacmanjarres@google.com> Reviewed-by:
Jeff Xu <jeffxu@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Kalesh Singh <kaleshsingh@google.com> Cc: <stable@vger.kernel.org> [6.6+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt authored
commit 65a25d9f upstream. The test_event_printk() code makes sure that when a trace event is registered, any dereferenced pointers in from the event's TP_printk() are pointing to content in the ring buffer. But currently it does not handle "%s", as there's cases where the string pointer saved in the ring buffer points to a static string in the kernel that will never be freed. As that is a valid case, the pointer needs to be checked at runtime. Currently the runtime check is done via trace_check_vprintf(), but to not have to replicate everything in vsnprintf() it does some logic with the va_list that may not be reliable across architectures. In order to get rid of that logic, more work in the test_event_printk() needs to be done. Some of the strings can be validated at this time when it is obvious the string is valid because the string will be saved in the ring buffer content. Do all the validation of strings in the ring buffer at boot in test_event_printk(), and make sure that the field of the strings that point into the kernel are accessible. This will allow adding checks at runtime that will validate the fields themselves and not rely on paring the TP_printk() format at runtime. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20241217024720.685917008@goodmis.org Fixes: 5013f454 ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt authored
commit 91711048 upstream. The process_pointer() helper function looks to see if various trace event macros are used. These macros are for storing data in the event. This makes it safe to dereference as the dereference will then point into the event on the ring buffer where the content of the data stays with the event itself. A few helper functions were missing. Those were: __get_rel_dynamic_array() __get_dynamic_array_len() __get_rel_dynamic_array_len() __get_rel_sockaddr() Also add a helper function find_print_string() to not need to use a middle man variable to test if the string exists. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20241217024720.521836792@goodmis.org Fixes: 5013f454 ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Rostedt authored
commit a6629626 upstream. The test_event_printk() analyzes print formats of trace events looking for cases where it may dereference a pointer that is not in the ring buffer which can possibly be a bug when the trace event is read from the ring buffer and the content of that pointer no longer exists. The function needs to accurately go from one print format argument to the next. It handles quotes and parenthesis that may be included in an argument. When it finds the start of the next argument, it uses a simple "c = strstr(fmt + i, ',')" to find the end of that argument! In order to include "%s" dereferencing, it needs to process the entire content of the print format argument and not just the content of the first ',' it finds. As there may be content like: ({ const char *saved_ptr = trace_seq_buffer_ptr(p); static const char *access_str[] = { "---", "--x", "w--", "w-x", "-u-", "-ux", "wu-", "wux" }; union kvm_mmu_page_role role; role.word = REC->role; trace_seq_printf(p, "sp gen %u gfn %llx l%u %u-byte q%u%s %s%s" " %snxe %sad root %u %s%c", REC->mmu_valid_gen, REC->gfn, role.level, role.has_4_byte_gpte ? 4 : 8, role.quadrant, role.direct ? " direct" : "", access_str[role.access], role.invalid ? " invalid" : "", role.efer_nx ? "" : "!", role.ad_disabled ? "!" : "", REC->root_count, REC->unsync ? "unsync" : "sync", 0); saved_ptr; }) Which is an example of a full argument of an existing event. As the code already handles finding the next print format argument, process the argument at the end of it and not the start of it. This way it has both the start of the argument as well as the end of it. Add a helper function "process_pointer()" that will do the processing during the loop as well as at the end. It also makes the code cleaner and easier to read. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/20241217024720.362271189@goodmis.org Fixes: 5013f454 ("tracing: Add check of trace event print fmts for dereferencing pointers") Signed-off-by:
Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Enzo Matsumiya authored
commit e9f2517a upstream. Commit ef7134c7 ("smb: client: Fix use-after-free of network namespace.") fixed a netns UAF by manually enabled socket refcounting (sk->sk_net_refcnt=1 and sock_inuse_add(net, 1)). The reason the patch worked for that bug was because we now hold references to the netns (get_net_track() gets a ref internally) and they're properly released (internally, on __sk_destruct()), but only because sk->sk_net_refcnt was set. Problem: (this happens regardless of CONFIG_NET_NS_REFCNT_TRACKER and regardless if init_net or other) Setting sk->sk_net_refcnt=1 *manually* and *after* socket creation is not only out of cifs scope, but also technically wrong -- it's set conditionally based on user (=1) vs kernel (=0) sockets. And net/ implementations seem to base their user vs kernel space operations on it. e.g. upon TCP socket close, the TCP timers are not cleared because sk->sk_net_refcnt=1: (cf. commit 151c9c72 ("tcp: properly terminate timers for kernel sockets")) net/ipv4/tcp.c: void tcp_close(struct sock *sk, long timeout) { lock_sock(sk); __tcp_close(sk, timeout); release_sock(sk); if (!sk->sk_net_refcnt) inet_csk_clear_xmit_timers_sync(sk); sock_put(sk); } Which will throw a lockdep warning and then, as expected, deadlock on tcp_write_timer(). A way to reproduce this is by running the reproducer from ef7134c7 and then 'rmmod cifs'. A few seconds later, the deadlock/lockdep warning shows up. Fix: We shouldn't mess with socket internals ourselves, so do not set sk_net_refcnt manually. Also change __sock_create() to sock_create_kern() for explicitness. As for non-init_net network namespaces, we deal with it the best way we can -- hold an extra netns reference for server->ssocket and drop it when it's released. This ensures that the netns still exists whenever we need to create/destroy server->ssocket, but is not directly tied to it. Fixes: ef7134c7 ("smb: client: Fix use-after-free of network namespace.") Cc: stable@vger.kernel.org Signed-off-by:
Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by:
Steve French <stfrench@microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sean Christopherson authored
commit 9b42d1e8 upstream. Use is_64_bit_hypercall() instead of is_64_bit_mode() to detect a 64-bit hypercall when completing said hypercall. For guests with protected state, e.g. SEV-ES and SEV-SNP, KVM must assume the hypercall was made in 64-bit mode as the vCPU state needed to detect 64-bit mode is unavailable. Hacking the sev_smoke_test selftest to generate a KVM_HC_MAP_GPA_RANGE hypercall via VMGEXIT trips the WARN: ------------[ cut here ]------------ WARNING: CPU: 273 PID: 326626 at arch/x86/kvm/x86.h:180 complete_hypercall_exit+0x44/0xe0 [kvm] Modules linked in: kvm_amd kvm ... [last unloaded: kvm] CPU: 273 UID: 0 PID: 326626 Comm: sev_smoke_test Not tainted 6.12.0-smp--392e932fa0f3-feat #470 Hardware name: Google Astoria/astoria, BIOS 0.20240617.0-0 06/17/2024 RIP: 0010:complete_hypercall_exit+0x44/0xe0 [kvm] Call Trace: <TASK> kvm_arch_vcpu_ioctl_run+0x2400/0x2720 [kvm] kvm_vcpu_ioctl+0x54f/0x630 [kvm] __se_sys_ioctl+0x6b/0xc0 do_syscall_64+0x83/0x160 entry_SYSCALL_64_after_hwframe+0x76/0x7e </TASK> ---[ end trace 0000000000000000 ]--- Fixes: b5aead00 ("KVM: x86: Assume a 64-bit hypercall for guests with protected state") Cc: stable@vger.kernel.org Cc: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by:
Xiaoyao Li <xiaoyao.li@intel.com> Reviewed-by:
Nikunj A Dadhania <nikunj@amd.com> Reviewed-by:
Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by:
Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by:
Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/r/20241128004344.4072099-2-seanjc@google.com Signed-off-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-