- Jan 24, 2023
-
-
Jens Axboe authored
commit 10c87333 upstream. We currently check REQ_F_POLLED before arming async poll for a notification to retry. If it's set, then we don't allow poll and will punt to io-wq instead. This is done to prevent a situation where a buggy driver will repeatedly return that there's space/data available yet we get -EAGAIN. However, if we already transferred data, then it should be safe to rely on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is also set. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jens Axboe authored
commit 4c3c0943 upstream. Like commit 7ba89d2a for recv/recvmsg, support MSG_WAITALL for the send side. If this flag is set and we do a short send, retry for a stream of seqpacket socket. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jens Axboe authored
commit 8a3e8ee5 upstream. If we need to continue doing this IO, then we don't want a potentially selected buffer recycled. Add a flag for that. Set this for recv/recvmsg if they do partial IO. Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jens Axboe authored
commit 7ba89d2a upstream. We currently don't attempt to get the full asked for length even if MSG_WAITALL is set, if we get a partial receive. If we do see a partial receive, then just note how many bytes we did and return -EAGAIN to get it retried. The iov is advanced appropriately for the vector based case, and we manually bump the buffer and remainder for the non-vector case. Cc: stable@vger.kernel.org Reported-by:
Constantine Gavrilov <constantine.gavrilov@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Pavel Begunkov authored
commit 7297ce3d upstream. Hide all error handling under common if block, removes two extra ifs on the success path and keeps the handling more condensed. Signed-off-by:
Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5761545158a12968f3caf30f747eea65ed75dfc1.1637524285.git.asml.silence@gmail.com Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jens Axboe authored
commit 46a525e1 upstream. This isn't a reliable mechanism to tell if we have task_work pending, we really should be looking at whether we have any items queued. This is problematic if forward progress is gated on running said task_work. One such example is reading from a pipe, where the write side has been closed right before the read is started. The fput() of the file queues TWA_RESUME task_work, and we need that task_work to be run before ->release() is called for the pipe. If ->release() isn't called, then the read will sit forever waiting on data that will never arise. Fix this by io_run_task_work() so it checks if we have task_work pending rather than rely on TIF_NOTIFY_SIGNAL for that. The latter obviously doesn't work for task_work that is queued without TWA_SIGNAL. Reported-by:
Christiano Haesbaert <haesbaert@haesbaert.org> Cc: stable@vger.kernel.org Link: https://github.com/axboe/liburing/issues/665 Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Krzysztof authored
[ Upstream commit 272970be ] The driver shutdown callback (which sends EDL_SOC_RESET to the device over serdev) should not be invoked when HCI device is not open (e.g. if hci_dev_open_sync() failed), because the serdev and its TTY are not open either. Also skip this step if device is powered off (qca_power_shutdown()). The shutdown callback causes use-after-free during system reboot with Qualcomm Atheros Bluetooth: Unable to handle kernel paging request at virtual address 0072662f67726fd7 ... CPU: 6 PID: 1 Comm: systemd-shutdow Tainted: G W 6.1.0-rt5-00325-g8a5f56bcfcca #8 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: tty_driver_flush_buffer+0x4/0x30 serdev_device_write_flush+0x24/0x34 qca_serdev_shutdown+0x80/0x130 [hci_uart] device_shutdown+0x15c/0x260 kernel_restart+0x48/0xac KASAN report: BUG: KASAN: use-after-free in tty_driver_flush_buffer+0x1c/0x50 Read of size 8 at addr ffff16270c2e0018 by task systemd-shutdow/1 CPU: 7 PID: 1 Comm: systemd-shutdow Not tainted 6.1.0-next-20221220-00014-gb85aaf97fb01-dirty #28 Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT) Call trace: dump_backtrace.part.0+0xdc/0xf0 show_stack+0x18/0x30 dump_stack_lvl+0x68/0x84 print_report+0x188/0x488 kasan_report+0xa4/0xf0 __asan_load8+0x80/0xac tty_driver_flush_buffer+0x1c/0x50 ttyport_write_flush+0x34/0x44 serdev_device_write_flush+0x48/0x60 qca_serdev_shutdown+0x124/0x274 device_shutdown+0x1e8/0x350 kernel_restart+0x48/0xb0 __do_sys_reboot+0x244/0x2d0 __arm64_sys_reboot+0x54/0x70 invoke_syscall+0x60/0x190 el0_svc_common.constprop.0+0x7c/0x160 do_el0_svc+0x44/0xf0 el0_svc+0x2c/0x6c el0t_64_sync_handler+0xbc/0x140 el0t_64_sync+0x190/0x194 Fixes: 7e7bbddd ("Bluetooth: hci_qca: Fix qca6390 enable failure after warm reboot") Cc: <stable@vger.kernel.org> Signed-off-by:
Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by:
Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Venkata Lakshmi Narayana Gubba authored
[ Upstream commit 2be43aba ] Currently qca_suspend() is relied on IBS mechanism. During FW download and memory dump collections, IBS will be disabled. In those cases, driver will allow suspend and still uses the serdev port, which results to errors. Now added a wait timeout if suspend is triggered during FW download and memory collections. Signed-off-by:
Venkata Lakshmi Narayana Gubba <gubbaven@codeaurora.org> Signed-off-by:
Balakrishna Godavarthi <bgodavar@codeaurora.org> Reviewed-by:
Abhishek Pandit-Subedi <abhishekpandit@chromium.org> Signed-off-by:
Marcel Holtmann <marcel@holtmann.org> Stable-dep-of: 272970be ("Bluetooth: hci_qca: Fix driver shutdown on closed serdev") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Chris Wilson authored
[ Upstream commit d3de5616 ] After applying an engine reset, on some platforms like Jasperlake, we occasionally detect that the engine state is not cleared until shortly after the resume. As we try to resume the engine with volatile internal state, the first request fails with a spurious CS event (it looks like it reports a lite-restore to the hung context, instead of the expected idle->active context switch). Signed-off-by:
Chris Wilson <chris@chris-wilson.co.uk> Cc: stable@vger.kernel.org Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by:
Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by:
Gwan-gyeong Mun <gwan-gyeong.mun@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221212161338.1007659-1-andi.shyti@linux.intel.com (cherry picked from commit 3db9d590) Signed-off-by:
Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yuchi Yang authored
[ Upstream commit 1f680609 ] Turn on power early to avoid wrong state for power relation register. This can earlier update JD state when resume back. Signed-off-by:
Yuchi Yang <yangyuchi66@gmail.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/e35d8f4fa18f4448a2315cc7d4a3715f@realtek.com Signed-off-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ding Hui authored
[ Upstream commit e006ac30 ] After [1][2], if we catch exceptions due to EFI runtime service, we will clear EFI_RUNTIME_SERVICES bit to disable EFI runtime service, then the subsequent routine which invoke the EFI runtime service should fail. But the userspace cat efivars through /sys/firmware/efi/efivars/ will stuck and infinite loop calling read() due to efivarfs_file_read() return -EINTR. The -EINTR is converted from EFI_ABORTED by efi_status_to_err(), and is an improper return value in this situation, so let virt_efi_xxx() return EFI_DEVICE_ERROR and converted to -EIO to invoker. Cc: <stable@vger.kernel.org> Fixes: 3425d934 ("efi/x86: Handle page faults occurring while running EFI runtime services") Fixes: 23715a26 ("arm64: efi: Recover from synchronous exceptions occurring in firmware") Signed-off-by:
Ding Hui <dinghui@sangfor.com.cn> Signed-off-by:
Ard Biesheuvel <ardb@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ryusuke Konishi authored
commit 7633355e upstream. If nilfs2 reads a corrupted disk image and tries to reads a b-tree node block by calling __nilfs_btree_get_block() against an invalid virtual block address, it returns -ENOENT because conversion of the virtual block address to a disk block address fails. However, this return value is the same as the internal code that b-tree lookup routines return to indicate that the block being searched does not exist, so functions that operate on that b-tree may misbehave. When nilfs_btree_insert() receives this spurious 'not found' code from nilfs_btree_do_lookup(), it misunderstands that the 'not found' check was successful and continues the insert operation using incomplete lookup path data, causing the following crash: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] ... RIP: 0010:nilfs_btree_get_nonroot_node fs/nilfs2/btree.c:418 [inline] RIP: 0010:nilfs_btree_prepare_insert fs/nilfs2/btree.c:1077 [inline] RIP: 0010:nilfs_btree_insert+0x6d3/0x1c10 fs/nilfs2/btree.c:1238 Code: bc 24 80 00 00 00 4c 89 f8 48 c1 e8 03 42 80 3c 28 00 74 08 4c 89 ff e8 4b 02 92 fe 4d 8b 3f 49 83 c7 28 4c 89 f8 48 c1 e8 03 <42> 80 3c 28 00 74 08 4c 89 ff e8 2e 02 92 fe 4d 8b 3f 49 83 c7 02 ... Call Trace: <TASK> nilfs_bmap_do_insert fs/nilfs2/bmap.c:121 [inline] nilfs_bmap_insert+0x20d/0x360 fs/nilfs2/bmap.c:147 nilfs_get_block+0x414/0x8d0 fs/nilfs2/inode.c:101 __block_write_begin_int+0x54c/0x1a80 fs/buffer.c:1991 __block_write_begin fs/buffer.c:2041 [inline] block_write_begin+0x93/0x1e0 fs/buffer.c:2102 nilfs_write_begin+0x9c/0x110 fs/nilfs2/inode.c:261 generic_perform_write+0x2e4/0x5e0 mm/filemap.c:3772 __generic_file_write_iter+0x176/0x400 mm/filemap.c:3900 generic_file_write_iter+0xab/0x310 mm/filemap.c:3932 call_write_iter include/linux/fs.h:2186 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x7dc/0xc50 fs/read_write.c:584 ksys_write+0x177/0x2a0 fs/read_write.c:637 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ... </TASK> This patch fixes the root cause of this problem by replacing the error code that __nilfs_btree_get_block() returns on block address conversion failure from -ENOENT to another internal code -EINVAL which means that the b-tree metadata is corrupted. By returning -EINVAL, it propagates without glitches, and for all relevant b-tree operations, functions in the upper bmap layer output an error message indicating corrupted b-tree metadata via nilfs_bmap_convert_error(), and code -EIO will be eventually returned as it should be. Link: https://lkml.kernel.org/r/000000000000bd89e205f0e38355@google.com Link: https://lkml.kernel.org/r/20230105055356.8811-1-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+ede796cecd5296353515@syzkaller.appspotmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Damien Le Moal authored
commit a608da3b upstream. Using REQ_OP_ZONE_APPEND operations for synchronous writes to sequential files succeeds regardless of the zone write pointer position, as long as the target zone is not full. This means that if an external (buggy) application writes to the zone of a sequential file underneath the file system, subsequent file write() operation will succeed but the file size will not be correct and the file will contain invalid data written by another application. Modify zonefs_file_dio_append() to check the written sector of an append write (returned in bio->bi_iter.bi_sector) and return -EIO if there is a mismatch with the file zone wp offset field. This change triggers a call to zonefs_io_error() and a zone check. Modify zonefs_io_error_cb() to not expose the unexpected data after the current inode size when the errors=remount-ro mode is used. Other error modes are correctly handled already. Fixes: 02ef12a6 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO") Cc: stable@vger.kernel.org Signed-off-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shawn.Shao authored
commit 57054fe5 upstream. Since there is no protection for vd, a kernel panic will be triggered here in exceptional cases. You can refer to the processing of axi_chan_block_xfer_complete function The triggered kernel panic is as follows: [ 67.848444] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000060 [ 67.848447] Mem abort info: [ 67.848449] ESR = 0x96000004 [ 67.848451] EC = 0x25: DABT (current EL), IL = 32 bits [ 67.848454] SET = 0, FnV = 0 [ 67.848456] EA = 0, S1PTW = 0 [ 67.848458] Data abort info: [ 67.848460] ISV = 0, ISS = 0x00000004 [ 67.848462] CM = 0, WnR = 0 [ 67.848465] user pgtable: 4k pages, 48-bit VAs, pgdp=00000800c4c0b000 [ 67.848468] [0000000000000060] pgd=0000000000000000, p4d=0000000000000000 [ 67.848472] Internal error: Oops: 96000004 [#1] SMP [ 67.848475] Modules linked in: dmatest [ 67.848479] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.100-emu_x2rc+ #11 [ 67.848483] pstate: 62000085 (nZCv daIf -PAN -UAO +TCO BTYPE=--) [ 67.848487] pc : axi_chan_handle_err+0xc4/0x230 [ 67.848491] lr : axi_chan_handle_err+0x30/0x230 [ 67.848493] sp : ffff0803fe55ae50 [ 67.848495] x29: ffff0803fe55ae50 x28: ffff800011212200 [ 67.848500] x27: ffff0800c42c0080 x26: ffff0800c097c080 [ 67.848504] x25: ffff800010d33880 x24: ffff80001139d850 [ 67.848508] x23: ffff0800c097c168 x22: 0000000000000000 [ 67.848512] x21: 0000000000000080 x20: 0000000000002000 [ 67.848517] x19: ffff0800c097c080 x18: 0000000000000000 [ 67.848521] x17: 0000000000000000 x16: 0000000000000000 [ 67.848525] x15: 0000000000000000 x14: 0000000000000000 [ 67.848529] x13: 0000000000000000 x12: 0000000000000040 [ 67.848533] x11: ffff0800c0400248 x10: ffff0800c040024a [ 67.848538] x9 : ffff800010576cd4 x8 : ffff0800c0400270 [ 67.848542] x7 : 0000000000000000 x6 : ffff0800c04003e0 [ 67.848546] x5 : ffff0800c0400248 x4 : ffff0800c4294480 [ 67.848550] x3 : dead000000000100 x2 : dead000000000122 [ 67.848555] x1 : 0000000000000100 x0 : ffff0800c097c168 [ 67.848559] Call trace: [ 67.848562] axi_chan_handle_err+0xc4/0x230 [ 67.848566] dw_axi_dma_interrupt+0xf4/0x590 [ 67.848569] __handle_irq_event_percpu+0x60/0x220 [ 67.848573] handle_irq_event+0x64/0x120 [ 67.848576] handle_fasteoi_irq+0xc4/0x220 [ 67.848580] __handle_domain_irq+0x80/0xe0 [ 67.848583] gic_handle_irq+0xc0/0x138 [ 67.848585] el1_irq+0xc8/0x180 [ 67.848588] arch_cpu_idle+0x14/0x2c [ 67.848591] default_idle_call+0x40/0x16c [ 67.848594] do_idle+0x1f0/0x250 [ 67.848597] cpu_startup_entry+0x2c/0x60 [ 67.848600] rest_init+0xc0/0xcc [ 67.848603] arch_call_rest_init+0x14/0x1c [ 67.848606] start_kernel+0x4cc/0x500 [ 67.848610] Code: eb0002ff 9a9f12d6 f2fbd5a2 f2fbd5a3 (a94602c1) [ 67.848613] ---[ end trace 585a97036f88203a ]--- Signed-off-by:
Shawn.Shao <shawn.shao@jaguarmicro.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230112055802.1764-1-shawn.shao@jaguarmicro.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexander Wetzel authored
commit 69403bad upstream. ieee80211_tx_ba_session_handle_start() may get NULL for sdata when a deauthentication is ongoing. Here a trace triggering the race with the hostapd test multi_ap_fronthaul_on_ap: (gdb) list *drv_ampdu_action+0x46 0x8b16 is in drv_ampdu_action (net/mac80211/driver-ops.c:396). 391 int ret = -EOPNOTSUPP; 392 393 might_sleep(); 394 395 sdata = get_bss_sdata(sdata); 396 if (!check_sdata_in_driver(sdata)) 397 return -EIO; 398 399 trace_drv_ampdu_action(local, sdata, params); 400 wlan0: moving STA 02:00:00:00:03:00 to state 3 wlan0: associated wlan0: deauthenticating from 02:00:00:00:03:00 by local choice (Reason: 3=DEAUTH_LEAVING) wlan3.sta1: Open BA session requested for 02:00:00:00:00:00 tid 0 wlan3.sta1: dropped frame to 02:00:00:00:00:00 (unauthorized port) wlan0: moving STA 02:00:00:00:03:00 to state 2 wlan0: moving STA 02:00:00:00:03:00 to state 1 wlan0: Removed STA 02:00:00:00:03:00 wlan0: Destroyed STA 02:00:00:00:03:00 BUG: unable to handle page fault for address: fffffffffffffb48 PGD 11814067 P4D 11814067 PUD 11816067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 2 PID: 133397 Comm: kworker/u16:1 Tainted: G W 6.1.0-rc8-wt+ #59 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 Workqueue: phy3 ieee80211_ba_session_work [mac80211] RIP: 0010:drv_ampdu_action+0x46/0x280 [mac80211] Code: 53 48 89 f3 be 89 01 00 00 e8 d6 43 bf ef e8 21 46 81 f0 83 bb a0 1b 00 00 04 75 0e 48 8b 9b 28 0d 00 00 48 81 eb 10 0e 00 00 <8b> 93 58 09 00 00 f6 c2 20 0f 84 3b 01 00 00 8b 05 dd 1c 0f 00 85 RSP: 0018:ffffc900025ebd20 EFLAGS: 00010287 RAX: 0000000000000000 RBX: fffffffffffff1f0 RCX: ffff888102228240 RDX: 0000000080000000 RSI: ffffffff918c5de0 RDI: ffff888102228b40 RBP: ffffc900025ebd40 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000000 R12: ffff888118c18ec0 R13: 0000000000000000 R14: ffffc900025ebd60 R15: ffff888018b7efb8 FS: 0000000000000000(0000) GS:ffff88817a600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: fffffffffffffb48 CR3: 0000000105228006 CR4: 0000000000170ee0 Call Trace: <TASK> ieee80211_tx_ba_session_handle_start+0xd0/0x190 [mac80211] ieee80211_ba_session_work+0xff/0x2e0 [mac80211] process_one_work+0x29f/0x620 worker_thread+0x4d/0x3d0 ? process_one_work+0x620/0x620 kthread+0xfb/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> Signed-off-by:
Alexander Wetzel <alexander@wetzel-home.de> Link: https://lore.kernel.org/r/20221230121850.218810-2-alexander@wetzel-home.de Cc: stable@vger.kernel.org Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Arend van Spriel authored
commit ed05cb17 upstream. A sanity check was introduced considering maximum flowrings above 256 as insane and effectively aborting the device probe. This resulted in regression for number of users as the value turns out to be sane after all. Fixes: 2aca4f37 ("brcmfmac: return error when getting invalid max_flowrings from dongle") Reported-by:
chainofflowers <chainofflowers@posteo.net> Link: https://lore.kernel.org/all/4781984.GXAFRqVoOG@luna/ Reported-by:
Christian Marillat <marillat@debian.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216894 Cc: stable@vger.kernel.org Signed-off-by:
Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by:
Kalle Valo <kvalo@kernel.org> Link: https://lore.kernel.org/r/20230111112419.24185-1-arend.vanspriel@broadcom.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jaegeuk Kim authored
[ Upstream commit df9d44b6 ] This patch avoids the below panic. pc : __lookup_extent_tree+0xd8/0x760 lr : f2fs_do_write_data_page+0x104/0x87c sp : ffffffc010cbb3c0 x29: ffffffc010cbb3e0 x28: 0000000000000000 x27: ffffff8803e7f020 x26: ffffff8803e7ed40 x25: ffffff8803e7f020 x24: ffffffc010cbb460 x23: ffffffc010cbb480 x22: 0000000000000000 x21: 0000000000000000 x20: ffffffff22e90900 x19: 0000000000000000 x18: ffffffc010c5d080 x17: 0000000000000000 x16: 0000000000000020 x15: ffffffdb1acdbb88 x14: ffffff888759e2b0 x13: 0000000000000000 x12: ffffff802da49000 x11: 000000000a001200 x10: ffffff8803e7ed40 x9 : ffffff8023195800 x8 : ffffff802da49078 x7 : 0000000000000001 x6 : 0000000000000000 x5 : 0000000000000006 x4 : ffffffc010cbba28 x3 : 0000000000000000 x2 : ffffffc010cbb480 x1 : 0000000000000000 x0 : ffffff8803e7ed40 Call trace: __lookup_extent_tree+0xd8/0x760 f2fs_do_write_data_page+0x104/0x87c f2fs_write_single_data_page+0x420/0xb60 f2fs_write_cache_pages+0x418/0xb1c __f2fs_write_data_pages+0x428/0x58c f2fs_write_data_pages+0x30/0x40 do_writepages+0x88/0x190 __writeback_single_inode+0x48/0x448 writeback_sb_inodes+0x468/0x9e8 __writeback_inodes_wb+0xb8/0x2a4 wb_writeback+0x33c/0x740 wb_do_writeback+0x2b4/0x400 wb_workfn+0xe4/0x34c process_one_work+0x24c/0x5bc worker_thread+0x3e8/0xa50 kthread+0x150/0x1b4 Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Mikulas Patocka authored
[ Upstream commit 55d23536 ] Fix a warning: "found `movsd'; assuming `movsl' was meant" Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Qu Wenruo authored
[ Upstream commit 39f501d6 ] Currently we have a btrfs_debug() for run_one_delayed_ref() failure, but if end users hit such problem, there will be no chance that btrfs_debug() is enabled. This can lead to very little useful info for debugging. This patch will: - Add extra info for error reporting Including: * logical bytenr * num_bytes * type * action * ref_mod - Replace the btrfs_debug() with btrfs_err() - Move the error reporting into run_one_delayed_ref() This is to avoid use-after-free, the @node can be freed in the caller. This error should only be triggered at most once. As if run_one_delayed_ref() failed, we trigger the error message, then causing the call chain to error out: btrfs_run_delayed_refs() `- btrfs_run_delayed_refs() `- btrfs_run_delayed_refs_for_head() `- run_one_delayed_ref() And we will abort the current transaction in btrfs_run_delayed_refs(). If we have to run delayed refs for the abort transaction, run_one_delayed_ref() will just cleanup the refs and do nothing, thus no new error messages would be output. Reviewed-by:
Anand Jain <anand.jain@oracle.com> Signed-off-by:
Qu Wenruo <wqu@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jiri Slaby (SUSE) authored
[ Upstream commit 56c5dab2 ] Since gcc13, each member of an enum has the same type as the enum [1]. And that is inherited from its members. Provided these two: SRP_TAG_NO_REQ = ~0U, SRP_TAG_TSK_MGMT = 1U << 31 all other members are unsigned ints. Esp. with SRP_MAX_SGE and SRP_TSK_MGMT_SQ_SIZE and their use in min(), this results in the following warnings: include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast drivers/infiniband/ulp/srp/ib_srp.c:563:42: note: in expansion of macro 'min' include/linux/minmax.h:20:35: error: comparison of distinct pointer types lacks a cast drivers/infiniband/ulp/srp/ib_srp.c:2369:27: note: in expansion of macro 'min' So move the large values away to a separate enum, so that they don't affect other members. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=36113 Link: https://lore.kernel.org/r/20221212120411.13750-1-jirislaby@kernel.org Signed-off-by:
Jiri Slaby (SUSE) <jirislaby@kernel.org> Reviewed-by:
Bart Van Assche <bvanassche@acm.org> Signed-off-by:
Leon Romanovsky <leon@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Daniil Tatianin authored
[ Upstream commit 9deb1e9f ] It's not very useful to copy back an empty ethtool_stats struct and return 0 if we didn't actually have any stats. This also allows for further simplification of this function in the future commits. Signed-off-by:
Daniil Tatianin <d-tatianin@yandex-team.ru> Reviewed-by:
Andrew Lunn <andrew@lunn.ch> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ricardo Cañuelo authored
[ Upstream commit c262f75c ] The virtio_device vqs_list spinlocks must be initialized before use to prevent functions that manipulate the device virtualqueues, such as vring_new_virtqueue(), from blocking indefinitely. Signed-off-by:
Ricardo Cañuelo <ricardo.canuelo@collabora.com> Message-Id: <20221012062949.1526176-1-ricardo.canuelo@collabora.com> Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Reviewed-by:
Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Hao Sun authored
[ Upstream commit cedebd74 ] Verify that nullness information is not porpagated in the branches of register to register JEQ and JNE operations if one of them is PTR_TO_BTF_ID. Implement this in C level so we can use CO-RE. Signed-off-by:
Hao Sun <sunhao.th@gmail.com> Suggested-by:
Martin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221222024414.29539-2-sunhao.th@gmail.com Signed-off-by:
Martin KaFai Lau <martin.lau@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Olga Kornievskaia authored
[ Upstream commit a6b9d2fa ] When there is a single DS no striping constraints need to be placed on the IO. When such constraint is applied then buffered reads don't coalesce to the DS's rsize. Signed-off-by:
Olga Kornievskaia <kolga@netapp.com> Signed-off-by:
Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Naohiro Aota authored
[ Upstream commit 0a3212de ] Fix a typo of printing FLUSH_DELAYED_REFS event in flush_space() as FLUSH_ELAYED_REFS. Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by:
David Sterba <dsterba@suse.com> Signed-off-by:
David Sterba <dsterba@suse.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Jan 18, 2023
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20230116154743.577276578@linuxfoundation.org Tested-by:
Salvatore Bonaccorso <carnil@debian.org> Tested-by:
Shuah Khan <skhan@linuxfoundation.org> Tested-by:
Sudip Mukherjee <sudip.mukherjee@codethink.co.uk> Link: https://lore.kernel.org/r/20230117124526.766388541@linuxfoundation.org Tested-by:
Florian Fainelli <f.fainelli@gmail.com> Tested-by:
Guenter Roeck <linux@roeck-us.net> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ferry Toth authored
commit b659b613 upstream. This reverts commit 8a7b31d5. This patch results in some qemu test failures, specifically xilinx-zynq-a9 machine and zynq-zc702 as well as zynq-zed devicetree files, when trying to boot from USB drive. Link: https://lore.kernel.org/lkml/20221220194334.GA942039@roeck-us.net/ Fixes: 8a7b31d5 ("usb: ulpi: defer ulpi_register on ulpi_read_id timeout") Cc: stable@vger.kernel.org Reported-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Ferry Toth <ftoth@exalondelft.nl> Link: https://lore.kernel.org/r/20221222205302.45761-1-ftoth@exalondelft.nl Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
commit e6db6f93 upstream. We have two types of task_work based creation, one is using an existing worker to setup a new one (eg when going to sleep and we have no free workers), and the other is allocating a new worker. Only the latter should be freed when we cancel task_work creation for a new worker. Fixes: af82425c ("io_uring/io-wq: free worker if task_work creation is canceled") Reported-by:
<syzbot+d56ec896af3637bdb7e4@syzkaller.appspotmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jens Axboe authored
commit af82425c upstream. If we cancel the task_work, the worker will never come into existance. As this is the last reference to it, ensure that we get it freed appropriately. Cc: stable@vger.kernel.org Reported-by:
진호 <wnwlsgh98@gmail.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Rob Clark authored
[ Upstream commit 52531258 ] Userspace can guess the handle value and try to race GEM object creation with handle close, resulting in a use-after-free if we dereference the object after dropping the handle's reference. For that reason, dropping the handle's reference must be done *after* we are done dereferencing the object. Signed-off-by:
Rob Clark <robdclark@chromium.org> Reviewed-by:
Chia-I Wu <olvaffe@gmail.com> Fixes: 62fb7a5e ("virtio-gpu: add 3d/virgl support") Cc: stable@vger.kernel.org Signed-off-by:
Dmitry Osipenko <dmitry.osipenko@collabora.com> Link: https://patchwork.freedesktop.org/patch/msgid/20221216233355.542197-2-robdclark@gmail.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Johan Hovold authored
[ Upstream commit 703c13fe ] In cases where runtime services are not supported or have been disabled, the runtime services workqueue will never have been allocated. Do not try to destroy the workqueue unconditionally in the unlikely event that EFI initialisation fails to avoid dereferencing a NULL pointer. Fixes: 98086df8 ("efi: add missed destroy_workqueue when efisubsys_init fails") Cc: stable@vger.kernel.org Cc: Li Heng <liheng40@huawei.com> Signed-off-by:
Johan Hovold <johan+linaro@kernel.org> Signed-off-by:
Ard Biesheuvel <ardb@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Mark Rutland authored
[ Upstream commit 031af500 ] The inline assembly for arm64's cmpxchg_double*() implementations use a +Q constraint to hazard against other accesses to the memory location being exchanged. However, the pointer passed to the constraint is a pointer to unsigned long, and thus the hazard only applies to the first 8 bytes of the location. GCC can take advantage of this, assuming that other portions of the location are unchanged, leading to a number of potential problems. This is similar to what we fixed back in commit: fee960be ("arm64: xchg: hazard against entire exchange variable") ... but we forgot to adjust cmpxchg_double*() similarly at the same time. The same problem applies, as demonstrated with the following test: | struct big { | u64 lo, hi; | } __aligned(128); | | unsigned long foo(struct big *b) | { | u64 hi_old, hi_new; | | hi_old = b->hi; | cmpxchg_double_local(&b->lo, &b->hi, 0x12, 0x34, 0x56, 0x78); | hi_new = b->hi; | | return hi_old ^ hi_new; | } ... which GCC 12.1.0 compiles as: | 0000000000000000 <foo>: | 0: d503233f paciasp | 4: aa0003e4 mov x4, x0 | 8: 1400000e b 40 <foo+0x40> | c: d2800240 mov x0, #0x12 // #18 | 10: d2800681 mov x1, #0x34 // #52 | 14: aa0003e5 mov x5, x0 | 18: aa0103e6 mov x6, x1 | 1c: d2800ac2 mov x2, #0x56 // #86 | 20: d2800f03 mov x3, #0x78 // #120 | 24: 48207c82 casp x0, x1, x2, x3, [x4] | 28: ca050000 eor x0, x0, x5 | 2c: ca060021 eor x1, x1, x6 | 30: aa010000 orr x0, x0, x1 | 34: d2800000 mov x0, #0x0 // #0 <--- BANG | 38: d50323bf autiasp | 3c: d65f03c0 ret | 40: d2800240 mov x0, #0x12 // #18 | 44: d2800681 mov x1, #0x34 // #52 | 48: d2800ac2 mov x2, #0x56 // #86 | 4c: d2800f03 mov x3, #0x78 // #120 | 50: f9800091 prfm pstl1strm, [x4] | 54: c87f1885 ldxp x5, x6, [x4] | 58: ca0000a5 eor x5, x5, x0 | 5c: ca0100c6 eor x6, x6, x1 | 60: aa0600a6 orr x6, x5, x6 | 64: b5000066 cbnz x6, 70 <foo+0x70> | 68: c8250c82 stxp w5, x2, x3, [x4] | 6c: 35ffff45 cbnz w5, 54 <foo+0x54> | 70: d2800000 mov x0, #0x0 // #0 <--- BANG | 74: d50323bf autiasp | 78: d65f03c0 ret Notice that at the lines with "BANG" comments, GCC has assumed that the higher 8 bytes are unchanged by the cmpxchg_double() call, and that `hi_old ^ hi_new` can be reduced to a constant zero, for both LSE and LL/SC versions of cmpxchg_double(). This patch fixes the issue by passing a pointer to __uint128_t into the +Q constraint, ensuring that the compiler hazards against the entire 16 bytes being modified. With this change, GCC 12.1.0 compiles the above test as: | 0000000000000000 <foo>: | 0: f9400407 ldr x7, [x0, #8] | 4: d503233f paciasp | 8: aa0003e4 mov x4, x0 | c: 1400000f b 48 <foo+0x48> | 10: d2800240 mov x0, #0x12 // #18 | 14: d2800681 mov x1, #0x34 // #52 | 18: aa0003e5 mov x5, x0 | 1c: aa0103e6 mov x6, x1 | 20: d2800ac2 mov x2, #0x56 // #86 | 24: d2800f03 mov x3, #0x78 // #120 | 28: 48207c82 casp x0, x1, x2, x3, [x4] | 2c: ca050000 eor x0, x0, x5 | 30: ca060021 eor x1, x1, x6 | 34: aa010000 orr x0, x0, x1 | 38: f9400480 ldr x0, [x4, #8] | 3c: d50323bf autiasp | 40: ca0000e0 eor x0, x7, x0 | 44: d65f03c0 ret | 48: d2800240 mov x0, #0x12 // #18 | 4c: d2800681 mov x1, #0x34 // #52 | 50: d2800ac2 mov x2, #0x56 // #86 | 54: d2800f03 mov x3, #0x78 // #120 | 58: f9800091 prfm pstl1strm, [x4] | 5c: c87f1885 ldxp x5, x6, [x4] | 60: ca0000a5 eor x5, x5, x0 | 64: ca0100c6 eor x6, x6, x1 | 68: aa0600a6 orr x6, x5, x6 | 6c: b5000066 cbnz x6, 78 <foo+0x78> | 70: c8250c82 stxp w5, x2, x3, [x4] | 74: 35ffff45 cbnz w5, 5c <foo+0x5c> | 78: f9400480 ldr x0, [x4, #8] | 7c: d50323bf autiasp | 80: ca0000e0 eor x0, x7, x0 | 84: d65f03c0 ret ... sampling the high 8 bytes before and after the cmpxchg, and performing an EOR, as we'd expect. For backporting, I've tested this atop linux-4.9.y with GCC 5.5.0. Note that linux-4.9.y is oldest currently supported stable release, and mandates GCC 5.1+. Unfortunately I couldn't get a GCC 5.1 binary to run on my machines due to library incompatibilities. I've also used a standalone test to check that we can use a __uint128_t pointer in a +Q constraint at least as far back as GCC 4.8.5 and LLVM 3.9.1. Fixes: 5284e1b4 ("arm64: xchg: Implement cmpxchg_double") Fixes: e9a4b795 ("arm64: cmpxchg_dbl: patch in lse instructions when supported by the CPU") Reported-by:
Boqun Feng <boqun.feng@gmail.com> Link: https://lore.kernel.org/lkml/Y6DEfQXymYVgL3oJ@boqun-archlinux/ Reported-by:
Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y6GXoO4qmH9OIZ5Q@hirez.programming.kicks-ass.net/ Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: stable@vger.kernel.org Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20230104151626.3262137-1-mark.rutland@arm.com Signed-off-by:
Will Deacon <will@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Mark Rutland authored
[ Upstream commit b2c3ccbd ] When CONFIG_ARM64_LSE_ATOMICS=y, each use of an LL/SC atomic results in a fragment of code being generated in a subsection without a clear association with its caller. A trampoline in the caller branches to the LL/SC atomic with with a direct branch, and the atomic directly branches back into its trampoline. This breaks backtracing, as any PC within the out-of-line fragment will be symbolized as an offset from the nearest prior symbol (which may not be the function using the atomic), and since the atomic returns with a direct branch, the caller's PC may be missing from the backtrace. For example, with secondary_start_kernel() hacked to contain atomic_inc(NULL), the resulting exception can be reported as being taken from cpus_are_stuck_in_kernel(): | Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 | Mem abort info: | ESR = 0x0000000096000004 | EC = 0x25: DABT (current EL), IL = 32 bits | SET = 0, FnV = 0 | EA = 0, S1PTW = 0 | FSC = 0x04: level 0 translation fault | Data abort info: | ISV = 0, ISS = 0x00000004 | CM = 0, WnR = 0 | [0000000000000000] user address but active_mm is swapper | Internal error: Oops: 96000004 [#1] PREEMPT SMP | Modules linked in: | CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.19.0-11219-geb555cb5b794-dirty #3 | Hardware name: linux,dummy-virt (DT) | pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : cpus_are_stuck_in_kernel+0xa4/0x120 | lr : secondary_start_kernel+0x164/0x170 | sp : ffff80000a4cbe90 | x29: ffff80000a4cbe90 x28: 0000000000000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: 0000000000000000 x22: 0000000000000000 x21: 0000000000000000 | x20: 0000000000000001 x19: 0000000000000001 x18: 0000000000000008 | x17: 3030383832343030 x16: 3030303030307830 x15: ffff80000a4cbab0 | x14: 0000000000000001 x13: 5d31666130663133 x12: 3478305b20313030 | x11: 3030303030303078 x10: 3020726f73736563 x9 : 726f737365636f72 | x8 : ffff800009ff2ef0 x7 : 0000000000000003 x6 : 0000000000000000 | x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000100 | x2 : 0000000000000000 x1 : ffff0000029bd880 x0 : 0000000000000000 | Call trace: | cpus_are_stuck_in_kernel+0xa4/0x120 | __secondary_switched+0xb0/0xb4 | Code: 35ffffa3 17fffc6c d53cd040 f9800011 (885f7c01) | ---[ end trace 0000000000000000 ]--- This is confusing and hinders debugging, and will be problematic for CONFIG_LIVEPATCH as these cases cannot be unwound reliably. This is very similar to recent issues with out-of-line exception fixups, which were removed in commits: 35d67794 ("arm64: lib: __arch_clear_user(): fold fixups into body") 4012e0e2 ("arm64: lib: __arch_copy_from_user(): fold fixups into body") 139f9ab7 ("arm64: lib: __arch_copy_to_user(): fold fixups into body") When the trampolines were introduced in commit: addfc386 ("arm64: atomics: avoid out-of-line ll/sc atomics") The rationale was to improve icache performance by grouping the LL/SC atomics together. This has never been measured, and this theoretical benefit is outweighed by other factors: * As the subsections are collapsed into sections at object file granularity, these are spread out throughout the kernel and can share cachelines with unrelated code regardless. * GCC 12.1.0 has been observed to place the trampoline out-of-line in specialised __ll_sc_*() functions, introducing more branching than was intended. * Removing the trampolines has been observed to shrink a defconfig kernel Image by 64KiB when building with GCC 12.1.0. This patch removes the LL/SC trampolines, meaning that the LL/SC atomics will be inlined into their callers (or placed in out-of line functions using regular BL/RET pairs). When CONFIG_ARM64_LSE_ATOMICS=y, the LL/SC atomics are always called in an unlikely branch, and will be placed in a cold portion of the function, so this should have minimal impact to the hot paths. Other than the improved backtracing, there should be no functional change as a result of this patch. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Link: https://lore.kernel.org/r/20220817155914.3975112-2-mark.rutland@arm.com Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Stable-dep-of: 031af500 ("arm64: cmpxchg_double*: hazard against entire exchange variable") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Mark Rutland authored
[ Upstream commit 8e6082e9 ] The code for the atomic ops is formatted inconsistently, and while this is not a functional problem it is rather distracting when working on them. Some have ops have consistent indentation, e.g. | #define ATOMIC_OP_ADD_RETURN(name, mb, cl...) \ | static inline int __lse_atomic_add_return##name(int i, atomic_t *v) \ | { \ | u32 tmp; \ | \ | asm volatile( \ | __LSE_PREAMBLE \ | " ldadd" #mb " %w[i], %w[tmp], %[v]\n" \ | " add %w[i], %w[i], %w[tmp]" \ | : [i] "+r" (i), [v] "+Q" (v->counter), [tmp] "=&r" (tmp) \ | : "r" (v) \ | : cl); \ | \ | return i; \ | } While others have negative indentation for some lines, and/or have misaligned trailing backslashes, e.g. | static inline void __lse_atomic_##op(int i, atomic_t *v) \ | { \ | asm volatile( \ | __LSE_PREAMBLE \ | " " #asm_op " %w[i], %[v]\n" \ | : [i] "+r" (i), [v] "+Q" (v->counter) \ | : "r" (v)); \ | } This patch makes the indentation consistent and also aligns the trailing backslashes. This makes the code easier to read for those (like myself) who are easily distracted by these inconsistencies. This is intended as a cleanup. There should be no functional change as a result of this patch. Signed-off-by:
Mark Rutland <mark.rutland@arm.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Acked-by:
Will Deacon <will@kernel.org> Acked-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20211210151410.2782645-2-mark.rutland@arm.com Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Stable-dep-of: 031af500 ("arm64: cmpxchg_double*: hazard against entire exchange variable") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Peter Newman authored
[ Upstream commit fe1f0714 ] When the user moves a running task to a new rdtgroup using the task's file interface or by deleting its rdtgroup, the resulting change in CLOSID/RMID must be immediately propagated to the PQR_ASSOC MSR on the task(s) CPUs. x86 allows reordering loads with prior stores, so if the task starts running between a task_curr() check that the CPU hoisted before the stores in the CLOSID/RMID update then it can start running with the old CLOSID/RMID until it is switched again because __rdtgroup_move_task() failed to determine that it needs to be interrupted to obtain the new CLOSID/RMID. Refer to the diagram below: CPU 0 CPU 1 ----- ----- __rdtgroup_move_task(): curr <- t1->cpu->rq->curr __schedule(): rq->curr <- t1 resctrl_sched_in(): t1->{closid,rmid} -> {1,1} t1->{closid,rmid} <- {2,2} if (curr == t1) // false IPI(t1->cpu) A similar race impacts rdt_move_group_tasks(), which updates tasks in a deleted rdtgroup. In both cases, use smp_mb() to order the task_struct::{closid,rmid} stores before the loads in task_curr(). In particular, in the rdt_move_group_tasks() case, simply execute an smp_mb() on every iteration with a matching task. It is possible to use a single smp_mb() in rdt_move_group_tasks(), but this would require two passes and a means of remembering which task_structs were updated in the first loop. However, benchmarking results below showed too little performance impact in the simple approach to justify implementing the two-pass approach. Times below were collected using `perf stat` to measure the time to remove a group containing a 1600-task, parallel workload. CPU: Intel(R) Xeon(R) Platinum P-8136 CPU @ 2.00GHz (112 threads) # mkdir /sys/fs/resctrl/test # echo $$ > /sys/fs/resctrl/test/tasks # perf bench sched messaging -g 40 -l 100000 task-clock time ranges collected using: # perf stat rmdir /sys/fs/resctrl/test Baseline: 1.54 - 1.60 ms smp_mb() every matching task: 1.57 - 1.67 ms [ bp: Massage commit message. ] Fixes: ae28d1aa ("x86/resctrl: Use an IPI instead of task_work_add() to update PQR_ASSOC MSR") Fixes: 0efc89be ("x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount") Signed-off-by:
Peter Newman <peternewman@google.com> Signed-off-by:
Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by:
Reinette Chatre <reinette.chatre@intel.com> Reviewed-by:
Babu Moger <babu.moger@amd.com> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20221220161123.432120-1-peternewman@google.com Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Reinette Chatre authored
[ Upstream commit e0ad6dc8 ] James reported in [1] that there could be two tasks running on the same CPU with task_struct->on_cpu set. Using task_struct->on_cpu as a test if a task is running on a CPU may thus match the old task for a CPU while the scheduler is running and IPI it unnecessarily. task_curr() is the correct helper to use. While doing so move the #ifdef check of the CONFIG_SMP symbol to be a C conditional used to determine if this helper should be used to ensure the code is always checked for correctness by the compiler. [1] https://lore.kernel.org/lkml/a782d2f3-d2f6-795f-f4b1-9462205fd581@arm.com Reported-by:
James Morse <james.morse@arm.com> Signed-off-by:
Reinette Chatre <reinette.chatre@intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/e9e68ce1441a73401e08b641cc3b9a3cf13fe6d4.1608243147.git.reinette.chatre@intel.com Stable-dep-of: fe1f0714 ("x86/resctrl: Fix task CLOSID/RMID update race") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Paolo Bonzini authored
[ Upstream commit 45e966fc ] Passing the host topology to the guest is almost certainly wrong and will confuse the scheduler. In addition, several fields of these CPUID leaves vary on each processor; it is simply impossible to return the right values from KVM_GET_SUPPORTED_CPUID in such a way that they can be passed to KVM_SET_CPUID2. The values that will most likely prevent confusion are all zeroes. Userspace will have to override it anyway if it wishes to present a specific topology to the guest. Cc: stable@vger.kernel.org Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Paolo Bonzini authored
[ Upstream commit cde363ab ] Add a section to document all the different ways in which the KVM API sucks. I am sure there are way more, give people a place to vent so that userspace authors are aware. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20220322110712.222449-4-pbonzini@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Christophe JAILLET authored
[ Upstream commit 142e821f ] A clk, prepared and enabled in mtk_iommu_v1_hw_init(), is not released in the error handling path of mtk_iommu_v1_probe(). Add the corresponding clk_disable_unprepare(), as already done in the remove function. Fixes: b17336c5 ("iommu/mediatek: add support for mtk iommu generation one HW") Signed-off-by:
Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by:
Yong Wu <yong.wu@mediatek.com> Reviewed-by:
AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Reviewed-by:
Matthias Brugger <matthias.bgg@gmail.com> Link: https://lore.kernel.org/r/593e7b7d97c6e064b29716b091a9d4fd122241fb.1671473163.git.christophe.jaillet@wanadoo.fr Signed-off-by:
Joerg Roedel <jroedel@suse.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Yong Wu authored
[ Upstream commit ac304c07 ] In the original code, we lack the error handle. This patch adds them. Signed-off-by:
Yong Wu <yong.wu@mediatek.com> Link: https://lore.kernel.org/r/20210412064843.11614-2-yong.wu@mediatek.com Signed-off-by:
Joerg Roedel <jroedel@suse.de> Stable-dep-of: 142e821f ("iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-