- Oct 15, 2022
-
-
Lukas Straub authored
[ Upstream commit 61670b4d ] Like in f4f03f29 "um: Cleanup syscall_handler_t definition/cast, fix warning", remove the cast to to fix the compiler warning. Signed-off-by:
Lukas Straub <lukasstraub2@web.de> Acked-by: Randy Dunlap <rdunlap@infradead.org> # build-tested Signed-off-by:
Richard Weinberger <richard@nod.at> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Haimin Zhang authored
[ Upstream commit 94160108 ] There is uninit value bug in dgram_sendmsg function in net/ieee802154/socket.c when the length of valid data pointed by the msg->msg_name isn't verified. We introducing a helper function ieee802154_sockaddr_check_size to check namelen. First we check there is addr_type in ieee802154_addr_sa. Then, we check namelen according to addr_type. Also fixed in raw_bind, dgram_bind, dgram_connect. Signed-off-by:
Haimin Zhang <tcs_kernel@tencent.com> Signed-off-by:
David S. Miller <davem@davemloft.net> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Letu Ren authored
[ Upstream commit fbfe9686 ] In __qedf_probe(), if qedf->cdev is NULL which means qed_ops->common->probe() failed, then the program will goto label err1, and scsi_host_put() will free lport->host pointer. Because the memory qedf points to is allocated by libfc_host_alloc(), it will be freed by scsi_host_put(). However, the if statement below label err0 only checks whether qedf is NULL but doesn't check whether the memory has been freed. So a UAF bug can occur. There are two ways to reach the statements below err0. The first one is described as before, "qedf" should be set to NULL. The second one is goto "err0" directly. In the latter scenario qedf hasn't been changed and it has the initial value NULL. As a result the if statement is not reachable in any situation. The KASAN logs are as follows: [ 2.312969] BUG: KASAN: use-after-free in __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] [ 2.312969] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 [ 2.312969] Call Trace: [ 2.312969] dump_stack_lvl+0x59/0x7b [ 2.312969] print_address_description+0x7c/0x3b0 [ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] __kasan_report+0x160/0x1c0 [ 2.312969] ? __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] kasan_report+0x4b/0x70 [ 2.312969] ? kobject_put+0x25d/0x290 [ 2.312969] kasan_check_range+0x2ca/0x310 [ 2.312969] __qedf_probe+0x5dcf/0x6bc0 [ 2.312969] ? selinux_kernfs_init_security+0xdc/0x5f0 [ 2.312969] ? trace_rpm_return_int_rcuidle+0x18/0x120 [ 2.312969] ? rpm_resume+0xa5c/0x16e0 [ 2.312969] ? qedf_get_generic_tlv_data+0x160/0x160 [ 2.312969] local_pci_probe+0x13c/0x1f0 [ 2.312969] pci_device_probe+0x37e/0x6c0 Link: https://lore.kernel.org/r/20211112120641.16073-1-fantasquex@gmail.com Reported-by:
Zheyu Ma <zheyuma97@gmail.com> Acked-by:
Saurav Kashyap <skashyap@marvell.com> Co-developed-by:
Wende Tan <twd2.me@gmail.com> Signed-off-by:
Wende Tan <twd2.me@gmail.com> Signed-off-by:
Letu Ren <fantasquex@gmail.com> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Sergei Antonov authored
[ Upstream commit 02181e68 ] Driver moxart-mmc.c has .compatible = "moxa,moxart-mmc". But moxart .dts/.dtsi and the documentation file moxa,moxart-dma.txt contain compatible = "moxa,moxart-sdhci". Change moxart .dts/.dtsi files and moxa,moxart-dma.txt to match the driver. Replace 'sdhci' with 'mmc' in names too, since SDHCI is a different controller from FTSDC010. Suggested-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Sergei Antonov <saproj@gmail.com> Cc: Jonas Jensen <jonas.jensen@gmail.com> Link: https://lore.kernel.org/r/20220907175341.1477383-1-saproj@gmail.com ' Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Swati Agarwal authored
[ Upstream commit 8f2b6bc7 ] The driver does not handle the failure case while calling dma_set_mask_and_coherent API. In case of failure, capture the return value of API and then report an error. Addresses-coverity: Unchecked return value (CHECKED_RETURN) Signed-off-by:
Swati Agarwal <swati.agarwal@xilinx.com> Reviewed-by:
Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Link: https://lore.kernel.org/r/20220817061125.4720-4-swati.agarwal@xilinx.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Swati Agarwal authored
[ Upstream commit 462bce79 ] Free the allocated resources for missing xlnx,num-fstores property. Signed-off-by:
Swati Agarwal <swati.agarwal@xilinx.com> Link: https://lore.kernel.org/r/20220817061125.4720-3-swati.agarwal@xilinx.com Signed-off-by:
Vinod Koul <vkoul@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Cristian Marussi authored
[ Upstream commit dea796fc ] Currently, when removing the SCMI PM driver not all the resources registered with genpd subsystem are properly de-registered. As a side effect of this after a driver unload/load cycle you get a splat with a few warnings like this: | debugfs: Directory 'BIG_CPU0' with parent 'pm_genpd' already present! | debugfs: Directory 'BIG_CPU1' with parent 'pm_genpd' already present! | debugfs: Directory 'LITTLE_CPU0' with parent 'pm_genpd' already present! | debugfs: Directory 'LITTLE_CPU1' with parent 'pm_genpd' already present! | debugfs: Directory 'LITTLE_CPU2' with parent 'pm_genpd' already present! | debugfs: Directory 'LITTLE_CPU3' with parent 'pm_genpd' already present! | debugfs: Directory 'BIG_SSTOP' with parent 'pm_genpd' already present! | debugfs: Directory 'LITTLE_SSTOP' with parent 'pm_genpd' already present! | debugfs: Directory 'DBGSYS' with parent 'pm_genpd' already present! | debugfs: Directory 'GPUTOP' with parent 'pm_genpd' already present! Add a proper scmi_pm_domain_remove callback to the driver in order to take care of all the needed cleanups not handled by devres framework. Link: https://lore.kernel.org/r/20220817172731.1185305-7-cristian.marussi@arm.com Signed-off-by:
Cristian Marussi <cristian.marussi@arm.com> Signed-off-by:
Sudeep Holla <sudeep.holla@arm.com> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Dongliang Mu authored
commit 2e488f13 upstream. In alloc_inode, inode_init_always() could return -ENOMEM if security_inode_alloc() fails, which causes inode->i_private uninitialized. Then nilfs_is_metadata_file_inode() returns true and nilfs_free_inode() wrongly calls nilfs_mdt_destroy(), which frees the uninitialized inode->i_private and leads to crashes(e.g., UAF/GPF). Fix this by moving security_inode_alloc just prior to this_cpu_inc(nr_inodes) Link: https://lkml.kernel.org/r/CAFcO6XOcf1Jj2SeGt=jJV59wmhESeSKpfR0omdFRq+J9nD1vfQ@mail.gmail.com Reported-by:
butt3rflyh4ck <butterflyhuangxx@gmail.com> Reported-by:
Hao Sun <sunhao.th@gmail.com> Reported-by:
Jiacheng Xu <stitch@zju.edu.cn> Reviewed-by:
Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by:
Dongliang Mu <mudongliangabcd@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexey Dobriyan authored
commit 128dbd78 upstream. strdup() prototype doesn't live in stdlib.h . Add limits.h for PATH_MAX definition as well. This fixes the build on Android. Signed-off-by:
Alexey Dobriyan (SK hynix) <adobriyan@gmail.com> Acked-by:
Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/YRukaQbrgDWhiwGr@localhost.localdomain Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Steven Price authored
commit 8782fb61 upstream. The mmap lock protects the page walker from changes to the page tables during the walk. However a read lock is insufficient to protect those areas which don't have a VMA as munmap() detaches the VMAs before downgrading to a read lock and actually tearing down PTEs/page tables. For users of walk_page_range() the solution is to simply call pte_hole() immediately without checking the actual page tables when a VMA is not present. We now never call __walk_page_range() without a valid vma. For walk_page_range_novma() the locking requirements are tightened to require the mmap write lock to be taken, and then walking the pgd directly with 'no_vma' set. This in turn means that all page walkers either have a valid vma, or it's that special 'novma' case for page table debugging. As a result, all the odd '(!walk->vma && !walk->no_vma)' tests can be removed. Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap") Reported-by:
Jann Horn <jannh@google.com> Signed-off-by:
Steven Price <steven.price@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> [manually backported. backport note: walk_page_range_novma() does not exist in 5.4, so I'm omitting it from the backport] Signed-off-by:
Jann Horn <jannh@google.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Oct 07, 2022
-
-
Greg Kroah-Hartman authored
Link: https://lore.kernel.org/r/20221005113210.255710920@linuxfoundation.org Reviewed-by:
Guenter Roeck <linux@roeck-us.net> Tested-by:
Jon Hunter <jonathanh@nvidia.com> Tested-by:
Linux Kernel Functional Testing <lkft@linaro.org> Tested-by:
Slade Watkins <srw@sladewatkins.net> Tested-by:
Allen Pais <apais@linux.microsoft.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Shuah Khan authored
commit 8bfdfa0d upstream. Update mediator information in the CoC interpretation document. Signed-off-by:
Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/20220901212319.56644-1-skhan@linuxfoundation.org Cc: stable@vger.kernel.org Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Sami Tolvanen authored
commit 21206351 upstream. We enable -Wcast-function-type globally in the kernel to warn about mismatching types in function pointer casts. Compilers currently warn only about ABI incompability with this flag, but Clang 16 will enable a stricter version of the check by default that checks for an exact type match. This will be very noisy in the kernel, so disable -Wcast-function-type-strict without W=1 until the new warnings have been addressed. Cc: stable@vger.kernel.org Link: https://reviews.llvm.org/D134831 Link: https://github.com/ClangBuiltLinux/linux/issues/1724 Suggested-by:
Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Sami Tolvanen <samitolvanen@google.com> Signed-off-by:
Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220930203310.4010564-1-samitolvanen@google.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Greg Kroah-Hartman authored
This reverts commit c89849ec which is commit 66f99628 upstream. It is reported to cause problems on 5.4.y so it should be reverted for now. Reported-by:
Shuah Khan <skhan@linuxfoundation.org> Link: https://lore.kernel.org/r/7af02bc3-c0f2-7326-e467-02549e88c9ce@linuxfoundation.org Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Sasha Levin <sashal@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
YueHaibing authored
commit b3531f5f upstream. fs/xfs/xfs_inode.c: In function 'xfs_itruncate_extents_flags': fs/xfs/xfs_inode.c:1523:8: warning: unused variable 'done' [-Wunused-variable] commit 4bbb04ab ("xfs: truncate should remove all blocks, not just to the end of the page cache") left behind this, so remove it. Fixes: 4bbb04ab ("xfs: truncate should remove all blocks, not just to the end of the page cache") Reported-by:
Hulk Robot <hulkci@huawei.com> Reported-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
YueHaibing <yuehaibing@huawei.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit 54027a49 upstream. Dan Carpenter pointed out that error is uninitialized. While there never should be an attr leaf block with zero entries, let's not leave that logic bomb there. Fixes: 0bb9d159 ("xfs: streamline xfs_attr3_leaf_inactive") Reported-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Allison Collins <allison.henderson@oracle.com> Reviewed-by:
Eric Sandeen <sandeen@redhat.com> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit 0bb9d159 upstream. Now that we know we don't have to take a transaction to stale the incore buffers for a remote value, get rid of the unnecessary memory allocation in the leaf walker and call the rmt_stale function directly. Flatten the loop while we're at it. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit a39f089a upstream. Move the abstract in-memory version of various btree block headers out of xfs_da_format.h as they aren't on-disk formats. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit e8db2aaf upstream. [Replaced XFS_IS_CORRUPT() calls with ASSERT() for 5.4.y backport] While running generic/103, I observed what looks like memory corruption and (with slub debugging turned on) a slub redzone warning on i386 when inactivating an inode with a 64k remote attr value. On a v5 filesystem, maximally sized remote attr values require one block more than 64k worth of space to hold both the remote attribute value header (64 bytes). On a 4k block filesystem this results in a 68k buffer; on a 64k block filesystem, this would be a 128k buffer. Note that even though we'll never use more than 65,600 bytes of this buffer, XFS_MAX_BLOCKSIZE is 64k. This is a problem because the definition of struct xfs_buf_log_format allows for XFS_MAX_BLOCKSIZE worth of dirty bitmap (64k). On i386 when we invalidate a remote attribute, xfs_trans_binval zeroes all 68k worth of the dirty map, writing right off the end of the log item and corrupting memory. We've gotten away with this on x86_64 for years because the compiler inserts a u32 padding on the end of struct xfs_buf_log_format. Fortunately for us, remote attribute values are written to disk with xfs_bwrite(), which is to say that they are not logged. Fix the problem by removing all places where we could end up creating a buffer log item for a remote attribute value and leave a note explaining why. Next, replace the open-coded buffer invalidation with a call to the helper we created in the previous patch that does better checking for bad metadata before marking the buffer stale. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit 8edbb26b upstream. [Replaced XFS_IS_CORRUPT() calls with ASSERT() for 5.4.y backport] Hoist the code that invalidates remote extended attribute value buffers into a separate helper function. This prepares us for a memory corruption fix in the next patch. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 7b53b868 upstream. Direct I/O reads can also be used with RWF_NOWAIT & co. Fix the inode locking in xfs_file_dio_aio_read to take IOCB_NOWAIT into account. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit 932befe3 upstream. I observed a hang in generic/308 while running fstests on a i686 kernel. The hang occurred when trying to purge the pagecache on a large sparse file that had a page created past MAX_LFS_FILESIZE, which caused an integer overflow in the pagecache xarray and resulted in an infinite loop. I then noticed that Linus changed the definition of MAX_LFS_FILESIZE in commit 0cc3b0ec ("Clarify (and fix) MAX_LFS_FILESIZE macros") so that it is now one page short of the maximum page index on 32-bit kernels. Because the XFS function to compute max offset open-codes the 2005-era MAX_LFS_FILESIZE computation and neither the vfs nor mm perform any sanity checking of s_maxbytes, the code in generic/308 can create a page above the pagecache's limit and kaboom. Fix all this by setting s_maxbytes to MAX_LFS_FILESIZE directly and aborting the mount with a warning if our assumptions ever break. I have no answer for why this seems to have been broken for years and nobody noticed. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit 4bbb04ab upstream. xfs_itruncate_extents_flags() is supposed to unmap every block in a file from EOF onwards. Oddly, it uses s_maxbytes as the upper limit to the bunmapi range, even though s_maxbytes reflects the highest offset the pagecache can support, not the highest offset that XFS supports. The result of this confusion is that if you create a 20T file on a 64-bit machine, mount the filesystem on a 32-bit machine, and remove the file, we leak everything above 16T. Fix this by capping the bunmapi request at the maximum possible block offset, not s_maxbytes. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Darrick J. Wong authored
commit a5084865 upstream. Introduce a new #define for the maximum supported file block offset. We'll use this in the next patch to make it more obvious that we're doing some operation for all possible inode fork mappings after a given offset. We can't use ULLONG_MAX here because bunmapi uses that to detect when it's done. Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Christoph Hellwig authored
commit 780d2905 upstream. XFS_ATTR_INCOMPLETE is a flag in the on-disk attribute format, and thus in a different namespace as the ATTR_* flags in xfs_da_args.flags. Switch to using a XFS_DA_OP_INCOMPLETE flag in op_flags instead. Without this users might be able to inject this flag into operations using the attr by handle ioctl. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by:
Darrick J. Wong <darrick.wong@oracle.com> Acked-by:
Darrick J. Wong <djwong@kernel.org> Signed-off-by:
Chandan Babu R <chandan.babu@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Daniel Sneddon authored
commit 2b129932 upstream. tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e., eIBRS systems which enable legacy IBRS explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] Signed-off-by:
Daniel Sneddon <daniel.sneddon@linux.intel.com> Co-developed-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> [cascardo: no intra-function validation] Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit eb23b5ef upstream. IBRS mitigation for spectre_v2 forces write to MSR_IA32_SPEC_CTRL at every kernel entry/exit. On Enhanced IBRS parts setting MSR_IA32_SPEC_CTRL[IBRS] only once at boot is sufficient. MSR writes at every kernel entry/exit incur unnecessary performance loss. When Enhanced IBRS feature is present, print a warning about this unnecessary performance loss. Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/2a5eaf54583c2bfe0edc4fea64006656256cca17.1657814857.git.pawan.kumar.gupta@linux.intel.com Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Nathan Chancellor authored
commit db886979 upstream. Clang warns: arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection] DEFINE_PER_CPU(u64, x86_spec_ctrl_current); ^ arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here extern u64 x86_spec_ctrl_current; ^ 1 error generated. The declaration should be using DECLARE_PER_CPU instead so all attributes stay in sync. Cc: stable@vger.kernel.org Fixes: fc02735b ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") Reported-by:
kernel test robot <lkp@intel.com> Signed-off-by:
Nathan Chancellor <nathan@kernel.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit 4ad3278d upstream. Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> [cascardo: no tools/arch/x86/include/asm/msr-index.h] Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Pawan Gupta authored
commit f54d4537 upstream. Cannon lake is also affected by RETBleed, add it to the list. Fixes: 6ad0ad2b ("x86/bugs: Report Intel retbleed vulnerability") Signed-off-by:
Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrew Cooper authored
commit 26aae8cc upstream. BTC_NO indicates that hardware is not susceptible to Branch Type Confusion. Zen3 CPUs don't suffer BTC. Hypervisors are expected to synthesise BTC_NO when it is appropriate given the migration pool, to prevent kernels using heuristics. [ bp: Massage. ] Signed-off-by:
Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Zijlstra authored
commit 7a05bc95 upstream. The whole MMIO/RETBLEED enumeration went overboard on steppings. Get rid of all that and simply use ANY. If a future stepping of these models would not be affected, it had better set the relevant ARCH_CAP_$FOO_NO bit in IA32_ARCH_CAPABILITIES. Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Acked-by:
Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit 9756bba2 upstream. Prevent RSB underflow/poisoning attacks with RSB. While at it, add a bunch of comments to attempt to document the current state of tribal knowledge about RSB attacks and what exactly is being mitigated. Signed-off-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit bea7e31a upstream. For legacy IBRS to work, the IBRS bit needs to be always re-written after vmexit, even if it's already on. Signed-off-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit fc02735b upstream. On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thadeu Lima de Souza Cascardo authored
commit bb066506 upstream. Convert __vmx_vcpu_run()'s 'launched' argument to 'flags', in preparation for doing SPEC_CTRL handling immediately after vmexit, which will need another flag. This is much easier than adding a fourth argument, because this code supports both 32-bit and 64-bit, and the fourth argument on 32-bit would have to be pushed on the stack. Note that __vmx_vcpu_run_flags() is called outside of the noinstr critical section because it will soon start calling potentially traceable functions. Signed-off-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit 8bd200d2 upstream. Move the vmx_vm{enter,exit}() functionality into __vmx_vcpu_run(). This will make it easier to do the spec_ctrl handling before the first RET. Signed-off-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by:
Borislav Petkov <bp@suse.de> [cascardo: remove ENDBR] Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Ben Hutchings <ben@decadent.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org> [cascardo: no unwinding save/restore] Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uros Bizjak authored
commit 150f17bf upstream. Replace inline assembly in nested_vmx_check_vmentry_hw with a call to __vmx_vcpu_run. The function is not performance critical, so (double) GPR save/restore in __vmx_vcpu_run can be tolerated, as far as performance effects are concerned. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Reviewed-and-tested-by:
Sean Christopherson <seanjc@google.com> Signed-off-by:
Uros Bizjak <ubizjak@gmail.com> [sean: dropped versioning info from changelog] Signed-off-by:
Sean Christopherson <seanjc@google.com> Message-Id: <20201231002702.2223707-5-seanjc@google.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> [cascardo: small fixups] Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Uros Bizjak authored
commit 6c44221b upstream. Saves one byte in __vmx_vcpu_run for the same functionality. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by:
Uros Bizjak <ubizjak@gmail.com> Message-Id: <20201029140457.126965-1-ubizjak@gmail.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Josh Poimboeuf authored
commit acac5e98 upstream. This mask has been made redundant by kvm_spec_ctrl_test_value(). And it doesn't even work when MSR interception is disabled, as the guest can just write to SPEC_CTRL directly. Signed-off-by:
Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by:
Borislav Petkov <bp@suse.de> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-