Commits · 3f4c97417e3c74ffe3d816e58c0a21bfd8ca892e · CodeLinaro / la / kernel / msm

Oct 25, 2022

tools headers UAPI: Sync linux/kvm.h with the kernel sources · 3f4c9741

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 7fe718fb
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Sun May 9 09:39:02 2021 -0300

    tools headers UAPI: Sync linux/kvm.h with the kernel sources

    To pick the changes in:

      bfbab445 ("KVM: arm64: Implement PSCI SYSTEM_SUSPEND")
      7b33a09d ("KVM: arm64: Add support for userspace to suspend a vCPU")
      ffbb61d0 ("KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl.")
      661a20fa ("KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND")
      fde0451b ("KVM: x86/xen: Support per-vCPU event channel upcall via local APIC")
      28d1629f ("KVM: x86/xen: Kernel acceleration for XENVER_version")
      53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
      942c2490 ("KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID")
      2fd6df2f ("KVM: x86/xen: intercept EVTCHNOP_send from guests")
      35025735 ("KVM: x86/xen: Support direct injection of event channel events")

    That automatically adds support for this new ioctl:

      $ tools/perf/trace/beauty/kvm_ioctl.sh > before
      $ cp include/uapi/linux/kvm.h tools/include/uapi/linux/kvm.h
      $ tools/perf/trace/beauty/kvm_ioctl.sh > after
      $ diff -u before after
      --- before    2022-06-28 12:13:07.281150509 -0300
      +++ after     2022-06-28 12:13:16.423392896 -0300
      @@ -98,6 +98,7 @@
            [0xcc] = "GET_SREGS2",
            [0xcd] = "SET_SREGS2",
            [0xce] = "GET_STATS_FD",
      +     [0xd0] = "XEN_HVM_EVTCHN_SEND",
            [0xe0] = "CREATE_DEVICE",
            [0xe1] = "SET_DEVICE_ATTR",
            [0xe2] = "GET_DEVICE_ATTR",
      $

    This silences these perf build warning:

      Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
      diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Ian Rogers <irogers@google.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Joao Martins <joao.m.martins@oracle.com>
    Cc: Marc Zyngier <maz@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Oliver Upton <oupton@google.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Link: http://lore.kernel.org/lkml/Yrs4RE+qfgTaWdAt@kernel.org


Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

3f4c9741

tools headers UAPI: Sync linux/kvm.h with the kernel sources · a75b83cc

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 474e76c4
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Sun May 9 09:39:02 2021 -0300

    tools headers UAPI: Sync linux/kvm.h with the kernel sources

    To pick the changes in:

      d495f942 ("KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT")

    That just rebuilds perf, as these patches don't add any new KVM ioctl to
    be harvested for the the 'perf trace' ioctl syscall argument
    beautifiers.

    This is also by now used by tools/testing/selftests/kvm/, a simple test
    build succeeded.

    This silences this perf build warning:

      Warning: Kernel ABI header at 'tools/include/uapi/linux/kvm.h' differs from latest version at 'include/uapi/linux/kvm.h'
      diff -u tools/include/uapi/linux/kvm.h include/uapi/linux/kvm.h

    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Cc: Jiri Olsa <jolsa@kernel.org>
    Cc: Namhyung Kim <namhyung@kernel.org>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Link: http://lore.kernel.org/lkml/YnE5BIweGmCkpOTN@kernel.org


Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

a75b83cc

tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources · 44220d02

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 2e323f36
Author: Arnaldo Carvalho de Melo <acme@redhat.com>
Date:   Fri Sep 10 11:46:54 2021 -0300

    tools headers UAPI: Sync x86's asm/kvm.h with the kernel sources

    To pick the changes in:

      f1a9761f ("KVM: x86: Allow userspace to opt out of hypercall patching")

    That just rebuilds kvm-stat.c on x86, no change in functionality.

    This silences these perf build warning:

      Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/kvm.h' differs from latest version at 'arch/x86/include/uapi/asm/kvm.h'
      diff -u tools/arch/x86/include/uapi/asm/kvm.h arch/x86/include/uapi/asm/kvm.h

    Cc: Oliver Upton <oupton@google.com>
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Link: https://lore.kernel.org/lkml/Yq8qgiMwRcl9ds+f@kernel.org


Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

44220d02

x86/kvm: Simplify FOP_SETCC() · 4720a8f2

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 22472d12
Author: Josh Poimboeuf <jpoimboe@kernel.org>
Date:   Thu Aug 18 08:53:42 2022 -0700

    x86/kvm: Simplify FOP_SETCC()

    SETCC_ALIGN and FOP_ALIGN are both 16.  Remove the special casing for
    FOP_SETCC() and just make it a normal fastop.

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

4720a8f2

KVM: x86: check validity of argument to KVM_SET_MP_STATE · bfd1786d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 22c6a0ef
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Thu Aug 11 12:41:25 2022 -0400

    KVM: x86: check validity of argument to KVM_SET_MP_STATE

    An invalid argument to KVM_SET_MP_STATE has no effect other than making the
    vCPU fail to run at the next KVM_RUN.  Since it is extremely unlikely that
    any userspace is relying on it, fail with -EINVAL just like for other
    architectures.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

bfd1786d

KVM: x86: fix memoryleak in kvm_arch_vcpu_create() · 627041ac

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 3c0ba05c
Author: Miaohe Lin <linmiaohe@huawei.com>
Date:   Thu Sep 1 20:23:00 2022 +0800

    KVM: x86: fix memoryleak in kvm_arch_vcpu_create()

    When allocating memory for mci_ctl2_banks fails, KVM doesn't release
    mce_banks leading to memoryleak. Fix this issue by calling kfree()
    for it when kcalloc() fails.

    Fixes: 281b5278 ("KVM: x86: Add emulation for MSR_IA32_MCx_CTL2 MSRs.")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
    Message-Id: <20220901122300.22298-1-linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

627041ac

KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES · 713b7d27

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 0204750b
Author: Jim Mattson <jmattson@google.com>
Date:   Tue Aug 30 10:49:47 2022 -0700

    KVM: x86: Mask off unsupported and unknown bits of IA32_ARCH_CAPABILITIES

    KVM should not claim to virtualize unknown IA32_ARCH_CAPABILITIES
    bits. When kvm_get_arch_capabilities() was originally written, there
    were only a few bits defined in this MSR, and KVM could virtualize all
    of them. However, over the years, several bits have been defined that
    KVM cannot just blindly pass through to the guest without additional
    work (such as virtualizing an MSR promised by the
    IA32_ARCH_CAPABILITES feature bit).

    Define a mask of supported IA32_ARCH_CAPABILITIES bits, and mask off
    any other bits that are set in the hardware MSR.

    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Fixes: 5b76a3cf ("KVM: VMX: Tell the nested hypervisor to skip L1D flush on vmentry")
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
    Message-Id: <20220830174947.2182144-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

713b7d27

KVM: selftests: Fix ambiguous mov in KVM_ASM_SAFE() · afe44531

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 372d0708
Author: David Matlack <dmatlack@google.com>
Date:   Fri Jul 22 23:48:38 2022 +0000

    KVM: selftests: Fix ambiguous mov in KVM_ASM_SAFE()

    Change the mov in KVM_ASM_SAFE() that zeroes @vector to a movb to
    make it unambiguous.

    This fixes a build failure with Clang since, unlike the GNU assembler,
    the LLVM integrated assembler rejects ambiguous X86 instructions that
    don't have suffixes:

      In file included from x86_64/hyperv_features.c:13:
      include/x86_64/processor.h:825:9: error: ambiguous instructions require an explicit suffix (could be 'movb', 'movw', 'movl', or 'movq')
              return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                     ^
      include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
              asm volatile(KVM_ASM_SAFE(insn)                 \
                           ^
      include/x86_64/processor.h:788:16: note: expanded from macro 'KVM_ASM_SAFE'
              "1: " insn "\n\t"                                       \
                            ^
      <inline asm>:5:2: note: instantiated into assembly here
              mov $0, 15(%rsp)
              ^

    It seems like this change could introduce undesirable behavior in the
    future, e.g. if someone used a type larger than a u8 for @vector, since
    KVM_ASM_SAFE() will only zero the bottom byte. I tried changing the type
    of @vector to an int to see what would happen. GCC failed to compile due
    to a size mismatch between `movb` and `%eax`. Clang succeeded in
    compiling, but the generated code looked correct, so perhaps it will not
    be an issue. That being said it seems like there could be a better
    solution to this issue that does not assume @vector is a u8.

    Fixes: 3b23054c ("KVM: selftests: Add x86-64 support for exception fixup")
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220722234838.2160385-3-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

afe44531

KVM: selftests: Fix KVM_EXCEPTION_MAGIC build with Clang · b72b69ba

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 67ef8664
Author: David Matlack <dmatlack@google.com>
Date:   Fri Jul 22 23:48:37 2022 +0000

    KVM: selftests: Fix KVM_EXCEPTION_MAGIC build with Clang

    Change KVM_EXCEPTION_MAGIC to use the all-caps "ULL", rather than lower
    case. This fixes a build failure with Clang:

      In file included from x86_64/hyperv_features.c:13:
      include/x86_64/processor.h:825:9: error: unexpected token in argument list
              return kvm_asm_safe("wrmsr", "a"(val & -1u), "d"(val >> 32), "c"(msr));
                     ^
      include/x86_64/processor.h:802:15: note: expanded from macro 'kvm_asm_safe'
              asm volatile(KVM_ASM_SAFE(insn)                 \
                           ^
      include/x86_64/processor.h:785:2: note: expanded from macro 'KVM_ASM_SAFE'
              "mov $" __stringify(KVM_EXCEPTION_MAGIC) ", %%r9\n\t"   \
              ^
      <inline asm>:1:18: note: instantiated into assembly here
              mov $0xabacadabaull, %r9
                              ^

    Fixes: 3b23054c ("KVM: selftests: Add x86-64 support for exception fixup")
Signed-off-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220722234838.2160385-2-dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

b72b69ba

KVM: VMX: Heed the 'msr' argument in msr_write_intercepted() · 8a355916

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 020dac41
Author: Jim Mattson <jmattson@google.com>
Date:   Wed Aug 10 14:30:50 2022 -0700

    KVM: VMX: Heed the 'msr' argument in msr_write_intercepted()

    Regardless of the 'msr' argument passed to the VMX version of
    msr_write_intercepted(), the function always checks to see if a
    specific MSR (IA32_SPEC_CTRL) is intercepted for write.  This behavior
    seems unintentional and unexpected.

    Modify the function so that it checks to see if the provided 'msr'
    index is intercepted for write.

    Fixes: 67f4b996 ("KVM: nVMX: Handle dynamic MSR intercept toggling")
    Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220810213050.2655000-1-jmattson@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8a355916

kvm: x86: mmu: Always flush TLBs when enabling dirty logging · ce8d23c2

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit b64d740e
Author: Junaid Shahid <junaids@google.com>
Date: Wed Aug 10 15:49:39 2022 -0700

kvm: x86: mmu: Always flush TLBs when enabling dirty logging

When A/D bits are not available, KVM uses a software access tracking
mechanism, which involves making the SPTEs inaccessible. However,
the clear_young() MMU notifier does not flush TLBs. So it is possible
that there may still be stale, potentially writable, TLB entries.
This is usually fine, but can be problematic when enabling dirty
logging, because it currently only does a TLB flush if any SPTEs were
modified. But if all SPTEs are in access-tracked state, then there
won't be a TLB flush, which means that the guest could still possibly
write to memory and not have it reflected in the dirty bitmap.

So just unconditionally flush the TLBs when enabling dirty logging.
As an alternative, KVM could explicitly check the MMU-Writable bit when
write-protecting SPTEs to decide if a flush is needed (instead of
checking the Writable bit), but given that a flush almost always happens
anyway, so just making it unconditional seems simpler.

Signed-off-by: Junaid Shahid <junaids@google.com>
Message-Id: <20220810224939.2611160-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

ce8d23c2

kvm: x86: mmu: Drop the need_remote_flush() function · 95ceb5eb

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 1441ca14
Author: Junaid Shahid <junaids@google.com>
Date:   Fri Jul 22 19:43:16 2022 -0700

    kvm: x86: mmu: Drop the need_remote_flush() function

    This is only used by kvm_mmu_pte_write(), which no longer actually
    creates the new SPTE and instead just clears the old SPTE. So we
    just need to check if the old SPTE was shadow-present instead of
    calling need_remote_flush(). Hence we can drop this function. It was
    incomplete anyway as it didn't take access-tracking into account.

    This patch should not result in any functional change.

Signed-off-by: Junaid Shahid <junaids@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220723024316.2725328-1-junaids@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

95ceb5eb

KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device() · 2ec519fe

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit eceb6e1d
Author: Li kunyu <kunyu@nfschina.com>
Date:   Fri Aug 19 10:15:35 2022 +0800

    KVM: Drop unnecessary initialization of "ops" in kvm_ioctl_create_device()

    The variable is initialized but it is only used after its assignment.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Li kunyu <kunyu@nfschina.com>
    Message-Id: <20220819021535.483702-1-kunyu@nfschina.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

2ec519fe

KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow() · e28cc16c

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 28249139
Author: Li kunyu <kunyu@nfschina.com>
Date:   Fri Aug 19 10:28:04 2022 +0800

    KVM: Drop unnecessary initialization of "npages" in hva_to_pfn_slow()

    The variable is initialized but it is only used after its assignment.

Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Li kunyu <kunyu@nfschina.com>
    Message-Id: <20220819022804.483914-1-kunyu@nfschina.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

e28cc16c

KVM: Rename mmu_notifier_* to mmu_invalidate_* · ce4b723d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 20ec3ebd
Author: Chao Peng <chao.p.peng@linux.intel.com>
Date:   Tue Aug 16 20:53:22 2022 +0800

    KVM: Rename mmu_notifier_* to mmu_invalidate_*

    The motivation of this renaming is to make these variables and related
    helper functions less mmu_notifier bound and can also be used for non
    mmu_notifier based page invalidation. mmu_invalidate_* was chosen to
    better describe the purpose of 'invalidating' a page that those
    variables are used for.

      - mmu_notifier_seq/range_start/range_end are renamed to
        mmu_invalidate_seq/range_start/range_end.

      - mmu_notifier_retry{_hva} helper functions are renamed to
        mmu_invalidate_retry{_hva}.

      - mmu_notifier_count is renamed to mmu_invalidate_in_progress to
        avoid confusion with mn_active_invalidate_count.

      - While here, also update kvm_inc/dec_notifier_count() to
        kvm_mmu_invalidate_begin/end() to match the change for
        mmu_notifier_count.

    No functional change intended.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Conflicts:
	arch/riscv/kvm/mmu.c (dropping)

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

ce4b723d

KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS · 4dedf7c8

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit bdd1c37a
Author: Chao Peng <chao.p.peng@linux.intel.com>
Date:   Tue Aug 16 20:53:21 2022 +0800

    KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS

    KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM
    internally used (invisible to userspace) and avoids confusion to future
    private slots that can have different meaning.

Signed-off-by: Chao Peng <chao.p.peng@linux.intel.com>
    Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

4dedf7c8

KVM: Move coalesced MMIO initialization (back) into kvm_create_vm() · 78128481

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit c2b82397
Author: Sean Christopherson <seanjc@google.com>
Date: Tue Aug 16 05:39:37 2022 +0000

KVM: Move coalesced MMIO initialization (back) into kvm_create_vm()

Invoke kvm_coalesced_mmio_init() from kvm_create_vm() now that allocating
and initializing coalesced MMIO objects is separate from registering any
associated devices. Moving coalesced MMIO cleans up the last oddity
where KVM does VM creation/initialization after kvm_create_vm(), and more
importantly after kvm_arch_post_init_vm() is called and the VM is added
to the global vm_list, i.e. after the VM is fully created as far as KVM
is concerned.

Originally, kvm_coalesced_mmio_init() was called by kvm_create_vm(), but
the original implementation was completely devoid of error handling.
Commit 6ce5a090 ("KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s
error handling" fixed the various bugs, and in doing so rightly moved the
call to after kvm_create_vm() because kvm_coalesced_mmio_init() also
registered the coalesced MMIO device. Commit 2b3c246a ("KVM: Make
coalesced mmio use a device per zone") cleaned up that mess by having
each zone register a separate device, i.e. moved device registration to
its logical home in kvm_vm_ioctl_register_coalesced_mmio(). As a result,
kvm_coalesced_mmio_init() is now a "pure" initialization helper and can
be safely called from kvm_create_vm().

Opportunstically drop the #ifdef, KVM provides stubs for
kvm_coalesced_mmio_{init,free}() when CONFIG_KVM_MMIO=n (s390).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220816053937.2477106-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

78128481

KVM: Unconditionally get a ref to /dev/kvm module when creating a VM · e6a3fa79

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 405294f2
Author: Sean Christopherson <seanjc@google.com>
Date: Tue Aug 16 05:39:36 2022 +0000

KVM: Unconditionally get a ref to /dev/kvm module when creating a VM

Unconditionally get a reference to the /dev/kvm module when creating a VM
instead of using try_get_module(), which will fail if the module is in
the process of being forcefully unloaded. The error handling when
try_get_module() fails doesn't properly unwind all that has been done,
e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM
from the global list. Not removing VMs from the global list tends to be
fatal, e.g. leads to use-after-free explosions.

The obvious alternative would be to add proper unwinding, but the
justification for using try_get_module(), "rmmod --wait", is completely
bogus as support for "rmmod --wait", i.e. delete_module() without
O_NONBLOCK, was removed by commit 3f2b9c9c ("module: remove rmmod
--wait option.") nearly a decade ago.

It's still possible for try_get_module() to fail due to the module dying
(more like being killed), as the module will be tagged MODULE_STATE_GOING
by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice
with forced unloading is an exercise in futility and gives a falsea sense
of security. Using try_get_module() only prevents acquiring _new_
references, it doesn't magically put the references held by other VMs,
and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but
guaranteed to cause spectacular fireworks; the window where KVM will fail
try_get_module() is tiny compared to the window where KVM is building and
running the VM with an elevated module refcount.

Addressing KVM's inability to play nice with "rmmod --force" is firmly
out-of-scope. Forcefully unloading any module taints kernel (for obvious
reasons) _and_ requires the kernel to be built with
CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the
amusing disclaimer that it's "mainly for kernel developers and desperate
users". In other words, KVM is free to scoff at bug reports due to using
"rmmod --force" while VMs may be running.

Fixes: 5f6de5cb ("KVM: Prevent module exit until all VMs are freed")
Cc: stable@vger.kernel.org
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220816053937.2477106-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

e6a3fa79

KVM: Properly unwind VM creation if creating debugfs fails · 9802cb62

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 4ba4f419
Author: Sean Christopherson <seanjc@google.com>
Date: Tue Aug 16 05:39:35 2022 +0000

KVM: Properly unwind VM creation if creating debugfs fails

Properly unwind VM creation if kvm_create_vm_debugfs() fails. A recent
change to invoke kvm_create_vm_debug() in kvm_create_vm() was led astray
by buggy try_get_module() handling adding by commit 5f6de5cb ("KVM:
Prevent module exit until all VMs are freed"). The debugfs error path
effectively inherits the bad error path of try_module_get(), e.g. KVM
leaves the to-be-free VM on vm_list even though KVM appears to do the
right thing by calling module_put() and falling through.

Opportunistically hoist kvm_create_vm_debugfs() above the call to
kvm_arch_post_init_vm() so that the "post-init" arch hook is actually
invoked after the VM is initialized (ignoring kvm_coalesced_mmio_init()
for the moment). x86 is the only non-nop implementation of the post-init
hook, and it doesn't allocate/initialize any objects that are reachable
via debugfs code (spawns a kthread worker for the NX huge page mitigation).

Leave the buggy try_get_module() alone for now, it will be fixed in a
separate commit.

Fixes: b74ed7a6 ("KVM: Actually create debugfs in kvm_create_vm()")
Reported-by: <syzbot+744e173caec2e1627ee0@syzkaller.appspotmail.com>
Cc: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Message-Id: <20220816053937.2477106-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

9802cb62

KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh · a3522737

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 6348aafa
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Jul 27 23:34:24 2022 +0000

    KVM: VMX: Adjust number of LBR records for PERF_CAPABILITIES at refresh

    Now that the PMU is refreshed when MSR_IA32_PERF_CAPABILITIES is written
    by host userspace, zero out the number of LBR records for a vCPU during
    PMU refresh if PMU_CAP_LBR_FMT is not set in PERF_CAPABILITIES instead of
    handling the check at run-time.

    guest_cpuid_has() is expensive due to the linear search of guest CPUID
    entries, intel_pmu_lbr_is_enabled() is checked on every VM-Enter, _and_
    simply enumerating the same "Model" as the host causes KVM to set the
    number of LBR records to a non-zero value.

Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220727233424.2968356-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

a3522737

KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers · 984d831f

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 7de8e5b6
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Jul 27 23:34:23 2022 +0000

    KVM: VMX: Use proper type-safe functions for vCPU => LBRs helpers

    Turn vcpu_to_lbr_desc() and vcpu_to_lbr_records() into functions in order
    to provide type safety, to document exactly what they return, and to
    allow consuming the helpers in vmx.h.  Move the definitions as necessary
    (the macros "reference" to_vmx() before its definition).

    No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220727233424.2968356-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

984d831f

KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES · 0298c074

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 17a024a8
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Jul 27 23:34:22 2022 +0000

    KVM: x86: Refresh PMU after writes to MSR_IA32_PERF_CAPABILITIES

    Refresh the PMU if userspace modifies MSR_IA32_PERF_CAPABILITIES.  KVM
    consumes the vCPU's PERF_CAPABILITIES when enumerating PEBS support, but
    relies on CPUID updates to refresh the PMU.  I.e. KVM will do the wrong
    thing if userspace stuffs PERF_CAPABILITIES _after_ setting guest CPUID.

    Opportunistically fix a curly-brace indentation.

    Fixes: c59a1f10 ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
    Cc: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220727233424.2968356-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

0298c074

KVM: selftests: Test all possible "invalid" PERF_CAPABILITIES.LBR_FMT vals · 925e2840

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 9d27d461
Author: Sean Christopherson <seanjc@google.com>
Date:   Thu Aug 4 12:18:15 2022 -0700

    KVM: selftests: Test all possible "invalid" PERF_CAPABILITIES.LBR_FMT vals

    Test all possible input values to verify that KVM rejects all values
    except the exact host value.  Due to the LBR format affecting the core
    functionality of LBRs, KVM can't emulate "other" formats, so even though
    there are a variety of legal values, KVM should reject anything but an
    exact host match.

Suggested-by: Like Xu <like.xu.linux@gmail.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

925e2840

KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test · aa5d17d7

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 0fcc1029
Author: Gavin Shan <gshan@redhat.com>
Date:   Wed Aug 10 18:41:14 2022 +0800

    KVM: selftests: Use getcpu() instead of sched_getcpu() in rseq_test

    sched_getcpu() is glibc dependent and it can simply return the CPU
    ID from the registered rseq information, as Florian Weimer pointed.
    In this case, it's pointless to compare the return value from
    sched_getcpu() and that fetched from the registered rseq information.

    Fix the issue by replacing sched_getcpu() with getcpu(), as Florian
    suggested. The comments are modified accordingly by replacing
    "sched_getcpu()" with "getcpu()".

Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
    Message-Id: <20220810104114.6838-3-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

aa5d17d7

KVM: selftests: Make rseq compatible with glibc-2.35 · 513f0abc

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 66d42ac7
Author: Gavin Shan <gshan@redhat.com>
Date:   Wed Aug 10 18:41:13 2022 +0800

    KVM: selftests: Make rseq compatible with glibc-2.35

    The rseq information is registered by TLS, starting from glibc-2.35.
    In this case, the test always fails due to syscall(__NR_rseq). For
    example, on RHEL9.1 where upstream glibc-2.35 features are enabled
    on downstream glibc-2.34, the test fails like below.

      # ./rseq_test
      ==== Test Assertion Failure ====
        rseq_test.c:60: !r
        pid=112043 tid=112043 errno=22 - Invalid argument
           1        0x0000000000401973: main at rseq_test.c:226
           2        0x0000ffff84b6c79b: ?? ??:0
           3        0x0000ffff84b6c86b: ?? ??:0
           4        0x0000000000401b6f: _start at ??:?
        rseq failed, errno = 22 (Invalid argument)
      # rpm -aq | grep glibc-2
      glibc-2.34-39.el9.aarch64

    Fix the issue by using "../rseq/rseq.c" to fetch the rseq information,
    registred by TLS if it exists. Otherwise, we're going to register our
    own rseq information as before.

Reported-by: Yihuang Yu <yihyu@redhat.com>
Suggested-by: Florian Weimer <fweimer@redhat.com>
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
    Message-Id: <20220810104114.6838-2-gshan@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

513f0abc

KVM: Actually create debugfs in kvm_create_vm() · 8f9d924d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit b74ed7a6
Author: Oliver Upton <oupton@google.com>
Date:   Wed Jul 20 09:22:51 2022 +0000

    KVM: Actually create debugfs in kvm_create_vm()

    Doing debugfs creation after vm creation leaves things in a
    quasi-initialized state for a while. This is further complicated by the
    fact that we tear down debugfs from kvm_destroy_vm(). Align debugfs and
    stats init/destroy with the vm init/destroy pattern to avoid any
    headaches.

    Note the fix for a benign mistake in error handling for calls to
    kvm_arch_create_vm_debugfs() rolled in. Since all implementations of
    the function return 0 unconditionally it isn't actually a bug at
    the moment.

    Lastly, tear down debugfs/stats data in the kvm_create_vm_debugfs()
    error path. Previously it was safe to assume that kvm_destroy_vm() would
    take out the garbage, that is no longer the case.

Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20220720092259.3491733-6-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8f9d924d

KVM: Pass the name of the VM fd to kvm_create_vm_debugfs() · c9ed3f05

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 59f82aad
Author: Oliver Upton <oupton@google.com>
Date:   Wed Jul 20 09:22:50 2022 +0000

    KVM: Pass the name of the VM fd to kvm_create_vm_debugfs()

    At the time the VM fd is used in kvm_create_vm_debugfs(), the fd has
    been allocated but not yet installed. It is only really useful as an
    identifier in strings for the VM (such as debugfs).

    Treat it exactly as such by passing the string name of the fd to
    kvm_create_vm_debugfs(), futureproofing against possible misuse of the
    VM fd.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20220720092259.3491733-5-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

c9ed3f05

KVM: Get an fd before creating the VM · 2b4d877d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 20020f4c
Author: Oliver Upton <oupton@google.com>
Date:   Wed Jul 20 09:22:49 2022 +0000

    KVM: Get an fd before creating the VM

    Allocate a VM's fd at the very beginning of kvm_dev_ioctl_create_vm() so
    that KVM can use the fd value to generate strigns, e.g. for debugfs,
    when creating and initializing the VM.

Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20220720092259.3491733-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

2b4d877d

KVM: Shove vcpu stats_id init into kvm_vcpu_init() · ca68ceae

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 58fc1166
Author: Oliver Upton <oupton@google.com>
Date:   Wed Jul 20 09:22:48 2022 +0000

    KVM: Shove vcpu stats_id init into kvm_vcpu_init()

    Initialize stats_id alongside other kvm_vcpu fields to make it more
    difficult to unintentionally access stats_id before it's set.

    No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20220720092259.3491733-3-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

ca68ceae

KVM: Shove vm stats_id init into kvm_create_vm() · 8b633663

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit f2759c08
Author: Oliver Upton <oupton@google.com>
Date:   Wed Jul 20 09:22:47 2022 +0000

    KVM: Shove vm stats_id init into kvm_create_vm()

    Initialize stats_id alongside other struct kvm fields to make it more
    difficult to unintentionally access stats_id before it's set.  While at
    it, move the format string to the first line of the call and fix the
    indentation of the second line.

    No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20220720092259.3491733-2-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8b633663

KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen · 07f1314d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 8bad4606
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Aug 5 19:41:33 2022 +0000

    KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen

    Add compile-time and init-time sanity checks to ensure that the MMIO SPTE
    mask doesn't overlap the MMIO SPTE generation or the MMU-present bit.
    The generation currently avoids using bit 63, but that's as much
    coincidence as it is strictly necessarly.  That will change in the future,
    as TDX support will require setting bit 63 (SUPPRESS_VE) in the mask.

    Explicitly carve out the bits that are allowed in the mask so that any
    future shuffling of SPTE bits doesn't silently break MMIO caching (KVM
    has broken MMIO caching more than once due to overlapping the generation
    with other things).

Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
    Message-Id: <20220805194133.86299-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

07f1314d

KVM: x86/mmu: rename trace function name for asynchronous page fault · 68f636eb

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 1685c0f3
Author: Mingwei Zhang <mizhang@google.com>
Date:   Sun Aug 7 05:21:41 2022 +0000

    KVM: x86/mmu: rename trace function name for asynchronous page fault

    Rename the tracepoint function from trace_kvm_async_pf_doublefault() to
    trace_kvm_async_pf_repeated_fault() to make it clear, since double fault
    has nothing to do with this trace function.

    Asynchronous Page Fault (APF) is an artifact generated by KVM when it
    cannot find a physical page to satisfy an EPT violation. KVM uses APF to
    tell the guest OS to do something else such as scheduling other guest
    processes to make forward progress. However, when another guest process
    also touches a previously APFed page, KVM halts the vCPU instead of
    generating a repeated APF to avoid wasting cycles.

    Double fault (#DF) clearly has a different meaning and a different
    consequence when triggered. #DF requires two nested contributory exceptions
    instead of two page faults faulting at the same address. A prevous bug on
    APF indicates that it may trigger a double fault in the guest [1] and
    clearly this trace function has nothing to do with it. So rename this
    function should be a valid choice.

    No functional change intended.

    [1] https://www.spinics.net/lists/kvm/msg214957.html



Signed-off-by: Mingwei Zhang <mizhang@google.com>
    Message-Id: <20220807052141.69186-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

68f636eb

KVM: x86/xen: Stop Xen timer before changing IRQ · 4f37fdc5

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit c0368991
Author: Coleman Dietsch <dietschc@csp.edu>
Date:   Mon Aug 8 14:06:07 2022 -0500

    KVM: x86/xen: Stop Xen timer before changing IRQ

    Stop Xen timer (if it's running) prior to changing the IRQ vector and
    potentially (re)starting the timer. Changing the IRQ vector while the
    timer is still running can result in KVM injecting a garbage event, e.g.
    vm_xen_inject_timer_irqs() could see a non-zero xen.timer_pending from
    a previous timer but inject the new xen.timer_virq.

    Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
    Cc: stable@vger.kernel.org
    Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42


Reported-by:  <syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com>
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
    Message-Id: <20220808190607.323899-3-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

4f37fdc5

KVM: x86/xen: Initialize Xen timer only once · 8fe9890b

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit af735db3
Author: Coleman Dietsch <dietschc@csp.edu>
Date:   Mon Aug 8 14:06:06 2022 -0500

    KVM: x86/xen: Initialize Xen timer only once

    Add a check for existing xen timers before initializing a new one.

    Currently kvm_xen_init_timer() is called on every
    KVM_XEN_VCPU_ATTR_TYPE_TIMER, which is causing the following ODEBUG
    crash when vcpu->arch.xen.timer is already set.

    ODEBUG: init active (active state 0)
    object type: hrtimer hint: xen_timer_callbac0
    RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:502
    Call Trace:
    __debug_object_init
    debug_hrtimer_init
    debug_init
    hrtimer_init
    kvm_xen_init_timer
    kvm_xen_vcpu_set_attr
    kvm_arch_vcpu_ioctl
    kvm_vcpu_ioctl
    vfs_ioctl

    Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
    Cc: stable@vger.kernel.org
    Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42


Reported-by:  <syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com>
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220808190607.323899-2-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8fe9890b

KVM: SVM: Disable SEV-ES support if MMIO caching is disable · fb27bfa6

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 0c29397a
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Aug 3 22:49:57 2022 +0000

    KVM: SVM: Disable SEV-ES support if MMIO caching is disable

    Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
    generating #NPF(RSVD), which are reflected by the CPU into the guest as
    a #VC.  With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
    to the guest instruction stream or register state and so can't directly
    emulate in response to a #NPF on an emulated MMIO GPA.  Disabling MMIO
    caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
    and those flavors of #NPF cause automatic VM-Exits, not #VC.

    Adjust KVM's MMIO masks to account for the C-bit location prior to doing
    SEV(-ES) setup, and document that dependency between adjusting the MMIO
    SPTE mask and SEV(-ES) setup.

    Fixes: b09763da ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
Reported-by: Michael Roth <michael.roth@amd.com>
Tested-by: Michael Roth <michael.roth@amd.com>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220803224957.1285926-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

fb27bfa6

KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change · df199b0b

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit c3e0c8c2
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Aug 3 22:49:56 2022 +0000

    KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change

    Fully re-evaluate whether or not MMIO caching can be enabled when SPTE
    masks change; simply clearing enable_mmio_caching when a configuration
    isn't compatible with caching fails to handle the scenario where the
    masks are updated, e.g. by VMX for EPT or by SVM to account for the C-bit
    location, and toggle compatibility from false=>true.

    Snapshot the original module param so that re-evaluating MMIO caching
    preserves userspace's desire to allow caching.  Use a snapshot approach
    so that enable_mmio_caching still reflects KVM's actual behavior.

    Fixes: 8b9e74bf ("KVM: x86/mmu: Use enable_mmio_caching to track if MMIO caching is enabled")
Reported-by: Michael Roth <michael.roth@amd.com>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: stable@vger.kernel.org
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
    Message-Id: <20220803224957.1285926-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

df199b0b

KVM: x86: Tag kvm_mmu_x86_module_init() with __init · d49b93c3

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 982bae43
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Aug 3 22:49:55 2022 +0000

    KVM: x86: Tag kvm_mmu_x86_module_init() with __init

    Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists
    is to initialize variables when kvm.ko is loaded, i.e. it must never be
    called after module initialization.

    Fixes: 1d0e8480 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded")
    Cc: stable@vger.kernel.org
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220803224957.1285926-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

d49b93c3

KVM: x86: emulator: Fix illegal LEA handling · dc82804d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 4ac5b423
Author: Michal Luczaj <mhal@rbox.co>
Date:   Fri Jul 29 15:48:01 2022 +0200

    KVM: x86: emulator: Fix illegal LEA handling

    The emulator mishandles LEA with register source operand. Even though such
    LEA is illegal, it can be encoded and fed to CPU. In which case real
    hardware throws #UD. The emulator, instead, returns address of
    x86_emulate_ctxt._regs. This info leak hurts host's kASLR.

    Tell the decoder that illegal LEA is not to be emulated.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Message-Id: <20220729134801.1120-1-mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

dc82804d

KVM: X86: avoid uninitialized 'fault.async_page_fault' from fixed-up #PF · 8630b2f8

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 2bc685e6
Author: Yu Zhang <yu.c.zhang@linux.intel.com>
Date:   Mon Jul 18 15:47:56 2022 +0800

    KVM: X86: avoid uninitialized 'fault.async_page_fault' from fixed-up #PF

    kvm_fixup_and_inject_pf_error() was introduced to fixup the error code(
    e.g., to add RSVD flag) and inject the #PF to the guest, when guest
    MAXPHYADDR is smaller than the host one.

    When it comes to nested, L0 is expected to intercept and fix up the #PF
    and then inject to L2 directly if
    - L2.MAXPHYADDR < L0.MAXPHYADDR and
    - L1 has no intention to intercept L2's #PF (e.g., L2 and L1 have the
      same MAXPHYADDR value && L1 is using EPT for L2),
    instead of constructing a #PF VM Exit to L1. Currently, with PFEC_MASK
    and PFEC_MATCH both set to 0 in vmcs02, the interception and injection
    may happen on all L2 #PFs.

    However, failing to initialize 'fault' in kvm_fixup_and_inject_pf_error()
    may cause the fault.async_page_fault being NOT zeroed, and later the #PF
    being treated as a nested async page fault, and then being injected to L1.
    Instead of zeroing 'fault' at the beginning of this function, we mannually
    set the value of 'fault.async_page_fault', because false is the value we
    really expect.

    Fixes: 89786147 ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216178


Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220718074756.53788-1-yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8630b2f8

KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg · 32c1c915

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 70c8327c
Author: Sean Christopherson <seanjc@google.com>
Date:   Thu Aug 4 23:50:28 2022 +0000

    KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg

    Bug the VM if retrieving the x2APIC MSR/register while processing an
    accelerated vAPIC trap VM-Exit fails.  In theory it's impossible for the
    lookup to fail as hardware has already validated the register, but bugs
    happen, and not checking the result of kvm_lapic_msr_read() would result
    in consuming the uninitialized "val" if a KVM or hardware bug occurs.

    Fixes: 1bd9dfec ("KVM: x86: Do not block APIC write for non ICR registers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220804235028.1766253-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

32c1c915