Commits · ca68ceae98263440a3a619cc126f32090e88e44e · CodeLinaro / la / kernel / msm

Oct 25, 2022

KVM: Shove vcpu stats_id init into kvm_vcpu_init() · ca68ceae

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 58fc1166
Author: Oliver Upton <oupton@google.com>
Date:   Wed Jul 20 09:22:48 2022 +0000

    KVM: Shove vcpu stats_id init into kvm_vcpu_init()

    Initialize stats_id alongside other kvm_vcpu fields to make it more
    difficult to unintentionally access stats_id before it's set.

    No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20220720092259.3491733-3-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

ca68ceae

KVM: Shove vm stats_id init into kvm_create_vm() · 8b633663

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit f2759c08
Author: Oliver Upton <oupton@google.com>
Date:   Wed Jul 20 09:22:47 2022 +0000

    KVM: Shove vm stats_id init into kvm_create_vm()

    Initialize stats_id alongside other struct kvm fields to make it more
    difficult to unintentionally access stats_id before it's set.  While at
    it, move the format string to the first line of the call and fix the
    indentation of the second line.

    No functional change intended.

Signed-off-by: Oliver Upton <oupton@google.com>
    Message-Id: <20220720092259.3491733-2-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8b633663

KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen · 07f1314d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 8bad4606
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Aug 5 19:41:33 2022 +0000

    KVM: x86/mmu: Add sanity check that MMIO SPTE mask doesn't overlap gen

    Add compile-time and init-time sanity checks to ensure that the MMIO SPTE
    mask doesn't overlap the MMIO SPTE generation or the MMU-present bit.
    The generation currently avoids using bit 63, but that's as much
    coincidence as it is strictly necessarly.  That will change in the future,
    as TDX support will require setting bit 63 (SUPPRESS_VE) in the mask.

    Explicitly carve out the bits that are allowed in the mask so that any
    future shuffling of SPTE bits doesn't silently break MMIO caching (KVM
    has broken MMIO caching more than once due to overlapping the generation
    with other things).

Suggested-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
    Message-Id: <20220805194133.86299-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

07f1314d

KVM: x86/mmu: rename trace function name for asynchronous page fault · 68f636eb

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 1685c0f3
Author: Mingwei Zhang <mizhang@google.com>
Date:   Sun Aug 7 05:21:41 2022 +0000

    KVM: x86/mmu: rename trace function name for asynchronous page fault

    Rename the tracepoint function from trace_kvm_async_pf_doublefault() to
    trace_kvm_async_pf_repeated_fault() to make it clear, since double fault
    has nothing to do with this trace function.

    Asynchronous Page Fault (APF) is an artifact generated by KVM when it
    cannot find a physical page to satisfy an EPT violation. KVM uses APF to
    tell the guest OS to do something else such as scheduling other guest
    processes to make forward progress. However, when another guest process
    also touches a previously APFed page, KVM halts the vCPU instead of
    generating a repeated APF to avoid wasting cycles.

    Double fault (#DF) clearly has a different meaning and a different
    consequence when triggered. #DF requires two nested contributory exceptions
    instead of two page faults faulting at the same address. A prevous bug on
    APF indicates that it may trigger a double fault in the guest [1] and
    clearly this trace function has nothing to do with it. So rename this
    function should be a valid choice.

    No functional change intended.

    [1] https://www.spinics.net/lists/kvm/msg214957.html



Signed-off-by: Mingwei Zhang <mizhang@google.com>
    Message-Id: <20220807052141.69186-1-mizhang@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

68f636eb

KVM: x86/xen: Stop Xen timer before changing IRQ · 4f37fdc5

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit c0368991
Author: Coleman Dietsch <dietschc@csp.edu>
Date:   Mon Aug 8 14:06:07 2022 -0500

    KVM: x86/xen: Stop Xen timer before changing IRQ

    Stop Xen timer (if it's running) prior to changing the IRQ vector and
    potentially (re)starting the timer. Changing the IRQ vector while the
    timer is still running can result in KVM injecting a garbage event, e.g.
    vm_xen_inject_timer_irqs() could see a non-zero xen.timer_pending from
    a previous timer but inject the new xen.timer_virq.

    Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
    Cc: stable@vger.kernel.org
    Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42


Reported-by:  <syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com>
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Acked-by: David Woodhouse <dwmw@amazon.co.uk>
    Message-Id: <20220808190607.323899-3-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

4f37fdc5

KVM: x86/xen: Initialize Xen timer only once · 8fe9890b

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit af735db3
Author: Coleman Dietsch <dietschc@csp.edu>
Date:   Mon Aug 8 14:06:06 2022 -0500

    KVM: x86/xen: Initialize Xen timer only once

    Add a check for existing xen timers before initializing a new one.

    Currently kvm_xen_init_timer() is called on every
    KVM_XEN_VCPU_ATTR_TYPE_TIMER, which is causing the following ODEBUG
    crash when vcpu->arch.xen.timer is already set.

    ODEBUG: init active (active state 0)
    object type: hrtimer hint: xen_timer_callbac0
    RIP: 0010:debug_print_object+0x16e/0x250 lib/debugobjects.c:502
    Call Trace:
    __debug_object_init
    debug_hrtimer_init
    debug_init
    hrtimer_init
    kvm_xen_init_timer
    kvm_xen_vcpu_set_attr
    kvm_arch_vcpu_ioctl
    kvm_vcpu_ioctl
    vfs_ioctl

    Fixes: 53639526 ("KVM: x86/xen: handle PV timers oneshot mode")
    Cc: stable@vger.kernel.org
    Link: https://syzkaller.appspot.com/bug?id=8234a9dfd3aafbf092cc5a7cd9842e3ebc45fc42


Reported-by:  <syzbot+e54f930ed78eb0f85281@syzkaller.appspotmail.com>
Signed-off-by: Coleman Dietsch <dietschc@csp.edu>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220808190607.323899-2-dietschc@csp.edu>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8fe9890b

KVM: SVM: Disable SEV-ES support if MMIO caching is disable · fb27bfa6

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 0c29397a
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Aug 3 22:49:57 2022 +0000

    KVM: SVM: Disable SEV-ES support if MMIO caching is disable

    Disable SEV-ES if MMIO caching is disabled as SEV-ES relies on MMIO SPTEs
    generating #NPF(RSVD), which are reflected by the CPU into the guest as
    a #VC.  With SEV-ES, the untrusted host, a.k.a. KVM, doesn't have access
    to the guest instruction stream or register state and so can't directly
    emulate in response to a #NPF on an emulated MMIO GPA.  Disabling MMIO
    caching means guest accesses to emulated MMIO ranges cause #NPF(!PRESENT),
    and those flavors of #NPF cause automatic VM-Exits, not #VC.

    Adjust KVM's MMIO masks to account for the C-bit location prior to doing
    SEV(-ES) setup, and document that dependency between adjusting the MMIO
    SPTE mask and SEV(-ES) setup.

    Fixes: b09763da ("KVM: x86/mmu: Add module param to disable MMIO caching (for testing)")
Reported-by: Michael Roth <michael.roth@amd.com>
Tested-by: Michael Roth <michael.roth@amd.com>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220803224957.1285926-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

fb27bfa6

KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change · df199b0b

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit c3e0c8c2
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Aug 3 22:49:56 2022 +0000

    KVM: x86/mmu: Fully re-evaluate MMIO caching when SPTE masks change

    Fully re-evaluate whether or not MMIO caching can be enabled when SPTE
    masks change; simply clearing enable_mmio_caching when a configuration
    isn't compatible with caching fails to handle the scenario where the
    masks are updated, e.g. by VMX for EPT or by SVM to account for the C-bit
    location, and toggle compatibility from false=>true.

    Snapshot the original module param so that re-evaluating MMIO caching
    preserves userspace's desire to allow caching.  Use a snapshot approach
    so that enable_mmio_caching still reflects KVM's actual behavior.

    Fixes: 8b9e74bf ("KVM: x86/mmu: Use enable_mmio_caching to track if MMIO caching is enabled")
Reported-by: Michael Roth <michael.roth@amd.com>
    Cc: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: stable@vger.kernel.org
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
    Message-Id: <20220803224957.1285926-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

df199b0b

KVM: x86: Tag kvm_mmu_x86_module_init() with __init · d49b93c3

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 982bae43
Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Aug 3 22:49:55 2022 +0000

    KVM: x86: Tag kvm_mmu_x86_module_init() with __init

    Mark kvm_mmu_x86_module_init() with __init, the entire reason it exists
    is to initialize variables when kvm.ko is loaded, i.e. it must never be
    called after module initialization.

    Fixes: 1d0e8480 ("KVM: x86/mmu: Resolve nx_huge_pages when kvm.ko is loaded")
    Cc: stable@vger.kernel.org
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Michael Roth <michael.roth@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220803224957.1285926-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

d49b93c3

KVM: x86: emulator: Fix illegal LEA handling · dc82804d

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 4ac5b423
Author: Michal Luczaj <mhal@rbox.co>
Date:   Fri Jul 29 15:48:01 2022 +0200

    KVM: x86: emulator: Fix illegal LEA handling

    The emulator mishandles LEA with register source operand. Even though such
    LEA is illegal, it can be encoded and fed to CPU. In which case real
    hardware throws #UD. The emulator, instead, returns address of
    x86_emulate_ctxt._regs. This info leak hurts host's kASLR.

    Tell the decoder that illegal LEA is not to be emulated.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Message-Id: <20220729134801.1120-1-mhal@rbox.co>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

dc82804d

KVM: X86: avoid uninitialized 'fault.async_page_fault' from fixed-up #PF · 8630b2f8

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 2bc685e6
Author: Yu Zhang <yu.c.zhang@linux.intel.com>
Date:   Mon Jul 18 15:47:56 2022 +0800

    KVM: X86: avoid uninitialized 'fault.async_page_fault' from fixed-up #PF

    kvm_fixup_and_inject_pf_error() was introduced to fixup the error code(
    e.g., to add RSVD flag) and inject the #PF to the guest, when guest
    MAXPHYADDR is smaller than the host one.

    When it comes to nested, L0 is expected to intercept and fix up the #PF
    and then inject to L2 directly if
    - L2.MAXPHYADDR < L0.MAXPHYADDR and
    - L1 has no intention to intercept L2's #PF (e.g., L2 and L1 have the
      same MAXPHYADDR value && L1 is using EPT for L2),
    instead of constructing a #PF VM Exit to L1. Currently, with PFEC_MASK
    and PFEC_MATCH both set to 0 in vmcs02, the interception and injection
    may happen on all L2 #PFs.

    However, failing to initialize 'fault' in kvm_fixup_and_inject_pf_error()
    may cause the fault.async_page_fault being NOT zeroed, and later the #PF
    being treated as a nested async page fault, and then being injected to L1.
    Instead of zeroing 'fault' at the beginning of this function, we mannually
    set the value of 'fault.async_page_fault', because false is the value we
    really expect.

    Fixes: 89786147 ("KVM: x86: Add helper functions for illegal GPA checking and page fault injection")
    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216178


Reported-by: Yang Lixiao <lixiao.yang@intel.com>
Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220718074756.53788-1-yu.c.zhang@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8630b2f8

KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg · 32c1c915

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 70c8327c
Author: Sean Christopherson <seanjc@google.com>
Date:   Thu Aug 4 23:50:28 2022 +0000

    KVM: x86: Bug the VM if an accelerated x2APIC trap occurs on a "bad" reg

    Bug the VM if retrieving the x2APIC MSR/register while processing an
    accelerated vAPIC trap VM-Exit fails.  In theory it's impossible for the
    lookup to fail as hardware has already validated the register, but bugs
    happen, and not checking the result of kvm_lapic_msr_read() would result
    in consuming the uninitialized "val" if a KVM or hardware bug occurs.

    Fixes: 1bd9dfec ("KVM: x86: Do not block APIC write for non ICR registers")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
    Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220804235028.1766253-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

32c1c915

selftests: kvm: fix compilation · ef56ab09

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit baea2ce5
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Aug 10 14:55:27 2022 -0400

    selftests: kvm: fix compilation

    Commit  49de12ba ("selftests: drop KSFT_KHDR_INSTALL make target")
    dropped from tools/testing/selftests/lib.mk the code related to KSFT_KHDR_INSTALL,
    but in doing so it also dropped the definition of the ARCH variable.  The ARCH
    variable is used in several subdirectories, but kvm/ is the only one of these
    that was using KSFT_KHDR_INSTALL.

    As a result, kvm selftests cannot be built anymore:

    In file included from include/x86_64/vmx.h:12,
                     from x86_64/vmx_pmu_caps_test.c:18:
    include/x86_64/processor.h:15:10: fatal error: asm/msr-index.h: No such file or directory
       15 | #include <asm/msr-index.h>
          |          ^~~~~~~~~~~~~~~~~

    In file included from ../../../../tools/include/asm/atomic.h:6,
                     from ../../../../tools/include/linux/atomic.h:5,
                     from rseq_test.c:15:
    ../../../../tools/include/asm/../../arch/x86/include/asm/atomic.h:11:10: fatal error: asm/cmpxchg.h: No such file or directory
       11 | #include <asm/cmpxchg.h>
          |          ^~~~~~~~~~~~~~~

    Fix it by including the definition that was present in lib.mk.

    Fixes: 49de12ba ("selftests: drop KSFT_KHDR_INSTALL make target")
    Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
    Cc: Anders Roxell <anders.roxell@linaro.org>
    Cc: Shuah Khan <skhan@linuxfoundation.org>
    Cc: linux-kselftest@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Conflicts:
	tools/testing/selftests/kvm/Makefile (context, skipping f2745dc0)

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

ef56ab09

selftests: kvm: set rax before vmcall · a39e7b5b

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 281106f9
Author: Andrei Vagin <avagin@google.com>
Date:   Fri Jul 22 16:02:40 2022 -0700

    selftests: kvm: set rax before vmcall

    kvm_hypercall has to place the hypercall number in rax.

    Trace events show that kvm_pv_test doesn't work properly:
         kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0
         kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0
         kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0

    With this change, it starts working as expected:
         kvm_pv_test-54285: kvm_hypercall: nr 0x5 a0 0x0 a1 0x0 a2 0x0 a3 0x0
         kvm_pv_test-54285: kvm_hypercall: nr 0xa a0 0x0 a1 0x0 a2 0x0 a3 0x0
         kvm_pv_test-54285: kvm_hypercall: nr 0xb a0 0x0 a1 0x0 a2 0x0 a3 0x0

Signed-off-by: Andrei Vagin <avagin@google.com>
    Message-Id: <20220722230241.1944655-5-avagin@google.com>
    Fixes: ac4a4d6d ("selftests: kvm: test enforcement of paravirtual cpuid features")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

a39e7b5b

selftests: KVM: Add exponent check for boolean stats · 1d596b1f

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit dd4d1c3b
Author: Oliver Upton <oupton@google.com>
Date:   Tue Jul 19 14:31:34 2022 +0000

    selftests: KVM: Add exponent check for boolean stats

    The only sensible exponent for a boolean stat is 0. Add a test assertion
    requiring all boolean statistics to have an exponent of 0.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
    Message-Id: <20220719143134.3246798-4-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

1d596b1f

selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test · 07dffaaa

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 7eebae78
Author: Oliver Upton <oupton@google.com>
Date:   Tue Jul 19 14:31:33 2022 +0000

    selftests: KVM: Provide descriptive assertions in kvm_binary_stats_test

    As it turns out, tests sometimes fail. When that is the case, packing
    the test assertion with as much relevant information helps track down
    the problem more quickly.

    Sharpen up the stat descriptor assertions in kvm_binary_stats_test to
    more precisely describe the reason for the test assertion and which
    stat is to blame.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
    Message-Id: <20220719143134.3246798-3-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

07dffaaa

selftests: KVM: Check stat name before other fields · 00e976dc

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit ad5b0727
Author: Oliver Upton <oupton@google.com>
Date:   Tue Jul 19 14:31:32 2022 +0000

    selftests: KVM: Check stat name before other fields

    In order to provide more useful test assertions that describe the broken
    stats descriptor, perform sanity check on the stat name before any other
    descriptor field. While at it, avoid dereferencing the name field if the
    sanity check fails as it is more likely to contain garbage.

Signed-off-by: Oliver Upton <oupton@google.com>
Reviewed-by: Andrew Jones <andrew.jones@linux.dev>
    Message-Id: <20220719143134.3246798-2-oliver.upton@linux.dev>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

00e976dc

KVM: x86/mmu: remove unused variable · 5685ad31

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 31f6e383
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Mon Aug 1 07:27:18 2022 -0400

    KVM: x86/mmu: remove unused variable

    The last use of 'pfn' went away with the same-named argument to
    host_pfn_mapping_level; now that the hugepage level is obtained
    exclusively from the host page tables, kvm_mmu_zap_collapsible_spte
    does not need to know host pfns at all.

    Fixes: a8ac499b ("KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

5685ad31

KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache · b50b155b

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 4ab0e470
Author: Anup Patel <apatel@ventanamicro.com>
Date:   Fri Jul 29 17:15:00 2022 +0530

    KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache

    The kvm_mmu_topup_memory_cache() always uses GFP_KERNEL_ACCOUNT for
    memory allocation which prevents it's use in atomic context. To address
    this limitation of kvm_mmu_topup_memory_cache(), we add gfp_custom flag
    in struct kvm_mmu_memory_cache. When the gfp_custom flag is set to some
    GFP_xyz flags, the kvm_mmu_topup_memory_cache() will use that instead of
    GFP_KERNEL_ACCOUNT.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>

commit 63f4b210
Merge: 2e2e9115 7edc3a68
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Jul 29 09:46:01 2022 -0400

    Merge remote-tracking branch 'kvm/next' into kvm-next-5.20

    KVM/s390, KVM/x86 and common infrastructure changes for 5.20

    x86:

    * Permit guests to ignore single-bit ECC errors

    * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache

    * Intel IPI virtualization

    * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

    * PEBS virtualization

    * Simplify PMU emulation by just using PERF_TYPE_RAW events

    * More accurate event reinjection on SVM (avoid retrying instructions)

    * Allow getting/setting the state of the speaker port data bit

    * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent

    * "Notify" VM exit (detect microarchitectural hangs) for Intel

    * Cleanups for MCE MSR emulation

    s390:

    * add an interface to provide a hypervisor dump for secure guests

    * improve selftests to use TAP interface

    * enable interpretive execution of zPCI instructions (for PCI passthrough)

    * First part of deferred teardown

    * CPU Topology

    * PV attestation

    * Minor fixes

    Generic:

    * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple

    x86:

    * Use try_cmpxchg64 instead of cmpxchg64

    * Bugfixes

    * Ignore benign host accesses to PMU MSRs when PMU is disabled

    * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior

    * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis

    * Port eager page splitting to shadow MMU as well

    * Enable CMCI capability by default and handle injected UCNA errors

    * Expose pid of vcpu threads in debugfs

    * x2AVIC support for AMD

    * cleanup PIO emulation

    * Fixes for LLDT/LTR emulation

    * Don't require refcounted "struct page" to create huge SPTEs

    x86 cleanups:

    * Use separate namespaces for guest PTEs and shadow PTEs bitmasks

    * PIO emulation

    * Reorganize rmap API, mostly around rmap destruction

    * Do not workaround very old KVM bugs for L0 that runs with nesting enabled

    * new selftests API for CPUID

Conflicts:
	virt/kvm/kvm_main.c (upstream merge conflict)

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

b50b155b

KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs() · 2bd6a008

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 7edc3a68
Author: Kai Huang <kai.huang@intel.com>
Date:   Thu Jul 28 15:04:52 2022 +1200

    KVM, x86/mmu: Fix the comment around kvm_tdp_mmu_zap_leafs()

    Now kvm_tdp_mmu_zap_leafs() only zaps leaf SPTEs but not any non-root
    pages within that GFN range anymore, so the comment around it isn't
    right.

    Fix it by shifting the comment from tdp_mmu_zap_leafs() instead of
    duplicating it, as tdp_mmu_zap_leafs() is static and is only called by
    kvm_tdp_mmu_zap_leafs().

    Opportunistically tweak the blurb about SPTEs being cleared to (a) say
    "zapped" instead of "cleared" because "cleared" will be wrong if/when
    KVM allows a non-zero value for non-present SPTE (i.e. for Intel TDX),
    and (b) to clarify that a flush is needed if and only if a SPTE has been
    zapped since MMU lock was last acquired.

    Fixes: f47e5bbb ("KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmap")
Suggested-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
    Message-Id: <20220728030452.484261-1-kai.huang@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

2bd6a008

KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog · 731cd46a

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 6fac42f1
Author: Jarkko Sakkinen <jarkko@profian.com>
Date:   Thu Jul 28 08:09:19 2022 +0300

    KVM: SVM: Dump Virtual Machine Save Area (VMSA) to klog

    As Virtual Machine Save Area (VMSA) is essential in troubleshooting
    attestation, dump it to the klog with the KERN_DEBUG level of priority.

    Cc: Jarkko Sakkinen <jarkko@kernel.org>
Suggested-by: Harald Hoyer <harald@profian.com>
Signed-off-by: Jarkko Sakkinen <jarkko@profian.com>
    Message-Id: <20220728050919.24113-1-jarkko@profian.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

731cd46a

KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT · 26f4c5d0

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 6c6ab524
Author: Sean Christopherson <seanjc@google.com>
Date:   Sat Jul 23 01:30:29 2022 +0000

    KVM: x86/mmu: Treat NX as a valid SPTE bit for NPT

    Treat the NX bit as valid when using NPT, as KVM will set the NX bit when
    the NX huge page mitigation is enabled (mindblowing) and trigger the WARN
    that fires on reserved SPTE bits being set.

    KVM has required NX support for SVM since commit b26a71a1 ("KVM: SVM:
    Refuse to load kvm_amd if NX support is not available") for exactly this
    reason, but apparently it never occurred to anyone to actually test NPT
    with the mitigation enabled.

      ------------[ cut here ]------------
      spte = 0x800000018a600ee7, level = 2, rsvd bits = 0x800f0000001fe000
      WARNING: CPU: 152 PID: 15966 at arch/x86/kvm/mmu/spte.c:215 make_spte+0x327/0x340 [kvm]
      Hardware name: Google, Inc. Arcadia_IT_80/Arcadia_IT_80, BIOS 10.48.0 01/27/2022
      RIP: 0010:make_spte+0x327/0x340 [kvm]
      Call Trace:
       <TASK>
       tdp_mmu_map_handle_target_level+0xc3/0x230 [kvm]
       kvm_tdp_mmu_map+0x343/0x3b0 [kvm]
       direct_page_fault+0x1ae/0x2a0 [kvm]
       kvm_tdp_page_fault+0x7d/0x90 [kvm]
       kvm_mmu_page_fault+0xfb/0x2e0 [kvm]
       npf_interception+0x55/0x90 [kvm_amd]
       svm_invoke_exit_handler+0x31/0xf0 [kvm_amd]
       svm_handle_exit+0xf6/0x1d0 [kvm_amd]
       vcpu_enter_guest+0xb6d/0xee0 [kvm]
       ? kvm_pmu_trigger_event+0x6d/0x230 [kvm]
       vcpu_run+0x65/0x2c0 [kvm]
       kvm_arch_vcpu_ioctl_run+0x355/0x610 [kvm]
       kvm_vcpu_ioctl+0x551/0x610 [kvm]
       __se_sys_ioctl+0x77/0xc0
       __x64_sys_ioctl+0x1d/0x20
       do_syscall_64+0x44/0xa0
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
       </TASK>
      ---[ end trace 0000000000000000 ]---

    Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220723013029.1753623-1-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

26f4c5d0

KVM: x86: Do not block APIC write for non ICR registers · e80d53c4

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 1bd9dfec
Author: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Date:   Mon Jul 25 00:33:56 2022 -0500

    KVM: x86: Do not block APIC write for non ICR registers

    The commit 5413bcba ("KVM: x86: Add support for vICR APIC-write
    VM-Exits in x2APIC mode") introduces logic to prevent APIC write
    for offset other than ICR in kvm_apic_write_nodecode() function.
    This breaks x2AVIC support, which requires KVM to trap and emulate
    x2APIC MSR writes.

    Therefore, removes the warning and modify to logic to allow MSR write.

    Fixes: 5413bcba ("KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode")
    Cc: Zeng Guang <guang.zeng@intel.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Message-Id: <20220725053356.4275-1-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

e80d53c4

KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register · a600eac8

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 0a8735a6
Author: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Date:   Sun Jul 24 22:34:28 2022 -0500

    KVM: SVM: Do not virtualize MSR accesses for APIC LVTT register

    AMD does not support APIC TSC-deadline timer mode. AVIC hardware
    will generate GP fault when guest kernel writes 1 to bits [18]
    of the APIC LVTT register (offset 0x32) to set the timer mode.
    (Note: bit 18 is reserved on AMD system).

    Therefore, always intercept and let KVM emulate the MSR accesses.

    Fixes: f3d7c8aa6882 ("KVM: SVM: Fix x2APIC MSRs interception")
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
    Message-Id: <20220725033428.3699-1-suravee.suthikulpanit@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

a600eac8

KVM: selftests: Verify VMX MSRs can be restored to KVM-supported values · a6fa149f

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit ce30d8b9
Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jun 7 21:36:04 2022 +0000

    KVM: selftests: Verify VMX MSRs can be restored to KVM-supported values

    Verify that KVM allows toggling VMX MSR bits to be "more" restrictive,
    and also allows restoring each MSR to KVM's original, less restrictive
    value.

Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220607213604.3346000-16-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

a6fa149f

KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP · 6ab98cc8

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit a910b5ab
Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jun 7 21:36:00 2022 +0000

    KVM: nVMX: Set UMIP bit CR4_FIXED1 MSR when emulating UMIP

    Make UMIP an "allowed-1" bit CR4_FIXED1 MSR when KVM is emulating UMIP.
    KVM emulates UMIP for both L1 and L2, and so should enumerate that L2 is
    allowed to have CR4.UMIP=1.  Not setting the bit doesn't immediately
    break nVMX, as KVM does set/clear the bit in CR4_FIXED1 in response to a
    guest CPUID update, i.e. KVM will correctly (dis)allow nested VM-Entry
    based on whether or not UMIP is exposed to L1.  That said, KVM should
    enumerate the bit as being allowed from time zero, e.g. userspace will
    see the wrong value if the MSR is read before CPUID is written.

    Fixes: 0367f205 ("KVM: vmx: add support for emulating UMIP")
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220607213604.3346000-12-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

6ab98cc8

Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control" · c6e7694c

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 9389d577
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Jul 22 22:44:09 2022 +0000

    Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control"

    This reverts commit 03a8871a.

    Since commit 03a8871a ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
    VM-{Entry,Exit} control"), KVM has taken ownership of the "load
    IA32_PERF_GLOBAL_CTRL" VMX entry/exit control bits, trying to set these
    bits in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS MSRs if the guest's CPUID
    supports the architectural PMU (CPUID[EAX=0Ah].EAX[7:0]=1), and clear
    otherwise.

    This was a misguided attempt at mimicking what commit 5f76f6f5
    ("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled",
    2018-10-01) did for MPX.  However, that commit was a workaround for
    another KVM bug and not something that should be imitated.  Mucking with
    the VMX MSRs creates a subtle, difficult to maintain ABI as KVM must
    ensure that any internal changes, e.g. to how KVM handles _any_ guest
    CPUID changes, yield the same functional result.  Therefore, KVM's policy
    is to let userspace have full control of the guest vCPU model so long
    as the host kernel is not at risk.

    Now that KVM really truly ensures kvm_set_msr() will succeed by loading
    PERF_GLOBAL_CTRL if and only if it exists, revert KVM's misguided and
    roundabout behavior.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    [sean: make it a pure revert]
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220722224409.1336532-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

c6e7694c

KVM: nVMX: Attempt to load PERF_GLOBAL_CTRL on nVMX xfer iff it exists · f3a6888f

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 4496a6f9
Author: Sean Christopherson <seanjc@google.com>
Date: Fri Jul 22 22:44:08 2022 +0000

KVM: nVMX: Attempt to load PERF_GLOBAL_CTRL on nVMX xfer iff it exists

Attempt to load PERF_GLOBAL_CTRL during nested VM-Enter/VM-Exit if and
only if the MSR exists (according to the guest vCPU model). KVM has very
misguided handling of VM_{ENTRY,EXIT}_LOAD_IA32_PERF_GLOBAL_CTRL and
attempts to force the nVMX MSR settings to match the vPMU model, i.e. to
hide/expose the control based on whether or not the MSR exists from the
guest's perspective.

KVM's modifications fail to handle the scenario where the vPMU is hidden
from the guest _after_ being exposed to the guest, e.g. by userspace
doing multiple KVM_SET_CPUID2 calls, which is allowed if done before any
KVM_RUN. nested_vmx_pmu_refresh() is called if and only if there's a
recognized vPMU, i.e. KVM will leave the bits in the allow state and then
ultimately reject the MSR load and WARN.

KVM should not force the VMX MSRs in the first place. KVM taking control
of the MSRs was a misguided attempt at mimicking what commit 5f76f6f5
("KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled",
2018-10-01) did for MPX. However, the MPX commit was a workaround for
another KVM bug and not something that should be imitated (and it should
never been done in the first place).

In other words, KVM's ABI _should_ be that userspace has full control
over the MSRs, at which point triggering the WARN that loading the MSR
must not fail is trivial.

The intent of the WARN is still valid; KVM has consistency checks to
ensure that vmcs12->{guest,host}_ia32_perf_global_ctrl is valid. The
problem is that '0' must be considered a valid value at all times, and so
the simple/obvious solution is to just not actually load the MSR when it
does not exist. It is userspace's responsibility to provide a sane vCPU
model, i.e. KVM is well within its ABI and Intel's VMX architecture to
skip the loads if the MSR does not exist.

Fixes: 03a8871a ("KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL VM-{Entry,Exit} control")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20220722224409.1336532-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

f3a6888f

KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL · 4f1c5a07

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit b663f0b5
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jul 22 22:44:07 2022 +0000

    KVM: VMX: Add helper to check if the guest PMU has PERF_GLOBAL_CTRL

    Add a helper to check of the guest PMU has PERF_GLOBAL_CTRL, which is
    unintuitive _and_ diverges from Intel's architecturally defined behavior.
    Even worse, KVM currently implements the check using two different (but
    equivalent) checks, _and_ there has been at least one attempt to add a
    _third_ flavor.

    Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220722224409.1336532-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

4f1c5a07

KVM: VMX: Mark all PERF_GLOBAL_(OVF)_CTRL bits reserved if there's no vPMU · 7602548a

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 93255bf9
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jul 22 22:44:06 2022 +0000

    KVM: VMX: Mark all PERF_GLOBAL_(OVF)_CTRL bits reserved if there's no vPMU

    Mark all MSR_CORE_PERF_GLOBAL_CTRL and MSR_CORE_PERF_GLOBAL_OVF_CTRL bits
    as reserved if there is no guest vPMU.  The nVMX VM-Entry consistency
    checks do not check for a valid vPMU prior to consuming the masks via
    kvm_valid_perf_global_ctrl(), i.e. may incorrectly allow a non-zero mask
    to be loaded via VM-Enter or VM-Exit (well, attempted to be loaded, the
    actual MSR load will be rejected by intel_is_valid_msr()).

    Fixes: f5132b01 ("KVM: Expose a version 2 architectural PMU to a guests")
    Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220722224409.1336532-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

7602548a

Revert "KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled" · aca9fbec

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111

commit 8805875a
Author: Paolo Bonzini <pbonzini@redhat.com>
Date: Fri Jul 22 05:07:39 2022 -0400

Revert "KVM: nVMX: Do not expose MPX VMX controls when guest MPX disabled"

Since commit 5f76f6f5 ("KVM: nVMX: Do not expose MPX VMX controls
when guest MPX disabled"), KVM has taken ownership of the "load
IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls,
trying to set these bits in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
MSRs if the guest's CPUID supports MPX, and clear otherwise.

The intent of the patch was to apply it to L0 in order to work around
L1 kernels that lack the fix in commit 691bd434 ("kvm: vmx: allow
host to access guest MSR_IA32_BNDCFGS", 2017-07-04): by hiding the
control bits from L0, L1 hides BNDCFGS from KVM_GET_MSR_INDEX_LIST,
and the L1 bug is neutralized even in the lack of commit 691bd434.

This was perhaps a sensible kludge at the time, but a horrible
idea in the long term and in fact it has not been extended to
other CPUID bits like these:

X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE,
VMX_MISC_SAVE_EFER_LMA

X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING,
SECONDARY_EXEC_TSC_SCALING

X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID

X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING

X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA,
VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL

X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES

These days it's sort of common knowledge that any MSR in
KVM_GET_MSR_INDEX_LIST must allow *at least* setting it with KVM_SET_MSR
to a default value, so it is unlikely that something like commit
5f76f6f5 will be needed again. So revert it, at the potential cost
of breaking L1s with a 6 year old kernel. While in principle the L0 owner
doesn't control what runs on L1, such an old hypervisor would probably
have many other bugs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

aca9fbec

KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value · 61be97ec

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit f8ae08f9
Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jun 7 21:35:54 2022 +0000

    KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value

    Restrict the nVMX MSRs based on KVM's config, not based on the guest's
    current config.  Using the guest's config to audit the new config
    prevents userspace from restoring the original config (KVM's config) if
    at any point in the past the guest's config was restricted in any way.

    Fixes: 62cc6b9d ("KVM: nVMX: support restore of VMX capability MSRs")
    Cc: stable@vger.kernel.org
    Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220607213604.3346000-6-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

61be97ec

KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}() · 725e4bd2

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit a645c2b5
Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jun 7 21:35:53 2022 +0000

    KVM: nVMX: Rename handle_vm{on,off}() to handle_vmx{on,off}()

    Rename the exit handlers for VMXON and VMXOFF to match the instruction
    names, the terms "vmon" and "vmoff" are not used anywhere in Intel's
    documentation, nor are they used elsehwere in KVM.

    Sadly, the exit reasons are exposed to userspace and so cannot be renamed
    without breaking userspace. :-(

    Fixes: ec378aee ("KVM: nVMX: Implement VMXON and VMXOFF")
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220607213604.3346000-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

725e4bd2

KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks · ec126cf6

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit ca58f3aa
Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jun 7 21:35:51 2022 +0000

    KVM: nVMX: Account for KVM reserved CR4 bits in consistency checks

    Check that the guest (L2) and host (L1) CR4 values that would be loaded
    by nested VM-Enter and VM-Exit respectively are valid with respect to
    KVM's (L0 host) allowed CR4 bits.  Failure to check KVM reserved bits
    would allow L1 to load an illegal CR4 (or trigger hardware VM-Fail or
    failed VM-Entry) by massaging guest CPUID to allow features that are not
    supported by KVM.  Amusingly, KVM itself is an accomplice in its doom, as
    KVM adjusts L1's MSR_IA32_VMX_CR4_FIXED1 to allow L1 to enable bits for
    L2 based on L1's CPUID model.

    Note, although nested_{guest,host}_cr4_valid() are _currently_ used if
    and only if the vCPU is post-VMXON (nested.vmxon == true), that may not
    be true in the future, e.g. emulating VMXON has a bug where it doesn't
    check the allowed/required CR0/CR4 bits.

    Cc: stable@vger.kernel.org
    Fixes: 3899152c ("KVM: nVMX: fix checks on CR{0,4} during virtual VMX operation")
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220607213604.3346000-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

ec126cf6

KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits · 26a2bd90

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit c33f6f22
Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Jun 7 21:35:50 2022 +0000

    KVM: x86: Split kvm_is_valid_cr4() and export only the non-vendor bits

    Split the common x86 parts of kvm_is_valid_cr4(), i.e. the reserved bits
    checks, into a separate helper, __kvm_is_valid_cr4(), and export only the
    inner helper to vendor code in order to prevent nested VMX from calling
    back into vmx_is_valid_cr4() via kvm_is_valid_cr4().

    On SVM, this is a nop as SVM doesn't place any additional restrictions on
    CR4.

    On VMX, this is also currently a nop, but only because nested VMX is
    missing checks on reserved CR4 bits for nested VM-Enter.  That bug will
    be fixed in a future patch, and could simply use kvm_is_valid_cr4() as-is,
    but nVMX has _another_ bug where VMXON emulation doesn't enforce VMX's
    restrictions on CR0/CR4.  The cleanest and most intuitive way to fix the
    VMXON bug is to use nested_host_cr{0,4}_valid().  If the CR4 variant
    routes through kvm_is_valid_cr4(), using nested_host_cr4_valid() won't do
    the right thing for the VMXON case as vmx_is_valid_cr4() enforces VMX's
    restrictions if and only if the vCPU is post-VMXON.

    Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220607213604.3346000-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

26a2bd90

KVM: selftests: Add an option to run vCPUs while disabling dirty logging · 031da787

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit cfe12e64
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jul 15 23:21:07 2022 +0000

    KVM: selftests: Add an option to run vCPUs while disabling dirty logging

    Add a command line option to dirty_log_perf_test to run vCPUs for the
    entire duration of disabling dirty logging.  By default, the test stops
    running runs vCPUs before disabling dirty logging, which is faster but
    less interesting as it doesn't stress KVM's handling of contention
    between page faults and the zapping of collapsible SPTEs.  Enabling the
    flag also lets the user verify that KVM is indeed rebuilding zapped SPTEs
    as huge pages by checking KVM's pages_{1g,2m,4k} stats.  Without vCPUs to
    fault in the zapped SPTEs, the stats will show that KVM is zapping pages,
    but they never show whether or not KVM actually allows huge pages to be
    recreated.

    Note!  Enabling the flag can _significantly_ increase runtime, especially
    if the thread that's disabling dirty logging doesn't have a dedicated
    pCPU, e.g. if all pCPUs are used to run vCPUs.

Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220715232107.3775620-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

031da787

KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs · 8157c27b

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 85f44f8c
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jul 15 23:21:06 2022 +0000

    KVM: x86/mmu: Don't bottom out on leafs when zapping collapsible SPTEs

    When zapping collapsible SPTEs in the TDP MMU, don't bottom out on a leaf
    SPTE now that KVM doesn't require a PFN to compute the host mapping level,
    i.e. now that there's no need to first find a leaf SPTE and then step
    back up.

    Drop the now unused tdp_iter_step_up(), as it is not the safest of
    helpers (using any of the low level iterators requires some understanding
    of the various side effects).

Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220715232107.3775620-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

8157c27b

KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level() · 84291faf

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit 65e3b446
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jul 15 23:21:05 2022 +0000

    KVM: x86/mmu: Document the "rules" for using host_pfn_mapping_level()

    Add a comment to document how host_pfn_mapping_level() can be used safely,
    as the line between safe and dangerous is quite thin.  E.g. if KVM were
    to ever support in-place promotion to create huge pages, consuming the
    level is safe if the caller holds mmu_lock and checks that there's an
    existing _leaf_ SPTE, but unsafe if the caller only checks that there's a
    non-leaf SPTE.

    Opportunistically tweak the existing comments to explicitly document why
    KVM needs to use READ_ONCE().

    No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-Id: <20220715232107.3775620-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

84291faf

KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs · 0e41be64

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit a8ac499b
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jul 15 23:21:04 2022 +0000

    KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs

    Drop the requirement that a pfn be backed by a refcounted, compound or
    or ZONE_DEVICE, struct page, and instead rely solely on the host page
    tables to identify huge pages.  The PageCompound() check is a remnant of
    an old implementation that identified (well, attempt to identify) huge
    pages without walking the host page tables.  The ZONE_DEVICE check was
    added as an exception to the PageCompound() requirement.  In other words,
    neither check is actually a hard requirement, if the primary has a pfn
    backed with a huge page, then KVM can back the pfn with a huge page
    regardless of the backing store.

    Dropping the @pfn parameter will also allow KVM to query the max host
    mapping level without having to first get the pfn, which is advantageous
    for use outside of the page fault path where KVM wants to take action if
    and only if a page can be mapped huge, i.e. avoids the pfn lookup for
    gfns that can't be backed with a huge page.

    Cc: Mingwei Zhang <mizhang@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Mingwei Zhang <mizhang@google.com>
    Message-Id: <20220715232107.3775620-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

0e41be64

KVM: x86/mmu: Restrict mapping level based on guest MTRR iff they're used · 5be6c105

Vitaly Kuznetsov authored 2 years ago

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2119111



commit d5e90a69
Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jul 15 23:00:16 2022 +0000

    KVM: x86/mmu: Restrict mapping level based on guest MTRR iff they're used

    Restrict the mapping level for SPTEs based on the guest MTRRs if and only
    if KVM may actually use the guest MTRRs to compute the "real" memtype.
    For all forms of paging, guest MTRRs are purely virtual in the sense that
    they are completely ignored by hardware, i.e. they affect the memtype
    only if software manually consumes them.  The only scenario where KVM
    consumes the guest MTRRs is when shadow_memtype_mask is non-zero and the
    guest has non-coherent DMA, in all other cases KVM simply leaves the PAT
    field in SPTEs as '0' to encode WB memtype.

    Note, KVM may still ultimately ignore guest MTRRs, e.g. if the backing
    pfn is host MMIO, but false positives are ok as they only cause a slight
    performance blip (unless the guest is doing weird things with its MTRRs,
    which is extremely unlikely).

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
    Message-Id: <20220715230016.3762909-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

5be6c105