Skip to content
Snippets Groups Projects
verify-bugs-and-bisect-regressions.rst 80.9 KiB
Newer Older
.. SPDX-License-Identifier: (GPL-2.0+ OR CC-BY-4.0)
.. [see the bottom of this file for redistribution information]

=========================================
How to verify bugs and bisect regressions
=========================================

This document describes how to check if some Linux kernel problem occurs in code
currently supported by developers -- to then explain how to locate the change
causing the issue, if it is a regression (e.g. did not happen with earlier
versions).

The text aims at people running kernels from mainstream Linux distributions on
commodity hardware who want to report a kernel bug to the upstream Linux
developers. Despite this intent, the instructions work just as well for users
who are already familiar with building their own kernels: they help avoid
mistakes occasionally made even by experienced developers.

..
   Note: if you see this note, you are reading the text's source file. You
   might want to switch to a rendered version: it makes it a lot easier to
   read and navigate this document -- especially when you want to look something
   up in the reference section, then jump back to where you left off.
..
   Find the latest rendered version of this text here:
   https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.rst.html

The essence of the process (aka 'TL;DR')
========================================

*[If you are new to building or bisecting Linux, ignore this section and head
over to the* ":ref:`step-by-step guide<introguide_bissbs>`" *below. It utilizes
the same commands as this section while describing them in brief fashion. The
steps are nevertheless easy to follow and together with accompanying entries
in a reference section mention many alternatives, pitfalls, and additional
aspects, all of which might be essential in your present case.]*

**In case you want to check if a bug is present in code currently supported by
developers**, execute just the *preparations* and *segment 1*; while doing so,
consider the newest Linux kernel you regularly use to be the 'working' kernel.
In the following example that's assumed to be 6.0.13, which is why the sources
of v6.0 will be used to prepare the .config file.

**In case you face a regression**, follow the steps at least till the end of
*segment 2*. Then you can submit a preliminary report -- or continue with
*segment 3*, which describes how to perform a bisection needed for a
full-fledged regression report. In the following example 6.0.13 is assumed to be
the 'working' kernel and 6.1.5 to be the first 'broken', which is why v6.0
will be considered the 'good' release and used to prepare the .config file.

* **Preparations**: set up everything to build your own kernels::

    # * Remove any software that depends on externally maintained kernel modules
    #   or builds any automatically during bootup.
    # * Ensure Secure Boot permits booting self-compiled Linux kernels.
    # * If you are not already running the 'working' kernel, reboot into it.
    # * Install compilers and everything else needed for building Linux.
    # * Ensure to have 15 Gigabyte free space in your home directory.
    git clone -o mainline --no-checkout \
      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/
    cd ~/linux/
    git remote add -t master stable \
      https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git
    git checkout --detach v6.0
    # * Hint: if you used an existing clone, ensure no stale .config is around.
    make olddefconfig
    # * Ensure the former command picked the .config of the 'working' kernel.
    # * Connect external hardware (USB keys, tokens, ...), start a VM, bring up
    #   VPNs, mount network shares, and briefly try the feature that is broken.
    yes '' | make localmodconfig
    ./scripts/config --set-str CONFIG_LOCALVERSION '-local'
    ./scripts/config -e CONFIG_LOCALVERSION_AUTO
    # * Note, when short on storage space, check the guide for an alternative:
    ./scripts/config -d DEBUG_INFO_NONE -e KALLSYMS_ALL -e DEBUG_KERNEL \
      -e DEBUG_INFO -e DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -e KALLSYMS
    # * Hint: at this point you might want to adjust the build configuration;
    #   you'll have to, if you are running Debian.
    make olddefconfig
    cp .config ~/kernel-config-working

* **Segment 1**: build a kernel from the latest mainline codebase.

  This among others checks if the problem was fixed already and which developers
  later need to be told about the problem; in case of a regression, this rules
  out a .config change as root of the problem.

  a) Checking out latest mainline code::

       cd ~/linux/
       git checkout --force --detach mainline/master

  b) Build, install, and boot a kernel::

       cp ~/kernel-config-working .config
       make olddefconfig
       make -j $(nproc --all)
       # * Make sure there is enough disk space to hold another kernel:
       df -h /boot/ /lib/modules/
       # * Note: on Arch Linux, its derivatives and a few other distributions
       #   the following commands will do nothing at all or only part of the
       #   job. See the step-by-step guide for further details.
       command -v installkernel && sudo make modules_install install
       # * Check how much space your self-built kernel actually needs, which
       #   enables you to make better estimates later:
       du -ch /boot/*$(make -s kernelrelease)* | tail -n 1
       du -sh /lib/modules/$(make -s kernelrelease)/
       # * Hint: the output of the following command will help you pick the
       #   right kernel from the boot menu:
       make -s kernelrelease | tee -a ~/kernels-built
       reboot
       # * Once booted, ensure you are running the kernel you just built by
       #   checking if the output of the next two commands matches:
       tail -n 1 ~/kernels-built
       uname -r

  c) Check if the problem occurs with this kernel as well.

* **Segment 2**: ensure the 'good' kernel is also a 'working' kernel.

  This among others verifies the trimmed .config file actually works well, as
  bisecting with it otherwise would be a waste of time:

  a) Start by checking out the sources of the 'good' version::

       cd ~/linux/
       git checkout --force --detach v6.0

  b) Build, install, and boot a kernel as described earlier in *segment 1,
     section b* -- just feel free to skip the 'du' commands, as you have a rough
     estimate already.

  c) Ensure the feature that regressed with the 'broken' kernel actually works
     with this one.

* **Segment 3**: perform and validate the bisection.

  a) In case your 'broken' version is a stable/longterm release, add the Git
     branch holding it::

       git remote set-branches --add stable linux-6.1.y
       git fetch stable

  b) Initialize the bisection::

       cd ~/linux/
       git bisect start
       git bisect good v6.0
       git bisect bad v6.1.5

  c) Build, install, and boot a kernel as described earlier in *segment 1,
     section b*.

     In case building or booting the kernel fails for unrelated reasons, run
     ``git bisect skip``. In all other outcomes, check if the regressed feature
     works with the newly built kernel. If it does, tell Git by executing
     ``git bisect good``; if it does not, run ``git bisect bad`` instead.

     All three commands will make Git checkout another commit; then re-execute
     this step (e.g. build, install, boot, and test a kernel to then tell Git
     the outcome). Do so again and again until Git shows which commit broke
     things. If you run short of disk space during this process, check the
     "Supplementary tasks" section below.

  d) Once your finished the bisection, put a few things away::

       cd ~/linux/
       git bisect log > ~/bisect-log
       cp .config ~/bisection-config-culprit
       git bisect reset

  e) Try to verify the bisection result::

       git checkout --force --detach mainline/master
       git revert --no-edit cafec0cacaca0

    This is optional, as some commits are impossible to revert. But if the
    second command worked flawlessly, build, install, and boot one more kernel
    kernel, which should not show the regression.

* **Supplementary tasks**: cleanup during and after the process.

  a) To avoid running out of disk space during a bisection, you might need to
     remove some kernels you built earlier. You most likely want to keep those
     you built during segment 1 and 2 around for a while, but you will most
     likely no longer need kernels tested during the actual bisection
     (Segment 3 c). You can list them in build order using::

       ls -ltr /lib/modules/*-local*

    To then for example erase a kernel that identifies itself as
    '6.0-rc1-local-gcafec0cacaca0', use this::

       sudo rm -rf /lib/modules/6.0-rc1-local-gcafec0cacaca0
       sudo kernel-install -v remove 6.0-rc1-local-gcafec0cacaca0
       # * Note, if kernel-install is missing, you will have to
       #   manually remove the kernel image and related files.

  b) If you performed a bisection and successfully validated the result, feel
     free to remove all kernels built during the actual bisection (Segment 3 c);
     the kernels you built earlier and later you might want to keep around for
Loading
Loading full blame...