-
Thorsten Leemhuis authored
Add and fetch all required stable branches ahead of time. This fixes a bug, as readers that wanted to bisect a regression within a stable or longterm series otherwise did not have them available at the right time. This way also matches the flow somewhat better and avoids some "if you haven't already added it" phrases that otherwise become necessary in future changes. Signed-off-by:
Thorsten Leemhuis <linux@leemhuis.info> Signed-off-by:
Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/57dcf312959476abe6151bf3d35eb79e3e9a83d1.1712647788.git.linux@leemhuis.info
932c9a53
How to verify bugs and bisect regressions
This document describes how to check if some Linux kernel problem occurs in code currently supported by developers -- to then explain how to locate the change causing the issue, if it is a regression (e.g. did not happen with earlier versions).
The text aims at people running kernels from mainstream Linux distributions on commodity hardware who want to report a kernel bug to the upstream Linux developers. Despite this intent, the instructions work just as well for users who are already familiar with building their own kernels: they help avoid mistakes occasionally made even by experienced developers.
The essence of the process (aka 'TL;DR')
[If you are new to building or bisecting Linux, ignore this section and head over to the ":ref:`step-by-step guide<introguide_bissbs>`" below. It utilizes the same commands as this section while describing them in brief fashion. The steps are nevertheless easy to follow and together with accompanying entries in a reference section mention many alternatives, pitfalls, and additional aspects, all of which might be essential in your present case.]
In case you want to check if a bug is present in code currently supported by developers, execute just the preparations and segment 1; while doing so, consider the newest Linux kernel you regularly use to be the 'working' kernel. In the following example that's assumed to be 6.0, which is why its sources will be used to prepare the .config file.
In case you face a regression, follow the steps at least till the end of segment 2. Then you can submit a preliminary report -- or continue with segment 3, which describes how to perform a bisection needed for a full-fledged regression report. In the following example 6.0.13 is assumed to be the 'working' kernel and 6.1.5 to be the first 'broken', which is why 6.0 will be considered the 'good' release and used to prepare the .config file.
-
Preparations: set up everything to build your own kernels:
# * Remove any software that depends on externally maintained kernel modules # or builds any automatically during bootup. # * Ensure Secure Boot permits booting self-compiled Linux kernels. # * If you are not already running the 'working' kernel, reboot into it. # * Install compilers and everything else needed for building Linux. # * Ensure to have 15 Gigabyte free space in your home directory. git clone -o mainline --no-checkout \ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git ~/linux/ cd ~/linux/ git remote add -t master stable \ https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git git switch --detach v6.0 # * Hint: if you used an existing clone, ensure no stale .config is around. make olddefconfig # * Ensure the former command picked the .config of the 'working' kernel. # * Connect external hardware (USB keys, tokens, ...), start a VM, bring up # VPNs, mount network shares, and briefly try the feature that is broken. yes '' | make localmodconfig ./scripts/config --set-str CONFIG_LOCALVERSION '-local' ./scripts/config -e CONFIG_LOCALVERSION_AUTO # * Note, when short on storage space, check the guide for an alternative: ./scripts/config -d DEBUG_INFO_NONE -e KALLSYMS_ALL -e DEBUG_KERNEL \ -e DEBUG_INFO -e DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT -e KALLSYMS # * Hint: at this point you might want to adjust the build configuration; # you'll have to, if you are running Debian. make olddefconfig cp .config ~/kernel-config-working
-
Segment 1: build a kernel from the latest mainline codebase.
This among others checks if the problem was fixed already and which developers later need to be told about the problem; in case of a regression, this rules out a .config change as root of the problem.
-
Checking out latest mainline code:
cd ~/linux/ git switch --discard-changes --detach mainline/master
-
Build, install, and boot a kernel:
cp ~/kernel-config-working .config make olddefconfig make -j $(nproc --all) # * Make sure there is enough disk space to hold another kernel: df -h /boot/ /lib/modules/ # * Note: on Arch Linux, its derivatives and a few other distributions # the following commands will do nothing at all or only part of the # job. See the step-by-step guide for further details. sudo make modules_install command -v installkernel && sudo make install # * Check how much space your self-built kernel actually needs, which # enables you to make better estimates later: du -ch /boot/*$(make -s kernelrelease)* | tail -n 1 du -sh /lib/modules/$(make -s kernelrelease)/ # * Hint: the output of the following command will help you pick the # right kernel from the boot menu: make -s kernelrelease | tee -a ~/kernels-built reboot # * Once booted, ensure you are running the kernel you just built by # checking if the output of the next two commands matches: tail -n 1 ~/kernels-built uname -r cat /proc/sys/kernel/tainted
-
Check if the problem occurs with this kernel as well.
-
-
Segment 2: ensure the 'good' kernel is also a 'working' kernel.
This among others verifies the trimmed .config file actually works well, as bisecting with it otherwise would be a waste of time:
-
Start by checking out the sources of the 'good' version:
cd ~/linux/ git switch --discard-changes --detach v6.0
-
Build, install, and boot a kernel as described earlier in segment 1, section b -- just feel free to skip the 'du' commands, as you have a rough estimate already.
-
Ensure the feature that regressed with the 'broken' kernel actually works with this one.
-
-
Segment 3: perform and validate the bisection.
-
Retrieve the sources for your 'bad' version:
git remote set-branches --add stable linux-6.1.y git fetch stable
-
Initialize the bisection:
cd ~/linux/ git bisect start git bisect good v6.0 git bisect bad v6.1.5
-
Build, install, and boot a kernel as described earlier in segment 1, section b.
In case building or booting the kernel fails for unrelated reasons, run
git bisect skip
. In all other outcomes, check if the regressed feature works with the newly built kernel. If it does, tell Git by executinggit bisect good
; if it does not, rungit bisect bad
instead.All three commands will make Git check out another commit; then re-execute this step (e.g. build, install, boot, and test a kernel to then tell Git the outcome). Do so again and again until Git shows which commit broke things. If you run short of disk space during this process, check the section 'Supplementary tasks: cleanup during and after the process' below.
-
Once your finished the bisection, put a few things away:
cd ~/linux/ git bisect log > ~/bisect-log cp .config ~/bisection-config-culprit git bisect reset
-
Try to verify the bisection result:
git switch --discard-changes --detach mainline/master git revert --no-edit cafec0cacaca0 cp ~/kernel-config-working .config ./scripts/config --set-str CONFIG_LOCALVERSION '-local-cafec0cacaca0-reverted'
This is optional, as some commits are impossible to revert. But if the second command worked flawlessly, build, install, and boot one more kernel kernel; just this time skip the first command copying the base .config file over, as that already has been taken care off.
-
-
Supplementary tasks: cleanup during and after the process.
-
To avoid running out of disk space during a bisection, you might need to remove some kernels you built earlier. You most likely want to keep those you built during segment 1 and 2 around for a while, but you will most likely no longer need kernels tested during the actual bisection (Segment 3 c). You can list them in build order using:
ls -ltr /lib/modules/*-local*
To then for example erase a kernel that identifies itself as '6.0-rc1-local-gcafec0cacaca0', use this:
sudo rm -rf /lib/modules/6.0-rc1-local-gcafec0cacaca0 sudo kernel-install -v remove 6.0-rc1-local-gcafec0cacaca0 # * Note, on some distributions kernel-install is missing # or does only part of the job.
- If you performed a bisection and successfully validated the result, feel free to remove all kernels built during the actual bisection (Segment 3 c); the kernels you built earlier and later you might want to keep around for a week or two.
-