diff --git a/README.md b/README.md index d357ceddf09bbd9058a9969e9242149524e19416..43355b6bf3b45e7f3ff49bfa9fffe54cfb5448cf 100644 --- a/README.md +++ b/README.md @@ -83,6 +83,7 @@ Tools: - tools/[memleak](tools/memleak.py): Display outstanding memory allocations to find memory leaks. [Examples](tools/memleak_examples.txt). - tools/[offcputime](tools/offcputime.py): Summarize off-CPU time by kernel stack trace. [Examples](tools/offcputime_example.txt). - tools/[offwaketime](tools/offwaketime.py): Summarize blocked time by kernel off-CPU stack and waker stack. [Examples](tools/offwaketime_example.txt). +- tools/[oomkill](tools/oomkill.py): Trace the out-of-memory (OOM) killer. [Examples](tools/oomkill_example.txt). - tools/[opensnoop](tools/opensnoop.py): Trace open() syscalls. [Examples](tools/opensnoop_example.txt). - tools/[pidpersec](tools/pidpersec.py): Count new processes (via fork). [Examples](tools/pidpersec_example.txt). - tools/[runqlat](tools/runqlat.py): Run queue (scheduler) latency as a histogram. [Examples](tools/runqlat_example.txt). diff --git a/man/man8/oomkill.8 b/man/man8/oomkill.8 new file mode 100644 index 0000000000000000000000000000000000000000..8ddb42f89be04d0bfe08b77756231804ef667047 --- /dev/null +++ b/man/man8/oomkill.8 @@ -0,0 +1,55 @@ +.TH oomkill 8 "2016-02-09" "USER COMMANDS" +.SH NAME +oomkill \- Trace oom_kill_process(). Uses Linux eBPF/bcc. +.SH SYNOPSIS +.B bashreadline +.SH DESCRIPTION +This traces the kernel out-of-memory killer, and prints basic details, +including the system load averages at the time of the OOM kill. This can +provide more context on the system state at the time: was it getting busier +or steady, based on the load averages? This tool may also be useful to +customize for investigations; for example, by adding other task_struct +details at the time of OOM. + +This program is also a basic example of eBPF/bcc. + +Since this uses BPF, only the root user can use this tool. +.SH REQUIREMENTS +CONFIG_BPF and bcc. +.SH EXAMPLES +.TP +Trace OOM kill events: +# +.B oomkill +.SH FIELDS +.TP +Triggered by ... +The process ID and process name of the task that was running when another task was OOM +killed. +.TP +OOM kill of ... +The process ID and name of the target process that was OOM killed. +.TP +loadavg +Contents of /proc/loadavg. The first three numbers are 1, 5, and 15 minute +load averages (where the average is an exponentially damped moving sum, and +those numbers are constants in the equation); then there is the number of +running tasks, a slash, and the total number of tasks; and then the last number +is the last PID to be created. +.SH OVERHEAD +Negligible. +.SH SOURCE +This is from bcc. +.IP +https://github.com/iovisor/bcc +.PP +Also look in the bcc distribution for a companion _examples.txt file containing +example usage, output, and commentary for this tool. +.SH OS +Linux +.SH STABILITY +Unstable - in development. +.SH AUTHOR +Brendan Gregg +.SH SEE ALSO +memleak(8) diff --git a/tools/oomkill.py b/tools/oomkill.py new file mode 100755 index 0000000000000000000000000000000000000000..3d9fda3531703d19826274d790e0e01cbd4ec953 --- /dev/null +++ b/tools/oomkill.py @@ -0,0 +1,42 @@ +#!/usr/bin/env python +# +# oomkill Trace oom_kill_process(). For Linux, uses BCC, eBPF. +# +# This traces the kernel out-of-memory killer, and prints basic details, +# including the system load averages. This can provide more context on the +# system state at the time of OOM: was it getting busier or steady, based +# on the load averages? This tool may also be useful to customize for +# investigations; for example, by adding other task_struct details at the time +# of OOM. +# +# Copyright 2016 Netflix, Inc. +# Licensed under the Apache License, Version 2.0 (the "License") +# +# 09-Feb-2016 Brendan Gregg Created this. + +from bcc import BPF +from time import strftime + +# linux stats +loadavg = "/proc/loadavg" + +# initialize BPF +b = BPF(text=""" +#include <uapi/linux/ptrace.h> +#include <linux/oom.h> +void kprobe__oom_kill_process(struct pt_regs *ctx, struct oom_control *oc, + struct task_struct *p, unsigned int points, unsigned long totalpages) +{ + bpf_trace_printk("OOM kill of PID %d (\\"%s\\"), %d pages\\n", p->pid, + p->comm, totalpages); +} +""") + +# print output +print("Tracing oom_kill_process()... Ctrl-C to end.") +while 1: + (task, pid, cpu, flags, ts, msg) = b.trace_fields() + with open(loadavg) as stats: + avgline = stats.read().rstrip() + print("%s Triggered by PID %d (\"%s\"), %s, loadavg: %s" % ( + strftime("%H:%M:%S"), pid, task, msg, avgline)) diff --git a/tools/oomkill_example.txt b/tools/oomkill_example.txt new file mode 100644 index 0000000000000000000000000000000000000000..ceeb1b700db51b8fe8b031de0f38d24c5d183dc1 --- /dev/null +++ b/tools/oomkill_example.txt @@ -0,0 +1,39 @@ +Demonstrations of oomkill, the Linux eBPF/bcc version. + + +oomkill is a simple program that traces the Linux out-of-memory (OOM) killer, +and shows basic details on one line per OOM kill: + +# ./oomkill +Tracing oom_kill_process()... Ctrl-C to end. +21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724 +21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932 + +The first line shows that PID 22516, with process name "perl", was OOM killed +when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill +happened to be triggered by PID 3297, process name "ntpd", doing some memory +allocation. + +The system log (dmesg) shows pages of details and system context about an OOM +kill. What it currently lacks, however, is context on how the system had been +changing over time. I've seen OOM kills where I wanted to know if the system +was at steady state at the time, or if there had been a recent increase in +workload that triggered the OOM event. oomkill provides some context: at the +end of the line is the load average information from /proc/loadavg. For both +of the oomkills here, we can see that the system was getting busier at the +time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average" +of 0.30). + +oomkill can also be the basis of other tools and customizations. For example, +you can edit it to include other task_struct details from the target PID at +the time of the OOM kill. + + +The following commands can be used to test this program, and invoke a memory +consuming process that exhausts system memory and is OOM killed: + +sysctl -w vm.overcommit_memory=1 # always overcommit +perl -e 'while (1) { $a .= "A" x 1024; }' # eat all memory + +WARNING: This exhausts system memory after disabling some overcommit checks. +Only test in a lab environment.