kdump is part of the kexec-tools package which provides the kexec binary that facilitates a new kernel to boot using the kernel’s kexec feature either on a normal or a panic reboot. With the help of kdump, kexec and a debug kernel, one can have a much higher chance of finding out why the kernel failed. When a kernel panic occurs, kexec loads a new kernel which collects the crash data and saves it in a special log file which helps troubleshooting the failure. This guide shows you how to configure kdump for CentOS 6, but it should also apply to Red Hat Enterprise Linux and Fedora. This CentOS installation is a guest OS under VirtualBox 4.2.8 and it has the latest kernel installed (as of today).
First some info about the machinecat /etc/redhat-release CentOS release 6.3 (Final) uname -r 2.6.32-279.22.1.el6.x86_64 rpm -qa | grep `uname -r` kernel-2.6.32-279.22.1.el6.x86_64 kernel-headers-2.6.32-279.22.1.el6.x86_64 kernel-devel-2.6.32-279.22.1.el6.x86_64
Install the required packagesyum --enablerepo=debug install kexec-tools crash kernel-debug kernel-debuginfo-`uname -r` This will install all required packages and dependencies. Make sure you use `uname -r` or $(uname -r) when installing the debuginfo rpms, otherwise yum could install the latest packages available under the debug repository and not those needed for your kernel version. Also note that kernel-debuginfo is quite large in size (1.5-1.7GB installed) so check your free disk space before the installation.
Modify grubA kernel argument must be added to /etc/grub.conf to enable kdump. It’s called crashkernel and it can be either auto or set as a predefined value e.g. 128M, 256M, 512M etc. These values define the amount of memory reserved for the capture kernel. I chose 128M for my testing. title CentOS (2.6.32-279.22.1.el6.x86_64.debug) root (hd0,0) kernel /vmlinuz-2.6.32-279.22.1.el6.x86_64.debug ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_centos6/lv_swap rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_centos6/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM crashkernel=128M initrd /initramfs-2.6.32-279.22.1.el6.x86_64.debug.img title CentOS (2.6.32-279.22.1.el6.x86_64) root (hd0,0) kernel /vmlinuz-2.6.32-279.22.1.el6.x86_64 ro root=/dev/mapper/vg_centos6-lv_root rd_NO_LUKS LANG=en_US.UTF-8 rd_LVM_LV=vg_centos6/lv_swap rd_NO_MD SYSFONT=latarcyrheb-sun16 rd_LVM_LV=vg_centos6/lv_root KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM crashkernel=128M initrd /initramfs-2.6.32-279.22.1.el6.x86_64.img
Enable kdumpchkconfig kdump on service kdump start No kdump initial ramdisk found. [WARNING] Rebuilding /boot/initrd-2.6.32-279.22.1.el6.x86_64kdump.img Starting kdump: [ OK ] After this step a reboot is required in order to boot the kernel with the new argument. shutdown -r now
Confirm kdump is activeservice kdump status Kdump is operational cat /sys/kernel/kexec_crash_loaded 1 cat /proc/iomem | grep Crash 03000000-12ffffff : Crash kernel
Test kdump i.e. trigger a kernel crash### Clearly you shouldn’t do this on a production machine! ### echo 1 > /proc/sys/kernel/sysrq echo c > /proc/sysrq-trigger
The kernel panic should happen instantly. In theory the debug kernel is loaded by kexec and gathers the crash data. After that the machine will boot into the default kernel. In practice this doesn’t always happen. You may need to tweak the configuration files (/etc/kdump.conf and /etc/sysconfig/kdump) or try different crashkernel options in grub. There could also be issues with the debug kernel and some existing kernel modules (e.g. megaraid) so you might need to explicitly add those to the extra_modules line in /etc/kdump.conf or prevent them from being added to initrd by using the mkdumprd utility (and its omit-raid-modules option).
Analysing the log fileThe default path to store the log file is under /var/crash. With the help of the crash utility you can try to investigate what happened. Most data is pretty cryptic, but with the help of the built-in commands you can at least get some idea of what went wrong. crash /usr/lib/debug/lib/modules/2.6.32-279.22.1.el6.x86_64/vmlinux /var/crash/127.0.0.1-2013-03-03-20\:14\:21/vmcore KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.22.1.el6.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2013-03-03-20:14:21/vmcore [PARTIAL DUMP] CPUS: 2 DATE: Sun Mar 3 20:13:14 2013 UPTIME: 00:00:56 LOAD AVERAGE: 0.08, 0.03, 0.01 TASKS: 188 NODENAME: centos6.3 RELEASE: 2.6.32-279.22.1.el6.x86_64 VERSION: #1 SMP Wed Feb 6 03:10:46 UTC 2013 MACHINE: x86_64 (2467 Mhz) MEMORY: 4 GB PANIC: "Oops: 0002 [#1] SMP " (check log for details) PID: 8473 COMMAND: "bash" TASK: ffff88011b550040 [THREAD_INFO: ffff880119322000] CPU: 0 STATE: TASK_RUNNING (PANIC) In my case the issue was quite easy to spot as the log command from the crash tool exposed the SysRq triggered crash: SysRq : Trigger a crash The bt command also revealed the same thing: KERNEL-MODE EXCEPTION FRAME AT: ffff8801193238d8 [exception RIP: sysrq_handle_crash+22] RIP: ffffffff81321d66 RSP: ffff880119323e18 RFLAGS: 00010096 RAX: 0000000000000010 RBX: 0000000000000063 RCX: 0000000000002388 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000063 RBP: ffff880119323e18 R8: 0000000000000000 R9: ffffffff8163ac60 R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000 R13: ffffffff81afb7a0 R14: 0000000000000286 R15: 0000000000000004 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 There are other commands that you can run with the crash utility, type help inside the crash prompt to get the full list. See also some screens while booting the crash kernel after the panic.
Related Posts:
|
|