Friday, 23 September 2011

kexec/kdump

kexec/kdump






The issue


Kexec is a fastboot mechanism that allows booting a Linux kernel from the context of an already running kernel without going through the BIOS. The BIOS can be very time consuming, especially on big servers with numerous peripherals. This can save a lot of time for developers who end up booting a machine numerous times.

Kdump is a new kernel crash dumping mechanism and is very reliable. The crash dump is captured from the context of a freshly booted kernel and not from the context of the crashed kernel. Kdump uses kexec to boot into a second kernel whenever the system crashes. This second kernel, often called a capture kernel, boots with very little memory and captures the dump image.

Kexec enables booting the capture kernel without going through BIOS hence the contents of the first kernel's memory are preserved, which is essentially the kernel crash dump.

Packages


The package kexec-tools is added to the LBG Build 2010 Activation Key

Configuration file 1


/etc/kdump.conf

added as a configuration file to Configuration Channel LBG Build 21062010

# kdump.conf

ext3 /dev/mapper/rootvg-crashlv
core_collector makedumpfile –c -d 20

The first line mounts the crashlv logical volume to enable the core dump to be written to the filesystem /var/crash

The second line discards pages that are not needed and vastly reduces the size of the resultant file.

Configuration file 2


The configuration file

/boot/grub/grub.conf

requires a parameter to be amended to the end of the kernel statement which reserves an area of memory for the new kernel to be booted from and not booting via the BIOS.

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/rootvg/rootlv
#          initrd /initrd-version.img
#boot=/dev/sda
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Red Hat Enterprise Linux Server (2.6.18-194.el5)
        root (hd0,0)
        kernel /vmlinuz-2.6.18-194.el5 ro root=/dev/rootvg/rootlv crashkernel=128M@16M
        initrd /initrd-2.6.18-194.el5.img

On reboot during installation the kdump kernel is created in /boot and will have the format

initrd<kernel version>kdump.img

Configuration file 3


Further to previous discussions on the point of kernel panic, I have added the following setting to /etc/sysctl.conf to be applied at boot time, and have uploaded this as a configuration file to Satellite Server. We did discuss in relation core dump enablement.

# Amount of time (in seconds) kernel will wait before rebooting if it reaches a "kernel panic"
# Setting of zero (0) seconds will disable rebooting on kernel panic
# Defaut is 0
kernel.panic = 180

# Kernel oops occurs when kernel code makes an invalid access to memory
# If panic_on_oops=1, then kernel will panic, else if panic_on_oops=0, kernel will try to continue running.
# Default is 1
kernel.panic_on_oops = 0


Kickstart Post script


The filesystem where the vmcore file will be saved to in the event of a crash needs to be unmounted.  To handle this the logical volume is not added to the Partition details of the kickstart file but as commands to create it as part of the Post install script.

/sbin/chkconfig kdump on    [start the kdump service]
/usr/sbin/lvcreate --size 1000 --name crashlv rootvg
/sbin/mkfs.ext3 /dev/rootvg/crashlv
yum update –y           [performs yum update including the kernel]


Testing


[root@localhost ~]# echo "c" > /proc/sysrq-trigger

The kernel will be loaded into the memory space set aside, mount crashlv and store the vmcore here.  The system will then reboot with the existing kernel.

Once the system is back up again we should can confirm the dump file has been created by mounting crashlv.  In the event of this happening in a production environment the vmcore file should be sent to red hat on the back of a support case.
Prior to sending the file we can analyse the dump information first. 
From the directory where the dump file has been created run the following:-

# /usr/lib/debug/lib/modules/2.6.18-194.3.1.el5/vmlinux vmcore

The following packages need to be installed before this can happen

kernel-debuginfo-common
kernel-debuginfo
kernel-debug-debuginfo

These can be uploaded to your server from there same location as the iso file on the shared drive or alternatively from the ftp site:-


The following knowledgebase doc gives further information on what can be done to analyse the vmcore file

How can I use crash to send Red Hat some vmcore pre-analysis information before or while uploading the vmcore image?


Initial problem
make a dummy change to /etc/kdump.conf.  At the moment this is required as there is currently a bug with this process.  A full dump will be taken (i.e. the entire contents of memory and there will not be enough room to store this in the crashlv logical volume.  Rhys Oxenham has raised this internally within red hat.)

# touch /etc/kdump.conf

then restart the service

[root@localhost ~]# service kdump restart

Stopping kdump:                                            [  OK  ]
Detected change(s) the following file(s):
 
  /etc/kdump.conf
Rebuilding /boot/initrd-2.6.18-194.3.1.el5kdump.img
Warning: There is not enough space to save a vmcore.
         The size of /dev/mapper/rootvg-crashlv should be much greater than 8044268 kilo bytes.
Starting kdump:                                            [  OK  ]

#### end of document #####


No comments:

Post a Comment