Core dump 介绍 - 炽翼铁冰's Blog

kokogood 2011-04-18

展开全文

Core dump 介绍

炽翼铁冰posted @ 2009年12月16日 23:21 in Linux(转载) , 961 阅读

文章一：Core Dump技术介绍

在开发和使用Linux程序时，我们最怕的就是莫过于程序莫明其妙的当掉了，虽然对整个系统来说是没有什么影响，但对于程序使用尤其是程序开发者来说，这显然是难以忍受的，有这样一种技术，操作系统把程序当掉时的内容保存起来，让我们在程序开发时作一个调试参考。
1、什么是core dump?
Core,即core memory, 指由一系列小当纳圈形的磁性材料组成的存储器，这里不过是沿用了这一称呼,而dump就是堆放的意思。core dump又叫核心转储, 当程序运行过程中发生异常, 程序异常退出时, 由操作系统把程序当前的内存状况存储在一个core文件中, 叫core dump.
2、如何打开core dump支持？
有的操作系统并没有默认打开core dump支持，需要用ulimit -c unlimited语句进行设置，core文件生成的位置一般在程序运行的当前目录下，文件名为core.进程号(当然不同的系统也许有所不同，可以查看相手册对路径和文件名进行设置).
3、Core dump的使用方法
首先应该在用gcc进行编译时选择-g选项，以便起动debug支持，生成可执行文件时ex，./ex运行可执行文件，如果程序当掉，则会生成一个core文件，假设为core.1568,则gdb ex core.1568进入gdb,然后再用where命令进行查看即可。
4、举例说明
假设我的代码main.c为：

#include <stdio.h>
int div(int i, int j)
{
return i / j;
}
int main()
{
int i = 2;
int j = 0;
printf("%d ", div(i, j));
return 0;
}

显然有一个被零除的错误，用gcc –g main.c –o main进行编译，然后./main执行，不可避免的程序要down掉，然后用gdb main core文件名进行分析，然后你就能看到分析结果，采用这种方法，可以找到程序运行的大部分bug.
5、其它-用kill命令查询信号的名称和值
有时候core dump显示的结果是一些信号的值，这时需要对这些值进行查询，这时要用到kill命令。
Kill –l   :列出所有信号的名称和值
Kill –l val:查询值为val的信号名称
Kill –l signame: 查询signame信号的值。
本文总结了core dump技术的基本使用方法，并进行了举例，对于初学Linux程序开发者来说，多多少少会有点帮助。

备注：此次 kill -l 8 得到 FPE，
在终端下：man 7 signal 查得：SIGFPE        8       Core    Floating point exception

附：

Standard Signals
Linux supports the standard signals listed below. Several signal numbers are architecture dependent, as indicated in
the "Value" column. (Where three values are given, the first one is usually valid for alpha and sparc, the middle one
for i386, ppc and sh, and the last one for mips. A - denotes that a signal is absent on the corresponding architec‐
ture.)

First the signals described in the original POSIX.1-1990 standard.

Signal Value Action Comment
-------------------------------------------------------------------------
SIGHUP 1 Term Hangup detected on controlling terminal
or death of controlling process
SIGINT 2 Term Interrupt from keyboard
SIGQUIT 3 Core Quit from keyboard

SIGILL 4 Core Illegal Instruction
SIGABRT 6 Core Abort signal from abort(3)
SIGFPE 8 Core Floating point exception
SIGKILL 9 Term Kill signal
SIGSEGV 11 Core Invalid memory reference
SIGPIPE 13 Term Broken pipe: write to pipe with no readers
SIGALRM 14 Term Timer signal from alarm(2)
SIGTERM 15 Term Termination signal
SIGUSR1 30,10,16 Term User-defined signal 1
SIGUSR2 31,12,17 Term User-defined signal 2
SIGCHLD 20,17,18 Ign Child stopped or terminated
SIGCONT 19,18,25 Cont Continue if stopped
SIGSTOP 17,19,23 Stop Stop process
SIGTSTP 18,20,24 Stop Stop typed at tty

………………

文章二：
在Linux下产生并调试core文件

先看看我用的是个什么机器：

$ uname -a
Linux dev 2.4.21-9.30AXsmp #1 SMP Wed May 26 23:37:09 EDT 2004 i686 i686 i386 GNU/Linux

再看看默认的一些参数，注意core file size是个0，程序出错时不会产生core文件了。

$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) 4
max memory size (kbytes, -m) unlimited
open files (-n) 2048
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 7168
virtual memory (kbytes, -v) unlimited

写个简单的程序，看看core文件是不是会被产生。

$ more foo.c

#include <stdio.h>

static void sub(void);

int main(void)
{
sub();
return 0;
}

static void sub(void)
{
int *p = NULL;

/* derefernce a null pointer, expect core dump. */
printf("%d", *p);
}

$ gcc -Wall -g foo.c
$ ./a.out
Segmentation fault

$ ls -l core.*
ls: core.*: No such file or directory

没有找到core文件，我们改改ulimit的设置，让它产生。1024是随便取的，要是core文件大于1024个块，就产生不出来了。

$ ulimit -c 1024 （转者注: 使用-c unlimited不限制core文件大小）

$ ulimit -a
core file size (blocks, -c) 1024
data seg size (kbytes, -d) unlimited
file size (blocks, -f) unlimited
max locked memory (kbytes, -l) 4
max memory size (kbytes, -m) unlimited
open files (-n) 2048
pipe size (512 bytes, -p) 8
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 7168
virtual memory (kbytes, -v) unlimited

$ ./a.out
Segmentation fault (core dumped)
$ ls -l core.*
-rw------- 1 uniware uniware 53248 Jun 30 17:10 core.9128

注意看上述的输出信息，多了个(core dumped)。确实产生了一个core文件，9128是该进程的PID。我们用GDB来看看这个core。

$ gdb --core=core.9128
GNU gdb Asianux (6.0post-0.20040223.17.1AX)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-asianux-linux-gnu".
Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
#0 0x08048373 in ?? ()
(gdb) bt
#0 0x08048373 in ?? ()
#1 0xbfffd8f8 in ?? ()
#2 0x0804839e in ?? ()
#3 0xb74cc6b3 in ?? ()
#4 0x00000000 in ?? ()

此时用bt看不到backtrace，也就是调用堆栈，原来GDB还不知道符号信息在哪里。我们告诉它一下：

(gdb) file ./a.out
Reading symbols from ./a.out...done.
Using host libthread_db library "/lib/tls/libthread_db.so.1".
(gdb) bt
#0 0x08048373 in sub () at foo.c:17
#1 0x08048359 in main () at foo.c:8

此时backtrace出来了。

(gdb) l
8 sub();
9 return 0;
10 }
11
12 static void sub(void)
13 {
14 int *p = NULL;
15
16 /* derefernce a null pointer, expect core dump. */
17 printf("%d", *p);

1、开启系统的Core Dump功能

ulimit -c core_file_size_in_kb

如果要关闭该功能core_file_size_in_kb为0就行了。

2、设置Core Dump的核心转储文件目录和命名规则

文件的命名规则放在

/proc/sys/kernel/core_name_format文件中

使用sysctl -w "kernel.core_name_format=/coredump/%n.core"

上例的core文件放在/coredump目录下，文件名是进程名+.core

以下是一些命名的格式说明

    %P   The Process ID (current->pid)
    %U   The UID of the process (current->uid)
    %N   The command name of the process (current->comm)
    %H   The nodename of the system (system_utsname.nodename)
    %%   A "%"

3、分析核心转储文件

程序如下：

#include

int main()

{

int i=0;

int j=5;

int tmp;

for(; i < 10; i++, j--)

{

tmp=i/j;

printf("%d/%d=%dn", i, j, tmp);

}

}

该程序运行到i=5时，会发生浮点运算错误（被除数等于0，j=0）

编译上面的程序

gcc -g main.c -o eg

./eg

发生core-dump后，如果核心转储文件是core.2098，执行下面的命令

gdb eg core.2098

可以看到当时的信息，此出不方便录入。

4、杂项

kill -l

上面命令列出所有信号的名称和值

kill -l val

查询值为val的信号名称

kill -l signame

查询signame信号的值

附录 A. IBM AIX中产生Core文件的方法（来源于IBM cn）

Document #: 1311993F06001

环境:(产品,平台,机型,软件版本,等)
平台：RS
软件版本：AIX4.3 or later

问题描述:
如果用户需要为一个应用进程产生一个完整的core文件用于分析，如何做？

解答:
1. 前提条件

在产生core文件之前，先要配置系统参数以确认系统可以产生一个完整的core文件。另外，文件系统中还需要有足够的剩余空间用于存放所产生的core文件。core文件通常存放在进程属主用户的主目录中。

2. 什么时候要产生完整地Core文件

缺省情况下，进程不会产生一个完整的core文件。如果需要跟踪调试一个应用的共享内存段中的数据，特别是线程堆栈中的数据，则需要产生一个完整的core dump文件用于分析。

3. 若需要产生完整的core文件信息，首先需要以root身份执行下面的命令：

# chdev -l sys0 -a fullcore=true

上述命令也可以通过smitty来完成：

smitty --> System Environments --> Change/ Show Characteristics of Operating System

Change/ Show Characteristics of Operating System
Maximum number of PROCESSES allowed per user [128]
Maximum number of pages in block I/O BUFFER CACHE [20]
Maximum Kbytes of real memory allowed for MBUFS [0]
Automatically REBOOT system after a crash false
Continuously maintain DISK I/O history false
HIGH water mark for pending write I/Os per file [33]
LOW water mark for pending write I/Os per file [24]
Amount of usable physical memory in Kbytes 262144
State of system keylock at boot time normal
Enable full CORE dump true
Use pre-430 style CORE dump false
Enable CPU Guard disable

将上面列表中的“ Enable full CORE dump ”项设置为“true”。

4. 使用下面的命令产生一个core文件：
# kill -11
注意：上面的命令同时也会杀掉指定的进程。

附录 B. 如何查看C程序产生的CORE DUMP(来源于IBM cn)
Document #: 1317181000005

环境

产品：IBM C 和C++ Compiler
平台：AIX4.3.0以上
版本： C for AIX version 3 以上

问题描述

如何查看C程序产生的CORE DUMP，并找到产生问题的根源？

解答

在AIX操作系统上查看C程序产生的CORE DUMP，可以使用AIX操作系统提供的命令：dbx. 如果用户在使用时发现无此命令，则需安装操作系统的文件包：bos.adt.debug。
具体使用dbx命令的方法如下：
1。首先在进行C程序编译时，要使用-g选项。
cc -g -o samp1.o samp1.c
2.在执行完程序，并产生CORE DUMP文件core后，使用：dbx samp1.o core 命令可查看CORE DUMP。
要想找到是哪个函数或语句引起的问题，在dbx的命令行再输入where，即可显示出所需的信息。dbx的命令行还可执行多种命令以完成不同的查看功能，用户可通过help命令了解这些命令。

相关链接：http://www./gnu/linux/core.html

Core Dump?!

整理：Wilbur Lang

何谓 core？

在使用半导体作为内存的材料前，人类是利用线圈当作内存的材料（发明者为王安），线圈就叫作 core ，用线圈做的内存就叫作 core memory。如今，半导体工业澎勃发展，已经没有人用 core memory 了，不过，在许多情况下，人们还是把记忆体叫作 core 。
何谓 core dump？

我们在开发（或使用）一个程序时，最怕的就是程序莫明其妙地当掉。虽然系统没事，但我们下次仍可能遇到相同的问题。于是这时操作系统就会把程序当掉时的内存内容 dump 出来（现在通常是写在一个叫 core 的 file 里面），让我们或是 debugger 做为参考。这个动作就叫作 core dump。
为何会发生 core dump？

前面说过，在程序当掉时出错。在 C/C++语言中，最常发生错误的地方就是指针有问题。您可以利用 core 文件和 debugger 把错误找出来（要怎麽在 debugger 中使用 core 文件？man 一下 gdb 吧！）。
我可以把 core 文件删掉吗？

如果你不会、不能、不需要修改程序，那就放心地把它删除了吧！
要怎麽才不会让 core 文件出现？

如果用的是tcsh的话, 以试著在 .tcshrc 里加一行：
limit coredumpsize 0
如果用的是bash的话, 在/etc/profile里加上(或者修改)一条:
ulimit -c 0

有一招, 可以让你看出 core 最好用的地方 :)

gdb -c core, 进去後打 where, 就可以 show 出你是在程序哪一行当掉的, 还有在当掉时在哪个 function 里, 这个 function 是被哪个 function 所 call 的, 而这个 function 又是被哪个 function 所 call 的.... 一直到 main()

由这个信息, 可以找出五六成的 bug........ 屡试不爽

但, 先决条件, 当你在 compile 时必须把 debug information 的选项打开不然, 就会出现一大堆你看不懂的东西，而不是你喜欢的源程序。