本文是羽千叶同学在实际工作过程中遇到的复杂的服务器宕机问题的真实案例。羽同学说奔跑吧死机黑屏专题对解决死机问题很有启发和帮助很大。
一、环境
OS:centos7.7
kernel:3.10.0-1062.7.1
二、宕机信息
[482661.362612] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
[482661.367822] IP: [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.368337] PGD 0
[482661.368522] Oops: 0000 [#1] SMP
[482661.368806] Modules linked in: veth vxlan ip6_udp_tunnel udp_tunnel ip6table_nat
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables nf_conntrack_netlink xt_conntrack
[482661.376311] CPU: 1 PID: 29670 Comm: runc:[2:INIT] Kdump: loaded Tainted: G------------ T 3.10.0-1062.7.1.el7.x86_64 #1
[482661.377199] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[482661.377637] task: ffff96808d678000 ti: ffff968091738000 task.ti: ffff968091738000
[482661.378196] RIP: 0010:[<ffffffffba2e5b49>] [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.378863] RSP: 0018:ffff96809173be30 EFLAGS: 00010006
[482661.379270] RAX: 0000000000000002 RBX: ffff96809025af00 RCX: 0000000000000000
[482661.379806] RDX: 0000000000000002 RSI: ffff9680406f9070 RDI: ffff96809fc9ad00
[482661.380340] RBP: ffff96809173be68 R08: ffffffffbaa1e3c0 R09: 0000000000000000
[482661.380875] R10: 000000000000b8ff R11: f448000000000000 R12: 0000000000000000
[482661.381411] R13: ffff96808d678000 R14: ffff96809fc9ac80 R15: 0000000000000001
[482661.381946] FS: 00007f9ca6c4d740(0000) GS:ffff96809fc80000(0000) knlGS:0000000000000000
[482661.382550] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[482661.382983] CR2: 0000000000000070 CR3: 000000020f69c000 CR4: 00000000003606e0
[482661.383522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[482661.384054] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[482661.384594] Call Trace:
[482661.384806] [<ffffffffba2d7782>] check_preempt_curr+0x92/0xa0
[482661.385250] [<ffffffffba2dad24>] wake_up_new_task+0x104/0x1a0
[482661.385700] [<ffffffffba29a9f1>] do_fork+0xf1/0x330
[482661.386095] [<ffffffffba988a26>] ? trace_do_page_fault+0x56/0x150
[482661.386565] [<ffffffffba29acb6>] SyS_clone+0x16/0x20
[482661.386950] [<ffffffffba98e2b4>] stub_clone+0x44/0x70
[482661.387343] [<ffffffffba98dede>] ? system_call_fastpath+0x25/0x2a
[482661.387791] Code: 00 00 83 e8 01 48 8b 5b 68 39 d0 75 f5 49 8b 7c 24 70 48 3b 7b 70 74 1e 66 2e 0f 1f 84 00 00 00 00 00 4d 8b 64 24 68 48 8b 5b 68 <49> 8b 7c 24 70 48 3b 7b 70 75 ec 48 85 ff 74 e7 89 4d d0 e8 9f
[482661.389736] RIP [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.390208] RSP <ffff96809173be30>
[482661.390475] CR2: 0000000000000070
出错位置在第二行报出,位于check_preempt_wakeup偏移0xe9处;
Code这行代码中<49>是出错机器码起始字节.
crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
crash> rd -8 0xffffffffba2e5b49 5
ffffffffba2e5b49: 49 8b 7c 24 70
三、调试
下载kernel-debuginfo-3.10.0-1062.7.1.el7.x86_64.rpm、kernel-debuginfo-common-x86_64-3.10.0-1062.7.1.el7.x86_64.rpm调试包
解压kenel-debuginfo包
rpm2cpio kernel-debuginfo-3.10.0-1062.7.1.el7.x86_64.rpm | cpio -div;
crash ./usr/lib/debug/lib/modules/3.10.0-1062.7.1.el7.x86_64/vmlinux vmcore
1、查看调用栈信息
crash> bt
PID: 29670 TASK: ffff96808d678000 CPU: 1 COMMAND: 'runc:[2:INIT]'
#0 [ffff96809173ba90] machine_kexec at ffffffffba265b24
#1 [ffff96809173baf0] __crash_kexec at ffffffffba322422
#2 [ffff96809173bbc0] crash_kexec at ffffffffba322510
#3 [ffff96809173bbd8] oops_end at ffffffffba985798
#4 [ffff96809173bc00] no_context at ffffffffba275bb4
#5 [ffff96809173bc50] __bad_area_nosemaphore at ffffffffba275e82
#6 [ffff96809173bca0] bad_area_nosemaphore at ffffffffba275fa4
#7 [ffff96809173bcb0] __do_page_fault at ffffffffba988750
#8 [ffff96809173bd20] trace_do_page_fault at ffffffffba988a26
#9 [ffff96809173bd60] do_async_page_fault at ffffffffba987fa2
#10 [ffff96809173bd80] async_page_fault at ffffffffba9847a8
[exception RIP: check_preempt_wakeup+233]
RIP: ffffffffba2e5b49 RSP: ffff96809173be30 RFLAGS: 00010006
RAX: 0000000000000002 RBX: ffff96809025af00 RCX: 0000000000000000
RDX: 0000000000000002 RSI: ffff9680406f9070 RDI: ffff96809fc9ad00
RBP: ffff96809173be68 R8: ffffffffbaa1e3c0 R9: 0000000000000000
R10: 000000000000b8ff R11: f448000000000000 R12: 0000000000000000
R13: ffff96808d678000 R14: ffff96809fc9ac80 R15: 0000000000000001
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#11 [ffff96809173be70] check_preempt_curr at ffffffffba2d7782
#12 [ffff96809173be88] wake_up_new_task at ffffffffba2dad24
#13 [ffff96809173bec0] do_fork at ffffffffba29a9f1
#14 [ffff96809173bf38] sys_clone at ffffffffba29acb6
#15 [ffff96809173bf48] stub_clone at ffffffffba98e2b4
#16 [ffff96809173bf50] system_call_fastpath at ffffffffba98dede
RIP: 00007f9ca630e851 RSP: 00007ffe97b8c2e8 RFLAGS: 00000202
RAX: 0000000000000038 RBX: 00007f9ca2ffc700 RCX: ffffffffffffffff
RDX: 00007f9ca2ffc9d0 RSI: 00007f9ca2ffbfb0 RDI: 00000000003d0f00
RBP: 00007ffe97b8c410 R8: 00007f9ca2ffc700 R9: 00007f9ca2ffc700
R10: 00007f9ca2ffc9d0 R11: 0000000000000202 R12: 0000000000000000
R13: 0000000000801000 R14: 0000000000000000 R15: 00007f9ca2ffc700
ORIG_RAX: 0000000000000038 CS: 0033 SS: 002b
2、反汇编出错位置
crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
将r12+0x70地址处内容取出来,赋值给rdi,而出错时候r12寄存器值为0,即访问了地址0x70,该地址是一个非法地址,所以导致内核宕机。
3、找出出错位置对应的源代码
crash> dis check_preempt_wakeup+233 -l
/usr/src/debug/kernel-3.10.0-1062.7.1.el7/linux-3.10.0-1062.7.1.el7.x86_64/kernel/sched/fair.c: 343
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
报错说是源码中343行,其实不是这一行。通过此种方式得出的出错位置不准确。下面通过两种方式来获取准确的出错位置。

四、找到准确出错位置
1、分析源代码来查找
栈调用关系是:do_fork---->wake_up_new_task---->check_preempt_curr---->check_preempt_wakeup
wake_up_new_task函数中调用check_preempt_curr为:
void wake_up_new_task(struct task_struct *p)
{
……
check_preempt_curr(rq, p, WF_FORK);
……
}
void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
……
rq->curr->sched_class->check_preempt_curr(rq, p, flags);
……
}
check_preempt_curr就是check_preempt_wakeup函数
可以看到check_preempt_wakeup有三个参数,是通过check_preempt_curr函数调用 rq->curr->sched_class->check_preempt_curr进行传递。
x86下参数传递通过寄存器传递:%rdi、%rsi、%rdx、%rcx、%r8、%r9依次对应第1个参数、第2个参数……,如果有超过的参数则通过栈传递。
通过上面函数调用关系,可以得知第三个参数wake_flags=WF_FORK;
下面我们通过汇编来推导第1个以及第2个参数;第一个参数rq通过%rdi寄存器传入;第二个参数通过%rsi寄存器传入;
如果在函数入口处就知道这两个寄存器的值,那么就可以确定函数参数值。
static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
……
}
1.1通过汇编推导函数参数
check_preempt_wakeup函数对应的汇编代码如下:
0xffffffffba2e5a60 <check_preempt_wakeup>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffba2e5a65 <check_preempt_wakeup+5>: push %rbp
0xffffffffba2e5a66 <check_preempt_wakeup+6>: mov %rsp,%rbp
0xffffffffba2e5a69 <check_preempt_wakeup+9>: push %r15
0xffffffffba2e5a6b <check_preempt_wakeup+11>: push %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>: mov %rdi,%r14
0xffffffffba2e5a70 <check_preempt_wakeup+16>: push %r13
0xffffffffba2e5a72 <check_preempt_wakeup+18>: push %r12
0xffffffffba2e5a74 <check_preempt_wakeup+20>: push %rbx
0xffffffffba2e5a75 <check_preempt_wakeup+21>: lea 0x68(%rsi),%rbx
0xffffffffba2e5a79 <check_preempt_wakeup+25>: sub $0x10,%rsp
……
1.2 推导第1个参数
第一个参数是通过寄存器%rdi传入,在汇编代码中偏移13位置处,看到将%rdi寄存器赋值给了%r14寄存器,
而%r14寄存器,从此时被赋值开始到发生宕机时刻,值都没有发生改变,
说明%r14寄存器里面保存了函数入口的第一个参数值;
crash> dis check_preempt_wakeup | grep r14 //过滤出包含r14的指令
0xffffffffba2e5a6b <check_preempt_wakeup+11>: push %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>: mov %rdi,%r14
0xffffffffba2e5b9e <check_preempt_wakeup+318>: mov %r14,%rdi
0xffffffffba2e5bb0 <check_preempt_wakeup+336>: cmp 0x8b0(%r14),%r13
0xffffffffba2e5c09 <check_preempt_wakeup+425>: pop %r14
0xffffffffba2e5c27 <check_preempt_wakeup+455>: pop %r14
通过宕机时候打印出的寄存器信息,可以得到%r14=ffff96809fc9ac80,所以函数入口第一个参数值为rq=ffff96809fc9ac80;
1.3 推导第2个参数
第二个参数是通过寄存器%rsi传入,过滤出函数中%rsi的指令,发现从进入check_preempt_wakeup函数开始到宕机时刻,
%rsi指令的值都没有变化,所以宕机时刻寄存器%rsi中就保存着第2个参数值。%rsi=ffff9680406f9070,所以函数入口第二个参数值为
p=ffff9680406f9070;
crash> dis check_preempt_wakeup | grep rsi
0xffffffffba2e5a75 <check_preempt_wakeup+21>: lea 0x68(%rsi),%rbx
0xffffffffba2e5aa1 <check_preempt_wakeup+65>: mov 0xd8(%rsi),%rdi
0xffffffffba2e5adb <check_preempt_wakeup+123>: mov 0x188(%rsi),%r9d
0xffffffffba2e5af7 <check_preempt_wakeup+151>: mov 0x110(%rsi),%eax ----->此条指令及前面指令是出错前的指令,可以被执行,rsi没有发生变化;
0xffffffffba2e5c47 <check_preempt_wakeup+487>: mov %rsi,-0x30(%rbp) ----->此条指令开始是出错后指令,不会执行到此;
0xffffffffba2e5c55 <check_preempt_wakeup+501>: mov -0x30(%rbp),%rsi
0xffffffffba2e5c6b <check_preempt_wakeup+523>: cmpl $0x5,0x188(%rsi)
通过上面推导函数check_preempt_wakeup的三个参数为:
struct rq *rq=ffff96809fc9ac80;
struct task_struct *p=ffff9680406f9070;
int wake_flags=WF_FORK;
如果直接在check_preempt_wakeup中无法推出函数参数值,则可以向上在调用者代码中进行推断。
1.4 分析check_preempt_wakeup函数
struct rq *rq=ffff96809fc9ac80,struct task_struct *p=ffff9680406f9070,int wake_flags=WF_FORK;
static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
struct task_struct *curr = rq->curr; //crash> struct rq.curr ffff96809fc9ac80
//------>curr = 0xffff96808d678000
struct sched_entity *se = &curr->se, *pse = &p->se; //crash> struct task_struct -x -o | grep sched_entity
//[0x68] struct sched_entity se;
//------>se=0xffff96808d678068,pse=0xffff9680406f90d8
struct cfs_rq *cfs_rq = task_cfs_rq(curr); // curr->se.cfs_rq
//crash> struct task_struct.se.cfs_rq 0xffff96808d678000
//se.cfs_rq = 0xffff967f41400400
//------>cfs_rq=0xffff967f41400400
int scale = cfs_rq->nr_running >= sched_nr_latency; //crash> p sched_nr_latency crash> struct cfs_rq.nr_running 0xffff967f41400400
//sched_nr_latency = $1 = 2 nr_running = 2
//------>scale=(2>=2)=1;
int next_buddy_marked = 0;
if (unlikely(se == pse)) //此处se=0xffff96808d678068,pse=0xffff9680406f90d8,两者不相等,所以不返回;
return;
if (unlikely(throttled_hierarchy(cfs_rq_of(pse)))) //cfs_rq_of(pse):pse->cfs_rq
return; //crash> struct sched_entity.cfs_rq 0xffff9680406f90d8 crash> struct cfs_rq.throttle_count 0xffff967f41400400
//cfs_rq = 0xffff967f41400400 throttle_count = 0
//所以此处条件也不成立,不返回;
if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK)) { //此处wake_flags=WF_FORK,所以!(wake_flags & WF_FORK)条件不成立,不执行条件体语句。
set_next_buddy(pse);
next_buddy_marked = 1;
}
if (test_tsk_need_resched(curr)) //crash> struct task_struct.stack 0xffff96808d678000 crash> struct thread_info.flags 0xffff968091738000
return; //stack = 0xffff968091738000 flags = 128
//由于thread_info->flags中没有设置TIF_NEED_RESCHED,所以此处条件不成立,不返回;
if (unlikely(curr->policy == SCHED_IDLE) && //crash> struct task_struct.policy 0xffff96808d678000
likely(p->policy != SCHED_IDLE)) //policy = 0
goto preempt; //SCHED_IDLE=5与curr->policy=0不相等,条件不成立,所以不跳转;
if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION)) //该处条件也不成立,不跳转;
return;
find_matching_se(&se, &pse); //------------>进入find_matching_se函数中执行
update_curr(cfs_rq_of(se));
BUG_ON(!pse);
if (wakeup_preempt_entity(se, pse) == 1) {
if (!next_buddy_marked)
set_next_buddy(pse);
goto preempt;
}
return;
preempt:
resched_curr(rq);
if (unlikely(!se->on_rq || curr == rq->idle))
return;
if (sched_feat(LAST_BUDDY) && scale && entity_is_task(se))
set_last_buddy(se);
}
static void find_matching_se(struct sched_entity **se, struct sched_entity **pse)
{
int se_depth, pse_depth;
se_depth = (*se)->depth; //struct sched_entity.depth 0xffff96808d678068
//depth = 3
//------>se_depth=3
pse_depth = (*pse)->depth; //crash> struct sched_entity.depth 0xffff9680406f90d8
//depth = 2
//------>pse_depth=2
while (se_depth > pse_depth) { //se_depth=3,pse_depth=2,只进行一次循环
se_depth--; //se_depth=3-1=2
*se = parent_entity(*se); //crash> struct sched_entity.parent 0xffff96808d678068
} //parent = 0xffff968095227900
//------->*se=0xffff968095227900
while (pse_depth > se_depth) { //pse_depth=2,se_depth=2两者相等,条件不成立,不执行while中语句
pse_depth--;
*pse = parent_entity(*pse);
}
/*
1. 第一轮循环 *se=0xffff968095227900,*pse=0xffff9680406f90d8
crash> struct sched_entity.cfs_rq 0xffff968095227900 crash> struct sched_entity.cfs_rq 0xffff9680406f90d8
cfs_rq = 0xffff96804d2cfc00 cfs_rq = 0xffff967f41400400
(*se)->cfs_rq=0xffff96804d2cfc00,(*pse)->cfs_rq=0xffff967f41400400,两者不相等,进而获取sched_entity->parent
crash> struct sched_entity.parent 0xffff968095227900 struct sched_entity.parent 0xffff9680406f90d8
parent = 0xffff967f2e069300 parent = 0xffff968095227900
*se=0xffff967f2e069300 *pse=0xffff968095227900
2. 第二轮循环 *se=0xffff967f2e069300,*pse=0xffff968095227900
struct sched_entity.cfs_rq 0xffff967f2e069300 crash> struct sched_entity.cfs_rq 0xffff968095227900
cfs_rq = 0xffff967f540ee600 cfs_rq = 0xffff96804d2cfc00
(*se)->cfs_rq=0xffff967f540ee600,(*pse)->cfs_rq=0xffff96804d2cfc00,两者不相等,进而获取sched_entity->parent
crash> struct sched_entity.parent 0xffff967f2e069300 struct sched_entity.parent 0xffff968095227900
parent = 0xffff96809025af00 parent = 0xffff967f2e069300
*se=0xffff96809025af00 *pse=0xffff967f2e069300
3. 第三轮循环 *se=0xffff96809025af00,*pse=0xffff967f2e069300
struct sched_entity.cfs_rq 0xffff96809025af00 crash> struct sched_entity.cfs_rq 0xffff967f2e069300
cfs_rq = 0xffff96809fc9ad00 cfs_rq = 0xffff967f540ee600
(*se)->cfs_rq=0xffff96809fc9ad00,(*pse)->cfs_rq=0xffff967f540ee600,两者不相等,进而获取sched_entity->parent
crash> struct sched_entity.parent 0xffff96809025af00 struct sched_entity.parent 0xffff967f2e069300
parent = 0x0 parent = 0xffff96809025af00
*se=0x0 *pse=0xffff96809025af00
4. 第四轮循环 *se=0x0,*pse=0xffff96809025af00
由于*se=0,所以在执行(*se)->cfs_rq就报错了,所以出错位置是在is_same_group函数中;
*/
while (!is_same_group(*se, *pse)) { //(*se)->cfs_rq是否和(*pse)->cfs_rq相等;
*se = parent_entity(*se);
*pse = parent_entity(*pse);
}
}
cfs_rq在sched_entity中偏移0x70,出错位置处汇编指令是从以%r12为基地址,偏移0x70处取值,即获取sched_entity->cfs_rq的值,所以经过上述对源码分析可以找到出错位置。
crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi
crash> struct sched_entity -x -o | grep cfs_rq
[0x70] struct cfs_rq *cfs_rq;
1.5 出错原因
出错原因是由于父子进程的depth不匹配;主线中有相关patch,升级patch解决。
[upstream eeb61e53ea19be0c4015b00b2e8b3b2185436f2b]

父进程:depth = 0;parent = 0;
depth = 1;parent = 0xffff96809025af00;
depth = 2;parent = 0xffff967f2e069300;
depth = 3;parent = 0xffff968095227900;
子进程:depth = 0;parent = 0;
depth = 1;parent = 0xffff96809025af00;
depth = 2;parent = 0xffff967f2e069300;
depth = 2;parent = 0xffff968095227900;
2、分析汇编代码来查找
crash> dis check_preempt_wakeup
0xffffffffba2e5a60 <check_preempt_wakeup>: nopl 0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffba2e5a65 <check_preempt_wakeup+5>: push %rbp
0xffffffffba2e5a66 <check_preempt_wakeup+6>: mov %rsp,%rbp
0xffffffffba2e5a69 <check_preempt_wakeup+9>: push %r15
0xffffffffba2e5a6b <check_preempt_wakeup+11>: push %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>: mov %rdi,%r14 #此处将函数入口第一个参数保存到%r14寄存器中;
0xffffffffba2e5a70 <check_preempt_wakeup+16>: push %r13
0xffffffffba2e5a72 <check_preempt_wakeup+18>: push %r12
0xffffffffba2e5a74 <check_preempt_wakeup+20>: push %rbx
0xffffffffba2e5a75 <check_preempt_wakeup+21>: lea 0x68(%rsi),%rbx #此处将以%rsi为基地址,偏移0x68处地址赋值给%rbx,即%rsi+0x68值给%rbx
#%rsi作为函数第二个参数,即struct task_struct,其中偏移0x68处刚好是se
#所以该条汇编指令对应源码:pse = &p->se
#%rbx = %rsi + 0x68 = 0xffff9680406f9070 + 0x68 = 0xffff9680406f90d8;
0xffffffffba2e5a79 <check_preempt_wakeup+25>: sub $0x10,%rsp
0xffffffffba2e5a7d <check_preempt_wakeup+29>: mov 0x8a8(%rdi),%r13 #%rdi作为函数第一个参数,struct rq,其中偏移0x8a8刚好是struct task_struct *curr结构
#此处获取rq->curr值保存到%r13寄存器中;
#%r13 = [0xffff96809fc9ac80 + 0x8a8] = 0xffff96808d678000
0xffffffffba2e5a84 <check_preempt_wakeup+36>: mov 0xd8(%r13),%rax #此处是以%r13为基地址,偏移0xd8,将其中内容取出,赋值给%rax
#%r13值是rq->curr,即在struct task_struct中偏移0xd8地址处值,直接看偏移难以计算出该处是哪个变量
#task_struct中偏移0x68是sched_entity,而sched_entity中偏移0x70是cfs_rq,0xd8=0x68+0x70
#所以该条指令是获取task_struct中sched_entity中cfs_rq的值,对应源码struct cfs_rq *cfs_rq = task_cfs_rq(curr);
#所以%rax保存有curr->se->cfs_rq值;
#%rax = [0xffff96808d678000 + 0xd8] = 0xffff967f41400400
0xffffffffba2e5a8b <check_preempt_wakeup+43>: lea 0x68(%r13),%r12 #%r13中保存的是rq->curr值,此处以其为基地址,偏移0x68处地址赋值给%r12,即对应源码se = &curr->se
#%r12=%r13+0x68=0xffff96808d678000+0x68=0xffff96808d678068
0xffffffffba2e5a8f <check_preempt_wakeup+47>: cmp %rbx,%r12 #%rbx=0xffff9680406f90d8 ---> pse = &p->se; %r12=0xffff96808d678068 ----> se = &curr->se;两者不相等
#对应源码if (unlikely(se == pse))
0xffffffffba2e5a92 <check_preempt_wakeup+50>: mov 0x10(%rax),%ecx #%rax保存的是curr->se->cfs_rq值,cfs_rq中偏移0x10处是nr_running,此处获取nr_running值保存到%ecx中
#对应源码:cfs_rq->nr_running=2
#%ecx=2
0xffffffffba2e5a95 <check_preempt_wakeup+53>: mov 0xb76b09(%rip),%eax #%rip为下一条指令的地址,即0xffffffffba2e5a9b,%rax=[0xffffffffba2e5a9b+0xb76b09]=0x0032dcd500000002
#%eax=2
0xffffffffba2e5a9b <check_preempt_wakeup+59>: je 0xffffffffba2e5c00 <check_preempt_wakeup+416> #由于上面%rbx与%r12不相等,所以此处不跳转;
0xffffffffba2e5aa1 <check_preempt_wakeup+65>: mov 0xd8(%rsi),%rdi #%rdi=[%rsi+0xd8]=[%rsi+0x68+0x70],%rsi为task_struct基地址;偏移0x68处是sched_entity成员地址;
#sched_entity中偏移0x70处是cfs_rq地址,此条汇编指令是获取task_struct中sched_entity成员的cfs_rq的值给%rdi
#对应源码cfs_rq_of(pse) ----> pse->cfs_rq
#%rdi=0xffff967f41400400
0xffffffffba2e5aa8 <check_preempt_wakeup+72>: jmpq 0xffffffffba2e5c10 <check_preempt_wakeup+432> #跳转到0xffffffffba2e5c10处执行 ------------->@@@@@@@@@@ 1 @@@@@@@@@
0xffffffffba2e5aad <check_preempt_wakeup+77>: xor %r15d,%r15d #------->@@@@@@@@@@ 2 @@@@@@@@@
#%r15d=0
0xffffffffba2e5ab0 <check_preempt_wakeup+80>: cmp %eax,%ecx #比较%eax与%ecx
0xffffffffba2e5ab2 <check_preempt_wakeup+82>: setae %r15b #如果%eax=%ecx则设置%r15b寄存器,此处%eax=2,%ecx=2,两者相等,所以%r15b=1;
#对应源码:int scale = cfs_rq->nr_running >= sched_nr_latency;
0xffffffffba2e5ab6 <check_preempt_wakeup+86>: nopl 0x0(%rax,%rax,1)
0xffffffffba2e5abb <check_preempt_wakeup+91>: xor %ecx,%ecx #%ecx=0
0xffffffffba2e5abd <check_preempt_wakeup+93>: mov 0x8(%r13),%rax #%rax=[%r13+0x8],%r13是rq->curr,偏移0x8处是栈地址stack,curr->stack=%rax=ffff968091738000
0xffffffffba2e5ac1 <check_preempt_wakeup+97>: mov 0x10(%rax),%rax #task_struct->stack处地址也就是thread_info地址,thread_info中偏移0x10处是flags成员,此处获取flags值
#对应源码test_tsk_need_resched(curr)
#%rax=[%rax+0x10]=[ffff968091738000+0x10]=0000000000000080,即thread_info->flags=0x80
0xffffffffba2e5ac5 <check_preempt_wakeup+101>: test $0x8,%al #判断thread_info->flags的bit 3是否置1,也就是判断thread_info->flags中是否设置TIF_NEED_RESCHED标记
#对应源码test_tsk_need_resched(curr)
0xffffffffba2e5ac7 <check_preempt_wakeup+103>: jne 0xffffffffba2e5c00 <check_preempt_wakeup+416> #此处条件不成立,不跳转
0xffffffffba2e5acd <check_preempt_wakeup+109>: cmpl $0x5,0x188(%r13) #%r13对应rq->curr,task_struct中偏移0x188处是policy成员,此处比较rq->curr->policy是否为5;
#对应源码unlikely(curr->policy == SCHED_IDLE)
#curr->policy=[%r13+0x188]=[0xffff96808d678000+0x188]=0000000200000000;
#curr->policy是int类型,占4个字节,所以取低4字节,即0,curr->policy=0;
0xffffffffba2e5ad5 <check_preempt_wakeup+117>: je 0xffffffffba2e5c6b <check_preempt_wakeup+523> #curr->policy == SCHED_IDLE不成立,所以不进行跳转
#对应源码if (unlikely(curr->policy == SCHED_IDLE) &&
0xffffffffba2e5adb <check_preempt_wakeup+123>: mov 0x188(%rsi),%r9d #%rsi对应第二个参数struct task_struct *p;此处获取p->policy=[%rsi+0x188]=[0xffff9680406f9070+0x188]=0x0000000200000000
#p->policy占4字节,所以p->policy=0,
#%r9d=0
0xffffffffba2e5ae2 <check_preempt_wakeup+130>: test %r9d,%r9d #判断%r9d是否为0,此处为0
0xffffffffba2e5ae5 <check_preempt_wakeup+133>: jne 0xffffffffba2e5c00 <check_preempt_wakeup+416> #如果不为0则跳转,此处不跳转
#对应源码:if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))
0xffffffffba2e5aeb <check_preempt_wakeup+139>: nopl 0x0(%rax,%rax,1)
0xffffffffba2e5af0 <check_preempt_wakeup+144>: mov 0x110(%r13),%edx #%r13中保存rq->curr值,此处偏移0x110直接找偏移难以找出,0x110=0x68+0xa0,其中task_struct中偏移0x68处是sched_entity;
#sched_entity中偏移0xa0处是depth成员,所以以task_struct为基地址,偏移0x110处是获取depth成员;
#%edx=[%r13+0x110]=3 ----> se_depth=3
#对应源码se_depth = (*se)->depth;
0xffffffffba2e5af7 <check_preempt_wakeup+151>: mov 0x110(%rsi),%eax #rsi中保存第二个参数struct task_struct *p值,此处获取p->se->depth值
#%eax=[%rsi+0x110]=2 ----> pse_depth=2
#对应源码pse_depth = (*pse)->depth;
0xffffffffba2e5afd <check_preempt_wakeup+157>: cmp %eax,%edx #比较%edx与%eax
0xffffffffba2e5aff <check_preempt_wakeup+159>: jle 0xffffffffba2e5b16 <check_preempt_wakeup+182> #%edx=3 如果小于等于 %eax=2,则跳转,此处不成立,不跳转
#对应源码:while (se_depth > pse_depth)
0xffffffffba2e5b01 <check_preempt_wakeup+161>: nopl 0x0(%rax)
0xffffffffba2e5b08 <check_preempt_wakeup+168>: sub $0x1,%edx #%edx=%edx - 1 = 2;
#se_depth=2
#对应源码:se_depth--;
0xffffffffba2e5b0b <check_preempt_wakeup+171>: mov 0x68(%r12),%r12 #%r12中保留有rq->curr->se处地址,sched_entity中偏移0x68处是parent成员,此处获取se->parent成员值给%r12
#%r12=0xffff968095227900
#对应源码:*se = parent_entity(*se);
0xffffffffba2e5b10 <check_preempt_wakeup+176>: cmp %eax,%edx #此时%eax=2,%edx=2,两者相等
0xffffffffba2e5b12 <check_preempt_wakeup+178>: jne 0xffffffffba2e5b08 <check_preempt_wakeup+168> #如果两者不想等,则跳转,条件不成立,此处不跳转;
0xffffffffba2e5b14 <check_preempt_wakeup+180>: mov %eax,%edx #%edx=%eax=2
0xffffffffba2e5b16 <check_preempt_wakeup+182>: cmp %eax,%edx #比较%edx与%eax
0xffffffffba2e5b18 <check_preempt_wakeup+184>: jge 0xffffffffba2e5b49 <check_preempt_wakeup+233> #%edx大于等于%eax则跳转,此处相等,跳转到0xffffffffba2e5b49------>@@@@@@@@@@ 3 @@@@@@@@@
0xffffffffba2e5b1a <check_preempt_wakeup+186>: nopw 0x0(%rax,%rax,1)
0xffffffffba2e5b20 <check_preempt_wakeup+192>: sub $0x1,%eax
0xffffffffba2e5b23 <check_preempt_wakeup+195>: mov 0x68(%rbx),%rbx
0xffffffffba2e5b27 <check_preempt_wakeup+199>: cmp %edx,%eax
0xffffffffba2e5b29 <check_preempt_wakeup+201>: jne 0xffffffffba2e5b20 <check_preempt_wakeup+192>
0xffffffffba2e5b2b <check_preempt_wakeup+203>: mov 0x70(%r12),%rdi
0xffffffffba2e5b30 <check_preempt_wakeup+208>: cmp 0x70(%rbx),%rdi
0xffffffffba2e5b34 <check_preempt_wakeup+212>: je 0xffffffffba2e5b54 <check_preempt_wakeup+244>
0xffffffffba2e5b36 <check_preempt_wakeup+214>: nopw %cs:0x0(%rax,%rax,1)
0xffffffffba2e5b40 <check_preempt_wakeup+224>: mov 0x68(%r12),%r12
0xffffffffba2e5b45 <check_preempt_wakeup+229>: mov 0x68(%rbx),%rbx
#------>@@@@@@@@@@ 3 @@@@@@@@@
0xffffffffba2e5b49 <check_preempt_wakeup+233>: mov 0x70(%r12),%rdi #实际报错在该行,如果直接从上往下推一次,此处%r12不为0,而报错时候%r12=0,具有误导性
#是由于经过while循环,不断重新给%12赋值,最后导致%12=0;
#此处%12对应rq->curr中sched_entity的成员地址,sched_entity中偏移0x70刚好是cfs_rq成员,所以获取sched_entity中cfs_rq成员;
#对应源码:if (se->cfs_rq == pse->cfs_rq)
0xffffffffba2e5b4e <check_preempt_wakeup+238>: cmp 0x70(%rbx),%rdi #获取pse->cfs_rq
0xffffffffba2e5b52 <check_preempt_wakeup+242>: jne 0xffffffffba2e5b40 <check_preempt_wakeup+224> #每次循环se->cfs_rq与pse->cfs_rq都不相等,所以跳转到0xffffffffba2e5b40处执行;
0xffffffffba2e5b54 <check_preempt_wakeup+244>: test %rdi,%rdi
0xffffffffba2e5b57 <check_preempt_wakeup+247>: je 0xffffffffba2e5b40 <check_preempt_wakeup+224>
0xffffffffba2e5b59 <check_preempt_wakeup+249>: mov %ecx,-0x30(%rbp)
0xffffffffba2e5b5c <check_preempt_wakeup+252>: callq 0xffffffffba2e4900 <update_curr>
0xffffffffba2e5b61 <check_preempt_wakeup+257>: test %rbx,%rbx
0xffffffffba2e5b64 <check_preempt_wakeup+260>: mov -0x30(%rbp),%ecx
0xffffffffba2e5b67 <check_preempt_wakeup+263>: je 0xffffffffba2e5c7d <check_preempt_wakeup+541>
0xffffffffba2e5b6d <check_preempt_wakeup+269>: mov 0x50(%r12),%rdx
0xffffffffba2e5b72 <check_preempt_wakeup+274>: sub 0x50(%rbx),%rdx
……
#---------->@@@@@@@@@@ 1 @@@@@@@@@
0xffffffffba2e5c10 <check_preempt_wakeup+432>: mov 0xfc(%rdi),%edi #%rdi=0xffff967f41400400,对应的是pse->cfs_rq,cfs_rq中偏移0xfc处是throttle_count成员
#%rdi=[%rdi + 0xfc]=[0xffff967f414004fc]=91bf264000000000;
#%edi=0;
0xffffffffba2e5c16 <check_preempt_wakeup+438>: test %edi,%edi #测试%edi是否为0
0xffffffffba2e5c18 <check_preempt_wakeup+440>: je 0xffffffffba2e5aad <check_preempt_wakeup+77> #为0,则跳转到0xffffffffba2e5aad------->@@@@@@@@@@ 2 @@@@@@@@@
#对应源码if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
……