分享

宕机实战透析:x86内核访问非法指针

 西北望msm66g9f 2020-10-04

本文是羽千叶同学在实际工作过程中遇到的复杂的服务器宕机问题的真实案例。羽同学说奔跑吧死机黑屏专题对解决死机问题很有启发和帮助很大。

一、环境

OS:centos7.7
kernel:3.10.0-1062.7.1

二、宕机信息

[482661.362612] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070
[482661.367822] IP: [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.368337] PGD 0
[482661.368522] Oops: 0000 [#1] SMP
[482661.368806] Modules linked in: veth vxlan ip6_udp_tunnel udp_tunnel ip6table_nat 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6_tables nf_conntrack_netlink xt_conntrack
[482661.376311] CPU: 1 PID: 29670 Comm: runc:[2:INIT] Kdump: loaded Tainted: G------------ T 3.10.0-1062.7.1.el7.x86_64 #1
[482661.377199] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[482661.377637] task: ffff96808d678000 ti: ffff968091738000 task.ti: ffff968091738000
[482661.378196] RIP: 0010:[<ffffffffba2e5b49>]  [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.378863] RSP: 0018:ffff96809173be30  EFLAGS: 00010006
[482661.379270] RAX: 0000000000000002 RBX: ffff96809025af00 RCX: 0000000000000000
[482661.379806] RDX: 0000000000000002 RSI: ffff9680406f9070 RDI: ffff96809fc9ad00
[482661.380340] RBP: ffff96809173be68 R08: ffffffffbaa1e3c0 R09: 0000000000000000
[482661.380875] R10: 000000000000b8ff R11: f448000000000000 R12: 0000000000000000
[482661.381411] R13: ffff96808d678000 R14: ffff96809fc9ac80 R15: 0000000000000001
[482661.381946] FS:  00007f9ca6c4d740(0000) GS:ffff96809fc80000(0000) knlGS:0000000000000000
[482661.382550] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[482661.382983] CR2: 0000000000000070 CR3: 000000020f69c000 CR4: 00000000003606e0
[482661.383522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[482661.384054] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[482661.384594] Call Trace:
[482661.384806]  [<ffffffffba2d7782>] check_preempt_curr+0x92/0xa0
[482661.385250]  [<ffffffffba2dad24>] wake_up_new_task+0x104/0x1a0
[482661.385700]  [<ffffffffba29a9f1>] do_fork+0xf1/0x330
[482661.386095]  [<ffffffffba988a26>] ? trace_do_page_fault+0x56/0x150
[482661.386565]  [<ffffffffba29acb6>] SyS_clone+0x16/0x20
[482661.386950]  [<ffffffffba98e2b4>] stub_clone+0x44/0x70
[482661.387343]  [<ffffffffba98dede>] ? system_call_fastpath+0x25/0x2a
[482661.387791] Code: 00 00 83 e8 01 48 8b 5b 68 39 d0 75 f5 49 8b 7c 24 70 48 3b 7b 70 74 1e 66 2e 0f 1f 84 00 00 00 00 00 4d 8b 64 24 68 48 8b 5b 68 <49> 8b 7c 24 70 48 3b 7b 70 75 ec 48 85 ff 74 e7 89 4d d0 e8 9f
[482661.389736] RIP  [<ffffffffba2e5b49>] check_preempt_wakeup+0xe9/0x220
[482661.390208]  RSP <ffff96809173be30>
[482661.390475] CR2: 0000000000000070

出错位置在第二行报出,位于check_preempt_wakeup偏移0xe9处;
Code这行代码中<49>是出错机器码起始字节.

crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>:  mov    0x70(%r12),%rdi
crash> rd -8 0xffffffffba2e5b49 5
ffffffffba2e5b49:  49 8724 70

三、调试

下载kernel-debuginfo-3.10.0-1062.7.1.el7.x86_64.rpm、kernel-debuginfo-common-x86_64-3.10.0-1062.7.1.el7.x86_64.rpm调试包
解压kenel-debuginfo包

rpm2cpio kernel-debuginfo-3.10.0-1062.7.1.el7.x86_64.rpm |  cpio -div;
crash ./usr/lib/debug/lib/modules/3.10.0-1062.7.1.el7.x86_64/vmlinux vmcore

1、查看调用栈信息

crash> bt
PID: 29670  TASK: ffff96808d678000  CPU: 1   COMMAND: 'runc:[2:INIT]'
 #0 [ffff96809173ba90] machine_kexec at ffffffffba265b24
 #1 [ffff96809173baf0] __crash_kexec at ffffffffba322422
 #2 [ffff96809173bbc0] crash_kexec at ffffffffba322510
 #3 [ffff96809173bbd8] oops_end at ffffffffba985798
 #4 [ffff96809173bc00] no_context at ffffffffba275bb4
 #5 [ffff96809173bc50] __bad_area_nosemaphore at ffffffffba275e82
 #6 [ffff96809173bca0] bad_area_nosemaphore at ffffffffba275fa4
 #7 [ffff96809173bcb0] __do_page_fault at ffffffffba988750
 #8 [ffff96809173bd20] trace_do_page_fault at ffffffffba988a26
 #9 [ffff96809173bd60] do_async_page_fault at ffffffffba987fa2
#10 [ffff96809173bd80] async_page_fault at ffffffffba9847a8
    [exception RIP: check_preempt_wakeup+233]
    RIP: ffffffffba2e5b49  RSP: ffff96809173be30  RFLAGS: 00010006
    RAX: 0000000000000002  RBX: ffff96809025af00  RCX: 0000000000000000
    RDX: 0000000000000002  RSI: ffff9680406f9070  RDI: ffff96809fc9ad00
    RBP: ffff96809173be68   R8: ffffffffbaa1e3c0   R9: 0000000000000000
    R10: 000000000000b8ff  R11: f448000000000000  R12: 0000000000000000
    R13: ffff96808d678000  R14: ffff96809fc9ac80  R15: 0000000000000001
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#11 [ffff96809173be70] check_preempt_curr at ffffffffba2d7782
#12 [ffff96809173be88] wake_up_new_task at ffffffffba2dad24
#13 [ffff96809173bec0] do_fork at ffffffffba29a9f1
#14 [ffff96809173bf38] sys_clone at ffffffffba29acb6
#15 [ffff96809173bf48] stub_clone at ffffffffba98e2b4
#16 [ffff96809173bf50] system_call_fastpath at ffffffffba98dede
    RIP: 00007f9ca630e851  RSP: 00007ffe97b8c2e8  RFLAGS: 00000202
    RAX: 0000000000000038  RBX: 00007f9ca2ffc700  RCX: ffffffffffffffff
    RDX: 00007f9ca2ffc9d0  RSI: 00007f9ca2ffbfb0  RDI: 00000000003d0f00
    RBP: 00007ffe97b8c410   R8: 00007f9ca2ffc700   R9: 00007f9ca2ffc700
    R10: 00007f9ca2ffc9d0  R11: 0000000000000202  R12: 0000000000000000
    R13: 0000000000801000  R14: 0000000000000000  R15: 00007f9ca2ffc700
    ORIG_RAX: 0000000000000038  CS: 0033  SS: 002b

2、反汇编出错位置

crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>:  mov    0x70(%r12),%rdi

将r12+0x70地址处内容取出来,赋值给rdi,而出错时候r12寄存器值为0,即访问了地址0x70,该地址是一个非法地址,所以导致内核宕机。

3、找出出错位置对应的源代码

crash> dis check_preempt_wakeup+233 -l
/usr/src/debug/kernel-3.10.0-1062.7.1.el7/linux-3.10.0-1062.7.1.el7.x86_64/kernel/sched/fair.c: 343
0xffffffffba2e5b49 <check_preempt_wakeup+233>:  mov    0x70(%r12),%rdi

报错说是源码中343行,其实不是这一行。通过此种方式得出的出错位置不准确。下面通过两种方式来获取准确的出错位置。

四、找到准确出错位置

1、分析源代码来查找

栈调用关系是:do_fork---->wake_up_new_task---->check_preempt_curr---->check_preempt_wakeup
wake_up_new_task函数中调用check_preempt_curr为:

void wake_up_new_task(struct task_struct *p)
{
    ……
  check_preempt_curr(rq, p, WF_FORK);
  ……
}
void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
    ……
  rq->curr->sched_class->check_preempt_curr(rq, p, flags);
  ……
}

check_preempt_curr就是check_preempt_wakeup函数
可以看到check_preempt_wakeup有三个参数,是通过check_preempt_curr函数调用 rq->curr->sched_class->check_preempt_curr进行传递。
x86下参数传递通过寄存器传递:%rdi、%rsi、%rdx、%rcx、%r8、%r9依次对应第1个参数、第2个参数……,如果有超过的参数则通过栈传递。
通过上面函数调用关系,可以得知第三个参数wake_flags=WF_FORK;
下面我们通过汇编来推导第1个以及第2个参数;第一个参数rq通过%rdi寄存器传入;第二个参数通过%rsi寄存器传入;
如果在函数入口处就知道这两个寄存器的值,那么就可以确定函数参数值。

static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
    ……
}

1.1通过汇编推导函数参数

check_preempt_wakeup函数对应的汇编代码如下:

0xffffffffba2e5a60 <check_preempt_wakeup>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffba2e5a65 <check_preempt_wakeup+5>:    push   %rbp
0xffffffffba2e5a66 <check_preempt_wakeup+6>:    mov    %rsp,%rbp
0xffffffffba2e5a69 <check_preempt_wakeup+9>:    push   %r15
0xffffffffba2e5a6b <check_preempt_wakeup+11>:   push   %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>:   mov    %rdi,%r14
0xffffffffba2e5a70 <check_preempt_wakeup+16>:   push   %r13
0xffffffffba2e5a72 <check_preempt_wakeup+18>:   push   %r12
0xffffffffba2e5a74 <check_preempt_wakeup+20>:   push   %rbx
0xffffffffba2e5a75 <check_preempt_wakeup+21>:   lea    0x68(%rsi),%rbx
0xffffffffba2e5a79 <check_preempt_wakeup+25>:   sub    $0x10,%rsp
……

1.2 推导第1个参数

第一个参数是通过寄存器%rdi传入,在汇编代码中偏移13位置处,看到将%rdi寄存器赋值给了%r14寄存器,
而%r14寄存器,从此时被赋值开始到发生宕机时刻,值都没有发生改变,
说明%r14寄存器里面保存了函数入口的第一个参数值;

crash> dis check_preempt_wakeup | grep r14                                                  //过滤出包含r14的指令
0xffffffffba2e5a6b <check_preempt_wakeup+11>:   push   %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>:   mov    %rdi,%r14
0xffffffffba2e5b9e <check_preempt_wakeup+318>:  mov    %r14,%rdi    
0xffffffffba2e5bb0 <check_preempt_wakeup+336>:  cmp    0x8b0(%r14),%r13
0xffffffffba2e5c09 <check_preempt_wakeup+425>:  pop    %r14
0xffffffffba2e5c27 <check_preempt_wakeup+455>:  pop    %r14

通过宕机时候打印出的寄存器信息,可以得到%r14=ffff96809fc9ac80,所以函数入口第一个参数值为rq=ffff96809fc9ac80;

1.3 推导第2个参数

第二个参数是通过寄存器%rsi传入,过滤出函数中%rsi的指令,发现从进入check_preempt_wakeup函数开始到宕机时刻,
%rsi指令的值都没有变化,所以宕机时刻寄存器%rsi中就保存着第2个参数值。%rsi=ffff9680406f9070,所以函数入口第二个参数值为
p=ffff9680406f9070;

crash> dis check_preempt_wakeup | grep rsi
0xffffffffba2e5a75 <check_preempt_wakeup+21>:   lea    0x68(%rsi),%rbx
0xffffffffba2e5aa1 <check_preempt_wakeup+65>:   mov    0xd8(%rsi),%rdi
0xffffffffba2e5adb <check_preempt_wakeup+123>:  mov    0x188(%rsi),%r9d
0xffffffffba2e5af7 <check_preempt_wakeup+151>:  mov    0x110(%rsi),%eax   ----->此条指令及前面指令是出错前的指令,可以被执行,rsi没有发生变化;
0xffffffffba2e5c47 <check_preempt_wakeup+487>:  mov    %rsi,-0x30(%rbp)     ----->此条指令开始是出错后指令,不会执行到此;
0xffffffffba2e5c55 <check_preempt_wakeup+501>:  mov    -0x30(%rbp),%rsi
0xffffffffba2e5c6b <check_preempt_wakeup+523>:  cmpl   $0x5,0x188(%rsi)

通过上面推导函数check_preempt_wakeup的三个参数为:
struct rq *rq=ffff96809fc9ac80;
struct task_struct *p=ffff9680406f9070;
int wake_flags=WF_FORK;
如果直接在check_preempt_wakeup中无法推出函数参数值,则可以向上在调用者代码中进行推断。

1.4 分析check_preempt_wakeup函数

struct rq *rq=ffff96809fc9ac80,struct task_struct *p=ffff9680406f9070,int wake_flags=WF_FORK;

static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_flags)
{
        struct task_struct *curr = rq->curr;                    //crash> struct rq.curr ffff96809fc9ac80
                                                                //------>curr = 0xffff96808d678000          

        struct sched_entity *se = &curr->se, *pse = &p->se;     //crash> struct task_struct -x -o | grep sched_entity
                                                                //[0x68] struct sched_entity se;
                                                                //------>se=0xffff96808d678068,pse=0xffff9680406f90d8

        struct cfs_rq *cfs_rq = task_cfs_rq(curr);              // curr->se.cfs_rq
                                                                //crash> struct task_struct.se.cfs_rq 0xffff96808d678000
                                                                //se.cfs_rq = 0xffff967f41400400
                                                                //------>cfs_rq=0xffff967f41400400                                                               

        int scale = cfs_rq->nr_running >= sched_nr_latency;     //crash> p sched_nr_latency         crash> struct cfs_rq.nr_running 0xffff967f41400400                
                                                                //sched_nr_latency = $1 = 2         nr_running = 2
                                                                //------>scale=(2>=2)=1;
        int next_buddy_marked = 0;

        if (unlikely(se == pse))                                //此处se=0xffff96808d678068,pse=0xffff9680406f90d8,两者不相等,所以不返回;
                return;

        if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))      //cfs_rq_of(pse):pse->cfs_rq
                return;                                         //crash> struct sched_entity.cfs_rq 0xffff9680406f90d8          crash> struct cfs_rq.throttle_count 0xffff967f41400400
                                                                //cfs_rq = 0xffff967f41400400                                   throttle_count = 0
                                                                //所以此处条件也不成立,不返回;


        if (sched_feat(NEXT_BUDDY) && scale && !(wake_flags & WF_FORK)) {     //此处wake_flags=WF_FORK,所以!(wake_flags & WF_FORK)条件不成立,不执行条件体语句。
                set_next_buddy(pse);
                next_buddy_marked = 1;
        }

        if (test_tsk_need_resched(curr))                        //crash> struct task_struct.stack 0xffff96808d678000            crash> struct thread_info.flags 0xffff968091738000
                return;                                         //stack = 0xffff968091738000                                    flags = 128
                                                                //由于thread_info->flags中没有设置TIF_NEED_RESCHED,所以此处条件不成立,不返回;

        if (unlikely(curr->policy == SCHED_IDLE) &&             //crash> struct task_struct.policy 0xffff96808d678000
            likely(p->policy != SCHED_IDLE))                    //policy = 0                                
                goto preempt;                                   //SCHED_IDLE=5与curr->policy=0不相等,条件不成立,所以不跳转;

        if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))  //该处条件也不成立,不跳转;
                return;

        find_matching_se(&se, &pse);                            //------------>进入find_matching_se函数中执行
        update_curr(cfs_rq_of(se));
        BUG_ON(!pse);
        if (wakeup_preempt_entity(se, pse) == 1) {
                if (!next_buddy_marked)
                        set_next_buddy(pse);
                goto preempt;
        }

        return;

preempt:
        resched_curr(rq);
        if (unlikely(!se->on_rq || curr == rq->idle))
                return;

        if (sched_feat(LAST_BUDDY) && scale && entity_is_task(se))
                set_last_buddy(se);
}
static void find_matching_se(struct sched_entity **se, struct sched_entity **pse)
{
        int se_depth, pse_depth;
        se_depth = (*se)->depth;                                                        //struct sched_entity.depth 0xffff96808d678068
                                                                                                                //depth = 3
                                                            //------>se_depth=3

        pse_depth = (*pse)->depth;                                                  //crash> struct sched_entity.depth 0xffff9680406f90d8
                                                                                                                //depth = 2
                                                            //------>pse_depth=2
        while (se_depth > pse_depth) {                                          //se_depth=3,pse_depth=2,只进行一次循环
                se_depth--;                                                                 //se_depth=3-1=2
                *se = parent_entity(*se);                                       //crash> struct sched_entity.parent 0xffff96808d678068
        }                                                                                                       //parent = 0xffff968095227900
                                                                                                                        //------->*se=0xffff968095227900

        while (pse_depth > se_depth) {                                          //pse_depth=2,se_depth=2两者相等,条件不成立,不执行while中语句
                pse_depth--;
                *pse = parent_entity(*pse);
        }
/*
  1. 第一轮循环  *se=0xffff968095227900,*pse=0xffff9680406f90d8
  crash> struct sched_entity.cfs_rq 0xffff968095227900                  crash> struct sched_entity.cfs_rq 0xffff9680406f90d8
  cfs_rq = 0xffff96804d2cfc00                                                                       cfs_rq = 0xffff967f41400400
  (*se)->cfs_rq=0xffff96804d2cfc00,(*pse)->cfs_rq=0xffff967f41400400,两者不相等,进而获取sched_entity->parent

  crash> struct sched_entity.parent 0xffff968095227900                  struct sched_entity.parent 0xffff9680406f90d8
  parent = 0xffff967f2e069300                                                                       parent = 0xffff968095227900
  *se=0xffff967f2e069300                                                                                *pse=0xffff968095227900

  2. 第二轮循环 *se=0xffff967f2e069300,*pse=0xffff968095227900
    struct sched_entity.cfs_rq 0xffff967f2e069300                                   crash> struct sched_entity.cfs_rq 0xffff968095227900
  cfs_rq = 0xffff967f540ee600                                                                   cfs_rq = 0xffff96804d2cfc00
  (*se)->cfs_rq=0xffff967f540ee600,(*pse)->cfs_rq=0xffff96804d2cfc00,两者不相等,进而获取sched_entity->parent

  crash> struct sched_entity.parent 0xffff967f2e069300                  struct sched_entity.parent 0xffff968095227900
  parent = 0xffff96809025af00                                                                       parent = 0xffff967f2e069300
  *se=0xffff96809025af00                                                                                *pse=0xffff967f2e069300

  3. 第三轮循环 *se=0xffff96809025af00,*pse=0xffff967f2e069300
  struct sched_entity.cfs_rq 0xffff96809025af00                                 crash> struct sched_entity.cfs_rq 0xffff967f2e069300
  cfs_rq = 0xffff96809fc9ad00                                                                   cfs_rq = 0xffff967f540ee600
  (*se)->cfs_rq=0xffff96809fc9ad00,(*pse)->cfs_rq=0xffff967f540ee600,两者不相等,进而获取sched_entity->parent

  crash> struct sched_entity.parent 0xffff96809025af00                  struct sched_entity.parent 0xffff967f2e069300
  parent = 0x0                                                                                                  parent = 0xffff96809025af00
  *se=0x0                                                                                                               *pse=0xffff96809025af00

  4. 第四轮循环 *se=0x0,*pse=0xffff96809025af00
  由于*se=0,所以在执行(*se)->cfs_rq就报错了,所以出错位置是在is_same_group函数中;
*/

        while (!is_same_group(*se, *pse)) {                 //(*se)->cfs_rq是否和(*pse)->cfs_rq相等;
                *se = parent_entity(*se);
                *pse = parent_entity(*pse);
        }
}

cfs_rq在sched_entity中偏移0x70,出错位置处汇编指令是从以%r12为基地址,偏移0x70处取值,即获取sched_entity->cfs_rq的值,所以经过上述对源码分析可以找到出错位置。

crash> dis check_preempt_wakeup+233
0xffffffffba2e5b49 <check_preempt_wakeup+233>:  mov    0x70(%r12),%rdi
crash> struct sched_entity -x -o | grep cfs_rq
[0x70] struct cfs_rq *cfs_rq;

1.5 出错原因

出错原因是由于父子进程的depth不匹配;主线中有相关patch,升级patch解决。
[upstream eeb61e53ea19be0c4015b00b2e8b3b2185436f2b]

父进程:depth = 0;parent = 0;
            depth = 1;parent = 0xffff96809025af00;
      depth = 2;parent = 0xffff967f2e069300;
      depth = 3;parent = 0xffff968095227900;
子进程:depth = 0;parent = 0;
            depth = 1;parent = 0xffff96809025af00;
      depth = 2;parent = 0xffff967f2e069300;
      depth = 2;parent = 0xffff968095227900;

2、分析汇编代码来查找

crash> dis check_preempt_wakeup
0xffffffffba2e5a60 <check_preempt_wakeup>:      nopl   0x0(%rax,%rax,1) [FTRACE NOP]
0xffffffffba2e5a65 <check_preempt_wakeup+5>:    push   %rbp
0xffffffffba2e5a66 <check_preempt_wakeup+6>:    mov    %rsp,%rbp
0xffffffffba2e5a69 <check_preempt_wakeup+9>:    push   %r15
0xffffffffba2e5a6b <check_preempt_wakeup+11>:   push   %r14
0xffffffffba2e5a6d <check_preempt_wakeup+13>:   mov    %rdi,%r14                                        #此处将函数入口第一个参数保存到%r14寄存器中;
0xffffffffba2e5a70 <check_preempt_wakeup+16>:   push   %r13
0xffffffffba2e5a72 <check_preempt_wakeup+18>:   push   %r12
0xffffffffba2e5a74 <check_preempt_wakeup+20>:   push   %rbx
0xffffffffba2e5a75 <check_preempt_wakeup+21>:   lea    0x68(%rsi),%rbx                          #此处将以%rsi为基地址,偏移0x68处地址赋值给%rbx,即%rsi+0x68值给%rbx
                                                                                                                                                                        #%rsi作为函数第二个参数,即struct task_struct,其中偏移0x68处刚好是se
                                                                                    #所以该条汇编指令对应源码:pse = &p->se
                                                                                    #%rbx = %rsi + 0x68 = 0xffff9680406f9070 + 0x68 = 0xffff9680406f90d8;
0xffffffffba2e5a79 <check_preempt_wakeup+25>:   sub    $0x10,%rsp
0xffffffffba2e5a7d <check_preempt_wakeup+29>:   mov    0x8a8(%rdi),%r13                         #%rdi作为函数第一个参数,struct rq,其中偏移0x8a8刚好是struct task_struct *curr结构
                                                                                                                                                                        #此处获取rq->curr值保存到%r13寄存器中;
                                                                                    #%r13 = [0xffff96809fc9ac80 + 0x8a8] = 0xffff96808d678000
0xffffffffba2e5a84 <check_preempt_wakeup+36>:   mov    0xd8(%r13),%rax                          #此处是以%r13为基地址,偏移0xd8,将其中内容取出,赋值给%rax
                                                                                                                                                                        #%r13值是rq->curr,即在struct task_struct中偏移0xd8地址处值,直接看偏移难以计算出该处是哪个变量
                                                                                    #task_struct中偏移0x68是sched_entity,而sched_entity中偏移0x70是cfs_rq,0xd8=0x68+0x70
                                                                                    #所以该条指令是获取task_struct中sched_entity中cfs_rq的值,对应源码struct cfs_rq *cfs_rq = task_cfs_rq(curr);
                                                                                    #所以%rax保存有curr->se->cfs_rq值;
                                                                                    #%rax = [0xffff96808d678000 + 0xd8] = 0xffff967f41400400

0xffffffffba2e5a8b <check_preempt_wakeup+43>:   lea    0x68(%r13),%r12                          #%r13中保存的是rq->curr值,此处以其为基地址,偏移0x68处地址赋值给%r12,即对应源码se = &curr->se
                                                                                                                                                                        #%r12=%r13+0x68=0xffff96808d678000+0x68=0xffff96808d678068
0xffffffffba2e5a8f <check_preempt_wakeup+47>:   cmp    %rbx,%r12                                        #%rbx=0xffff9680406f90d8 ---> pse = &p->se; %r12=0xffff96808d678068 ----> se = &curr->se;两者不相等
                                                                                                                                                                        #对应源码if (unlikely(se == pse))
0xffffffffba2e5a92 <check_preempt_wakeup+50>:   mov    0x10(%rax),%ecx                          #%rax保存的是curr->se->cfs_rq值,cfs_rq中偏移0x10处是nr_running,此处获取nr_running值保存到%ecx中
                                                                                                                                                                        #对应源码:cfs_rq->nr_running=2
                                                                                    #%ecx=2

0xffffffffba2e5a95 <check_preempt_wakeup+53>:   mov    0xb76b09(%rip),%eax          #%rip为下一条指令的地址,即0xffffffffba2e5a9b,%rax=[0xffffffffba2e5a9b+0xb76b09]=0x0032dcd500000002
                                                                                                                                                                        #%eax=2
0xffffffffba2e5a9b <check_preempt_wakeup+59>:   je     0xffffffffba2e5c00 <check_preempt_wakeup+416>    #由于上面%rbx与%r12不相等,所以此处不跳转;
0xffffffffba2e5aa1 <check_preempt_wakeup+65>:   mov    0xd8(%rsi),%rdi                          #%rdi=[%rsi+0xd8]=[%rsi+0x68+0x70],%rsi为task_struct基地址;偏移0x68处是sched_entity成员地址;
                                                                                                                                                                        #sched_entity中偏移0x70处是cfs_rq地址,此条汇编指令是获取task_struct中sched_entity成员的cfs_rq的值给%rdi
                                                                                    #对应源码cfs_rq_of(pse) ----> pse->cfs_rq
                                                                                    #%rdi=0xffff967f41400400
0xffffffffba2e5aa8 <check_preempt_wakeup+72>:   jmpq   0xffffffffba2e5c10 <check_preempt_wakeup+432>        #跳转到0xffffffffba2e5c10处执行 ------------->@@@@@@@@@@ 1 @@@@@@@@@
0xffffffffba2e5aad <check_preempt_wakeup+77>:   xor    %r15d,%r15d                                  #------->@@@@@@@@@@ 2 @@@@@@@@@
                                                                                                                                                                        #%r15d=0
0xffffffffba2e5ab0 <check_preempt_wakeup+80>:   cmp    %eax,%ecx                                        #比较%eax与%ecx
0xffffffffba2e5ab2 <check_preempt_wakeup+82>:   setae  %r15b                                                #如果%eax=%ecx则设置%r15b寄存器,此处%eax=2,%ecx=2,两者相等,所以%r15b=1;
                                                                                                                                                                        #对应源码:int scale = cfs_rq->nr_running >= sched_nr_latency;
0xffffffffba2e5ab6 <check_preempt_wakeup+86>:   nopl   0x0(%rax,%rax,1)
0xffffffffba2e5abb <check_preempt_wakeup+91>:   xor    %ecx,%ecx                                        #%ecx=0
0xffffffffba2e5abd <check_preempt_wakeup+93>:   mov    0x8(%r13),%rax                               #%rax=[%r13+0x8],%r13是rq->curr,偏移0x8处是栈地址stack,curr->stack=%rax=ffff968091738000
0xffffffffba2e5ac1 <check_preempt_wakeup+97>:   mov    0x10(%rax),%rax                          #task_struct->stack处地址也就是thread_info地址,thread_info中偏移0x10处是flags成员,此处获取flags值
                                                                                                                                                                        #对应源码test_tsk_need_resched(curr)
                                                                                    #%rax=[%rax+0x10]=[ffff968091738000+0x10]=0000000000000080,即thread_info->flags=0x80

0xffffffffba2e5ac5 <check_preempt_wakeup+101>:  test   $0x8,%al                                         #判断thread_info->flags的bit 3是否置1,也就是判断thread_info->flags中是否设置TIF_NEED_RESCHED标记
                                                                                                                                                                        #对应源码test_tsk_need_resched(curr)

0xffffffffba2e5ac7 <check_preempt_wakeup+103>:  jne    0xffffffffba2e5c00 <check_preempt_wakeup+416>    #此处条件不成立,不跳转
0xffffffffba2e5acd <check_preempt_wakeup+109>:  cmpl   $0x5,0x188(%r13)                         #%r13对应rq->curr,task_struct中偏移0x188处是policy成员,此处比较rq->curr->policy是否为5
                                                                                                                                                                        #对应源码unlikely(curr->policy == SCHED_IDLE)
                                                                                    #curr->policy=[%r13+0x188]=[0xffff96808d678000+0x188]=0000000200000000;
                                                                                    #curr->policy是int类型,占4个字节,所以取低4字节,即0,curr->policy=0;

0xffffffffba2e5ad5 <check_preempt_wakeup+117>:  je     0xffffffffba2e5c6b <check_preempt_wakeup+523>    #curr->policy == SCHED_IDLE不成立,所以不进行跳转
                                                                                                                                                                                                            #对应源码if (unlikely(curr->policy == SCHED_IDLE) &&

0xffffffffba2e5adb <check_preempt_wakeup+123>:  mov    0x188(%rsi),%r9d                         #%rsi对应第二个参数struct task_struct *p;此处获取p->policy=[%rsi+0x188]=[0xffff9680406f9070+0x188]=0x0000000200000000
                                                                                                                                                                        #p->policy占4字节,所以p->policy=0,
                                                                                    #%r9d=0

0xffffffffba2e5ae2 <check_preempt_wakeup+130>:  test   %r9d,%r9d                                        #判断%r9d是否为0,此处为0
0xffffffffba2e5ae5 <check_preempt_wakeup+133>:  jne    0xffffffffba2e5c00 <check_preempt_wakeup+416>        #如果不为0则跳转,此处不跳转
                                                                                                                                                                                                                #对应源码:if (unlikely(p->policy != SCHED_NORMAL) || !sched_feat(WAKEUP_PREEMPTION))                                                        
0xffffffffba2e5aeb <check_preempt_wakeup+139>:  nopl   0x0(%rax,%rax,1)
0xffffffffba2e5af0 <check_preempt_wakeup+144>:  mov    0x110(%r13),%edx                         #%r13中保存rq->curr值,此处偏移0x110直接找偏移难以找出,0x110=0x68+0xa0,其中task_struct中偏移0x68处是sched_entity;
                                                                                                                                                                        #sched_entity中偏移0xa0处是depth成员,所以以task_struct为基地址,偏移0x110处是获取depth成员;
                                                                                    #%edx=[%r13+0x110]=3 ----> se_depth=3
                                                                                    #对应源码se_depth = (*se)->depth;

0xffffffffba2e5af7 <check_preempt_wakeup+151>:  mov    0x110(%rsi),%eax                         #rsi中保存第二个参数struct task_struct *p值,此处获取p->se->depth值
                                                                                                                                                                        #%eax=[%rsi+0x110]=2 ----> pse_depth=2
                                                                                    #对应源码pse_depth = (*pse)->depth; 

0xffffffffba2e5afd <check_preempt_wakeup+157>:  cmp    %eax,%edx                                        #比较%edx与%eax    
0xffffffffba2e5aff <check_preempt_wakeup+159>:  jle    0xffffffffba2e5b16 <check_preempt_wakeup+182>    #%edx=3 如果小于等于 %eax=2,则跳转,此处不成立,不跳转
                                                                                                                                                                                                            #对应源码:while (se_depth > pse_depth)                                                                                                     
0xffffffffba2e5b01 <check_preempt_wakeup+161>:  nopl   0x0(%rax)
0xffffffffba2e5b08 <check_preempt_wakeup+168>:  sub    $0x1,%edx                                        #%edx=%edx - 1 = 2;
                                                                                                                                                                        #se_depth=2
                                                                                    #对应源码:se_depth--;

0xffffffffba2e5b0b <check_preempt_wakeup+171>:  mov    0x68(%r12),%r12                          #%r12中保留有rq->curr->se处地址,sched_entity中偏移0x68处是parent成员,此处获取se->parent成员值给%r12
                                                                                                                                                                        #%r12=0xffff968095227900
                                                                                    #对应源码:*se = parent_entity(*se);

0xffffffffba2e5b10 <check_preempt_wakeup+176>:  cmp    %eax,%edx                                        #此时%eax=2,%edx=2,两者相等
0xffffffffba2e5b12 <check_preempt_wakeup+178>:  jne    0xffffffffba2e5b08 <check_preempt_wakeup+168>    #如果两者不想等,则跳转,条件不成立,此处不跳转;
0xffffffffba2e5b14 <check_preempt_wakeup+180>:  mov    %eax,%edx                                        #%edx=%eax=2
0xffffffffba2e5b16 <check_preempt_wakeup+182>:  cmp    %eax,%edx                                        #比较%edx与%eax
0xffffffffba2e5b18 <check_preempt_wakeup+184>:  jge    0xffffffffba2e5b49 <check_preempt_wakeup+233>        #%edx大于等于%eax则跳转,此处相等,跳转到0xffffffffba2e5b49------>@@@@@@@@@@ 3 @@@@@@@@@
0xffffffffba2e5b1a <check_preempt_wakeup+186>:  nopw   0x0(%rax,%rax,1)
0xffffffffba2e5b20 <check_preempt_wakeup+192>:  sub    $0x1,%eax
0xffffffffba2e5b23 <check_preempt_wakeup+195>:  mov    0x68(%rbx),%rbx
0xffffffffba2e5b27 <check_preempt_wakeup+199>:  cmp    %edx,%eax
0xffffffffba2e5b29 <check_preempt_wakeup+201>:  jne    0xffffffffba2e5b20 <check_preempt_wakeup+192>
0xffffffffba2e5b2b <check_preempt_wakeup+203>:  mov    0x70(%r12),%rdi
0xffffffffba2e5b30 <check_preempt_wakeup+208>:  cmp    0x70(%rbx),%rdi
0xffffffffba2e5b34 <check_preempt_wakeup+212>:  je     0xffffffffba2e5b54 <check_preempt_wakeup+244>
0xffffffffba2e5b36 <check_preempt_wakeup+214>:  nopw   %cs:0x0(%rax,%rax,1)
0xffffffffba2e5b40 <check_preempt_wakeup+224>:  mov    0x68(%r12),%r12
0xffffffffba2e5b45 <check_preempt_wakeup+229>:  mov    0x68(%rbx),%rbx
                                                                                                                                                                        #------>@@@@@@@@@@ 3 @@@@@@@@@
0xffffffffba2e5b49 <check_preempt_wakeup+233>:  mov    0x70(%r12),%rdi                          #实际报错在该行,如果直接从上往下推一次,此处%r12不为0,而报错时候%r12=0,具有误导性
                                                                                                                                                                        #是由于经过while循环,不断重新给%12赋值,最后导致%12=0
                                                                                    #此处%12对应rq->curr中sched_entity的成员地址,sched_entity中偏移0x70刚好是cfs_rq成员,所以获取sched_entity中cfs_rq成员;
                                                                                    #对应源码:if (se->cfs_rq == pse->cfs_rq)

0xffffffffba2e5b4e <check_preempt_wakeup+238>:  cmp    0x70(%rbx),%rdi                          #获取pse->cfs_rq
0xffffffffba2e5b52 <check_preempt_wakeup+242>:  jne    0xffffffffba2e5b40 <check_preempt_wakeup+224>    #每次循环se->cfs_rq与pse->cfs_rq都不相等,所以跳转到0xffffffffba2e5b40处执行;
0xffffffffba2e5b54 <check_preempt_wakeup+244>:  test   %rdi,%rdi
0xffffffffba2e5b57 <check_preempt_wakeup+247>:  je     0xffffffffba2e5b40 <check_preempt_wakeup+224>
0xffffffffba2e5b59 <check_preempt_wakeup+249>:  mov    %ecx,-0x30(%rbp)
0xffffffffba2e5b5c <check_preempt_wakeup+252>:  callq  0xffffffffba2e4900 <update_curr>
0xffffffffba2e5b61 <check_preempt_wakeup+257>:  test   %rbx,%rbx
0xffffffffba2e5b64 <check_preempt_wakeup+260>:  mov    -0x30(%rbp),%ecx
0xffffffffba2e5b67 <check_preempt_wakeup+263>:  je     0xffffffffba2e5c7d <check_preempt_wakeup+541>
0xffffffffba2e5b6d <check_preempt_wakeup+269>:  mov    0x50(%r12),%rdx
0xffffffffba2e5b72 <check_preempt_wakeup+274>:  sub    0x50(%rbx),%rdx
……
                                                                                                                                                                                                                #---------->@@@@@@@@@@ 1 @@@@@@@@@
0xffffffffba2e5c10 <check_preempt_wakeup+432>:  mov    0xfc(%rdi),%edi                                                                  #%rdi=0xffff967f41400400,对应的是pse->cfs_rq,cfs_rq中偏移0xfc处是throttle_count成员
                                                                                                                                                                                                                #%rdi=[%rdi + 0xfc]=[0xffff967f414004fc]=91bf264000000000;
                                                                                                        #%edi=0;

0xffffffffba2e5c16 <check_preempt_wakeup+438>:  test   %edi,%edi                                                                                #测试%edi是否为0
0xffffffffba2e5c18 <check_preempt_wakeup+440>:  je     0xffffffffba2e5aad <check_preempt_wakeup+77>         #为0,则跳转到0xffffffffba2e5aad------->@@@@@@@@@@ 2 @@@@@@@@@
                                                                                                                                                                                                                #对应源码if (unlikely(throttled_hierarchy(cfs_rq_of(pse))))
……

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多