写这篇blog的缘由是因为最近调试模块代码的时候,出现了kernel crash,堆栈如下: PID: 0 最后的log日志则如下: ------------[ cut here ]------------ kernel BUG at kernel/workqueue.c:191! invalid opcode: 0000 [#1] SMP last sysfs file: /sys/kernel/mm/ksm/run CPU 0 Modules linked in: ext3 jbd binlogdev(U) ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables bnx2fc fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp sunrpc bridge stp llc vhost_net macvtap macvlan tun kvm_intel kvm sg serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support shpchp memdisk(U) memcon(U) ext4 mbcache jbd2 sd_mod crc_t10dif megaraid_sas e1000e video output ahci dm_mirror dm_region_hash dm_log dm_mod be2iscsi bnx2i cnic uio ipv6 cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio libiscsi_tcp qla4xxx iscsi_boot_sysfs libiscsi scsi_transport_iscsi [last unloaded: scsi_wait_scan] Pid: 0, comm: swapper Not tainted 2.6.32-358.6.1.el6.x86_64 #1 Supermicro X9SCI/X9SCA/X9SCI/X9SCA RIP: 0010:[] RSP: 0018:ffff880028203be0 RAX: ffffc9000578d068 RBX: 0000000000000286 RCX: 0000000000000000 RDX: ffffc9000578d060 RSI: ffff88021acddc00 RDI: 0000000000000000 RBP: ffff880028203be0 R08: ffffc9000578d038 R09: 0000000000000001 R10: ffff8802159bee80 R11: 0000000000000001 R12: 0000000000000000 R13: ffff8800c4702558 R14: 0000000000002400 R15: 0000000000000000 FS: CS: CR2: 000000000246d000 CR3: 00000001f1db3000 CR4: 00000000000427e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffffffff81a00000, task ffffffff81a8d020) Stack: ffff8800c4702558 ffff8800c4702558 ffff880028203c30 ffffffffa0342b3c ffffffffa0342b00 ffffffffa0342b00 ffff880028203c60 ffffffffa01b5d7f Call Trace: 省去了调用queue_work的流程。 判断程序是在workqueue.c这个文件的第191行挂掉,查看对应源代码(所有的源码为2.6.32-358.2.1.el6,在此说明下,以后就不一一说明代码出处了),如下: function: queue_work function: queue_work_on 可见,模块是crash在了第191行,也就是BUG_ON(!list_empty(&work->entry))。 接下来,就说说work工作队列的执行过程。 1. 首先看下创建workqueue_struct,我们引用single thread的创建深入, wq = create_singlethread_workqueue("testd"); ->__create_workqueue((name), 1, 0, 0) ->struct workqueue_struct *__create_workqueue_key(const char *name, int singlethread, int freezeable, int rt, struct lock_class_key *key, const char *lock_name) ->static int create_workqueue_thread(struct cpu_workqueue_struct *cwq, int cpu) 至此,创建workqueue完成,重点是最后执行的 create_workqueue_thread 函数,此函数创建了一个名为wq->name的线程,进入了static int worker_thread(void *__cwq)开始执行线程。 查看线程执行 线程主要为for的死循环,如下: |