分享

配置wsgi运行环境

 老匹夫 2015-01-03

配置wsgi运行环境




原创文章,转载请注明出处.转载自: Li Haifeng's Blog
本文链接地址: 配置wsgi运行环境



我想用python写web.
之前最拿手的应该是用asp写,后来,学了php。总感觉,用这些脚本写网站,不太geek。所以,就摸索着用python写。

昨天,申请了支持python的SAE,其安装手册上的例子是用wsgi跑一个helloworld. 我对这些东西还不懂,纯菜鸟。SAE用的版本系统是svn,蛋疼的很。每次svn ci的时候,总感觉很不习惯,不如git那样舒服。

so.想在local上学习python web开发。python写web方式有很多,因为SAE上支持WSGI,所以,就先玩玩WSGI吧。可是在配置WSGI方面,网上有很多垃圾文章,明明很简单的东西,非要再加个Django,而我只是想运行个hello world而已,用不上这么厚重的东西。

下面是配置总结,整个过程比较简单:

依次安装apache2和libapache2-mod-wsgi后,我还安装了libapache2-mod-wsgi-py3.

然后,在/etc/apache2/mods-enabled/下面应该能够看到俩个文件:
wsgi.conf  wsgi.load

然后在/etc/apache2/apache2.conf结尾加上:

231 <Directory /var/www/>
232 order deny,allow
233 Allow from all
234 </Directory>
235 WSGIScriptAlias / /var/www/index.wsgi


于是,就OK了。这个时候,怎么测试是OK的呢?我们先重启apache2,然后写个简单的hello world程序。
重启apache2:

	$sudo /etc/init.d/apache2 restart


Hello world程序: 在/var/www/下

 

$touch index.wsgi
$vim index.wsgi



def application(environ, start_response):
start_response('200 OK', [('Content-Type', 'text/plain')])
yield 'Hello Worldn'


ps:该helloworld测试程序来自:http://en./wiki/Web_Server_Gateway_Interface
然后在浏览器中http://127.0.0.1就可以看到Hello world了。



Post Footer automatically generated by wp-posturl plugin for wordpress.





November 23rd, 2011 in
Uncategorized | tags: |
暂无评论





SD卡读写操作浅析




原创文章,转载请注明出处.转载自: Li Haifeng's Blog
本文链接地址: SD卡读写操作浅析



一个读写请求何时被读写,怎样读写,全看请求队列。以Goldfish平台上的MMC卡,我们来看看其请求队列都怎样设置的:

mmc_blk_probe()
597     struct mmc_blk_data *md;
598     int err;
599
600     char cap_str[10];
601
602     /*
603      * Check that the card supports the command class(es) we need.
604      */
605     if (!(card->csd.cmdclass & CCC_BLOCK_READ))
606         return -ENODEV;
607      
608     md = mmc_blk_alloc(card);


mmc_blk_probe->mmc_blk_alloc()

510     struct mmc_blk_data *md;
511     int devidx, ret;
512
513     devidx = find_first_zero_bit(dev_use, MMC_NUM_MINORS);//在dev_use中查找一个没有被用到的
514     if (devidx >= MMC_NUM_MINORS)
515         return ERR_PTR(-ENOSPC);
516     __set_bit(devidx, dev_use);
517
518     md = kzalloc(sizeof(struct mmc_blk_data), GFP_KERNEL);
519     if (!md) {
520         ret = -ENOMEM;
521         goto out;
522     }
523
524
525     /*
526      * Set the read-only status based on the supported commands
527      * and the write protect switch.
528      */
529     md->read_only = mmc_blk_readonly(card);
530
531     md->disk = alloc_disk(1 << MMC_SHIFT);//这个结构非常非常重要
532     if (md->disk == NULL) {
533         ret = -ENOMEM;
534         goto err_kfree;
535     }
536
537     spin_lock_init(&md->lock);
538     md->usage = 1;//每get一次,会++
539
540     ret = mmc_init_queue(&md->queue, card, &md->lock);
541     if (ret)
542         goto err_putdisk;
543
544     md->queue.issue_fn = mmc_blk_issue_rq;
545     md->queue.data = md;
546
547     md->disk->major = MMC_BLOCK_MAJOR;//发送的request将根据它来寻找disk,然后挂载在disk->queue上
548     md->disk->first_minor = devidx << MMC_SHIFT;
549     md->disk->fops = &mmc_bdops;
550     md->disk->private_data = md;
551     md->disk->queue = md->queue.queue;//哦,原来如此哇~
552     md->disk->driverfs_dev = &card->dev;
第531行通过alloc_disk分配一个disk,这个结构就是通用块设备结构。所有的request,将查询其/dev/目录下的对应设备,通过major和minor找到对应的disk,然后挂载在disk->queue上。这个request被执行的时机,全部由这个请求队列决定。哪些方面呢?我们接着看,等所有流程跟踪结束后,会做一个总结。


mmc_blk_probe->mmc_blk_alloc()->mmc_init_queue()

125     mq->card = card;
126     mq->queue = blk_init_queue(mmc_request, lock);                                                                                    
127     if (!mq->queue)    
128         return -ENOMEM;
129
130     mq->queue->queuedata = mq;
131     mq->req = NULL;    
132  
133     blk_queue_prep_rq(mq->queue, mmc_prep_request);
134     blk_queue_ordered(mq->queue, QUEUE_ORDERED_DRAIN, NULL);
135     queue_flag_set_unlocked(QUEUE_FLAG_NONROT, mq->queue);
这个mmc_init_queue,由调用通用的blk_init_queue创建一个请求队列,在mmc_init_queue中,单独设定了其特定的参数。
其特定的参数由:
queue->request_fn=mmc_request
queue->prep_rq_fn=mmc_prep_request
queue->odered=QUEUE_ORDERED_DRAIN
queue->next_ordered=QUEUE_ORDERED_DRAIN
queue->prepare_flush_fn=NULL
除了这些特定的参数外,还有一些参数是通用的,也是必不可少的。继续往下看:
mmc_blk_probe->mmc_blk_alloc()->mmc_init_queue()->blk_init_queue
 540 /**
 541  * blk_init_queue  – prepare a request queue for use with a block device
 542  * @rfn:  The function to be called to process requests that have been
 543  *        placed on the queue.
 544  * @lock: Request queue spin lock
 545  *
 546  * Description:
 547  *    If a block device wishes to use the standard request handling procedures,
 548  *    which sorts requests and coalesces adjacent requests, then it must
 549  *    call blk_init_queue().  The function @rfn will be called when there
 550  *    are requests on the queue that need to be processed.  If the device
 551  *    supports plugging, then @rfn may not be called immediately when requests
 552  *    are available on the queue, but may be called at some time later instead.
 553  *    Plugged queues are generally unplugged when a buffer belonging to one
 554  *    of the requests on the queue is needed, or due to memory pressure.
 555  *
 556  *    @rfn is not required, or even expected, to remove all requests off the
 557  *    queue, but only as many as it can handle at a time.  If it does leave
 558  *    requests on the queue, it is responsible for arranging that the requests
 559  *    get dealt with eventually.
 560  *
 561  *    The queue spin lock must be held while manipulating the requests on the
 562  *    request queue; this lock will be taken also from interrupt context, so irq
 563  *    disabling is needed for it.
 564  *
 565  *    Function returns a pointer to the initialized request queue, or %NULL if
 566  *    it didn’t succeed.
 567  *  
 568  * Note:
 569  *    blk_init_queue() must be paired with a blk_cleanup_queue() call
 570  *    when the block device is deactivated (such as at module unload).
 571  **/
 572      
 573 struct request_queue *blk_init_queue(request_fn_proc *rfn, spinlock_t *lock)
 574 {
 575     return blk_init_queue_node(rfn, lock, -1);
 576 }


mmc_blk_probe->mmc_blk_alloc()->mmc_init_queue()->blk_init_queue->blk_init_queue_node

 579 struct request_queue *
 580 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 581 {
 582     struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
 583
 584     if (!q)
 585         return NULL;
 586
 587     q->node = node_id;
 588     if (blk_init_free_list(q)) {
 589         kmem_cache_free(blk_requestq_cachep, q);
 590         return NULL;
 591     }
 592
 593     /*
 594      * if caller didn’t supply a lock, they get per-queue locking with
 595      * our embedded lock
 596      */
 597     if (!lock)
 598         lock = &q->__queue_lock;
 599
 600     q->request_fn       = rfn;
 601     q->prep_rq_fn       = NULL;
 602     q->unplug_fn        = generic_unplug_device;
 603     q->queue_flags      = QUEUE_FLAG_DEFAULT;
 604     q->queue_lock       = lock;
 605
 606     blk_queue_segment_boundary(q, BLK_SEG_BOUNDARY_MASK);
 607
 608     blk_queue_make_request(q, __make_request);
 609     blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
 610
 611     blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
 612     blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
 613
 614     q->sg_reserved_size = INT_MAX;
 615
 616     blk_set_cmd_filter_defaults(&q->cmd_filter);
 617
 618     /*
 619      * all done
 620      */
 621     if (!elevator_init(q, NULL)) {
 622         blk_queue_congestion_threshold(q);
 623         return q;
 624     }


在这里,又定义了几个通用的参数:

queue->unplug_fn=generic_unplug_device
queue->make_request_fn=__make_request,这个函数太通用了
queue->seg_boundary_mask,这个是合并的规则,默认是0xFFFFFFFF
queue->max_segment_size,最大的segment是2^16B(64KB)
queue->max_hw_segments,最多的segment数目(128)
queue->max_phys_segments,最多物理段数目(128)


第608行,通过调用blk_queue_make_request,设置了一个关键的数据结构,queue->unplug_timer,它决定了request执行的时机。

第616行,设置一些标志位,这些标志位起到filter的作用。在request的执行中起作用。
621行设置了I/O调度算法,默认采用“anticipatory”,不过将之改为none也可以。各个块设备的request与电梯调度中的queue是什么关系呢?


mmc_blk_probe->mmc_blk_alloc()->mmc_init_queue()->blk_init_queue->blk_init_queue_node->blk_queue_make_request


 98 /**

 99  * blk_queue_make_request – define an alternate make_request function for a device
100  * @q:  the request queue for the device to be affected
101  * @mfn: the alternate make_request function
102  *
103  * Description:
104  *    The normal way for &struct bios to be passed to a device
105  *    driver is for them to be collected into requests on a request
106  *    queue, and then to allow the device driver to select requests
107  *    off that queue when it is ready.  This works well for many block
108  *    devices. However some block devices (typically virtual devices
109  *    such as md or lvm) do not benefit from the processing on the
110  *    request queue, and are served best by having the requests passed
111  *    directly to them.  This can be achieved by providing a function
112  *    to blk_queue_make_request().
113  *
114  * Caveat:
115  *    The driver that does this *must* be able to deal appropriately
116  *    with buffers in “highmemory”. This can be accomplished by either calling
117  *    __bio_kmap_atomic() to get a temporary kernel mapping, or by calling
118  *    blk_queue_bounce() to create a buffer in normal memory.
119  **/
120 void blk_queue_make_request(struct request_queue *q, make_request_fn *mfn)
121 {
122     /*
123      * set defaults
124      */
125     q->nr_requests = BLKDEV_MAX_RQ;
126     blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
127     blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
128     blk_queue_segment_boundary(q, BLK_SEG_BOUNDARY_MASK);
129     blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
130  
131     q->make_request_fn = mfn;
132     q->backing_dev_info.ra_pages =
133             (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
134     q->backing_dev_info.state = 0;
135     q->backing_dev_info.capabilities = BDI_CAP_MAP_COPY;
136     blk_queue_max_sectors(q, SAFE_MAX_SECTORS);
137     blk_queue_hardsect_size(q, 512);
138     blk_queue_dma_alignment(q, 511);
139     blk_queue_congestion_threshold(q);
140     q->nr_batching = BLK_BATCH_REQ;
141  
142     q->unplug_thresh = 4;       /* hmm */
143     q->unplug_delay = (3 * HZ) / 1000;  /* 3 milliseconds */
144     if (q->unplug_delay == 0)
145         q->unplug_delay = 1;
146  
147     q->unplug_timer.function = blk_unplug_timeout;
148     q->unplug_timer.data = (unsigned long)q;
149  
150     /*
151      * by default assume old behaviour and bounce for any highmem page
152      */
153     blk_queue_bounce_limit(q, BLK_BOUNCE_HIGH);
154 }
每当插入一个request的时候,若请求队列为空,则会 blk_plug_device


__make_request


1248     if (!blk_queue_nonrot(q) && elv_queue_empty(q))

1249         blk_plug_device(q);
1250     add_request(q, req);
__make_request->blk_plug_device
 205 /*
 206  * “plug” the device if there are no outstanding requests: this will
 207  * force the transfer to start only after we have put all the requests
 208  * on the list.
 209  *
 210  * This is called with interrupts off and no requests on the queue and
 211  * with the queue lock held.
 212  */
 213 void blk_plug_device(struct request_queue *q)
 214 {
 215     WARN_ON(!irqs_disabled());
 216
 217     /*
 218      * don’t plug a stopped queue, it must be paired with blk_start_queue()
 219      * which will restart the queueing
 220      */
 221     if (blk_queue_stopped(q))
 222         return;
 223
 224     if (!queue_flag_test_and_set(QUEUE_FLAG_PLUGGED, q)) {
 225         mod_timer(&q->unplug_timer, jiffies + q->unplug_delay);
 226         trace_block_plug(q);
 227     }
 228 }
通过225行的mod_timer来插入一个定时器。当定时器到期后,会执行timer.function,即blk_unplug_timeout。
 316 void blk_unplug_timeout(unsigned long data)
 317 {        
 318     struct request_queue *q = (struct request_queue *)data;
 319  
 320     trace_block_unplug_timer(q);
 321     kblockd_schedule_work(q, &q->unplug_work);
 322 }


321行,定时器又调度了q->unplug_work。而q->unplug_work是什么时候定义的呢?

在前面的blk_init_queue_node就定义了的。


mmc_blk_probe->mmc_blk_alloc()->mmc_init_queue()->blk_init_queue->blk_init_queue_node


 579 struct request_queue *

 580 blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
 581 {
 582     struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
 mmc_blk_probe->mmc_blk_alloc()->mmc_init_queue()->blk_init_queue->blk_init_queue_node->blk_alloc_queue_node
 508 struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 509 {
 510     struct request_queue *q;
 511     int err;
 512
 513     q = kmem_cache_alloc_node(blk_requestq_cachep,
 514                 gfp_mask | __GFP_ZERO, node_id);
 515     if (!q)
 516         return NULL;
 517
 518     q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug;
 519     q->backing_dev_info.unplug_io_data = q;
 520     err = bdi_init(&q->backing_dev_info);
 521     if (err) {
 522         kmem_cache_free(blk_requestq_cachep, q);
 523         return NULL;
 524     }
 525
 526     init_timer(&q->unplug_timer);
 527     setup_timer(&q->timeout, blk_rq_timed_out_timer, (unsigned long) q);
 528     INIT_LIST_HEAD(&q->timeout_list);
 529     INIT_WORK(&q->unplug_work, blk_unplug_work);
 530
 531     kobject_init(&q->kobj, &blk_queue_ktype);
 532
 533     mutex_init(&q->sysfs_lock);
 534     spin_lock_init(&q->__queue_lock);
 535
 536     return q;
 537 }


在529行,定义了queue->unplug_work。(在第527行,竟然还有一个timer,这个q->timeout的timer什么时候用呢?)


queue->unplug_work的function是blk_unplug_work


blk_unplug_timeout->blk_unplug_work


 307 void blk_unplug_work(struct work_struct *work)

 308 {
 309     struct request_queue *q =
 310         container_of(work, struct request_queue, unplug_work);
 311
 312     trace_block_unplug_io(q);
 313     q->unplug_fn(q);
 314 }


第313行又调用了q->unplug_fn,即generic_unplug_device(在blk_init_queue_node中定义了)。


blk_unplug_timeout->blk_unplug_work->generic_unplug_device


 278 /**

 279  * generic_unplug_device – fire a request queue
 280  * @q:    The &struct request_queue in question
 281  *
 282  * Description:
 283  *   Linux uses plugging to build bigger requests queues before letting
 284  *   the device have at them. If a queue is plugged, the I/O scheduler
 285  *   is still adding and merging requests on the queue. Once the queue
 286  *   gets unplugged, the request_fn defined for the queue is invoked and
 287  *   transfers started.
 288  **/
 289 void generic_unplug_device(struct request_queue *q)
 290 {
 291     if (blk_queue_plugged(q)) {
 292         spin_lock_irq(q->queue_lock);
 293         __generic_unplug_device(q);
 294         spin_unlock_irq(q->queue_lock);
 295     }
 296 }


blk_unplug_timeout->blk_unplug_work->generic_unplug_device->__generic_unplug_device


 268 void __generic_unplug_device(struct request_queue *q)

 269 {
 270     if (unlikely(blk_queue_stopped(q)))
 271         return;
 272     if (!blk_remove_plug(q) && !blk_queue_nonrot(q))
 273         return;
 274
 275     q->request_fn(q);
 276 }


275行的request_fn对于mmc是mmc_request


blk_unplug_timeout->blk_unplug_work->generic_unplug_device->__generic_unplug_device->mmc_request

 81 /*

 82  * Generic MMC request handler.  This is called for any queue on a

 83  * particular host.  When the host is not busy, we look for a request

 84  * on any queue on this host, and attempt to issue it.  This may

 85  * not be the queue we were asked to process.

 86  */

 87 static void mmc_request(struct request_queue *q)

 88 {

 89     struct mmc_queue *mq = q->queuedata;

 90     struct request *req;

 91     int ret;

 92 

 93     if (!mq) {

 94         printk(KERN_ERR “MMC: killing requests for dead queuen”);

 95         while ((req = elv_next_request(q)) != NULL) {

 96             do {

 97                 ret = __blk_end_request(req, -EIO,

 98                             blk_rq_cur_bytes(req));

 99             } while (ret);

100         }

101         return;

102     }

103 

104     if (!mq->req)

105         wake_up_process(mq->thread);//用线程来写

106 }

在105行,mmc又唤醒了mq->thread.即

mmc_init_queue

202     init_MUTEX(&mq->thread_sem);

203 

204     mq->thread = kthread_run(mmc_queue_thread, mq, “mmcqd”);

205     if (IS_ERR(mq->thread)) {

206         ret = PTR_ERR(mq->thread);

207         goto free_bounce_sg;

208     }

blk_unplug_timeout->blk_unplug_work->generic_unplug_device->__generic_unplug_device->mmc_request->mmc_queue_thread

 44 static int mmc_queue_thread(void *d)

 45 {

 46     struct mmc_queue *mq = d;

 47     struct request_queue *q = mq->queue;

 48 

 49     current->flags |= PF_MEMALLOC;

 50 

 51     down(&mq->thread_sem);

 52     do {

 53         struct request *req = NULL;

 54 

 55         spin_lock_irq(q->queue_lock);

 56         set_current_state(TASK_INTERRUPTIBLE);

 57         if (!blk_queue_plugged(q))
 58             req = elv_next_request(q);
 59         mq->req = req;
 60         spin_unlock_irq(q->queue_lock);

 61 

 62         if (!req) {

 63             if (kthread_should_stop()) {

 64                 set_current_state(TASK_RUNNING);

 65                 break;

 66             }

 67             up(&mq->thread_sem);

 68             schedule();

 69             down(&mq->thread_sem);

 70             continue;

 71         }

 72         set_current_state(TASK_RUNNING);

 73 

 74         mq->issue_fn(mq, req);

 75     } while (1);

 76     up(&mq->thread_sem);

 77 

 78     return 0;

 79 }

 80 

而,该线程所做的事情是,从请求队列上取下一个request,然后用mq->issue_fn去执行。mq->issue_fn是在mmc_blk_probe->mmc_blk_alloc中定义的:mmc_blk_issue_rq

mmc_queue_thread->mmc_blk_issue_rq:

264 static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)

265 {

266     struct mmc_blk_data *md = mq->data;

267     struct mmc_card *card = md->queue.card;

268     struct mmc_blk_request brq;

269     int ret = 1, disable_multi = 0;

270 

271 #ifdef CONFIG_MMC_BLOCK_DEFERRED_RESUME

272     if (mmc_bus_needs_resume(card->host)) {

273         mmc_resume_bus(card->host);

274         mmc_blk_set_blksize(md, card);

275     }

276 #endif

277 

278     mmc_claim_host(card->host);

279 

280     do {

281         struct mmc_command cmd;

282         u32 readcmd, writecmd, status = 0;

283 

284         memset(&brq, 0, sizeof(struct mmc_blk_request));

285         brq.mrq.cmd = &brq.cmd;

286         brq.mrq.data = &brq.data;

287 

288         brq.cmd.arg = req->sector;

289         if (!mmc_card_blockaddr(card))

290             brq.cmd.arg <<= 9;

291         brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;

292         brq.data.blksz = 512;

293         brq.stop.opcode = MMC_STOP_TRANSMISSION;

294         brq.stop.arg = 0;

295         brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;

296         brq.data.blocks = req->nr_sectors;

297 

298         /*

299          * The block layer doesn’t support all sector count

300          * restrictions, so we need to be prepared for too big

301          * requests.

302          */

303         if (brq.data.blocks > card->host->max_blk_count)

304             brq.data.blocks = card->host->max_blk_count;

305 

306         /*

307          * After a read error, we redo the request one sector at a time

308          * in order to accurately determine which sectors can be read

309          * successfully.

310          */

311         if (disable_multi && brq.data.blocks > 1)

312             brq.data.blocks = 1;

313 

314         if (brq.data.blocks > 1) {

315             /* SPI multiblock writes terminate using a special

316              * token, not a STOP_TRANSMISSION request.

317              */

318             if (!mmc_host_is_spi(card->host)

319                     || rq_data_dir(req) == READ)

320                 brq.mrq.stop = &brq.stop;

321             readcmd = MMC_READ_MULTIPLE_BLOCK;

322             writecmd = MMC_WRITE_MULTIPLE_BLOCK;

323         } else {

324             brq.mrq.stop = NULL;

325             readcmd = MMC_READ_SINGLE_BLOCK;

326             writecmd = MMC_WRITE_BLOCK;

327         }

328 

329         if (rq_data_dir(req) == READ) {

330             brq.cmd.opcode = readcmd;

331             brq.data.flags |= MMC_DATA_READ;

332         } else {

333             brq.cmd.opcode = writecmd;

334             brq.data.flags |= MMC_DATA_WRITE;

335         }

336 

337         mmc_set_data_timeout(&brq.data, card);

338 

339         brq.data.sg = mq->sg;

340         brq.data.sg_len = mmc_queue_map_sg(mq);

341 

342         /*

343          * Adjust the sg list so it is the same size as the

344          * request.

345          */

346         if (brq.data.blocks != req->nr_sectors) {

347             int i, data_size = brq.data.blocks << 9;

348             struct scatterlist *sg;

349 

350             for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) {

351                 data_size -= sg->length;

352                 if (data_size <= 0) {

353                     sg->length += data_size;

354                     i++;

355                     break;

356                 }

357             }

358             brq.data.sg_len = i;

359         }

360 

361         mmc_queue_bounce_pre(mq);

362 

363         mmc_wait_for_req(card->host, &brq.mrq);

364 

365         mmc_queue_bounce_post(mq);

366 

367         /*

368          * Check for errors here, but don’t jump to cmd_err

369          * until later as we need to wait for the card to leave

370          * programming mode even when things go wrong.

371          */

372         if (brq.cmd.error || brq.data.error || brq.stop.error) {

373             if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {

374                 /* Redo read one sector at a time */

375                 printk(KERN_WARNING “%s: retrying using single “

376                        “block readn”, req->rq_disk->disk_name);

377                 disable_multi = 1;

378                 continue;

379             }

380             status = get_card_status(card, req);

381         } else if (disable_multi == 1) {

382             disable_multi = 0;

383         }

384 

385         if (brq.cmd.error) {

386             printk(KERN_ERR “%s: error %d sending read/write “

387                    “command, response %#x, card status %#xn”,

388                    req->rq_disk->disk_name, brq.cmd.error,

389                    brq.cmd.resp[0], status);

390         }

391 

392         if (brq.data.error) {

393             if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)

394                 /* ‘Stop’ response contains card status */

395                 status = brq.mrq.stop->resp[0];

396             printk(KERN_ERR “%s: error %d transferring data,”

397                    ” sector %u, nr %u, card status %#xn”,

398                    req->rq_disk->disk_name, brq.data.error,

399                    (unsigned)req->sector,

400                    (unsigned)req->nr_sectors, status);

401         }

402 

403         if (brq.stop.error) {

404             printk(KERN_ERR “%s: error %d sending stop command, “

405                    “response %#x, card status %#xn”,

406                    req->rq_disk->disk_name, brq.stop.error,

407                    brq.stop.resp[0], status);

408         }

409 

410         if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {

411             do {

412                 int err;

413 

414                 cmd.opcode = MMC_SEND_STATUS;

415                 cmd.arg = card->rca << 16;

416                 cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;

417                 err = mmc_wait_for_cmd(card->host, &cmd, 5);

418                 if (err) {

419                     printk(KERN_ERR “%s: error %d requesting statusn”,

420                            req->rq_disk->disk_name, err);

421                     goto cmd_err;

422                 }

423                 /*

424                  * Some cards mishandle the status bits,

425                  * so make sure to check both the busy

426                  * indication and the card state.

427                  */

428             } while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||

429                 (R1_CURRENT_STATE(cmd.resp[0]) == 7));

430 

431 #if 0

432             if (cmd.resp[0] & ~0x00000900)

433                 printk(KERN_ERR “%s: status = %08xn”,

434                        req->rq_disk->disk_name, cmd.resp[0]);

435             if (mmc_decode_status(cmd.resp))

436                 goto cmd_err;

437 #endif

438         }

439 

440         if (brq.cmd.error || brq.stop.error || brq.data.error) {

441             if (rq_data_dir(req) == READ) {

442                 /*

443                  * After an error, we redo I/O one sector at a

444                  * time, so we only reach here after trying to

445                  * read a single sector.

446                  */

447                 spin_lock_irq(&md->lock);

448                 ret = __blk_end_request(req, -EIO, brq.data.blksz);

449                 spin_unlock_irq(&md->lock);

450                 continue;

451             }

452             goto cmd_err;

453         }

454 

455         /*

456          * A block was successfully transferred.

457          */

458         spin_lock_irq(&md->lock);

459         ret = __blk_end_request(req, 0, brq.data.bytes_xfered);

460         spin_unlock_irq(&md->lock);

461     } while (ret);

462 

463     mmc_release_host(card->host);

464 

465     return 1;

466 

467  cmd_err:

468     /*

469      * If this is an SD card and we’re writing, we can first

470      * mark the known good sectors as ok.

471      *

472      * If the card is not SD, we can still ok written sectors

473      * as reported by the controller (which might be less than

474      * the real number of written sectors, but never more).

475      */

476     if (mmc_card_sd(card)) {

477         u32 blocks;

478 

479         blocks = mmc_sd_num_wr_blocks(card);

480         if (blocks != (u32)-1) {

481             spin_lock_irq(&md->lock);

482             ret = __blk_end_request(req, 0, blocks << 9);

483             spin_unlock_irq(&md->lock);

484         }

485     } else {

486         spin_lock_irq(&md->lock);

487         ret = __blk_end_request(req, 0, brq.data.bytes_xfered);

488         spin_unlock_irq(&md->lock);

489     }

490 

491     mmc_release_host(card->host);

492 

493     spin_lock_irq(&md->lock);

494     while (ret)

495         ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req));

496     spin_unlock_irq(&md->lock);

497 

498     return 0;

499 }

500 

该函数所做的事情就是将mmc->host锁住,然后让host进行操作。操作最重要的是363行:mmc_wait_for_req(card->host, &brq.mrq)

mmc_queue_thread->mmc_blk_issue_rq->mmc_wait_for_req

 186 /**        

 187  *  mmc_wait_for_req – start a request and wait for completion

 188  *  @host: MMC host to start command

 189  *  @mrq: MMC request to start

 190  *             

 191  *  Start a new MMC custom command request for a host, and wait

 192  *  for the command to complete. Does not attempt to parse the

 193  *  response.      

 194  */            

 195 void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq)

 196 {          

 197     DECLARE_COMPLETION_ONSTACK(complete);

 198 

 199     mrq->done_data = &complete;

 200     mrq->done = mmc_wait_done;

 201      
 202     mmc_start_request(host, mrq);//maybe have a long time.
 203      
 204     wait_for_completion(&complete);//wait until the data completed.the sem also anipulated by interrupt.
 205 }    
 206
  207 EXPORT_SYMBOL(mmc_wait_for_req);


mmc_start_request,就开始操作了。什么时候操作结束呢?要看complete了。complete中的done变量,在数据读取结束时,有中断来将其+1。然后本线程检测到后,就可以结束了。


 注:2012.8.24日重新更新了一下排版将代码行间多出的两行空格删掉。这篇浅析是在读书时代总结的,内容分析的比较粗浅,代码看的也不太详细。分析不到位的地方涵请谅解:)。



Post Footer automatically generated by wp-posturl plugin for wordpress.





October 8th, 2011 in
Uncategorized | tags: |
暂无评论





从Nand驱动到文件读写




原创文章,转载请注明出处.转载自: Li Haifeng's Blog
本文链接地址: 从Nand驱动到文件读写




Nand属于块设备。那么nand块设备是否像其他块设备那样,每次读写都经历一个“C/S”的过程呢?
我们在Goldfish Platform上,从nand的驱动注册开始,看看nand之上的yaffs2文件读写到底是怎样的一个过程。





本文主要是对自己在学习过程中遇到疑问做一个记录,同以前的文章一样,基本上只有流程,那些原理之类的东西,请同学们google吧。在下文中,有些代码可能会有重复,主要目的是不想让各位看官看的太累,跳来跳去,眼镜受不了啊。





代码是Android Kernel 2.6.29.整个记录过程比较仓促,难免会由认识上的错误,欢迎大家指正。





下面是Android在Goldfish Platform上的执行流程:
<1>
377 static int __init init_mtdblock(void)        
378 { 
379     return register_mtd_blktrans(&mtdblock_tr);
380 } 
在代码片段<1>中注册了一个struct mtd_blktrans_ops结构的mtdblock_tr,这个模块是系统在启动过程中加载的,从模块的init名字,可以看出,是针对mtd块设备的。由于在Linux中,Nand被归为MTD设备,MTD设备就是将nand设备封装了一下,让上层没有直接看到nand,而是看到的MTD。实际上,通过MTD来操作Nand,还是通过nand内部的驱动函数。不要把MTD看的太过神秘。如果还需要了解,请Google吧,我之前就是太较真了,一直没有弄明白,read the fucking code之后才算明白过来了。





这个mtd_blktrans_ops结构如下:
<2>
 32 struct mtd_blktrans_ops {
 33     char *name;
 34     int major;
 35     int part_bits;
 36     int blksize;
 37     int blkshift;
 38    
 39     /* Access functions */
 40     int (*readsect)(struct mtd_blktrans_dev *dev,
 41             unsigned long block, char *buffer);
 42     int (*writesect)(struct mtd_blktrans_dev *dev,
 43              unsigned long block, char *buffer);
 44     int (*discard)(struct mtd_blktrans_dev *dev,
 45                unsigned long block, unsigned nr_blocks);
 46    
 47     /* Block layer ioctls */
 48     int (*getgeo)(struct mtd_blktrans_dev *dev, struct hd_geometry *geo);
 49     int (*flush)(struct mtd_blktrans_dev *dev);
 50
 51     /* Called with mtd_table_mutex held; no race with add/remove */
 52     int (*open)(struct mtd_blktrans_dev *dev);
 53     int (*release)(struct mtd_blktrans_dev *dev);
 54   
 55     /* Called on {de,}registration and on subsequent addition/removal
 56        of devices, with mtd_table_mutex held. */
 57     void (*add_mtd)(struct mtd_blktrans_ops *tr, struct mtd_info *mtd);
 58     void (*remove_dev)(struct mtd_blktrans_dev *dev);
 59   
 60     struct list_head devs;
 61     struct list_head list;
 62     struct module *owner;
 63   
 64     struct mtd_blkcore_priv *blkcore_priv;
 65 };
其中第64行的 struct mtd_blkcore_priv,它包含了一个读写请求队列。所有的mtd设备的读写请求共用了一个请求队列。
init_mtdblock->register_mtd_blktrans
340 int register_mtd_blktrans(struct mtd_blktrans_ops *tr)
341 {
342     int ret, i;
343
344     /* Register the notifier if/when the first device type is
345        registered, to prevent the link/init ordering from fucking
346        us over. */
347     if (!blktrans_notifier.list.next)
348         register_mtd_user(&blktrans_notifier);
349
350     tr->blkcore_priv = kzalloc(sizeof(*tr->blkcore_priv), GFP_KERNEL);//几乎算是一个队列了
351     if (!tr->blkcore_priv)
352         return -ENOMEM;
353
354     mutex_lock(&mtd_table_mutex);
355
356     ret = register_blkdev(tr->major, tr->name);//"mtdblk"注册一个通用块设备
357     if (ret) {
358         printk(KERN_WARNING "Unable to register %s block device on major %d: %dn",
359                tr->name, tr->major, ret);
360         kfree(tr->blkcore_priv);
361         mutex_unlock(&mtd_table_mutex);
362         return ret;
363     }
364     spin_lock_init(&tr->blkcore_priv->queue_lock);
365
366     tr->blkcore_priv->rq = blk_init_queue(mtd_blktrans_request, &tr->blkcore_priv->queue_lock);
367     if (!tr->blkcore_priv->rq) {
368         unregister_blkdev(tr->major, tr->name);
369         kfree(tr->blkcore_priv);
370         mutex_unlock(&mtd_table_mutex);
371         return -ENOMEM;
372     }
373
374     tr->blkcore_priv->rq->queuedata = tr;
375     blk_queue_hardsect_size(tr->blkcore_priv->rq, tr->blksize);
376     if (tr->discard)
377         blk_queue_set_discard(tr->blkcore_priv->rq,
378                       blktrans_discard_request);
379
380     tr->blkshift = ffs(tr->blksize) – 1;
381
382     tr->blkcore_priv->thread = kthread_run(mtd_blktrans_thread, tr,
383             "%sd", tr->name);
384     if (IS_ERR(tr->blkcore_priv->thread)) {
385         blk_cleanup_queue(tr->blkcore_priv->rq);
386         unregister_blkdev(tr->major, tr->name);
387         kfree(tr->blkcore_priv);
388         mutex_unlock(&mtd_table_mutex);
389         return PTR_ERR(tr->blkcore_priv->thread);
390     }
391
392     INIT_LIST_HEAD(&tr->devs);
393     list_add(&tr->list, &blktrans_majors);
394
395     for (i=0; i<MAX_MTD_DEVICES; i++) {
396         if (mtd_table[i] && mtd_table[i]->type != MTD_ABSENT)
397             tr->add_mtd(tr, mtd_table[i]);//对于每一个mtd设备,都alloc_disk
398     }
399
400     mutex_unlock(&mtd_table_mutex);
401
402     return 0;
403 }
356行,比较重要在/dev/目录下,将多一个mtdblk节点。为啥叫mtdblk呢,第二个参数决定的。^_^
366行,正如上述所言,声明了一个读写请求队列。
382行,声明了一个内核线程。当每一次请求发送的时候,会让这个线程run一次。(引入一个问题:线程自动终结后,如何释放所拥有的资源?)
在第395~398行,其实所有的nand并没有在这里添加,貌似是因为这个时候goldfish_nand设备驱动还没有被加进来出来。所以register_mtd_blktrans的工作至此已经结束了。它的贡献,仅仅是注册了一个mtd_blktrans_ops。
当执行到module_init(goldfish_nand_init)的时候,才开始添加mtd设备。
405 static int __init goldfish_nand_init(void)
406 {
407     return platform_driver_register(&goldfish_nand_driver);
408 }
在这之后,会遍历bus上的所有设备,直到和goldfish_nand相匹配。有同学可能会有疑问,为啥在设备注册的时候不主动去匹配驱动呢?确实,设备会主动去匹配驱动,但是当前驱动的代码还没有被加载进来的时候,去神马地方找驱动捏?
 58 static void goldfish_pdev_worker(struct work_struct *work)
 59 {
 60     int ret;
 61     struct pdev_bus_dev *pos, *n;
 62
 63     list_for_each_entry_safe(pos, n, &pdev_bus_removed_devices, list) {
 64         list_del(&pos->list);
 65         platform_device_unregister(&pos->pdev);
 66         kfree(pos);
 67     }
 68     list_for_each_entry_safe(pos, n, &pdev_bus_new_devices, list) {
 69         list_del(&pos->list);
 70         ret = platform_device_register(&pos->pdev);
 71         if(ret) {
 72             printk("goldfish_pdev_worker failed to register device, %sn", pos->pdev.name);
 73         }      
 74         else {
 75             printk("goldfish_pdev_worker registered %sn", pos->pdev.name);
 76         }
 77         list_add_tail(&pos->list, &pdev_bus_registered_devices);
 78     }
 79 }
看第70行,可以看到,确实注册了设备,并且在platform_device_register中,也确实去匹配驱动了,只不过没有找到驱动而饮恨“铩羽而归”。然后,当每一次注册一个驱动的时候,会去找对应的设备。当找到对应的设备后,就会调用对应驱动的probe函数了。对于goldfish_nand_driver,其probe函数是:
goldfish_nand_probe
315 static int goldfish_nand_probe(struct platform_device *pdev)
316 {
317     uint32_t num_dev;
318     int i;
319     int err;
320     uint32_t num_dev_working;
321     uint32_t version;
322     struct resource *r;
323     struct goldfish_nand *nand;
324     unsigned char __iomem  *base;
325
326     r = platform_get_resource(pdev, IORESOURCE_MEM, 0);
327     if(r == NULL) {
328         err = -ENODEV;
329         goto err_no_io_base;
330     }
331
332     base = ioremap(r->start, PAGE_SIZE);
333     if(base == NULL) {
334         err = -ENOMEM;
335         goto err_ioremap;
336     }
337     version = readl(base + NAND_VERSION);
338     if(version != NAND_VERSION_CURRENT) {
339         printk("goldfish_nand_init: version mismatch, got %d, expected %dn",
340                version, NAND_VERSION_CURRENT);
341         err = -ENODEV;
342         goto err_no_dev;
343     }
344     num_dev = readl(base + NAND_NUM_DEV);
345     if(num_dev == 0) {
346         err = -ENODEV;
347         goto err_no_dev;
348     }
349
350     nand = kzalloc(sizeof(*nand) + sizeof(struct mtd_info) * num_dev, GFP_KERNEL);
351     if(nand == NULL) {
352         err = -ENOMEM;
353         goto err_nand_alloc_failed;
354     }
355     spin_lock_init(&nand->lock);
356     nand->base = base;
357     nand->mtd_count = num_dev;
358     platform_set_drvdata(pdev, nand);
359
360     num_dev_working = 0;
361     for(i = 0; i < num_dev; i++) {
362         err = goldfish_nand_init_device(nand, i);
363         if(err == 0)
364             num_dev_working++;
365     }
366     if(num_dev_working == 0) {
367         err = -ENODEV;
368         goto err_no_working_dev;
369     }
370     return 0;
371
372 err_no_working_dev:
373     kfree(nand);
374 err_nand_alloc_failed:
375 err_no_dev:
376     iounmap(base);
377 err_ioremap:
378 err_no_io_base:
379     return err;
380 }
362行,根据枚举出来的nand,调用goldfish_nand_init_device去初始化。
goldfish_nand_probe->goldfish_nand_init_device
248 static int goldfish_nand_init_device(struct goldfish_nand *nand, int id)
249 {
250     uint32_t name_len;
251     uint32_t result;
252     uint32_t flags;
253     unsigned long irq_flags;
254     unsigned char __iomem  *base = nand->base;
255     struct mtd_info *mtd = &nand->mtd[id];
256     char *name;
257
258     spin_lock_irqsave(&nand->lock, irq_flags);
259     writel(id, base + NAND_DEV);
260     flags = readl(base + NAND_DEV_FLAGS);
261     name_len = readl(base + NAND_DEV_NAME_LEN);
262     mtd->writesize = readl(base + NAND_DEV_PAGE_SIZE);
263     mtd->size = readl(base + NAND_DEV_SIZE_LOW);
264     mtd->size |= (uint64_t)readl(base + NAND_DEV_SIZE_HIGH) << 32;
265     mtd->oobsize = readl(base + NAND_DEV_EXTRA_SIZE);
266     mtd->oobavail = mtd->oobsize;
267     mtd->erasesize = readl(base + NAND_DEV_ERASE_SIZE) /
268                      (mtd->writesize + mtd->oobsize) * mtd->writesize;
269     do_div(mtd->size, mtd->writesize + mtd->oobsize);
270     mtd->size *= mtd->writesize;
271     printk("goldfish nand dev%d: size %llx, page %d, extra %d, erase %dn",
272            id, mtd->size, mtd->writesize, mtd->oobsize, mtd->erasesize);
273     spin_unlock_irqrestore(&nand->lock, irq_flags);
274
275     mtd->priv = nand;
276
277     mtd->name = name = kmalloc(name_len + 1, GFP_KERNEL);
278     if(name == NULL)
279         return -ENOMEM;
280
281     result = goldfish_nand_cmd(mtd, NAND_CMD_GET_DEV_NAME, 0, name_len, name);
282     if(result != name_len) {
283         kfree(mtd->name);
284         mtd->name = NULL;
285         printk("goldfish_nand_init_device failed to get dev name %d != %dn",
286                result, name_len);
287         return -ENODEV;
288     }
289     ((char *) mtd->name)[name_len] = '';
290
291     /* Setup the MTD structure */
292     mtd->type = MTD_NANDFLASH;
293     mtd->flags = MTD_CAP_NANDFLASH;
294     if(flags & NAND_DEV_FLAG_READ_ONLY)
295         mtd->flags &= ~MTD_WRITEABLE;
296
297     mtd->owner = THIS_MODULE;
298     mtd->erase = goldfish_nand_erase;
299     mtd->read = goldfish_nand_read;
300     mtd->write = goldfish_nand_write;
301     mtd->read_oob = goldfish_nand_read_oob;
302     mtd->write_oob = goldfish_nand_write_oob;
303     mtd->block_isbad = goldfish_nand_block_isbad;
304     mtd->block_markbad = goldfish_nand_block_markbad;
305
306     if (add_mtd_device(mtd)) {
307         kfree(mtd->name);
308         mtd->name = NULL;
309         return -EIO;
310     }
311
312     return 0;
313 }
306行,调用add_mtd_device
goldfish_nand_probe->goldfish_nand_init_device->add_mtd_device
 35 /**
 36  *  add_mtd_device – register an MTD device
 37  *  @mtd: pointer to new MTD device info structure
 38  *
 39  *  Add a device to the list of MTD devices present in the system, and
 40  *  notify each currently active MTD 'user' of its arrival. Returns
 41  *  zero on success or 1 on failure, which currently will only happen
 42  *  if the number of present devices exceeds MAX_MTD_DEVICES (i.e. 16)
 43  */
 44
 45 int add_mtd_device(struct mtd_info *mtd)
 46 {
 47     int i;
 48
 49     BUG_ON(mtd->writesize == 0);
 50     mutex_lock(&mtd_table_mutex);
 51
 52     for (i=0; i < MAX_MTD_DEVICES; i++)
 53         if (!mtd_table[i]) {
 54             struct mtd_notifier *not;
 55
 56             mtd_table[i] = mtd;
 57             mtd->index = i;
 58             mtd->usecount = 0;
 59
 60             if (is_power_of_2(mtd->erasesize))
 61                 mtd->erasesize_shift = ffs(mtd->erasesize) – 1;
 62             else
 63                 mtd->erasesize_shift = 0;
 64
 65             if (is_power_of_2(mtd->writesize))
 66                 mtd->writesize_shift = ffs(mtd->writesize) – 1;
 67             else
 68                 mtd->writesize_shift = 0;
 69
 70             mtd->erasesize_mask = (1 << mtd->erasesize_shift) – 1;
 71             mtd->writesize_mask = (1 << mtd->writesize_shift) – 1;
 72
 73             /* Some chips always power up locked. Unlock them now */
 74             if ((mtd->flags & MTD_WRITEABLE)
 75                 && (mtd->flags & MTD_POWERUP_LOCK) && mtd->unlock) {
 76                 if (mtd->unlock(mtd, 0, mtd->size))
 77                     printk(KERN_WARNING
 78                            "%s: unlock failed, "
 79                            "writes may not workn",
 80                            mtd->name);
 81             }
 82
 83             DEBUG(0, "mtd: Giving out device %d to %sn",i, mtd->name);
 84             /* No need to get a refcount on the module containing
 85                the notifier, since we hold the mtd_table_mutex */
 86             list_for_each_entry(not, &mtd_notifiers, list)
 87             {
 88                 not->add(mtd);
 89             }  
 90            
 91             mutex_unlock(&mtd_table_mutex);
 92             /* We _know_ we aren't being removed, because
 93                our caller is still holding us here. So none
 94                of this try_ nonsense, and no bitching about it
 95                either. :) */
 96             __module_get(THIS_MODULE);
 97             return 0;
 98         }  
 99        
100     mutex_unlock(&mtd_table_mutex);
101     return 1;
102 }  
103
注意第88行,它调用的是:blktrans_notify_add.为啥是这个呢?在初始化的时候,有个模块是init_mtdblock,其调用了register_mtd_blktrans(&mtdblock_tr),结构mtdblock_tr中定义了一些操作mtdblock的一些操作,比如add_mtd。
362 static struct mtd_blktrans_ops mtdblock_tr = {
363     .name       = "mtdblock",
364     .major      = 31,
365     .part_bits  = 0,
366     .blksize    = 512,
367     .open       = mtdblock_open,
368     .flush      = mtdblock_flush,
369     .release    = mtdblock_release,
370     .readsect   = mtdblock_readsect,
371     .writesect  = mtdblock_writesect,
372     .add_mtd    = mtdblock_add_mtd,
373     .remove_dev = mtdblock_remove_dev,
374     .owner      = THIS_MODULE,
375 };





那么它在初始化的时候,又定义了一个mtd_notifier结构的blktrans_notifier。具体,为什么要搞的这么复杂,Linux主要是为了可扩展性的考虑。
335 static struct mtd_notifier blktrans_notifier = {
336     .add = blktrans_notify_add,
337     .remove = blktrans_notify_remove,
338 };
324 static void blktrans_notify_add(struct mtd_info *mtd)
325 {
326     struct mtd_blktrans_ops *tr;
327
328     if (mtd->type == MTD_ABSENT)
329         return;
330
331     list_for_each_entry(tr, &blktrans_majors, list)
332         tr->add_mtd(tr, mtd);
333 }  
回到上文,通过not->add(mtd)添加mtd设备,not->add又调用了tr->add_mtd,这个函数是上面mtdblock_tr定义的mtdblock_add_mtd。
blktrans_notify_add->mtdblock_add_mtd
337 static void mtdblock_add_mtd(struct mtd_blktrans_ops *tr, struct mtd_info *mtd)
338 {
339     struct mtd_blktrans_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL);
340
341     if (!dev)
342         return;
343
344     dev->mtd = mtd;
345     dev->devnum = mtd->index;
346
347     dev->size = mtd->size >> 9;
348     dev->tr = tr;
349
350     if (!(mtd->flags & MTD_WRITEABLE))
351         dev->readonly = 1;
352
353     add_mtd_blktrans_dev(dev);
354 }





blktrans_notify_add->mtdblock_add_mtd->add_mtd_blktrans_dev





216 int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
217 {
218     struct mtd_blktrans_ops *tr = new->tr;
219     struct mtd_blktrans_dev *d;
220     int last_devnum = -1;
221     struct gendisk *gd;
222
223     if (mutex_trylock(&mtd_table_mutex)) {
224         mutex_unlock(&mtd_table_mutex);
225         BUG();
226     }
227
228     list_for_each_entry(d, &tr->devs, list) {
229         if (new->devnum == -1) {
230             /* Use first free number */
231             if (d->devnum != last_devnum+1) {
232                 /* Found a free devnum. Plug it in here */
233                 new->devnum = last_devnum+1;
234                 list_add_tail(&new->list, &d->list);
235                 goto added;
236             }
237         } else if (d->devnum == new->devnum) {
238             /* Required number taken */
239             return -EBUSY;
240         } else if (d->devnum > new->devnum) {
241             /* Required number was free */
242             list_add_tail(&new->list, &d->list);
243             goto added;
244         }
245         last_devnum = d->devnum;
246     }
247     if (new->devnum == -1)
248         new->devnum = last_devnum+1;
249
250     if ((new->devnum << tr->part_bits) > 256) {
251         return -EBUSY;
252     }
253
254     list_add_tail(&new->list, &tr->devs);
255  added:
256     mutex_init(&new->lock);
257     if (!tr->writesect)
258         new->readonly = 1;
259
260     gd = alloc_disk(1 << tr->part_bits);//哦,在这里分配alloc_disk
261     if (!gd) {
262         list_del(&new->list);
263         return -ENOMEM;
264     }
265     gd->major = tr->major;
266     gd->first_minor = (new->devnum) << tr->part_bits;
267     gd->fops = &mtd_blktrans_ops;
268
269     if (tr->part_bits)
270         if (new->devnum < 26)
271             snprintf(gd->disk_name, sizeof(gd->disk_name),
272                  "%s%c", tr->name, 'a' + new->devnum);
273         else
274             snprintf(gd->disk_name, sizeof(gd->disk_name),
275                  "%s%c%c", tr->name,
276                  'a' – 1 + new->devnum / 26,
277                  'a' + new->devnum % 26);
278     else
279         snprintf(gd->disk_name, sizeof(gd->disk_name),
280              "%s%d", tr->name, new->devnum);
281
282     /* 2.5 has capacity in units of 512 bytes while still
283        having BLOCK_SIZE_BITS set to 10. Just to keep us amused. */
284     set_capacity(gd, (new->size * tr->blksize) >> 9);
285
286     gd->private_data = new;
287     new->blkcore_priv = gd;
288     gd->queue = tr->blkcore_priv->rq;//使用的队列是tr的队列
289
290     if (new->readonly)
291         set_disk_ro(gd, 1);
292
293     add_disk(gd);//加入
294
295     return 0;
296 }
288行,使用了tr的队列,这个tr队列是在register_mtd_blktrans(&mtdblock_tr)初始化时定义的。
340 int register_mtd_blktrans(struct mtd_blktrans_ops *tr)
341 {
342     int ret, i;
343    
344     /* Register the notifier if/when the first device type is
345        registered, to prevent the link/init ordering from fucking
346        us over. */
347     if (!blktrans_notifier.list.next)
348         register_mtd_user(&blktrans_notifier);
349    
350     tr->blkcore_priv = kzalloc(sizeof(*tr->blkcore_priv), GFP_KERNEL);//几乎算是一个队列了
351     if (!tr->blkcore_priv)
352         return -ENOMEM;
353
354     mutex_lock(&mtd_table_mutex);
355
356     ret = register_blkdev(tr->major, tr->name);//"mtdblk"注册一个通用块设备
357     if (ret) {
358         printk(KERN_WARNING "Unable to register %s block device on major %d: %dn",
359                tr->name, tr->major, ret);
360         kfree(tr->blkcore_priv);
361         mutex_unlock(&mtd_table_mutex);
362         return ret;
363     }
364     spin_lock_init(&tr->blkcore_priv->queue_lock);
365     tr->blkcore_priv->rq = blk_init_queue(mtd_blktrans_request, &tr->blkcore_priv->queue_lock);





tr队列比较通用,只有request_fn不同。设置的这么简单,让人始料不及啊。不过令人以外的是,mtd的读写并不经过request等策略。它们是通过什么策略呢?
我们直到在Linux中,文件的读写是经过几个层次,最上面是VFS,然后是具体的文件系统。具体的文件系统决定了,是否经过request策略。我们不妨直接去看看Yaffs2的file_operations对象,从那里入手,看看具体是否经过了request吧。
由于在Linux中的VFS是具有页缓存的,而页缓存相关联的数据结构是address_space,其host是inode。所以,对于文件的读写,基本上最终是通过调用address_space的operations结构。
这个是yaffs2的address_operations结构。它们是否最终request,看看其readpage即可
 270 static struct address_space_operations yaffs_file_address_operations = {                                                               
 271     .readpage = yaffs_readpage,   
 272     .writepage = yaffs_writepage,
 273 #if (YAFFS_USE_WRITE_BEGIN_END > 0)
 274     .write_begin = yaffs_write_begin,
 275     .write_end = yaffs_write_end,
 276 #else
 277     .prepare_write = yaffs_prepare_write,
 278     .commit_write = yaffs_commit_write,                                                                                                
 279 #endif
 280 };
由于yaffs_read主要涉及yaffs2文件系统内部的流程,因此不再详述。比较有特点的是,对于yaffs2文件系统,它并没有使用传统的页缓存的概念。在yaffs_device数据结构里面有“yaffs_ChunkCache *srCache;”一个成员变量,其数据类型:
111 /* Special sequence number for bad block that failed to be marked bad */
112 #define YAFFS_SEQUENCE_BAD_BLOCK    0xFFFF0000
113   
114 /* ChunkCache is used for short read/write operations.*/
115 typedef struct {         
116     struct yaffs_ObjectStruct *object;
117     int chunkId;
118     int lastUse;
119     int dirty;
120     int nBytes;     /* Only valid if the cache is dirty */
121     int locked;     /* Can't push out or flush while locked. */
122 #ifdef CONFIG_YAFFS_YAFFS2
123     __u8 *data;
124 #else
125     __u8 data[YAFFS_BYTES_PER_CHUNK];
126 #endif
127 } yaffs_ChunkCache;  
每一次,在cache中,查找是否存在对应的cache的时候,它的查找函数是这样的:
4015 /* Find a cached chunk */
4016 static yaffs_ChunkCache *yaffs_FindChunkCache(const yaffs_Object *obj,
4017                           int chunkId)
4018 { 
4019     yaffs_Device *dev = obj->myDev;
4020     int i;
4021     if (dev->nShortOpCaches > 0) {
4022         for (i = 0; i < dev->nShortOpCaches; i++) {
4023             if (dev->srCache[i].object == obj &&
4024                 dev->srCache[i].chunkId == chunkId) {
4025                 dev->cacheHits++;
4026   
4027                 return &dev->srCache[i];
4028             }
4029         }
4030     }
4031     return NULL;
4032 }
从上面的查找中,可以发现,它并不像ext2那样,用hash或者radix_tree那样将页缓存组织起来。它的组织方式,和nand设备是一样的。哈,这也许是yaffs2移植性强的一个体现吧。它不局限于在Linux中,不局限于是否支持MTD,只要是一个OS,任意类型的,都可以被移植进去。





所以,回归上文,其实之前申请的alloc_disk,request_queue之类的玩意,都是坑爹的啊。所以,对于Nand设备的读写并没有其他块设备那么复杂啊。之前的那个request_fn竟然都木有用上。




Post Footer automatically generated by wp-posturl plugin for wordpress.





October 7th, 2011 in
Uncategorized | tags: |
暂无评论





SD卡读写流程




原创文章,转载请注明出处.转载自: Li Haifeng's Blog
本文链接地址: SD卡读写流程




本流程分析针对2.6.29Kernel on Goldfish Platform.
SD卡的读写操作同其他块设备一样,都是异步的过程。当进程把request发到块设备请求队列后,在真正读写时,mq->thread进程会被激活。这个进程准确说属于内核线程,其函数执行主体如下:
44 static int mmc_queue_thread(void *d)
45 {
46     struct mmc_queue *mq = d;
47     struct request_queue *q = mq->queue;
48
49     current->flags |= PF_MEMALLOC;
50
51     down(&mq->thread_sem);
52     do {
53         struct request *req = NULL;
54
55         spin_lock_irq(q->queue_lock);
56         set_current_state(TASK_INTERRUPTIBLE);
57         if (!blk_queue_plugged(q))
58             req = elv_next_request(q);
59         mq->req = req;
60         spin_unlock_irq(q->queue_lock);
61
62         if (!req) {
63             if (kthread_should_stop()) {
64                 set_current_state(TASK_RUNNING);
65                 break;
66             }
67             up(&mq->thread_sem);
68             schedule();
69             down(&mq->thread_sem);
70             continue;
71         }
72         set_current_state(TASK_RUNNING);
73
74         mq->issue_fn(mq, req);
75     } while (1);
76     up(&mq->thread_sem);
77
78     return 0;
79 }
通过51行和76行,保证只有一个线程操作mq。
接下来,第74行调用mq->issue_fn,即: mmc_blk_issue_rq。
264 static int mmc_blk_issue_rq(struct mmc_queue *mq, struct request *req)
265 {
266     struct mmc_blk_data *md = mq->data;
267     struct mmc_card *card = md->queue.card;
268     struct mmc_blk_request brq;
269     int ret = 1, disable_multi = 0;
270
271 #ifdef CONFIG_MMC_BLOCK_DEFERRED_RESUME
272     if (mmc_bus_needs_resume(card->host)) {
273         mmc_resume_bus(card->host);
274         mmc_blk_set_blksize(md, card);
275     }
276 #endif
277
278     mmc_claim_host(card->host);
279
280     do {
281         struct mmc_command cmd;
282         u32 readcmd, writecmd, status = 0;
283
284         memset(&brq, 0, sizeof(struct mmc_blk_request));
285         brq.mrq.cmd = &brq.cmd;
286         brq.mrq.data = &brq.data;
287
288         brq.cmd.arg = req->sector;
289         if (!mmc_card_blockaddr(card))
290             brq.cmd.arg <<= 9;
291         brq.cmd.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_ADTC;
292         brq.data.blksz = 512;
293         brq.stop.opcode = MMC_STOP_TRANSMISSION;
294         brq.stop.arg = 0;
295         brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
296         brq.data.blocks = req->nr_sectors;
297
298         /*
299          * The block layer doesn't support all sector count
300          * restrictions, so we need to be prepared for too big
301          * requests.
302          */
303         if (brq.data.blocks > card->host->max_blk_count)
304             brq.data.blocks = card->host->max_blk_count;
305
306         /*
307          * After a read error, we redo the request one sector at a time
308          * in order to accurately determine which sectors can be read
309          * successfully.
310          */
311         if (disable_multi && brq.data.blocks > 1)
312             brq.data.blocks = 1;
313
314         if (brq.data.blocks > 1) {
315             /* SPI multiblock writes terminate using a special
316              * token, not a STOP_TRANSMISSION request.
317              */
318             if (!mmc_host_is_spi(card->host)
319                     || rq_data_dir(req) == READ)
320                 brq.mrq.stop = &brq.stop;
321             readcmd = MMC_READ_MULTIPLE_BLOCK;
322             writecmd = MMC_WRITE_MULTIPLE_BLOCK;
323         } else {
324             brq.mrq.stop = NULL;
325             readcmd = MMC_READ_SINGLE_BLOCK;
326             writecmd = MMC_WRITE_BLOCK;
327         }
328
329         if (rq_data_dir(req) == READ) {
330             brq.cmd.opcode = readcmd;
331             brq.data.flags |= MMC_DATA_READ;
332         } else {
333             brq.cmd.opcode = writecmd;
334             brq.data.flags |= MMC_DATA_WRITE;
335         }
336
337         mmc_set_data_timeout(&brq.data, card);
338
339         brq.data.sg = mq->sg;
340         brq.data.sg_len = mmc_queue_map_sg(mq);
341
342         /*
343          * Adjust the sg list so it is the same size as the
344          * request.
345          */
346         if (brq.data.blocks != req->nr_sectors) {
347             int i, data_size = brq.data.blocks << 9;
348             struct scatterlist *sg;
349
350             for_each_sg(brq.data.sg, sg, brq.data.sg_len, i) {
351                 data_size -= sg->length;
352                 if (data_size <= 0) {
353                     sg->length += data_size;
354                     i++;
355                     break;
356                 }
357             }
358             brq.data.sg_len = i;
359         }
360
361         mmc_queue_bounce_pre(mq);
362
363         mmc_wait_for_req(card->host, &brq.mrq);
364
365         mmc_queue_bounce_post(mq);
366
367         /*
368          * Check for errors here, but don't jump to cmd_err
369          * until later as we need to wait for the card to leave
370          * programming mode even when things go wrong.
371          */
372         if (brq.cmd.error || brq.data.error || brq.stop.error) {
373             if (brq.data.blocks > 1 && rq_data_dir(req) == READ) {
374                 /* Redo read one sector at a time */
375                 printk(KERN_WARNING "%s: retrying using single "
376                        "block readn", req->rq_disk->disk_name);
377                 disable_multi = 1;
378                 continue;
379             }
380             status = get_card_status(card, req);
381         } else if (disable_multi == 1) {
382             disable_multi = 0;
383         }
384
385         if (brq.cmd.error) {
386             printk(KERN_ERR "%s: error %d sending read/write "
387                    "command, response %#x, card status %#xn",
388                    req->rq_disk->disk_name, brq.cmd.error,
389                    brq.cmd.resp[0], status);
390         }
391
392         if (brq.data.error) {
393             if (brq.data.error == -ETIMEDOUT && brq.mrq.stop)
394                 /* 'Stop' response contains card status */
395                 status = brq.mrq.stop->resp[0];
396             printk(KERN_ERR "%s: error %d transferring data,"
397                    " sector %u, nr %u, card status %#xn",
398                    req->rq_disk->disk_name, brq.data.error,
399                    (unsigned)req->sector,
400                    (unsigned)req->nr_sectors, status);
401         }
402
403         if (brq.stop.error) {
404             printk(KERN_ERR "%s: error %d sending stop command, "
405                    "response %#x, card status %#xn",
406                    req->rq_disk->disk_name, brq.stop.error,
407                    brq.stop.resp[0], status);
408         }
409
410         if (!mmc_host_is_spi(card->host) && rq_data_dir(req) != READ) {
411             do {
412                 int err;
413
414                 cmd.opcode = MMC_SEND_STATUS;
415                 cmd.arg = card->rca << 16;
416                 cmd.flags = MMC_RSP_R1 | MMC_CMD_AC;
417                 err = mmc_wait_for_cmd(card->host, &cmd, 5);
418                 if (err) {
419                     printk(KERN_ERR "%s: error %d requesting statusn",
420                            req->rq_disk->disk_name, err);
421                     goto cmd_err;
422                 }
423                 /*
424                  * Some cards mishandle the status bits,
425                  * so make sure to check both the busy
426                  * indication and the card state.
427                  */
428             } while (!(cmd.resp[0] & R1_READY_FOR_DATA) ||
429                 (R1_CURRENT_STATE(cmd.resp[0]) == 7));
430
431 #if 0
432             if (cmd.resp[0] & ~0x00000900)
433                 printk(KERN_ERR "%s: status = %08xn",
434                        req->rq_disk->disk_name, cmd.resp[0]);
435             if (mmc_decode_status(cmd.resp))
436                 goto cmd_err;
437 #endif
438         }
439
440         if (brq.cmd.error || brq.stop.error || brq.data.error) {
441             if (rq_data_dir(req) == READ) {
442                 /*
443                  * After an error, we redo I/O one sector at a
444                  * time, so we only reach here after trying to
445                  * read a single sector.
446                  */
447                 spin_lock_irq(&md->lock);
448                 ret = __blk_end_request(req, -EIO, brq.data.blksz);
449                 spin_unlock_irq(&md->lock);
450                 continue;
451             }
452             goto cmd_err;
453         }
454
455         /*
456          * A block was successfully transferred.
457          */
458         spin_lock_irq(&md->lock);
459         ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
460         spin_unlock_irq(&md->lock);
461     } while (ret);
462
463     mmc_release_host(card->host);
464
465     return 1;
466
467  cmd_err:
468     /*
469      * If this is an SD card and we're writing, we can first
470      * mark the known good sectors as ok.
471      *
472      * If the card is not SD, we can still ok written sectors
473      * as reported by the controller (which might be less than
474      * the real number of written sectors, but never more).
475      */
476     if (mmc_card_sd(card)) {
477         u32 blocks;
478
479         blocks = mmc_sd_num_wr_blocks(card);
480         if (blocks != (u32)-1) {
481             spin_lock_irq(&md->lock);
482             ret = __blk_end_request(req, 0, blocks << 9);
483             spin_unlock_irq(&md->lock);
484         }
485     } else {
486         spin_lock_irq(&md->lock);
487         ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
488         spin_unlock_irq(&md->lock);
489     }
490
491     mmc_release_host(card->host);
492
493     spin_lock_irq(&md->lock);
494     while (ret)
495         ret = __blk_end_request(req, -EIO, blk_rq_cur_bytes(req));
496     spin_unlock_irq(&md->lock);
497
498     return 0;
499 }
278行和491行保证了,当前握有card->host的唯一性。
280行~360行,根据当前的request,再次组织一个新的block request,通过363行,进行读写。
186 /**
187  *  mmc_wait_for_req – start a request and wait for completion
188  *  @host: MMC host to start command
189  *  @mrq: MMC request to start
190  *
191  *  Start a new MMC custom command request for a host, and wait
192  *  for the command to complete. Does not attempt to parse the
193  *  response.
194  */
195 void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq)
196 {
197     DECLARE_COMPLETION_ONSTACK(complete);
198
199     mrq->done_data = &complete;
200     mrq->done = mmc_wait_done;
201
202     mmc_start_request(host, mrq);//maybe have a long time.
203
204     wait_for_completion(&complete);//wait until the data completed.the sem also anipulated by interrupt.
205 }
186行~205行,是个真正的读写过程。通过complete保证了:只有数据读写完毕,这个函数才返回。否则,将一直等待(等待的过程在204行)。
123 static void
124 mmc_start_request(struct mmc_host *host, struct mmc_request *mrq)
125 {
126 #ifdef CONFIG_MMC_DEBUG
127     unsigned int i, sz;
128     struct scatterlist *sg;
129 #endif
130
131     pr_debug("%s: starting CMD%u arg %08x flags %08xn",
132          mmc_hostname(host), mrq->cmd->opcode,
133          mrq->cmd->arg, mrq->cmd->flags);
134
135     if (mrq->data) {
136         pr_debug("%s:     blksz %d blocks %d flags %08x "
137             "tsac %d ms nsac %dn",
138             mmc_hostname(host), mrq->data->blksz,
139             mrq->data->blocks, mrq->data->flags,
140             mrq->data->timeout_ns / 1000000,
141             mrq->data->timeout_clks);
142     }
143
144     if (mrq->stop) {
145         pr_debug("%s:     CMD%u arg %08x flags %08xn",
146              mmc_hostname(host), mrq->stop->opcode,
147              mrq->stop->arg, mrq->stop->flags);
148     }
149
150     WARN_ON(!host->claimed);
151
152     led_trigger_event(host->led, LED_FULL);
153
154     mrq->cmd->error = 0;
155     mrq->cmd->mrq = mrq;
156     if (mrq->data) {
157         BUG_ON(mrq->data->blksz > host->max_blk_size);
158         BUG_ON(mrq->data->blocks > host->max_blk_count);
159         BUG_ON(mrq->data->blocks * mrq->data->blksz >
160             host->max_req_size);
161
162 #ifdef CONFIG_MMC_DEBUG
163         sz = 0;
164         for_each_sg(mrq->data->sg, sg, mrq->data->sg_len, i)
165             sz += sg->length;
166         BUG_ON(sz != mrq->data->blocks * mrq->data->blksz);
167 #endif
168
169         mrq->cmd->data = mrq->data;
170         mrq->data->error = 0;
171         mrq->data->mrq = mrq;
172         if (mrq->stop) {
173             mrq->data->stop = mrq->stop;
174             mrq->stop->error = 0;
175             mrq->stop->mrq = mrq;
176         }
177     }
178     host->ops->request(host, mrq);//对于goldish会调用goldfish_mmc_request
179 }
178行注释,调用goldfish_mmc_request进行真正的读写。
400 static void goldfish_mmc_request(struct mmc_host *mmc, struct mmc_request *req)
401 {
402     struct goldfish_mmc_host *host = mmc_priv(mmc);
403
404     WARN_ON(host->mrq != NULL);
405
406     host->mrq = req;
407     goldfish_mmc_prepare_data(host, req);//parameters be written and ready
408     goldfish_mmc_start_command(host, req->cmd);//data been transported
409
410     /* this is to avoid accidentally being detected as an SDIO card in mmc_attach_sdio() */
411     if (req->cmd->opcode == SD_IO_SEND_OP_COND &&
412         req->cmd->flags == (MMC_RSP_SPI_R4 | MMC_RSP_R4 | MMC_CMD_BCR)) {
413         req->cmd->error = -EINVAL;
414     }
415 }
第408行,调用goldfish_mmc_request
156 static void
157 goldfish_mmc_start_command(struct goldfish_mmc_host *host, struct mmc_command *cmd)
158 {
159     u32 cmdreg;
160     u32 resptype;
161     u32 cmdtype;
162
163     host->cmd = cmd;
164
165     resptype = 0;
166     cmdtype = 0;
167
168     /* Our hardware needs to know exact type */
169     switch (mmc_resp_type(cmd)) {
170     case MMC_RSP_NONE:
171         break;
172     case MMC_RSP_R1:
173     case MMC_RSP_R1B:
174         /* resp 1, 1b, 6, 7 */
175         resptype = 1;
176         break;
177     case MMC_RSP_R2:
178         resptype = 2;
179         break;
180     case MMC_RSP_R3:
181         resptype = 3;
182         break;
183     default:
184         dev_err(mmc_dev(host->mmc), "Invalid response type: %04xn", mmc_resp_type(cmd));
185         break;
186     }
187
188     if (mmc_cmd_type(cmd) == MMC_CMD_ADTC) {
189         cmdtype = OMAP_MMC_CMDTYPE_ADTC;
190     } else if (mmc_cmd_type(cmd) == MMC_CMD_BC) {
191         cmdtype = OMAP_MMC_CMDTYPE_BC;
192     } else if (mmc_cmd_type(cmd) == MMC_CMD_BCR) {
193         cmdtype = OMAP_MMC_CMDTYPE_BCR;
194     } else {
195         cmdtype = OMAP_MMC_CMDTYPE_AC;
196     }
197
198     cmdreg = cmd->opcode | (resptype <<  8) | (cmdtype << 12);
199
200     if (host->bus_mode == MMC_BUSMODE_OPENDRAIN)
201         cmdreg |= 1 << 6;
202
203     if (cmd->flags & MMC_RSP_BUSY)
204         cmdreg |= 1 << 11;
205
206     if (host->data && !(host->data->flags & MMC_DATA_WRITE))
207         cmdreg |= 1 << 15;
208
209     GOLDFISH_MMC_WRITE(host, MMC_ARG, cmd->arg);
210     GOLDFISH_MMC_WRITE(host, MMC_CMD, cmdreg);
211 }
这个过程可能会等一段时间。
什么时候,才知道数据读写完毕呢?通过中断。当数据读写完毕后,host会向系统发起一个中断。在中断中,将调用第200行的mmc_wait_done。其中断函数的代码如下:
291 static irqreturn_t goldfish_mmc_irq(int irq, void *dev_id)
292 {
293     struct goldfish_mmc_host * host = (struct goldfish_mmc_host *)dev_id;
294     u16 status;
295     int end_command = 0;
296     int end_transfer = 0;
297     int transfer_error = 0;
298     int state_changed = 0;
299     int cmd_timeout = 0;
300
301     while ((status = GOLDFISH_MMC_READ(host, MMC_INT_STATUS)) != 0) {
302         GOLDFISH_MMC_WRITE(host, MMC_INT_STATUS, status);
303
304         if (status & MMC_STAT_END_OF_CMD) {
305             end_command = 1;
306         }
307
308         if (status & MMC_STAT_END_OF_DATA) {
309             end_transfer = 1;
310         }
311         if (status & MMC_STAT_STATE_CHANGE) {
312             state_changed = 1;
313         }
314
315         if (status & MMC_STAT_CMD_TIMEOUT) {
316             end_command = 0;
317             cmd_timeout = 1;
318         }
319     }
320
321     if (cmd_timeout) {
322         struct mmc_request *mrq = host->mrq;
323         mrq->cmd->error = -ETIMEDOUT;
324         host->mrq = NULL;
325         mmc_request_done(host->mmc, mrq);
326     }
327
328     if (end_command) {
329         goldfish_mmc_cmd_done(host, host->cmd);
330     }
331     if (transfer_error)
332         goldfish_mmc_xfer_done(host, host->data);
333     else if (end_transfer) {
334         host->dma_done = 1;
335         goldfish_mmc_end_of_data(host, host->data);
336     }
337     if (state_changed) {
338         u32 state = GOLDFISH_MMC_READ(host, MMC_STATE);
339         pr_info("%s: Card detect now %dn", __func__,
340             (state & MMC_STATE_INSERTED));
341         mmc_detect_change(host->mmc, 0);
342     }
343
344     if (!end_command && !end_transfer &&
345         !transfer_error && !state_changed && !cmd_timeout) {
346         status = GOLDFISH_MMC_READ(host, MMC_INT_STATUS);
347         dev_info(mmc_dev(host->mmc),"spurious irq 0x%04xn", status);
348         if (status != 0) {
349             GOLDFISH_MMC_WRITE(host, MMC_INT_STATUS, status);
350             GOLDFISH_MMC_WRITE(host, MMC_INT_ENABLE, 0);
351         }
352     }
353
354     return IRQ_HANDLED;
355 }
在第333~336行,如果数据传输完毕后,会执行335行的goldfish_mmc_end_of_data(),注意host->dma_done设置为1,下面的程序会调用到。
252 static void
253 goldfish_mmc_end_of_data(struct goldfish_mmc_host *host, struct mmc_data *data)
254 {
255     if (!host->dma_in_use) {
256         goldfish_mmc_xfer_done(host, data);
257         return;
258     }
259     if (host->dma_done)
260         goldfish_mmc_xfer_done(host, data);
261 }
由于之前host->dma_done设置为1,那么执行259~260行。即调用goldfish_mmc_xfer_done
213 static void
214 goldfish_mmc_xfer_done(struct goldfish_mmc_host *host, struct mmc_data *data)
215 {
216     if (host->dma_in_use) {
217         enum dma_data_direction dma_data_dir;
218
219         if (data->flags & MMC_DATA_WRITE)
220             dma_data_dir = DMA_TO_DEVICE;
221         else
222             dma_data_dir = DMA_FROM_DEVICE;
223
224         if (dma_data_dir == DMA_FROM_DEVICE) {
225             // we don't really have DMA, so we need to copy from our platform driver buffer
226             uint8_t* dest = (uint8_t *)sg_virt(data->sg);
227             memcpy(dest, host->virt_base, data->sg->length);
228         }
229
230         host->data->bytes_xfered += data->sg->length;
231
232         dma_unmap_sg(mmc_dev(host->mmc), data->sg, host->sg_len, dma_data_dir);
233     }
234
235     host->data = NULL;
236     host->sg_len = 0;
237
238     /* NOTE:  MMC layer will sometimes poll-wait CMD13 next, issuing
239      * dozens of requests until the card finishes writing data.
240      * It'd be cheaper to just wait till an EOFB interrupt arrives…
241      */
242
243     if (!data->stop) {
244         host->mrq = NULL;
245         mmc_request_done(host->mmc, data->mrq);
246         return;
247     }
248
249     goldfish_mmc_start_command(host, data->stop);
250 }
第245行,调用了mmc_request_done
69 /**
70  *  mmc_request_done – finish processing an MMC request
71  *  @host: MMC host which completed request
72  *  @mrq: MMC request which request
73  *
74  *  MMC drivers should call this function when they have completed
75  *  their processing of a request.
76  */
77 void mmc_request_done(struct mmc_host *host, struct mmc_request *mrq)
78 {
79     struct mmc_command *cmd = mrq->cmd;
80     int err = cmd->error;
81
82     if (err && cmd->retries && mmc_host_is_spi(host)) {
83         if (cmd->resp[0] & R1_SPI_ILLEGAL_COMMAND)
84             cmd->retries = 0;
85     }
86
87     if (err && cmd->retries) {
88         pr_debug("%s: req failed (CMD%u): %d, retrying…n",
89             mmc_hostname(host), cmd->opcode, err);
90
91         cmd->retries–;
92         cmd->error = 0;
93         host->ops->request(host, mrq);
94     } else {
95         led_trigger_event(host->led, LED_OFF);
96
97         pr_debug("%s: req done (CMD%u): %d: %08x %08x %08x %08xn",
98             mmc_hostname(host), cmd->opcode, err,
99             cmd->resp[0], cmd->resp[1],
100             cmd->resp[2], cmd->resp[3]);
101
102         if (mrq->data) {
103             pr_debug("%s:     %d bytes transferred: %dn",
104                 mmc_hostname(host),
105                 mrq->data->bytes_xfered, mrq->data->error);
106         }
107
108         if (mrq->stop) {
109             pr_debug("%s:     (CMD%u): %d: %08x %08x %08x %08xn",
110                 mmc_hostname(host), mrq->stop->opcode,
111                 mrq->stop->error,
112                 mrq->stop->resp[0], mrq->stop->resp[1],
113                 mrq->stop->resp[2], mrq->stop->resp[3]);
114         }
115
116         if (mrq->done)
117             mrq->done(mrq);
118     }
119 }
最终会调用117行的mrq->done,即mmc_wait_done
181 static void mmc_wait_done(struct mmc_request *mrq)
182 {
183     complete(mrq->done_data);
184 }
183行中的mrq->done_data被设置为了&complete(看mmc_wait_for_req)。
4824 /**
4825  * complete: – signals a single thread waiting on this completion
4826  * @x:  holds the state of this particular completion
4827  *
4828  * This will wake up a single thread waiting on this completion. Threads will be
4829  * awakened in the same order in which they were queued.
4830  *
4831  * See also complete_all(), wait_for_completion() and related routines.
4832  */
4833 void complete(struct completion *x)
4834 {
4835     unsigned long flags;
4836
4837     spin_lock_irqsave(&x->wait.lock, flags);
4838     x->done++;
4839     __wake_up_common(&x->wait, TASK_NORMAL, 1, 0, NULL);
4840     spin_unlock_irqrestore(&x->wait.lock, flags);
4841 }
4842 EXPORT_SYMBOL(complete);
看4838行,done++后,再调用complete上的进程,这时候,进程就可以结束 wait_for_completion(&complete)了。wait_for_completion代码如下:
4898 /**
4899  * wait_for_completion: – waits for completion of a task
4900  * @x:  holds the state of this particular completion
4901  *
4902  * This waits to be signaled for completion of a specific task. It is NOT
4903  * interruptible and there is no timeout.
4904  *
4905  * See also similar routines (i.e. wait_for_completion_timeout()) with timeout
4906  * and interrupt capability. Also see complete().
4907  */
4908 void __sched wait_for_completion(struct completion *x)
4909 {
4910     wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_UNINTERRUPTIBLE);
4911 }
4912 EXPORT_SYMBOL(wait_for_completion);
4887 static long __sched
4888 wait_for_common(struct completion *x, long timeout, int state)
4889 {
4890     might_sleep();
4891
4892     spin_lock_irq(&x->wait.lock);
4893     timeout = do_wait_for_common(x, timeout, state);
4894     spin_unlock_irq(&x->wait.lock);
4895     return timeout;
4896 }
4861 static inline long __sched
4862 do_wait_for_common(struct completion *x, long timeout, int state)
4863 {
4864     if (!x->done) {
4865         DECLARE_WAITQUEUE(wait, current);
4866
4867         wait.flags |= WQ_FLAG_EXCLUSIVE;
4868         __add_wait_queue_tail(&x->wait, &wait);
4869         do {
4870             if (signal_pending_state(state, current)) {
4871                 timeout = -ERESTARTSYS;
4872                 break;
4873             }
4874             __set_current_state(state);
4875             spin_unlock_irq(&x->wait.lock);
4876             timeout = schedule_timeout(timeout);
4877             spin_lock_irq(&x->wait.lock);
4878         } while (!x->done && timeout);
4879         __remove_wait_queue(&x->wait, &wait);
4880         if (!x->done)
4881             return timeout;
4882     }
4883     x->done–;
4884     return timeout ?: 1;
4885 }
着重看4878行。
至此,基本上,mmc读写请求的整个过程就分析结束了。




Post Footer automatically generated by wp-posturl plugin for wordpress.





October 6th, 2011 in
Uncategorized | tags: |
暂无评论





文件读写,从请求到IO启动




原创文章,转载请注明出处.转载自: Li Haifeng's Blog
本文链接地址: 文件读写,从请求到IO启动




在Linux中,块设备的读写是一个比较复杂的过程。如果再加上VFS的话,层次就更多了。实际上VFS和块设备驱动联系的不是非常密切。在VFS中,我们会看到当发生读写请求的时候,会调用ll_rw_block函数或者submit_bh函数,其中ll_rw_block是对submit_bh的封装。这个函数,实际上就是从VFS到实际设备读写的必经之路。关于这一点,有很多用systemtap来观测io的脚本就是在submit_bh函数安装一个stub(有关io检测的相关的文章,可以参看淘宝的大牛诸霸的博客)。





在这里,我们不再谈论VFS的东西,而是从submit_bh开始看起,然后到数据被读出,进程又继续执行的流程。
submit_bh的功能正如其函数的名字那样–“提交buffer head”。那么,具体提交给谁呢?由谁来提交呢?其实块设备的读写似乎是一个C/S架构的服务器。客户端是各个进行io操作的进程,服务器端就是设备的请求队列。进程把请求的信息包装成一个request的数据结构,然后,挂载在服务端,即块设备的请求队列中。我前面说的那句话“似乎是一个C/S架构的”,而不是真正C/S架构的。因为,C/S架构来源于网络程序,客户端进程把数据发往正在监听的服务端,然后服务端的进程从网络缓冲区中经过网络协议的层层解压“剥皮”,拿到数据。而我们谈论这个文件数据的读写并不真正是C/S架构的,原因就是,客户端的发送和服务端的接收请求全由一个进程,即用户进程来完成的。这一点应该很好理解,因为在OS中,除了中断以及异常处理没有上下文外,其他的都有进程上下文,因此,从submit读写请求到接收读写请求,当然就是进程自己的事情了,当然,如果你要是抬杠“在内核中完全可以由一个专门的线程用来处理服务端的事情”,我也无话可说。原理上,你这个抬杠当然是行得通的。说到这里,我想起了微内核的MINIX,就此打住,继续回到他们的处理过程中。
当用户进程提交请求,并挂载到块设备队列的过程中,还涉及一个IO调度的问题,即是,用户进程在提交一个请求时,遇到了调度算法,这个调度算法做的事情很简单,它检查这个请求和正在排队的请求能否合二为一。如果不能合二为一,那就直接挂上去。至此,一个进程所要做的工作基本上就结束了。可是,请求被相应的时机呢?什么时候,它的请求才能满足?
我们清楚,一个硬件设备,特别是块设备,让它来读取数据然后内核再从端口里面把数据提出来,或者说,通过DMA的方式,直接从磁盘中拿到数据,这个数据读出来的过程是很漫长的,这个漫长是相对于CPU来说的,绝对时间其实是很短暂的。一般情况下,用户进程在往块设备的请求队列上挂请求的时候,发现队列为空的话,会将该队列插入到一个全局的队列中(tq_disk,从名字中,我们也能够看出来是task queue for disk的缩写)。如果队列不为空,那么说明该队列已经加入到tq_disk的全局队列中了,既然该块设备的读写请求队列不为空,那么要利用调度策略,看时候能够和正在该块设备上排队的读写请求和二为一了。这里有一个很恰当的比方:当我们去饭店吃饭的时候,如果你要点的菜如果和师傅正在炒或者准备炒的菜一样的话,炒菜师傅会把两个人点的菜一块炒,特别是学校的食堂,每到吃饭高峰期,人很多,因此,学生们一般都会问服务员,下面要做的是什么菜,如果要节省时间的话,就要师傅下面要炒的菜了。在这里,磁盘的调度原理就是这个样子,貌似很简单哇。其实,有些时候,我们可能并不需要一个请求队列,比如,将来计算机的磁盘全部是SSD了,不再用机械磁盘了,都是电读写的,那么这个IO调度说不定就要被废除了。然后,也就不再需要请求队列了。一个请求到了,然后马上就发送到驱动程序,驱动程序想设备发送命令,读取数据。而在Linux的内核中,已经考虑到这一点了,如果进程进行同步IO的话,就直接启动驱动程序进行IO读写了(请参考代码段一)。
前面说到,将读写请求挂载在块设备的请求队列时,如果不为空的话,会看能否进行IO调度,不能调度的话,会向块设备请求队列插入一个请求。然后进程的任务就完成了。如果插入请求的时候,发现这个队列上的请求非常多,那么怎么办呢?进程就会主动的启动磁盘IO让这些请求队列赶紧执行(请参考代码片段二)。
上面设计到的进程启动磁盘IO,都算是主动的。除了主动的时候,还有被动的情况。当进程将请求挂载在块设备请求队列的时候,它是要用其中的数据的。什么时候用呢?该用的时候就用呀,不过用的时候,会检测相关的数据是否被读出了,如果没有读出,那么进程就被阻塞,然后启动磁盘IO(请参考代码片段三)。这一点在Linux2.6的内核中稍微进行了改进,设置了一个request数量阈值,如果大于这个阈值,那么就启动磁盘IO。
在Linux2.6的内核中,还增加了一个启动IO磁盘的时机,即,读写请求被插入到某个块设备的请求队列时,设置了一个定时器,保证在某个时间点之内,一定要启动磁盘IO。
启动磁盘IO后,数据怎么读出就跟进程没什么关系了。进程在使用的时候,就会查看它要用的buffer缓冲区是否locked,如果否,就说明已经读好了,如果是,那么就继续启动磁盘然后等待(代码片段三)。
以上基本上就分析完了。在2.6的内核中,在request和buffer head中又加了一个bio,不过仅仅加了个bio并不影响理解。另外,单单看块设备驱动,并不能够解决读写请求发送到块设备请求队列,然后块设备又怎样的把读写的数据读入到buffer中。当然了,在这一个南大富士通的赵磊大牛写过一个系列的文章“写一个块设备驱动”,一共120页。对我的帮助还是蛮大,当初凌晨看到2点半,然后又加上一个上午,基本上算是一口气看完了。写得不错,希望对块设备驱动有兴趣的同学,可以google一下,看看。





注:本文还非常naive,错误难免,如果发现,请批评指正。





附件:参考2.4.31内核
代码片段一:
         submit_bh->__make_request
1000 static int __make_request(request_queue_t * q, int rw,
1001                   struct buffer_head * bh)
1002 {
1003     unsigned int sector, count, sync;
1004     int max_segments = MAX_SEGMENTS;
1005     struct request * req, *freereq = NULL;
1006     int rw_ahead, max_sectors, el_ret;
1007     struct list_head *head, *insert_here;
1008     int latency;
1009     elevator_t *elevator = &q->elevator;
1010     int should_wake = 0;
1011
1012     count = bh->b_size >> 9;
1013     sector = bh->b_rsector;
1014     sync = test_and_clear_bit(BH_Sync, &bh->b_state);
1015
.。。。。。。。。。。。。。。。。。。。。。。。。。
1176 out:
1177     if (freereq)
1178         blkdev_release_request(freereq);
1179     if (should_wake)
1180         get_request_wait_wakeup(q, rw);
1181     if (sync)
1182         __generic_unplug_device(q);//进程发起启动磁盘IO
1183     spin_unlock_irq(&io_request_lock);
1184     return 0;
1185 end_io:
1186     bh->b_end_io(bh, test_bit(BH_Uptodate, &bh->b_state));
1187     return 0;
1188 }
代码片段二:
__make_request->__get_request_wait
 643 static struct request *__get_request_wait(request_queue_t *q, int rw)
 644 {             
 645     register struct request *rq;
 646     DECLARE_WAITQUEUE(wait, current);
 647
 648     add_wait_queue_exclusive(&q->wait_for_requests, &wait);
 649
 650     do {
 651         set_current_state(TASK_UNINTERRUPTIBLE);
 652         spin_lock_irq(&io_request_lock);
 653         if (blk_oversized_queue(q) || q->rq.count == 0) {
 654             __generic_unplug_device(q);//进程发起启动磁盘IO
 655             spin_unlock_irq(&io_request_lock);
 656             schedule();
 657             spin_lock_irq(&io_request_lock);
 658         }
 659         rq = get_request(q, rw);
 660         spin_unlock_irq(&io_request_lock);
 661     } while (rq == NULL);
 662     remove_wait_queue(&q->wait_for_requests, &wait);
 663     current->state = TASK_RUNNING;
 664
 665     return rq;
 666 }
 667
代码片段三:
 180 /* 
 181  * Note that the real wait_on_buffer() is an inline function that checks
 182  * that the buffer is locked before calling this, so that unnecessary disk
 183  * unplugging does not occur.
 184  */
 185 void __wait_on_buffer(struct buffer_head * bh)
 186 {
 187     struct task_struct *tsk = current;
 188     DECLARE_WAITQUEUE(wait, tsk);
 189
 190     get_bh(bh);
 191     add_wait_queue(&bh->b_wait, &wait);
 192     do {
 193         set_task_state(tsk, TASK_UNINTERRUPTIBLE);
 194         if (!buffer_locked(bh))
 195             break;
 196         /*
 197          * We must read tq_disk in TQ_ACTIVE after the
 198          * add_wait_queue effect is visible to other cpus.
 199          * We could unplug some line above it wouldn't matter
 200          * but we can't do that right after add_wait_queue
 201          * without an smp_mb() in between because spin_unlock
 202          * has inclusive semantics.
 203          * Doing it here is the most efficient place so we
 204          * don't do a suprious unplug if we get a racy
 205          * wakeup that make buffer_locked to return 0, and
 206          * doing it here avoids an explicit smp_mb() we
 207          * rely on the implicit one in set_task_state.
 208          */
 209         run_task_queue(&tq_disk);
 210         schedule();
 211     } while (buffer_locked(bh));
 212     tsk->state = TASK_RUNNING;
 213     remove_wait_queue(&bh->b_wait, &wait);
 214     put_bh(bh);
 215 }
__wait_on_buffer->run_task_queue
119 static inline void run_task_queue(task_queue *list)
120 {   
121     if (TQ_ACTIVE(*list))
122         __run_task_queue(list);
123 }
__wait_on_buffer->run_task_queue->__run_task_queue
334 void __run_task_queue(task_queue *list)
335 {
336     struct list_head head, *next;
337     unsigned long flags;
338
339     spin_lock_irqsave(&tqueue_lock, flags);
340     list_add(&head, list);
341     list_del_init(list);
342     spin_unlock_irqrestore(&tqueue_lock, flags);
343
344     next = head.next;
345     while (next != &head) {
346         void (*f) (void *);
347         struct tq_struct *p;
348         void *data;
349
350         p = list_entry(next, struct tq_struct, list);
351         next = next->next;
352         f = p->routine;
353         data = p->data;
354         wmb();
355         p->sync = 0;
356         if (f)
357             f(data);//这里对于普通的磁盘,就是generic_unplug_device,和代码片段一以及二是一个启动IO操作,
358         //其实这个函数还是包装了一下,最直接的是q->request_fn
359     }
360 }  





参考:
1.《Linux内核源码情景分析(下册)》,第八章,设备驱动
2.《深入Linux内核架构》,第六章,设备驱动程序
3.《写一个块设备驱动》,赵磊


    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多