从nandflash上启动uboot的ecc校验问题（STI7105）

guitarhua 2015-11-11

展开全文

项目进行到最后阶段，需要把程序拿到工厂去烧写测试。

结果发现工厂的软件没有针对ST-Uboot自带stm_nand_calculate_ecc校验，所以只能够将uboot.bin做成跟

根文件系统一样的自带ecc校验的来进行烧写，即2048 64字节的bin文件。

拿回来之后启动正常，但是出现了

nand_read_bbt: Bad block at 0x00000000

第一个块居然坏了！！！

但是芯片出厂的时候是保证第一个块是好的。

经过仔细分析，终于发现了问题

当uboot从nandflash上启动时，自身也有ecc校验并且和linux默认的ecc校验很不一样，如果改成其自带的linux默认ecc校验会导致uboot根本无法启动：

/*
* Do we want to read/write NAND Flash compatible with the ST40's
* NAND Controller H/W IP block for "boot-mode"? If we want
* to read/write NAND flash that is meant to support booting
* from NAND, then we need to use 3 bytes of ECC per 128 byte
* record. If so, then define the "CFG_NAND_ECC_HW3_128" macro.
*/
# define CFG_NAND_ECC_HW3_128 /* define for "boot-from-NAND" compatibility */
/*
* If using CFG_NAND_ECC_HW3_128, then we must also define
* where the (high watermark) boundary is. That is, the
* NAND offset, below which we are in "boot-mode", and
* must use 3 bytes of ECC for each 128 byte record.
* For this offset (and above) we can use any supported
* ECC configuration (e.g 3/256 S/W, or 3/512 H/W).
*/
# define CFG_NAND_STM_BOOT_MODE_BOUNDARY (1ul << 17) /* 128 Kb */

从注释可以得知要使用 boot from nand 就必须要用到3 bytes of ECC for each 128 byte record.

即每128字节做3字节的ecc校验

我们用到的nandflash是K9F1G08U0B这块：存储单元为（128M 4M）*8位，数据寄存器为（2K 64）字节,

所以在uboot上他的oob区域被定义为

/* for LARGE-page devices */
static struct nand_oobinfo stm_nand_oobinfo_64 = {
.useecc = MTD_NANDECC_AUTOPLACE,
.eccbytes = 48,
.eccpos = {
0, 1, 2, /* ECC for 1st 128-byte record */
4, 5, 6, /* ECC for 2nd 128-byte record */
8, 9, 10, /* ECC for 3rd 128-byte record */
12, 13, 14, /* ECC for 4th 128-byte record */
16, 17, 18, /* ECC for 5th 128-byte record */
20, 21, 22, /* ECC for 6th 128-byte record */
24, 25, 26, /* ECC for 7th 128-byte record */
28, 29, 30, /* ECC for 8th 128-byte record */
32, 33, 34, /* ECC for 9th 128-byte record */
36, 37, 38, /* ECC for 10th 128-byte record */
40, 41, 42, /* ECC for 11th 128-byte record */
44, 45, 46, /* ECC for 12th 128-byte record */
48, 49, 50, /* ECC for 13th 128-byte record */
52, 53, 54, /* ECC for 14th 128-byte record */
56, 57, 58, /* ECC for 15th 128-byte record */
60, 61, 62}, /* ECC for 16th 128-byte record */
.oobfree = {
{ 3, 1}, { 7, 1}, {11, 1}, {15, 1},
{19, 1}, {23, 1}, {27, 1}, {31, 1},
{35, 1}, {39, 1}, {43, 1}, {47, 1},
{51, 1}, {55, 1}, {59, 1}, {63, 1} },
};

以第一块第一页（2048 byte）为例：

OOB:
        cf 00 fc ff 0f 00 3c ff
        00 00 00 ff 95 25 64 ff
        65 26 a8 ff 3c 00 cc ff
        95 2a 68 ff 30 33 0c ff
        cc 30 c0 ff 59 16 54 ff
        aa 16 64 ff 3f 0f 30 ff
        6a 2a 58 ff f0 0f f0 ff
        55 25 a4 ff 59 15 54 ff

和默认的linux ecc校验区域做对比以第二快第二页为例：

OOB:
        ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff
        ff 3f 3f 5a 6a 9b 99 a6
        9b a6 5a 9b 59 6a 6b 00
        0f 33 9a 65 57 0f f3 0f

很明显的可以看出差别。

然后uboot启动时会首先去寻找bbt,即坏块表，如果找不到的话会在某个指定的位置建立一个bbt,而我们的bbt放在了nandflash最后3个块上面。

将bbt dump出来后发现

Page 07fc0000 dump:
        fc ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
        ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

可以看到第一个字节是fc即 11111100 根据代码的说明 bbt会使用2位来标记一个坏块

* The table uses 2 bits per block
* 11b: block is good
* 00b: block is factory marked bad
* 01b, 10b: block is marked bad due to wear

也就是说 uboot在建立bbt的时候发现第一个块坏了，而且是在建立bbt之前就坏了于是第一个块被定义成了

“factory marked bad”。

开来问题是在bbt建立的时候了。从网络上找到很多资料，最后从K9F1G08U0B的数据手册上发现了这样几关于Identifying Invalid Block(s)的说明

All device locations are erased(FFh for X8, FFFFh for X16) except locations where the invalid block(s) information is written prior to shipping. The invalid block(s) status is defined by the 1st byte(X8 device) or 1st word(X16 device) in the spare area. Samsung makes sure that either the 1st or 2nd page of every invalid block has non-FFh(X8) or non-FFFFh(X16) data at the column address of 2048(X8 device) or 1024(X16 device). Since the invalid block information is also erasable in most cases, it is impossible to recover the information once it has been erased. Therefore, the system must be able to recognize the invalid block(s) based on the original invalid block information and create the invalid block table via the following suggested flow chart(Figure 3). Any intentional erasure of the original invalid block information is prohibited.* Check "FFh( or FFFFh)" at the column address Figure 3. Flow chart to create invalid block table.

英文不好，反正大概意思就是三星会把每个Invalid Block 标记成 non-FFh(X8) or non-FFFFh(X16)

存放在2048(X8 device) or 1024(X16 device).也就是spare area.

而这个区域就是我们的oob区域。

但是这个区域因为stm自带的ecc的问题，所以肯定不是0xff了，问题找到了！

当然之前都是通过仿真器（第一次，即板上没有任何程序）加载到ram然后通过网络下载bin文件到ram最后通过已经运行在ram里的 uboot烧写bin到flash里面的。因为运行uboot的时候已经在ram了，flash里面没有任何数据，所以建立bbt的时候自然也不会将第一个块标记成坏块。

因为担心这样运行uboot可能出现问题，而uboot的前半部分加载是根据汇编代码进行的，查看uboot下面的

start.S 发现这样几行：

#ifdef CFG_BOOT_FROM_NAND
skip_signature:
bra skipped_signature /* skip over the "block 0 signature" */
nop

也就是说当从flash启动的时候会跳过对第一个块检查，因为厂家保证第一个块是好的。

这样看来对第一个块的标记不会影响到uboot的正常工作。唯一的影响就是第一个块不能通过uboot自带的

nand erase 来进行擦除，没有办法升级第一个块(要更改的话只能nand scrub)。反正也没打算让用户升级uboot，就这样放着好了。

完。