分享

bowtie2软件构建基因组索引

 CharlesNice 2020-12-11

准备好基因组的文件及hg19.fa,然后调用bowtie2。

#hg19.fa 基因组fa文件
#hg19 索引前缀

#bowtie2-build 构建参考基因组索引文件

$bowtie2-build hg19/hg19.fa hg19/hg19

Settings:

  Output files: 'hg19/hg19.*.bt2'

  Line rate: 6 (line is 64 bytes)

  Lines per side: 1 (side is 64 bytes)

  Offset rate: 4 (one in 16)

  FTable chars: 10

  Strings: unpacked

  Max bucket size: default

  Max bucket size, sqrt multiplier: default

  Max bucket size, len divisor: 4

  Difference-cover sample period: 1024

  Endianness: little

  Actual local endianness: little

  Sanity checking: disabled

  Assertions: disabled

  Random seed: 0

  Sizeofs: void*:8, int:4, long:8, size_t:8

Input files DNA, FASTA:

  hg19/hg19.fa

Building a SMALL index

Reading reference sizes

  Time reading reference sizes: 00:00:17

Calculating joined length

Writing header

Reserving space for joined string

Joining reference sequences

  Time to join reference sequences: 00:00:12

bmax according to bmaxDivN setting: 715331782

Using parameters --bmax 536498837 --dcv 1024

  Doing ahead-of-time memory usage test

  Passed!  Constructing with these parameters: --bmax 536498837 --dcv 1024

Constructing suffix-array element generator

Building DifferenceCoverSample

  Building sPrime

  Building sPrimeOrder

  V-Sorting samples

  V-Sorting samples time: 00:01:22

  Allocating rank array

  Ranking v-sort output

  Ranking v-sort output time: 00:00:24

  Invoking Larsson-Sadakane on ranks

  Invoking Larsson-Sadakane on ranks time: 00:00:43

  Sanity-checking and returning

Building samples

Reserving space for 12 sample suffixes

Generating random suffixes

QSorting 12 sample offsets, eliminating duplicates

QSorting sample offsets, eliminating duplicates time: 00:00:00

Multikey QSorting 12 samples

  (Using difference cover)

  Multikey QSorting samples time: 00:00:00

Calculating bucket sizes

Splitting and merging

  Splitting and merging time: 00:00:00

Avg bucket size: 2.86133e+09 (target: 536498836)

Converting suffix-array elements to index image

Allocating ftab, absorbFtab

Entering Ebwt loop

Getting block 1 of 1

  No samples; assembling all-inclusive block

  Sorting block of length 2861327131 for bucket 1

  (Using difference cover)

  Sorting block time: 00:44:25

Returning block of 2861327132 for bucket 1

Exited Ebwt loop

fchr[A]: 0

fchr[C]: 844862932

fchr[G]: 1429875684

fchr[T]: 2015233940

fchr[$]: 2861327131

Exiting Ebwt::buildToDisk()

Returning from initFromVector

Wrote 957974502 bytes to primary EBWT file: hg19/hg19.1.bt2

Wrote 715331788 bytes to secondary EBWT file: hg19/hg19.2.bt2

Re-opening _in1 and _in2 as input streams

Returning from Ebwt constructor

Headers:

    len: 2861327131

    bwtLen: 2861327132

    sz: 715331783

    bwtSz: 715331783

    lineRate: 6

    offRate: 4

    offMask: 0xfffffff0

    ftabChars: 10

    eftabLen: 20

    eftabSz: 80

    ftabLen: 1048577

    ftabSz: 4194308

    offsLen: 178832946

    offsSz: 715331784

    lineSz: 64

    sideSz: 64

    sideBwtSz: 48

    sideBwtLen: 192

    numSides: 14902746

    numLines: 14902746

    ebwtTotLen: 953775744

    ebwtTotSz: 953775744

    color: 0

    reverse: 0

Total time for call to driver() for forward index: 00:56:27

Reading reference sizes

  Time reading reference sizes: 00:00:13

Calculating joined length

Writing header

Reserving space for joined string

Joining reference sequences

  Time to join reference sequences: 00:00:11

  Time to reverse reference sequence: 00:00:02

bmax according to bmaxDivN setting: 715331782

Using parameters --bmax 536498837 --dcv 1024

  Doing ahead-of-time memory usage test

  Passed!  Constructing with these parameters: --bmax 536498837 --dcv 1024

Constructing suffix-array element generator

Building DifferenceCoverSample

  Building sPrime

  Building sPrimeOrder

  V-Sorting samples

  V-Sorting samples time: 00:01:29

  Allocating rank array

  Ranking v-sort output

  Ranking v-sort output time: 00:00:28

  Invoking Larsson-Sadakane on ranks

  Invoking Larsson-Sadakane on ranks time: 00:00:42

  Sanity-checking and returning

Building samples

Reserving space for 12 sample suffixes

Generating random suffixes

QSorting 12 sample offsets, eliminating duplicates

QSorting sample offsets, eliminating duplicates time: 00:00:00

Multikey QSorting 12 samples

  (Using difference cover)

  Multikey QSorting samples time: 00:00:00

Calculating bucket sizes

Splitting and merging

  Splitting and merging time: 00:00:00

Avg bucket size: 2.86133e+09 (target: 536498836)

Converting suffix-array elements to index image

Allocating ftab, absorbFtab

Entering Ebwt loop

Getting block 1 of 1

  No samples; assembling all-inclusive block

  Sorting block of length 2861327131 for bucket 1

  (Using difference cover)

  Sorting block time: 00:44:02

Returning block of 2861327132 for bucket 1

Exited Ebwt loop

fchr[A]: 0

fchr[C]: 844862932

fchr[G]: 1429875684

fchr[T]: 2015233940

fchr[$]: 2861327131

Exiting Ebwt::buildToDisk()

Returning from initFromVector

Wrote 957974502 bytes to primary EBWT file: hg19/hg19.rev.1.bt2

Wrote 715331788 bytes to secondary EBWT file: hg19/hg19.rev.2.bt2

Re-opening _in1 and _in2 as input streams

Returning from Ebwt constructor

Headers:

    len: 2861327131

    bwtLen: 2861327132

    sz: 715331783

    bwtSz: 715331783

    lineRate: 6

    offRate: 4

    offMask: 0xfffffff0

    ftabChars: 10

    eftabLen: 20

    eftabSz: 80

    ftabLen: 1048577

    ftabSz: 4194308

    offsLen: 178832946

    offsSz: 715331784

    lineSz: 64

    sideSz: 64

    sideBwtSz: 48

    sideBwtLen: 192

    numSides: 14902746

    numLines: 14902746

    ebwtTotLen: 953775744

    ebwtTotSz: 953775744

    color: 0

    reverse: 1

Total time for backward call to driver() for mirror index: 00:56:19

花费一个小时左右。

结果:

跑完之后可以保存,直接调用就可以了。pwd确定文件目录的路径

#直接拷贝构建好的bowtie2索引

cp /home/training58/hicpro/hg19.*bt2 ./hg19

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多