分享

Efficient Methods for Counting K

 勤悦轩 2015-10-23

当拿到转录组数据或者基因组数据后,在做拼接以及其他一系列的分析之前,我们一般会做一个K-mer频率分布分析,下面是目前常用的几种计算K-mer的算法和软件。

A. Bloom Filter-based Approach

This method uses the fact that, in real data, large number of k-mers are singletons appearing due to sequencing errors. Bloom filter based approach takes the least amount of memory, but is slightly slower than JELLYFISH hashing approach.

i) Efficient counting of k -mers in DNA sequences using a bloom filter, Páll Melsted and Jonathan K Pritchard, BMC Bioinformatics 2011, 12:333 doi:10.1186/1471-2105-12-333.

BFcounter code is here.

ii) khmercode

B. Hashing-based Approach as in JELLYFISH

i) Guillaume Marcais and Carl Kingsford, A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics (2011) 27(6): 764-770 doi:10.1093/bioinformatics/btr011.

It is based on a multithreaded, lock-free hash table optimized for counting k-mers up to 31 bases in length. Due to their flexibility, suffix arrays have been the data structure of choice for solving many string problems. For the task of k-mer counting, important in many biological applications, Jellyfish offers a much faster and more memory-efficient solution.

Their manual is available here.

C. Meryl

We are not sure of how efficient the algorithm is. Their website says -

An out-of-core k-mer counter. The amount of sequence that can be processed for any size k depends only on the amount of free disk space.

More here.

D. Tallymer – Suffix array based approach

A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes

Stefan Kurtz, Apurva Narechania, Joshua C Stein and Doreen Ware, BMC Genomics, 2008, 9:517 doi:10.1186/1471-2164-9-517.

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多