# 工作目录,根据个人情况修改wd=~/test/metagenome17cd $wdsudo apt-get install bwa samtools 下载数据
序列比对 # 建索引bwa index subset_assembly.fa# 双端合并序列,使用-p参数比对for i in *fqdo# unrecognized command 'mem' 版本过旧,qiime索引了旧版本,bwa改为绝对路/usr/bin/bwa mem -p subset_assembly.fa $i > ${i}.aln.sam done samtools操作比对结果samtools可视化比对结果
在两个终端(Xshell)中用samtools查看不同样品的比对结果序列分析情况 bedtools丰度估计我们将会用比对结果估计组装序列的覆盖度 使用bedtools比对,常用 http://bedtools./en/latest/content/tools/genomecov.html # 安装程序sudo apt-get install bedtools# 使用genomeCoverageBed定量基因for i in *sorted.bam do genomeCoverageBed -ibam $i > ${i/.pe*/}.histogram.tabdone 我们看一下结果:
To get an esimate of mean coverage for a contig we sum (Depth of coverage) * (Number of bases on contig) / (Length of the contig). We have a quick script that will do this calculation. 计算平均深度 wget https://raw./ngs-docs/2017-cicese-metagenomics/master/files/calculate-contig-coverage.py# 安装过可跳过sudo pip install pandasfor hist in *histogram.tabdo python calculate-contig-coverage.py $histdone 产生结果文件
可选分析:末质控数据结果如何?# 下载原始数据curl -O https://s3-us-west-1./dib-training.ucdavis.edu/metagenomics-scripps-2016-10-12/SRR1976948_1.fastq.gzcurl -O https://s3-us-west-1./dib-training.ucdavis.edu/metagenomics-scripps-2016-10-12/SRR1976948_2.fastq.gz# 如果有,可复制过来cp ../data/SRR1976948_?.fastq.gz .# 解压变只提取前20万行gunzip -c SRR1976948_1.fastq.gz | head -800000 > SRR1976948.1gunzip -c SRR1976948_2.fastq.gz | head -800000 > SRR1976948.2# 比对bwa aln subset_assembly.fa SRR1976948.1 > SRR1976948_1.untrimmed.saibwa aln subset_assembly.fa SRR1976948.2 > SRR1976948_2.untrimmed.saibwa sampe subset_assembly.fa SRR1976948_1.untrimmed.sai SRR1976948_2.untrimmed.sai SRR1976948.1 SRR1976948.2 > SRR1976948.untrimmed.sam# 压缩、排序、索引i=SRR1976948.untrimmed.samsamtools import subset_assembly.fa $i $i.bamsamtools sort $i.bam -o $i.bam.sorted.bamsamtools index $i.bam.sorted.bam# 查看samtools tview SRR1976948.untrimmed.sam.bam.sorted.bam subset_assembly.fa -p k99_13588:500 比较这些没有trim质控的数据,看看和原来的结果有什么不同? |
|