分享

MERCYRI

 爱在日月明 2018-09-21
MERCYRI hff92474

64位https://the./~sgtatham/putty/latest/w64/putty.exe
Winscp:
https://jaist.dl./project/winscp/WinSCP/5.13.4/WinSCP-5.13.4-Setup.exe

预处理原始数据

解压:
gunzip newBGIseq500_*.fq.gz
gunzip chr17.vcf.gz    

fastq转fasta
命令如下:sh fq-fa.sh

fq-fa.sh
awk '{if(NR%4 == 1){print ">" substr($0, 2)}}{if(NR%4 == 2){print}}' newBGIseq500_1.fq > newBGIseq500_1.fa

awk '{if(NR%4 == 1){print ">" substr($0, 2)}}{if(NR%4 == 2){print}}' newBGIseq500_2.fq > newBGIseq500_2.fa

一、

1.成对reads数是1599999,能覆盖。

reads计算方法:CL100024405L1C001为newBGIseq500_1.fq和newBGIseq500_2.fq中reads特有的特征,故通过grep -c 'CL100024405L1C001' *.fq 来计算PE reads数。

reads命令如下:
sh reads.sh

reads.sh
grep -c 'CL100024405L1C001' newBGIseq500_1.fq

grep -c 'CL100024405L1C001' newBGIseq500_2.fq



覆盖方法及命令如下:
Rscript coverage.R

coverage.R

实际测序碱基数目<-1599999*100*2
需要覆盖碱基数目<-6*1e6*40*0.99
实际测序碱基数目>需要覆盖碱基数目

2.下载安装soapdenovo软件,
网址:https://github.com/aquaskyline/SOAPdenovo2


unzip SOAPdenovo2-master.zip

cd
SOAPdenovo2-master

pwd
 /home/stu38/SOAPdenovo2-master

vi ~/.bashrc

source ~/.bashrc

make

建立配置文件lib.cfg
vi lib.cfg
max_rd_len=100
[LIB]
avg_ins=450
reverse_seq=0
asm_flags=3
rank=1
pair_num_cutoff=3
map_len=32
q1=/home/data/newBGIseq500_1.fq
q2=/home/data/newBGIseq500_2.fq
f1=/home/data/newBGIseq500_1.fa
f2=/home/data/newBGIseq500_2.fa

运行soapdenovo,得到组装结果ant.scafseq和ant.scafStatistics等
./SOAPdenovo-127mer all -s lib.cfg -K 35 -D 1 -o ant >>2ant.log

计算组装结果序列长度
命令如下:perl changdu.pl ant.scafseq > length.txt
changdu.pl
#!usr/bin/perl

use warnings;

use strict;

my (@arr, $changdu, %hash, $key);

open (FH, "$ARGV[0]");  #$ARGV[0]指的是脚本运行后跟的第一个参数





while(<FH>){

                 chomp;

                if (/^>/){

                   $key=$_;

                }



                $hash{$key}.=$_ unless /^>/;

}



foreach (keys %hash){

                  $changdu=length($hash{$_});

                  print "$_\t$changdu\t$hash{$_}\n";



}
或者
awk '/^>/&&NR>1{print "";}{ printf "%s",/^>/ ? $0" ":$0 }'  ant.scafseq |awk '{print $1"\t"length($3)}' > length.txt

长度序列挑取及排序
命令如下:sh sort-length.sh
sort-length.sh
cut -f2 length.txt |sort -nr >sort-length.txt

长度累积曲线图,详见length.pdf
计算方法及命令如下:Rscript length.R
pdf("length.pdf")
dat<-read.table("/home/exam01/sort-length.txt")
d<-cumsum(dat)/sum(dat)
plot(d$V1,type="l",main="length distribution",xlab="length",ylab="frequency",col="red")
dev.off()
q()


3.组装结果的n50是687
计算方法及命令如下:
perl n50.pl ant.scafSeq


n50.pl
#/usr/bin/perl -w
use strict;
my ($len,$total)=(0,0);
my @x;
while(<>){
    if(/^[\>\@]/){
        if($len>0){
            $total+=$len;
            push @x,$len;
        }
        $len=0;
    }
    else{
        s/\s//g;
        $len+=length($_);
    }
}
if ($len>0){
    $total+=$len;
    push @x,$len;
}
@x=sort{$b<=>$a} @x;
my ($count,$half)=(0,0);
for (my $j=0;$j<@x;$j++){
    $count+=$x[$j];
    if (($count>=$total/2)&&($half==0)){
        print "N50: $x[$j]\n";
        $half=$x[$j]
    }elsif ($count>=$total*0.9){
        print "N90: $x[$j]\n";
        exit;
    }
}

二、

1.vcf中变异位点的qual值分布请查看文件qual.txt,qual分布图详见qual.pdf

计算方法及命令如下:
下载安装vcftools
网址:https://jaist.dl./project/vcftools/vcftools_0.1.13.tar.gz
tar zxvf vcftools_0.1.13.tar.gz
pwd
 /home/soft/vcftools_0.1.13
vi ~/.bashrc

source ~/.bashrccd vcftools_0.1.13
make

统计qual值分布,命令如下:
vcftools --vcf chr17.vcf --site-quality --out Q


图计算方法及命令如下:Rscript qual.R
pdf("qual.pdf")
dat<-read.table("/home/exam01/Q.1qual")
hist(dat[,2],main="qual distribution",xlab="qual value",ylab="frequency")
dev.off()
q()



2.TP53基因变异信息请查看文件tp53.vcf


提取tp53基因变异情况,命令如下:
vcftools --vcf chr17.vcf --chr chr17 --from-bp 7668402 --to-bp 7687550 --recode -c > tp53.vcf

计算变异位点总数目是41,命令如下:sh var.sh
var.sh
grep -v '^#' tp53.vcf |wc -l

计算各样本tp53变异位点情况,未变异数目=变异总数目-纯合和杂合变异数目,结果是27DMBDM4YT纯合9个,杂合8个,未变异24个,7XKZJA3JWX纯合6个,杂合33个,未变异2个,APRDKV0LDS纯合8个,杂合8个,未变异25个。
纯合计算方法及命令如下:sh hom-var1.sh
hom-var1.sh
less -S tp53.vcf | cut -f 10 | grep  -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l
hom-var2.sh
less -S tp53.vcf | cut -f 11 | grep  -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l
hom-var3.sh
less -S tp53.vcf | cut -f 12 | grep  -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1==$2) print $0}' | wc –l
杂合计算方法及命令如下:sh het-var1.sh
hom-var1.sh
less -S tp53.vcf | cut -f 10 | grep  -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l
hom-var2.sh
less -S tp53.vcf | cut -f 11 | grep  -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l
hom-var3.sh
less -S tp53.vcf | cut -f 12 | grep  -v "0/0" | awk –F ":" '{print $1}' | awk -F "\/" '{print $1,$2}' | awk '{if($1!=$2) print $0}' | wc –l

三、
1.kmer图详见17mer.pdf,方法如下:
下载安装gce
网址:ftp://ftp.genomics.org.cn/pub/gce
tar -zxvf  gce-1.0.0.tar.gz
cd gce-1.0.0/kmerfreq/kmer_freq_hash
pwd
 /home/soft/gce-1.0.0/
kmerfreq/kmer_freq_hash
vi ~/.bashrc

source ~/.bashrc


建立fq.list,命令如下:ls /home/data/*.fq > fq.list

运行kmer分析,得到output.freq.stat,进行后续R画图
命令如下:kmer_freq_hash -k 17 -i 450000000 -l fq.list  2>kmerfreq.log

图计算方法及命令如下:Rscript 17mer.R
pdf("17mer.pdf")
dat<-read.table("/home/exam01/output.freq.stat")
plot(dat[,2],type="l",main="17mer distribution",xlab="17mer depth",ylab="frequency",xlim=c(0,200),ylim=c(0,1e6))
dev.off()
q()


2.基因组大小是6.4344e+06bp
计算方法及命令如下:less -S output.freq.stat | awk '{sum += $1*$2} END {print sum/41}'

四、
1.下载安装bzip、bwa、samtools、bcftools
bzip网址:https:///download/bzip2
bwa网址:https://jaist.dl./project/bio-bwa/bwa-0.7.17.tar.bz2
samtools网址:https://github.com/samtools/samtools/releases/download/1.9/samtools-1.9.tar.bz2
bcftools网址:https://github.com/samtools/bcftools/releases/download/1.9/bcftools-1.9.tar.bz2
bzip:
tar -zxvf  bzip2-1.0.6.tar.gz
cd bzip2-1.0.6
pwd
 /home/soft/bzip2-1.0.6
vi ~/.bashrc

source ~/.bashrc

make

bwa:
tar jxvf bwa-0.7.17.tar.bz2
cd bwa
pwd
 /home/soft/bwa-0.7.17
vi ~/.bashrc

source ~/.bashrc

make

bcftools:
tar jxvf bcftools-1.9.tar.bz2
cd bwa
pwd
 /home/soft/bcftools-1.9
vi ~/.bashrc

source ~/.bashrc

make

samtools:
tar jxvf samtools-1.9.tar.bz2
cd bwa
pwd
 /home/soft/samtools-1.9
vi ~/.bashrc

source ~/.bashrc

./congfigure --without-curses
make

2.bwa mem比对
bwa index *.fa
bwa mem *.fa newBGIseq500_1.fq >mem.sam
bwa aln ref.fa newBGIseq500_1.fq >read1.sai
bwa aln ref.fa newBGIseq500_2.fq >read2.sai
bwa sampe ref.fa read1.sai read2.sai newBGIseq500_1.fq newBGIseq500_2.fq >aln.sam

3.sam转bam,sort
samtools view -Sb mem.sam > mem.bam
samtools sort mem.bam >mem.sort

4.call变异
Samtools mpileup -ugf ref.fa mem.sort |bcftools call –vm –O z –o test.vcf

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多