任务
<font color=orange>reads计数的原理</font>
<font color=orange>标准化的原理RPKM/FPKM/TPM</font>
image
<font color=orange>HTSeq-count定量</font>
htseq-count [options] <alignment_file> <gff_file>
### samtools重新排序 for i in `seq 56 58`; do nohup samtools sort -@ 5 -n ../5_samtools/SRR35899${i}.bam -o SRR35899${i}_nsort.bam & done
for i in `seq 56 58` do htseq-count -s no -r name -f bam SRR35899${i}_nsort.bam ../../Database/human/Homo_sapiens.GRCh38.87.chr.gtf > SRR35899${i}_matrix.count 2> SRR35899${i}.log done
<font color=orange>featureCounts定量</font>
featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ... ### 参考其他命令行 featureCounts -T 4 -t CDS -s -g Name -C -a gtf -o readscount_cdf_out 1.sam 3.sam 5.sam ... $featureCounts -T 5 -p -t exon -g gene_id -a $mm10_gtf -o counts.txt *.bam <font color=orange>脚本合并各个表达量的文件生成为表达矩阵</font>
#!/usr/bin/perl -w $header= "Gene"; foreach $i ( 0..@ARGV-1){ #### $ARGV[$i]=~/(.*?)_(.*?).txt/; $ARGV[$i]=~/(.*?)_matrix\.count/; $header.="\t$1"; open IN,"$ARGV[$i]" or die "$!"; while(<IN>){ chomp; @line=split; $hash{$line[0]}[$i]=$line[1]; } } print"$header\n"; foreach $gene_id(sort keys %hash){ foreach $i(0..@ARGV-1){ if(!$hash{$gene_id}[$i]){ $hash{$gene_id}[$i]=0; } } print $gene_id,"\t",join("\t",@{$hash{$gene_id}}),"\n"; }
<font color=orange>重新对小鼠的4个数据进行比对-计数</font>### 下载小鼠的hisat2 参考基因组index数据 wget -b ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat2/data/grcm38.tar.gz ### 下载小鼠的gtf注释数据: wget -b wget ftp://ftp.ensembl.org/pub/release-90/gtf/mus_musculus/Mus_musculus.GRCm38.90.gtf.gz ### 对小鼠的数据进行比对,直接输出为bam文件,跳过输出的sam文件。 hisat2 -p 5 -x /sas/supercloud-kong/wangtianpeng/Database/mouse/grcm38/genome -1 /sas/supercloud-kong/wangtianpeng/data/biotrainee/1_raw_data/SRR3589959_1.fastq.gz -2 /sas/supercloud-kong/wangtianpeng/data/biotrainee/1_raw_data/SRR3589959_2.fastq.gz 2>/sas/supercloud-kong/wangtianpeng/data/biotrainee/5_samtools/SRR3589959.log | samtools view -Sb - > SRR3589959.bam ### HISAT2比对的循环脚本 for i in `seq 60 62` do nohup hisat2 -x /sas/supercloud-kong/wangtianpeng/Database/mouse/grcm38/genome -1 /sas/supercloud-kong/wangtianpeng/data/biotrainee/1_raw_data/SRR35899${i}_1.fastq.gz -2 /sas/supercloud-kong/wangtianpeng/data/biotrainee/1_raw_data/SRR35899${i}_2.fastq.gz 2>/sas/supercloud-kong/wangtianpeng/data/biotrainee/5_samtools/SRR35899${i}.log | samtools view -Sb - > /sas/supercloud-kong/wangtianpeng/data/biotrainee/5_samtools/SRR35899${i}.bam & done ### HTSeq-count计数 for i in `seq 59 62` do nohup htseq-count -s no -r name -f bam -i gene_name ../5_samtools/SRR35899${i}.bam /sas/supercloud-kong/wangtianpeng/Database/mouse/Mus_musculus.GRCm38.90.gtf >SRR35899${i}_matrix.count 2> SRR35899${i}.log & done
参考文档
|
|
来自: 新用户3677sdB0 > 《转录组学习》