看到一个成熟的R包,所以推荐一下
最重要的使用方法,当然是读入vcf文件啦,如下: library(vcfR) vcf_file='/Users/jmzeng/germline/merge.dbsnp.vcf' vcf <- read.vcfR( vcf_file, verbose = FALSE )
十几秒钟就轻轻松松读入一个300多M的vcf文件啦,成为一个S4对象: > vcf ***** Object of Class vcfR ***** 39 samples 24 CHROMs 212,875 variants Object size: 356.6 Mb 0 percent missing data ***** ***** ***** >
懂R的朋友就应该是知道如何去查看帮助文档来学习它啦,最重要的两句话是: Objects of class vcfR can be manipulated with vcfR-method and extract.gt. Contents of the vcfR object can be visualized with the plot method.
最基本的3个元素是: meta character vector for the meta information fix matrix for the fixed information gt matrix for the genotype information
其中meta存储着vcf的头文件,而fix存储在vcf的固定列,gt存储在样本基因型信息。 最基本的操作函数如下: show(object) colnames(vcf@fix) vcf@fix[1:4,1:4] colnames(vcf@gt) vcf@gt[1:4,1:4]
head(x, n = 6, maxchar = 80) plot(x, y, ...) ## 主要查看突变位点的质量值的分布情况 x[i, j, samples = NULL, ..., drop] dim(x) nrow(x)
解析出来了读入的文件所变成的对象,后续分析就简单多了。 包自带的测试数据pkg <- "pinfsc50" vcf_file <- system.file("extdata", "pinf_sc50.vcf.gz", package = pkg) dna_file <- system.file("extdata", "pinf_sc50.fasta", package = pkg) gff_file <- system.file("extdata", "pinf_sc50.gff", package = pkg) library(vcfR) vcf <- read.vcfR( vcf_file, verbose = FALSE ) dna <- ape::read.dna(dna_file, format = "fasta") gff <- read.table(gff_file, sep="\t", quote="") library(vcfR) chrom <- create.chromR(name='Supercontig', vcf=vcf, seq=dna, ann=gff) plot(chrom) chromoqc(chrom, dp.alpha=20) chromoqc(chrom, xlim=c(5e+05, 6e+05))
可以出一下还不错的图,我觉得颜值还行!
|