一个稍微复杂的案例, 看看snakemake的用法.过程介绍
1, 安装snakemake这里需要时python3, 不支持python2 pip3 install --user snakemake pyaml 2, 新建几个FASTQ文件这里, 我们新建两个配对的RNA-seq数据, 格式是FASTQ的文件, 然后经过下面两步处理:
创建文件
touch genome.fa mkdir fastq touch fastq/Sample1.R1.fastq.gz fastq/Sample1.R2.fastq.gz touch fastq/Sample2.R1.fastq.gz fastq/Sample2.R2.fastq.gz 创建结果, 使用tree查看: (base) [dengfei@localhost test]$ tree . ├── fastq │ ├── Sample1.R1.fastq.gz │ ├── Sample1.R2.fastq.gz │ ├── Sample2.R1.fastq.gz │ └── Sample2.R2.fastq.gz └── genome.fa
1 directory, 5 files 3, 创建snakemake参数文件将下面代码命名为Snakefile SAMPLES = ['Sample1', 'Sample2']
rule all: input: expand('{sample}.txt', sample=SAMPLES)
rule quantify_genes: input: genome = 'genome.fa', r1 = 'fastq/{sample}.R1.fastq.gz', r2 = 'fastq/{sample}.R2.fastq.gz' output: '{sample}.txt' shell: 'echo {input.genome} {input.r1} {input.r2} > {output}' 4, 参数解释我们下面进行代码的讲解: 这里, 定义了一个SAMPLE的数组: SAMPLES = ['Sample1', 'Sample2'] 数组, SAMPLES,里面有两个元素: Sample1和Sample2 定义一个rule, 名称为all, input使用expand函数, 能够将数组的内容解析给{sample} rule all: input: expand('{sample}.txt', sample=SAMPLES) 定义一个rule, 命名为 quantify_genes, 里面有input, output, shell, 其中{sample}是用的rule all里面的name rule quantify_genes: input: genome = 'genome.fa', r1 = 'fastq/{sample}.R1.fastq.gz', r2 = 'fastq/{sample}.R2.fastq.gz' output: '{sample}.txt' shell: 'echo {input.genome} {input.r1} {input.r2} > {output}' 5, 运行参数预览命令, 使用命令: snakemake -np 参数介绍 注意: -n 不执行, 只打印命令 例子: (snake_test) [dengfei@localhost ex2]$ snakemake -np Building DAG of jobs... Job counts: count jobs 1 all 2 quantify_genes 3
[Tue Apr 2 13:49:34 2019] rule quantify_genes: input: genome.fa, fastq/Sample1.R1.fastq.gz, fastq/Sample1.R2.fastq.gz output: Sample1.txt jobid: 1 wildcards: sample=Sample1
echo genome.fa fastq/Sample1.R1.fastq.gz fastq/Sample1.R2.fastq.gz > Sample1.txt
[Tue Apr 2 13:49:34 2019] rule quantify_genes: input: genome.fa, fastq/Sample2.R1.fastq.gz, fastq/Sample2.R2.fastq.gz output: Sample2.txt jobid: 2 wildcards: sample=Sample2
echo genome.fa fastq/Sample2.R1.fastq.gz fastq/Sample2.R2.fastq.gz > Sample2.txt
[Tue Apr 2 13:49:34 2019] localrule all: input: Sample1.txt, Sample2.txt jobid: 0
Job counts: count jobs 1 all 2 quantify_genes 3 This was a dry-run (flag -n). The order of jobs does not reflect the order of execution |
|