分享

snakemake 学习笔记3

 育种数据分析 2021-11-18

之前写的博客, 记录记录一下学习的轨迹.

目标

这次, 我要实现这个路程图.

目标介绍

  • 第一: 生成1.txt , 2.txt, 3.txt

  • 第二: 向每个文件中加入”add a”字符, 命名为:1_add_a.txt, 2_add_a.txt, 3_add_a.txt

  • 第三: 向文件中增加”add b”, 命名为:1_add_a_add_b.txt, 2_add_a_add_b.txt, 3_add_a_add_b.txt

  • 第四: 向文件中增加”add c”, 命名为: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt

  • 第五: 将1_add_a_add_b.txt, 2_add_a_add_b.txt, 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt 合并为hebing.txt文件

1. 生成三个文件

(snake_test) [dengfei@localhost ex4]$ ls *txt1.txt 2.txt 3.txt(snake_test) [dengfei@localhost ex4]$ cat *txtthis is 1.txtthis is 2.txtthis is 3.txt

2. 在每个文件中增加”add a”

对应的Snakefile内容如下:

rule adda:    input: "{file}.txt"    output: "{file}_add_a.txt"    shell: "cat {input} |xargs echo add a >{output}"

预览一下命令:snakemake -np {1,2,3}_add_a.txt

注意: 这里要把生成的文件{1,2,3}_add_a.txt写出来, 命令才可以运行.

(snake_test) [dengfei@localhost ex4]$ snakemake -np {1,2,3}_add_a.txtBuilding DAG of jobs...Job counts: count jobs 3 adda 3
[Tue Apr  2 21:09:19 2019]rule adda: input3.txt output3_add_a.txt jobid2 wildcardsfile=3
cat 3.txt |xargs echo add a >3_add_a.txt
[Tue Apr  2 21:09:19 2019]rule adda: input2.txt output2_add_a.txt jobid0 wildcardsfile=2
cat 2.txt |xargs echo add a >2_add_a.txt
[Tue Apr  2 21:09:19 2019]rule adda: input1.txt output1_add_a.txt jobid1 wildcardsfile=1
cat 1.txt |xargs echo add a >1_add_a.txtJob counts: count jobs 3 adda 3This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

执行命令:

snakemake {1,2,3}_add_a.txtBuilding DAG of jobs...Using shell: /usr/bin/bashProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 adda 3
[Tue Apr  2 21:11:09 2019]rule adda: input3.txt output3_add_a.txt jobid0 wildcardsfile=3
[Tue Apr  2 21:11:09 2019]Finished job 0.1 of 3 steps (33%) done
[Tue Apr  2 21:11:09 2019]rule adda: input1.txt output1_add_a.txt jobid1 wildcardsfile=1
[Tue Apr  2 21:11:09 2019]Finished job 1.2 of 3 steps (67%) done
[Tue Apr  2 21:11:09 2019]rule adda: input2.txt output2_add_a.txt jobid2 wildcardsfile=2
[Tue Apr  2 21:11:09 2019]Finished job 2.3 of 3 steps (100%) doneComplete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T211109.153566.snakemake.log

查看*add_a.txt文件:

(snake_test) [dengfei@localhost ex4]$ ls *add_a.txt1_add_a.txt 2_add_a.txt 3_add_a.txt(snake_test) [dengfei@localhost ex4]$ cat *add_a.txtadd a this is 1.txtadd a this is 2.txtadd a this is 3.txt

搞定.

3. 在每个文件中增加”add b”

对应的Snakefile内容如下:

rule adda: input"{file}.txt" output"{file}_add_a.txt" shell"cat {input} |xargs echo add a >{output}"rule addb: input: "{file}_add_a.txt" output: "{file}_add_a_add_b.txt" shell: "cat {input} | xargs echo add b >{output}"

预览一下命令:snakemake -np {1,2,3}_add_a_add_b.txt

(snake_test) [dengfei@localhost ex4]$ snakemake {1,2,3}_add_a_add_b.txtBuilding DAG of jobs...Using shell: /usr/bin/bashProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 addb 3
[Tue Apr  2 21:13:57 2019]rule addb: input2_add_a.txt output2_add_a_add_b.txt jobid0 wildcardsfile=2
[Tue Apr  2 21:13:57 2019]Finished job 0.1 of 3 steps (33%) done
[Tue Apr  2 21:13:57 2019]rule addb: input1_add_a.txt output1_add_a_add_b.txt jobid1 wildcardsfile=1
[Tue Apr  2 21:13:57 2019]Finished job 1.2 of 3 steps (67%) done
[Tue Apr  2 21:13:57 2019]rule addb: input3_add_a.txt output3_add_a_add_b.txt jobid2 wildcardsfile=3
[Tue Apr  2 21:13:57 2019]Finished job 2.3 of 3 steps (100%) doneComplete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T211357.666661.snakemake.log

执行命令:

snakemake {1,2,3}_add_a_add_b.txt

查看流程图

命令:

snakemake --dag {1,2,3}_add_a_add_b.txt |dot -Tpdf >a.pdf

这里生成的a.pdf如下:

4. 在每个文件中增加”add c”

Snakemake命令:

rule adda: input"{file}.txt" output"{file}_add_a.txt" shell"cat {input} |xargs echo add a >{output}"rule addb: input: "{file}_add_a.txt" output: "{file}_add_a_add_b.txt" shell: "cat {input} | xargs echo add b >{output}"
rule addc: input: "{file}_add_a_add_b.txt" output: "{file}_add_a_add_b_add_c.txt" shell: "cat {input} | xargs echo add c >{output}"

流程图:

命令:

snakemake --dag {1,2,3}_add_a_add_b_add_c.txt |dot -Tpdf >a1.pdf

5. 将文件合并

rule adda:    input: "{file}.txt"    output: "{file}_add_a.txt"    shell: "cat {input} |xargs echo add a >{output}"rule addb:    input: "{file}_add_a.txt"    output: "{file}_add_a_add_b.txt"    shell: "cat {input} | xargs echo add b >{output}"
rule addc:    input: "{file}_add_a_add_b.txt"    output: "{file}_add_a_add_b_add_c.txt"    shell: "cat {input} | xargs echo add c >{output}"
rule hebing:    input: a=expand("{file}_add_a_add_b_add_c.txt",file=["1","2","3"]), b=expand("{file}_add_a_add_b.txt",file=["1","2"])    output:"hebing.txt"    shell:"cat {input.a} {input.b} >{output}"

执行命令:

snakemake hebing.txt

执行结果:

Building DAG of jobs...Using shell: /usr/bin/bashProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 addc 1 hebing 4
[Tue Apr  2 21:21:04 2019]rule addc: input1_add_a_add_b.txt output1_add_a_add_b_add_c.txt jobid1 wildcardsfile=1
[Tue Apr  2 21:21:04 2019]Finished job 1.1 of 4 steps (25%) done
[Tue Apr  2 21:21:04 2019]rule addc: input3_add_a_add_b.txt output3_add_a_add_b_add_c.txt jobid3 wildcardsfile=3
[Tue Apr  2 21:21:04 2019]Finished job 3.2 of 4 steps (50%) done
[Tue Apr  2 21:21:04 2019]rule addc: input2_add_a_add_b.txt output2_add_a_add_b_add_c.txt jobid2 wildcardsfile=2
[Tue Apr  2 21:21:04 2019]Finished job 2.3 of 4 steps (75%) done
[Tue Apr  2 21:21:04 2019]rule hebing: input1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt outputhebing.txt jobid0
[Tue Apr  2 21:21:04 2019]Finished job 0.4 of 4 steps (100%) doneComplete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T212104.719887.snakemake.log

流程图:

搞定

欢迎关注我的公众号: R-breeding

相关阅读

snakemake 学习笔记1
snakemake 学习笔记2

后记1

今天测试了一下rule all的功能, 它是定义输出文件的, 如果没有定义, 需要在命令行中书写.

因为最后的输出文件是hebing.txt, 所以我们这里在Snakefile中定义一下输出文件.

rule all:    input:"hebing.txt"rule adda:    input: "{file}.txt"    output: "{file}_add_a.txt"    shell: "cat {input} |xargs echo add a >{output}"rule addb:    input: "{file}_add_a.txt"    output: "{file}_add_a_add_b.txt"    shell: "cat {input} | xargs echo add b >{output}"
rule addc:    input: "{file}_add_a_add_b.txt"    output: "{file}_add_a_add_b_add_c.txt"    shell: "cat {input} | xargs echo add c >{output}"
rule hebing:    input: a=expand("{file}_add_a_add_b_add_c.txt",file=["1","2","3"]), b=expand("{file}_add_a_add_b.txt",file=["1","2"])    output:"hebing.txt"    shell:"cat {input.a} {input.b} >{output}"

执行命令:

snakemake

结果如下:

(base) [dengfei@localhost ex4]$ snakemakeProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 adda 3 addb 3 addc 1 all 1 hebing 11
rule adda: input1.txt output1_add_a.txt jobid7 wildcardsfile=1
Finished job 7.1 of 11 steps (9%) done
rule adda: input2.txt output2_add_a.txt jobid9 wildcardsfile=2
Finished job 9.2 of 11 steps (18%) done
rule adda: input3.txt output3_add_a.txt jobid10 wildcardsfile=3
Finished job 10.3 of 11 steps (27%) done
rule addb: input3_add_a.txt output3_add_a_add_b.txt jobid8 wildcardsfile=3
Finished job 8.4 of 11 steps (36%) done
rule addb: input1_add_a.txt output1_add_a_add_b.txt jobid3 wildcardsfile=1
Finished job 3.5 of 11 steps (45%) done
rule addb: input2_add_a.txt output2_add_a_add_b.txt jobid6 wildcardsfile=2
Finished job 6.6 of 11 steps (55%) done
rule addc: input3_add_a_add_b.txt output3_add_a_add_b_add_c.txt jobid5 wildcardsfile=3
Finished job 5.7 of 11 steps (64%) done
rule addc: input2_add_a_add_b.txt output2_add_a_add_b_add_c.txt jobid2 wildcardsfile=2
Finished job 2.8 of 11 steps (73%) done
rule addc: input1_add_a_add_b.txt output1_add_a_add_b_add_c.txt jobid4 wildcardsfile=1
Finished job 4.9 of 11 steps (82%) done
rule hebing: input1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt outputhebing.txt jobid1
Finished job 1.10 of 11 steps (91%) done
localrule all: inputhebing.txt jobid0
Finished job 0.11 of 11 steps (100%) done

查看结果:

(base) [dengfei@localhost ex4]$ cat hebing.txt add c add b add a this is 1.txtadd c add b add a this is 2.txtadd c add b add a this is 3.txtadd b add a this is 1.txtadd b add a this is 2.txt

后记2

snakemake如果是默认的名称, 为Snakefile, 但是这样写没有高亮, 可以写为a.py, 然后用snakemake -s a.py运行即可.

rule all:    input:"hebing.txt"rule adda:    input: "{file}.txt"    output: "{file}_add_a.txt"    shell: "cat {input} |xargs echo add a >{output}"rule addb:    input: "{file}_add_a.txt"    output: "{file}_add_a_add_b.txt"    shell: "cat {input} | xargs echo add b >{output}"
rule addc:    input: "{file}_add_a_add_b.txt"    output: "{file}_add_a_add_b_add_c.txt"    shell: "cat {input} | xargs echo add c >{output}"
rule hebing:    input: a=expand("{file}_add_a_add_b_add_c.txt",file=["1","2","3"]), b=expand("{file}_add_a_add_b.txt",file=["1","2"])    output:"hebing.txt"    shell:"cat {input.a} {input.b} >{output}"

执行结果:

(base) [dengfei@localhost ex4]$ snakemake -s a.py Provided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 adda 3 addb 3 addc 1 all 1 hebing 11
rule adda: input1.txt output1_add_a.txt jobid8 wildcardsfile=1
Finished job 8.1 of 11 steps (9%) done
rule adda: input3.txt output3_add_a.txt jobid10 wildcardsfile=3
Finished job 10.2 of 11 steps (18%) done
rule adda: input2.txt output2_add_a.txt jobid9 wildcardsfile=2
Finished job 9.3 of 11 steps (27%) done
rule addb: input3_add_a.txt output3_add_a_add_b.txt jobid7 wildcardsfile=3
Finished job 7.4 of 11 steps (36%) done
rule addb: input2_add_a.txt output2_add_a_add_b.txt jobid4 wildcardsfile=2
Finished job 4.5 of 11 steps (45%) done
rule addb: input1_add_a.txt output1_add_a_add_b.txt jobid3 wildcardsfile=1
Finished job 3.6 of 11 steps (55%) done
rule addc: input3_add_a_add_b.txt output3_add_a_add_b_add_c.txt jobid2 wildcardsfile=3
Finished job 2.7 of 11 steps (64%) done
rule addc: input2_add_a_add_b.txt output2_add_a_add_b_add_c.txt jobid5 wildcardsfile=2
Finished job 5.8 of 11 steps (73%) done
rule addc: input1_add_a_add_b.txt output1_add_a_add_b_add_c.txt jobid6 wildcardsfile=1
Finished job 6.9 of 11 steps (82%) done
rule hebing: input1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt outputhebing.txt jobid1
Finished job 1.10 of 11 steps (91%) done
localrule all: inputhebing.txt jobid0
Finished job 0.11 of 11 steps (100%) done

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多