【原】snakemake 学习笔记3

育种数据分析 2021-11-18

展开全文

之前写的博客, 记录记录一下学习的轨迹.

目标

这次, 我要实现这个路程图.

目标介绍

第一: 生成1.txt , 2.txt, 3.txt
第二: 向每个文件中加入”add a”字符, 命名为:1_add_a.txt, 2_add_a.txt, 3_add_a.txt
第三: 向文件中增加”add b”, 命名为:1_add_a_add_b.txt, 2_add_a_add_b.txt, 3_add_a_add_b.txt
第四: 向文件中增加”add c”, 命名为: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt
第五: 将1_add_a_add_b.txt, 2_add_a_add_b.txt, 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt 合并为hebing.txt文件

1. 生成三个文件

(snake_test) [dengfei@localhost ex4]$ ls *txt1.txt 2.txt 3.txt(snake_test) [dengfei@localhost ex4]$ cat *txtthis is 1.txtthis is 2.txtthis is 3.txt

2. 在每个文件中增加”add a”

对应的Snakefile内容如下:

rule adda: input: "{file}.txt" output: "{file}_add_a.txt" shell: "cat {input} |xargs echo add a >{output}"

预览一下命令:snakemake -np {1,2,3}_add_a.txt

注意: 这里要把生成的文件{1,2,3}_add_a.txt写出来, 命令才可以运行.

(snake_test) [dengfei@localhost ex4]$ snakemake -np {1,2,3}_add_a.txtBuilding DAG of jobs...Job counts: count jobs 3 adda 3

[Tue Apr 2 21:09:19 2019]rule adda: input: 3.txt output: 3_add_a.txt jobid: 2 wildcards: file=3

cat 3.txt |xargs echo add a >3_add_a.txt

[Tue Apr 2 21:09:19 2019]rule adda: input: 2.txt output: 2_add_a.txt jobid: 0 wildcards: file=2

cat 2.txt |xargs echo add a >2_add_a.txt

[Tue Apr 2 21:09:19 2019]rule adda: input: 1.txt output: 1_add_a.txt jobid: 1 wildcards: file=1

cat 1.txt |xargs echo add a >1_add_a.txtJob counts: count jobs 3 adda 3This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

执行命令:

snakemake {1,2,3}_add_a.txtBuilding DAG of jobs...Using shell: /usr/bin/bashProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 adda 3

[Tue Apr 2 21:11:09 2019]rule adda: input: 3.txt output: 3_add_a.txt jobid: 0 wildcards: file=3

[Tue Apr 2 21:11:09 2019]Finished job 0.1 of 3 steps (33%) done

[Tue Apr 2 21:11:09 2019]rule adda: input: 1.txt output: 1_add_a.txt jobid: 1 wildcards: file=1

[Tue Apr 2 21:11:09 2019]Finished job 1.2 of 3 steps (67%) done

[Tue Apr 2 21:11:09 2019]rule adda: input: 2.txt output: 2_add_a.txt jobid: 2 wildcards: file=2

[Tue Apr 2 21:11:09 2019]Finished job 2.3 of 3 steps (100%) doneComplete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T211109.153566.snakemake.log

查看*add_a.txt文件:

(snake_test) [dengfei@localhost ex4]$ ls *add_a.txt1_add_a.txt 2_add_a.txt 3_add_a.txt(snake_test) [dengfei@localhost ex4]$ cat *add_a.txtadd a this is 1.txtadd a this is 2.txtadd a this is 3.txt

搞定.

3. 在每个文件中增加”add b”

对应的Snakefile内容如下:

rule adda: input: "{file}.txt" output: "{file}_add_a.txt" shell: "cat {input} |xargs echo add a >{output}"rule addb: input: "{file}_add_a.txt" output: "{file}_add_a_add_b.txt" shell: "cat {input} | xargs echo add b >{output}"

预览一下命令:snakemake -np {1,2,3}_add_a_add_b.txt

(snake_test) [dengfei@localhost ex4]$ snakemake {1,2,3}_add_a_add_b.txtBuilding DAG of jobs...Using shell: /usr/bin/bashProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 addb 3

[Tue Apr 2 21:13:57 2019]rule addb: input: 2_add_a.txt output: 2_add_a_add_b.txt jobid: 0 wildcards: file=2

[Tue Apr 2 21:13:57 2019]Finished job 0.1 of 3 steps (33%) done

[Tue Apr 2 21:13:57 2019]rule addb: input: 1_add_a.txt output: 1_add_a_add_b.txt jobid: 1 wildcards: file=1

[Tue Apr 2 21:13:57 2019]Finished job 1.2 of 3 steps (67%) done

[Tue Apr 2 21:13:57 2019]rule addb: input: 3_add_a.txt output: 3_add_a_add_b.txt jobid: 2 wildcards: file=3

[Tue Apr 2 21:13:57 2019]Finished job 2.3 of 3 steps (100%) doneComplete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T211357.666661.snakemake.log

执行命令:

snakemake {1,2,3}_add_a_add_b.txt

查看流程图

命令:

snakemake --dag {1,2,3}_add_a_add_b.txt |dot -Tpdf >a.pdf

这里生成的a.pdf如下:

4. 在每个文件中增加”add c”

Snakemake命令:

rule addc: input: "{file}_add_a_add_b.txt" output: "{file}_add_a_add_b_add_c.txt" shell: "cat {input} | xargs echo add c >{output}"

流程图:

命令:

snakemake --dag {1,2,3}_add_a_add_b_add_c.txt |dot -Tpdf >a1.pdf

5. 将文件合并

rule addc: input: "{file}_add_a_add_b.txt" output: "{file}_add_a_add_b_add_c.txt" shell: "cat {input} | xargs echo add c >{output}"

rule hebing: input: a=expand("{file}_add_a_add_b_add_c.txt",file=["1","2","3"]), b=expand("{file}_add_a_add_b.txt",file=["1","2"]) output:"hebing.txt" shell:"cat {input.a} {input.b} >{output}"

执行命令:

snakemake hebing.txt

执行结果:

Building DAG of jobs...Using shell: /usr/bin/bashProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 addc 1 hebing 4

[Tue Apr 2 21:21:04 2019]rule addc: input: 1_add_a_add_b.txt output: 1_add_a_add_b_add_c.txt jobid: 1 wildcards: file=1

[Tue Apr 2 21:21:04 2019]Finished job 1.1 of 4 steps (25%) done

[Tue Apr 2 21:21:04 2019]rule addc: input: 3_add_a_add_b.txt output: 3_add_a_add_b_add_c.txt jobid: 3 wildcards: file=3

[Tue Apr 2 21:21:04 2019]Finished job 3.2 of 4 steps (50%) done

[Tue Apr 2 21:21:04 2019]rule addc: input: 2_add_a_add_b.txt output: 2_add_a_add_b_add_c.txt jobid: 2 wildcards: file=2

[Tue Apr 2 21:21:04 2019]Finished job 2.3 of 4 steps (75%) done

[Tue Apr 2 21:21:04 2019]rule hebing:

    input: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt

output: hebing.txt jobid: 0

[Tue Apr 2 21:21:04 2019]Finished job 0.4 of 4 steps (100%) doneComplete log: /home/dengfei/test/snakemake/ex4/.snakemake/log/2019-04-02T212104.719887.snakemake.log

流程图:

搞定

欢迎关注我的公众号: R-breeding

后记1

今天测试了一下rule all的功能, 它是定义输出文件的, 如果没有定义, 需要在命令行中书写.

因为最后的输出文件是hebing.txt, 所以我们这里在Snakefile中定义一下输出文件.

rule all: input:"hebing.txt"rule adda: input: "{file}.txt" output: "{file}_add_a.txt" shell: "cat {input} |xargs echo add a >{output}"rule addb: input: "{file}_add_a.txt" output: "{file}_add_a_add_b.txt" shell: "cat {input} | xargs echo add b >{output}"

rule addc: input: "{file}_add_a_add_b.txt" output: "{file}_add_a_add_b_add_c.txt" shell: "cat {input} | xargs echo add c >{output}"

执行命令:

snakemake

结果如下:

(base) [dengfei@localhost ex4]$ snakemakeProvided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 adda 3 addb 3 addc 1 all 1 hebing 11

rule adda: input: 1.txt output: 1_add_a.txt jobid: 7 wildcards: file=1

Finished job 7.1 of 11 steps (9%) done

rule adda: input: 2.txt output: 2_add_a.txt jobid: 9 wildcards: file=2

Finished job 9.2 of 11 steps (18%) done

rule adda: input: 3.txt output: 3_add_a.txt jobid: 10 wildcards: file=3

Finished job 10.3 of 11 steps (27%) done

rule addb: input: 3_add_a.txt output: 3_add_a_add_b.txt jobid: 8 wildcards: file=3

Finished job 8.4 of 11 steps (36%) done

rule addb: input: 1_add_a.txt output: 1_add_a_add_b.txt jobid: 3 wildcards: file=1

Finished job 3.5 of 11 steps (45%) done

rule addb: input: 2_add_a.txt output: 2_add_a_add_b.txt jobid: 6 wildcards: file=2

Finished job 6.6 of 11 steps (55%) done

rule addc: input: 3_add_a_add_b.txt output: 3_add_a_add_b_add_c.txt jobid: 5 wildcards: file=3

Finished job 5.7 of 11 steps (64%) done

rule addc: input: 2_add_a_add_b.txt output: 2_add_a_add_b_add_c.txt jobid: 2 wildcards: file=2

Finished job 2.8 of 11 steps (73%) done

rule addc: input: 1_add_a_add_b.txt output: 1_add_a_add_b_add_c.txt jobid: 4 wildcards: file=1

Finished job 4.9 of 11 steps (82%) done

rule hebing:

    input: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt

output: hebing.txt jobid: 1

Finished job 1.10 of 11 steps (91%) done

localrule all: input: hebing.txt jobid: 0

Finished job 0.11 of 11 steps (100%) done

查看结果:

(base) [dengfei@localhost ex4]$ cat hebing.txt add c add b add a this is 1.txtadd c add b add a this is 2.txtadd c add b add a this is 3.txtadd b add a this is 1.txtadd b add a this is 2.txt

后记2

snakemake如果是默认的名称, 为Snakefile, 但是这样写没有高亮, 可以写为a.py, 然后用snakemake -s a.py运行即可.

rule addc: input: "{file}_add_a_add_b.txt" output: "{file}_add_a_add_b_add_c.txt" shell: "cat {input} | xargs echo add c >{output}"

执行结果:

(base) [dengfei@localhost ex4]$ snakemake -s a.py Provided cores: 1Rules claiming more threads will be scaled down.Job counts: count jobs 3 adda 3 addb 3 addc 1 all 1 hebing 11

rule adda: input: 1.txt output: 1_add_a.txt jobid: 8 wildcards: file=1

Finished job 8.1 of 11 steps (9%) done

rule adda: input: 3.txt output: 3_add_a.txt jobid: 10 wildcards: file=3

Finished job 10.2 of 11 steps (18%) done

rule adda: input: 2.txt output: 2_add_a.txt jobid: 9 wildcards: file=2

Finished job 9.3 of 11 steps (27%) done

rule addb: input: 3_add_a.txt output: 3_add_a_add_b.txt jobid: 7 wildcards: file=3

Finished job 7.4 of 11 steps (36%) done

rule addb: input: 2_add_a.txt output: 2_add_a_add_b.txt jobid: 4 wildcards: file=2

Finished job 4.5 of 11 steps (45%) done

rule addb: input: 1_add_a.txt output: 1_add_a_add_b.txt jobid: 3 wildcards: file=1

Finished job 3.6 of 11 steps (55%) done

rule addc: input: 3_add_a_add_b.txt output: 3_add_a_add_b_add_c.txt jobid: 2 wildcards: file=3

Finished job 2.7 of 11 steps (64%) done

rule addc: input: 2_add_a_add_b.txt output: 2_add_a_add_b_add_c.txt jobid: 5 wildcards: file=2

Finished job 5.8 of 11 steps (73%) done

rule addc: input: 1_add_a_add_b.txt output: 1_add_a_add_b_add_c.txt jobid: 6 wildcards: file=1

Finished job 6.9 of 11 steps (82%) done

rule hebing:

    input: 1_add_a_add_b_add_c.txt, 2_add_a_add_b_add_c.txt, 3_add_a_add_b_add_c.txt, 1_add_a_add_b.txt, 2_add_a_add_b.txt

output: hebing.txt jobid: 1

Finished job 1.10 of 11 steps (91%) done

localrule all: input: hebing.txt jobid: 0

Finished job 0.11 of 11 steps (100%) done

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自：育种数据分析 > 《待分类》

举报/认领

0条评论

发表

请遵守用户评论公约

类似文章 更多

育种数据分析

关注对话

TA的最新馆藏

单倍型分析：个体所对应的单倍型是？
如何利用系谱或者SNP数据划分家系？
如何利用DeepSeek的API搭建本地知识库
群体遗传三剑客第三篇：megacc和ggtree进化树分析
如何计算群体中的单倍型频率
R语言协变量的方差分析和Genstat结果对比

喜欢该文的人也喜欢更多

热门阅读换一换

【原】snakemake 学习笔记3

之前写的博客, 记录记录一下学习的轨迹.

目标

目标介绍

1. 生成三个文件

2. 在每个文件中增加”add a”

3. 在每个文件中增加”add b”

4. 在每个文件中增加”add c”

5. 将文件合并

搞定

相关阅读

后记1

后记2