分享

QIIME 2教程. 19实用程序Utilities(2024.2)

 宏基因组 2024-04-25 发布于北京

QIIME 2中的实用程序

Utilities in QIIME 2

https://docs./2024.2/tutorials/utilities/

https://www.bilibili.com/video/BV1Sm421s7xh/

以下是QIIME 2中提供的许多非基于插件的实用程序。以下文档试图演示其中的许多功能。本文档按接口interface划分,并尝试交叉引用其他接口中可用的类似功能。

conda activate qiime2-amplicon-2024.2

命令行q2cli

大多数有趣的实用程序都可以在q2clitools子命令中找到:

qiime tools --help

显示如下结果:

Usage: qiime tools [OPTIONS] COMMAND [ARGS]...

用于QIIME 2文件的工具。Tools for working with QIIME 2 files.

Options:
--help 显示帮助并退出Show this message and exit.

Commands:
cache-create 在给定的位置生成一个空缓存Crrate an empty cache at the given location.
cache-fetch 从缓存中获取一个 artifact 并保存为 .qza 文件Fetches an artifact out of a cache into a .qza.
cache-garbage-collection 在指定的位置运行缓存的垃圾回收Runs garbage collection on the cache at the specified location.
cache-remove 从缓存中删除给定的键Removes a given key from a cache.
cache-import 将数据导入到密钥下的缓存中的对象中Imports data into an Artifact in the cache under a key
cache-status 检查缓存的状态Checks the status of the cache.
cache-store 将一个 .qza 文件以指定的键存储在缓存中Stores a .qza in the cache ubder a key.
cast-metadata 指定元数据列类型Designate metadata column types.
citations 显示引文Print citations for a QIIME 2 result.
export 导出数据Export data from a QIIME 2 Artifact or a Visualization
extract 解压对象Extract a QIIME 2 Artifact or Visualization archive.
import 导入数据Import data into a new QIIME 2 Artifact.
inspect-metadata 检查元数据列Inspect columns available in metadata.
list-formats 列出可用格式List the available formats.
list-types 列出可用的语义类型List the available semantic types.
peek 预览Take a peek at a QIIME 2 Artifact or Visualization.
validate 验证Validate data in a QIIME 2 Artifact.
view 查看View a QIIME 2 Visualization.

让我们动手处理一些数据,以便我们可以进一步了解此功能!首先,我们将查看PD Mice教程中的分类条形图:

mkdir -p utilites && cd utilites
wget -c https://data./2024.2/tutorials/utilities/taxa-barplot.qzv

检索引文 Retrieving Citations

现在我们有了一些结果,让我们更多地了解与创建此可视化相关的引文。首先,我们可以检查qiime tools citations命令的帮助文本:

qiime tools citations --help

输出:

Usage: qiime tools citations [OPTIONS] ARTIFACT/VISUALIZATION

Print citations as a BibTex file (.bib) for a QIIME 2 result.

Options:
--help Show this message and exit.

输出可视化

  • taxa-barplot.qzv: 查看 | 下载

现在我们知道如何使用该命令,我们将运行以下命令:

qiime tools citations taxa-barplot.qzv

输出结果如下:

@article{framework|qiime2:2019.10.0|0,
author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R. and Bokulich, Nicholas A. and Abnet, Christian C. and Al-Ghalith, Gabriel A. and Alexander, Harriet and Alm, Eric J. and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E. and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J. and Brown, C. Titus and Callahan, Benjamin J. and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily K. and Da Silva, Ricardo and Diener, Christian and Dorrestein, Pieter C. and Douglas, Gavin M. and Durall, Daniel M. and Duvallet, Claire and Edwardson, Christian F. and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M. and Gibbons, Sean M. and Gibson, Deanna L. and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin A. and Janssen, Stefan and Jarmusch, Alan K. and Jiang, Lingjing and Kaehler, Benjamin D. and Kang, Kyo Bin and Keefe, Christopher R. and Keim, Paul and Kelley, Scott T. and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan G. I. and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan D. and McDonald, Daniel and McIver, Lauren J. and Melnik, Alexey V. and Metcalf, Jessica L. and Morgan, Sydney C. and Morton, Jamie T. and Naimey, Ahmad Turan and Navas-Molina, Jose A. and Nothias, Louis Felix and Orchanian, Stephanie B. and Pearson, Talima and Peoples, Samuel L. and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, Michael S. and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R. and Swafford, Austin D. and Thompson, Luke R. and Torres, Pedro J. and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J. and Ul-Hasan, Sabah and van der Hooft, Justin J. J. and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C. and Williamson, Charles H. D. and Willis, Amy D. and Xu, Zhenjiang Zech and Zaneveld, Jesse R. and Zhang, Yilong and Zhu, Qiyun and Knight, Rob and Caporaso, J. Gregory},
doi = {10.1038/s41587-019-0209-9},
issn = {1546-1696},
journal = {Nature Biotechnology},
number = {8},
pages = {852-857},
title = {Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2},
url = {https:///10.1038/s41587-019-0209-9},
volume = {37},
year = {2019}
}

@article{view|types:2019.10.0|BIOMV210DirFmt|0,
author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
doi = {10.1186/2047-217X-1-7},
journal = {GigaScience},
number = {1},
pages = {7},
publisher = {BioMed Central},
title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
volume = {1},
year = {2012}
}

@inproceedings{view|types:2019.10.0|pandas.core.frame:DataFrame|0,
author = { Wes McKinney },
booktitle = { Proceedings of the 9th Python in Science Conference },
editor = { Stéfan van der Walt and Jarrod Millman },
pages = { 51 -- 56 },
title = { Data Structures for Statistical Computing in Python },
year = { 2010 }
}

@inproceedings{view|types:2019.10.0|pandas.core.series:Series|0,
author = { Wes McKinney },
booktitle = { Proceedings of the 9th Python in Science Conference },
editor = { Stéfan van der Walt and Jarrod Millman },
pages = { 51 -- 56 },
title = { Data Structures for Statistical Computing in Python },
year = { 2010 }
}

@article{view|types:2019.10.0|biom.table:Table|0,
author = {McDonald, Daniel and Clemente, Jose C and Kuczynski, Justin and Rideout, Jai Ram and Stombaugh, Jesse and Wendel, Doug and Wilke, Andreas and Huse, Susan and Hufnagle, John and Meyer, Folker and Knight, Rob and Caporaso, J Gregory},
doi = {10.1186/2047-217X-1-7},
journal = {GigaScience},
number = {1},
pages = {7},
publisher = {BioMed Central},
title = {The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome},
volume = {1},
year = {2012}
}

@article{framework|qiime2:2019.4.0|0,
author = {Bolyen, Evan and Rideout, Jai Ram and Dillon, Matthew R and Bokulich, Nicholas A and Abnet, Christian and Al-Ghalith, Gabriel A and Alexander, Harriet and Alm, Eric J and Arumugam, Manimozhiyan and Asnicar, Francesco and Bai, Yang and Bisanz, Jordan E and Bittinger, Kyle and Brejnrod, Asker and Brislawn, Colin J and Brown, C Titus and Callahan, Benjamin J and Caraballo-Rodríguez, Andrés Mauricio and Chase, John and Cope, Emily and Da Silva, Ricardo and Dorrestein, Pieter C and Douglas, Gavin M and Durall, Daniel M and Duvallet, Claire and Edwardson, Christian F and Ernst, Madeleine and Estaki, Mehrbod and Fouquier, Jennifer and Gauglitz, Julia M and Gibson, Deanna L and Gonzalez, Antonio and Gorlick, Kestrel and Guo, Jiarong and Hillmann, Benjamin and Holmes, Susan and Holste, Hannes and Huttenhower, Curtis and Huttley, Gavin and Janssen, Stefan and Jarmusch, Alan K and Jiang, Lingjing and Kaehler, Benjamin and Kang, Kyo Bin and Keefe, Christopher R and Keim, Paul and Kelley, Scott T and Knights, Dan and Koester, Irina and Kosciolek, Tomasz and Kreps, Jorden and Langille, Morgan GI and Lee, Joslynn and Ley, Ruth and Liu, Yong-Xin and Loftfield, Erikka and Lozupone, Catherine and Maher, Massoud and Marotz, Clarisse and Martin, Bryan and McDonald, Daniel and McIver, Lauren J and Melnik, Alexey V and Metcalf, Jessica L and Morgan, Sydney C and Morton, Jamie and Naimey, Ahmad Turan and Navas-Molina, Jose A and Nothias, Louis Felix and Orchanian, Stephanie B and Pearson, Talima and Peoples, Samuel L and Petras, Daniel and Preuss, Mary Lai and Pruesse, Elmar and Rasmussen, Lasse Buur and Rivers, Adam and Robeson, II, Michael S and Rosenthal, Patrick and Segata, Nicola and Shaffer, Michael and Shiffer, Arron and Sinha, Rashmi and Song, Se Jin and Spear, John R and Swafford, Austin D and Thompson, Luke R and Torres, Pedro J and Trinh, Pauline and Tripathi, Anupriya and Turnbaugh, Peter J and Ul-Hasan, Sabah and van der Hooft, Justin JJ and Vargas, Fernando and Vázquez-Baeza, Yoshiki and Vogtmann, Emily and von Hippel, Max and Walters, William and Wan, Yunhu and Wang, Mingxun and Warren, Jonathan and Weber, Kyle C and Williamson, Chase HD and Willis, Amy D and Xu, Zhenjiang Zech and Zaneveld, Jesse R and Zhang, Yilong and Knight, Rob and Caporaso, J Gregory},
doi = {10.7287/peerj.preprints.27295v1},
issn = {2167-9843},
journal = {PeerJ Preprints},
month = {oct},
pages = {e27295v1},
title = {QIIME 2: Reproducible, interactive, scalable, and extensible microbiome data science},
url = {https:///10.7287/peerj.preprints.27295v1},
volume = {6},
year = {2018}
}

@article{action|feature-classifier:2019.4.0|method:fit_classifier_naive_bayes|0,
author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
journal = {Journal of machine learning research},
number = {Oct},
pages = {2825--2830},
title = {Scikit-learn: Machine learning in Python},
volume = {12},
year = {2011}
}

@inproceedings{view|types:2019.4.1|pandas.core.series:Series|0,
author = { Wes McKinney },
booktitle = { Proceedings of the 9th Python in Science Conference },
editor = { Stéfan van der Walt and Jarrod Millman },
pages = { 51 -- 56 },
title = { Data Structures for Statistical Computing in Python },
year = { 2010 }
}

@article{plugin|feature-classifier:2019.4.0|0,
author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
doi = {10.1186/s40168-018-0470-z},
journal = {Microbiome},
number = {1},
pages = {90},
title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
url = {https:///10.1186/s40168-018-0470-z},
volume = {6},
year = {2018}
}

@article{plugin|dada2:2019.10.0|0,
author = {Callahan, Benjamin J and McMurdie, Paul J and Rosen, Michael J and Han, Andrew W and Johnson, Amy Jo A and Holmes, Susan P},
doi = {10.1038/nmeth.3869},
journal = {Nature methods},
number = {7},
pages = {581},
publisher = {Nature Publishing Group},
title = {DADA2: high-resolution sample inference from Illumina amplicon data},
volume = {13},
year = {2016}
}

@article{action|feature-classifier:2019.10.0|method:classify_sklearn|0,
author = {Pedregosa, Fabian and Varoquaux, Gaël and Gramfort, Alexandre and Michel, Vincent and Thirion, Bertrand and Grisel, Olivier and Blondel, Mathieu and Prettenhofer, Peter and Weiss, Ron and Dubourg, Vincent and Vanderplas, Jake and Passos, Alexandre and Cournapeau, David and Brucher, Matthieu and Perrot, Matthieu and Duchesnay, Édouard},
journal = {Journal of machine learning research},
number = {Oct},
pages = {2825--2830},
title = {Scikit-learn: Machine learning in Python},
volume = {12},
year = {2011}
}

@article{plugin|feature-classifier:2019.10.0|0,
author = {Bokulich, Nicholas A. and Kaehler, Benjamin D. and Rideout, Jai Ram and Dillon, Matthew and Bolyen, Evan and Knight, Rob and Huttley, Gavin A. and Caporaso, J. Gregory},
doi = {10.1186/s40168-018-0470-z},
journal = {Microbiome},
number = {1},
pages = {90},
title = {Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin},
url = {https:///10.1186/s40168-018-0470-z},
volume = {6},
year = {2018}
}

如您所见,上面以BibTeX格式显示了此特定可视化的引文。

我们还可以看到特定插件的引用:

qiime vsearch --citations

显示如下:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
author = {Rognes, Torbjørn and Flouri, Tomáš and Nichols, Ben and Quince, Christopher and Mahé, Frédéric},
doi = {10.7717/peerj.2584},
journal = {PeerJ},
pages = {e2584},
publisher = {PeerJ Inc.},
title = {VSEARCH: a versatile open source tool for metagenomics},
volume = {4},
year = {2016}
}

以及针对插件的特定操作:

qiime vsearch cluster-features-open-reference --citations

显示如下:

% use `qiime tools citations` on a QIIME 2 result for complete list

@article{key0,
author = {Rideout, Jai Ram and He, Yan and Navas-Molina, Jose A. and Walters, William A. and Ursell, Luke K. and Gibbons, Sean M. and Chase, John and McDonald, Daniel and Gonzalez, Antonio and Robbins-Pianka, Adam and Clemente, Jose C. and Gilbert, Jack A. and Huse, Susan M. and Zhou, Hong-Wei and Knight, Rob and Caporaso, J. Gregory},
doi = {10.7717/peerj.545},
journal = {PeerJ},
pages = {e545},
publisher = {PeerJ Inc.},
title = {Subsampled open-reference clustering creates consistent, comprehensive OTU definitions and scales to billions of sequences},
volume = {2},
year = {2014}
}

查看可视化 Viewing Visualizations

如果我们要查看分类单元图怎么办?一种选择是在https://view.上加载可视化文件。另一种选择是使用qiime工具视图来完成工作

注意:只能在https://view.上查看出处。

qiime tools view taxa-barplot.qzv

此步需要图形界面支持。如Linux/Mac系统的桌面下运行。Widnows可使用Linux的远程桌面,详见(Windows10远程桌面Ubuntu),或Termial配置支持X11转发(如XShell+Xmanager,或Putty+xming,不推荐,反应极慢)。

这将打开一个浏览器窗口,其中包含您的可视化文件。完成后,您可以关闭浏览器窗口并按键盘上的ctrl-c终止命令。

偷看结果 Peeking at Results

通常,我们需要验证对象的类型和uuid。我们可以使用qiime tools peek命令来查看这些对象的简短摘要报告。首先,让我们看一些数据:

请选择最适合您的环境的下载选项:

wget -c https://data./2024.2/tutorials/utilities/faith-pd-vector.qza

现在我们有了数据,我们可以了解有关该文件的更多信息:

qiime tools peek faith-pd-vector.qza

显示结果如下:

UUID:        d5186dce-438d-44bb-903c-cb51a7ad4abe
Type: SampleData[AlphaDiversity] % Properties('phylogenetic')
Data format: AlphaDiversityDirectoryFormat

输出对象

  • faith-pd-vector.qza: 查看 | 下载

在这里,我们可以看到对象的类型为SampleData [AlphaDiversity]%Properties('phylogenetic'),以及对象的UUID和格式。

验证结果 Validating Results

我们还可以通过运行qiime tools validate来验证文件的完整性

qiime tools validate faith-pd-vector.qza

显示如下结果

Result faith-pd-vector.qza appears to be valid at level=max.

如果文件有问题,此命令通常会在在合理范围内很好地报告问题所在。

检查元数据 Inspecting Metadata

在元数据教程中,我们了解了metadata tabulate命令及其创建的可视化效果。通常,我们不太关心元数据的值,而只是关心它的结构:多少列?他们的名字是什么?他们是什么类型?文件中有多少行(或ID)?

我们可以通过首先下载一些示例元数据来演示这一点:

http://www.ience/github/QIIME2ChineseManual/2024.2/utilites/sample-metadata.tsv

然后运行qiime tools inspect-metadata命令:

qiime tools inspect-metadata sample-metadata.tsv

显示如下结果:

             COLUMN NAME  TYPE       
======================== ===========
barcode categorical
mouse_id categorical
genotype categorical
cage_id categorical
donor categorical
donor_status categorical
days_post_transplant numeric
enotype_and_donor_status categorical
======================== ===========
IDS: 48
COLUMNS: 8

问题:sample-metadata.tsv中有多少个元数据列?多少个ID?确定存在多少分类列。

该工具对于了解可作为元数据查看的文件的元数据列名称很有帮助。

详者注:我们知道行列数量(48行/IDS代表48个样品,8列/COLUMNS代表有8种样本属性),以及他们分别是属于分类型catagorical或是数值型numeric。

wget -c https://data./2024.2/tutorials/utilities/jaccard-pcoa.qza

我们刚刚下载的文件是Jaccard PCoA(来自PD Mice教程),可以代替“典型” TSV格式的元数据文件使用。我们可能需要了解我们希望运行的命令的列名,使用inspect-metadata,我们可以了解所有信息:

qiime tools inspect-metadata jaccard-pcoa.qza

结果如下:

COLUMN NAME  TYPE   
=========== =======
Axis 1 numeric
Axis 2 numeric
... numeric
Axis 47 numeric
=========== =======
IDS: 47
COLUMNS: 47

输出对象

  • jaccard-pcoa.qza: 查看 | 下载

问题:有多少个ID?多少列?是否有分类型的列?为什么?

详者注:共有47个IDS,47列,无分类型列。因为PCoA的结果为坐标值,为数值型。

强制转换元数据类型

在元数据教程中,我们了解了该类型,并利用该工具在提供的元数据文件中指定列类型。下面我们将介绍如何使用此工具的一些场景,以及一些常见的可能出现的错误。qiime tools cast-metadata

我们将首先下载一些示例元数据。注意:这是相同的示例 “检查元数据”部分中使用的元数据,因此,如果您有 已经从上面下载了文件。sample_metadata.tsv

wget -O "sample_metadata.tsv" "https://data./2024.2/tutorials/pd-mice/sample_metadata.tsv"

在此示例中,我们将从 到 转换列,将列从 转换为 。元数据中包含的其余列将保持原样。days_post_transplant numeric categorical mouse_id mouse_id categorical  numeric

qiime tools cast-metadata sample_metadata.tsv \
--cast days_post_transplant:categorical \
--cast mouse_id:numeric

stdout

sample_name    barcode    mouse_id    genotype    cage_id    donor    donor_status    days_post_transplant    genotype_and_donor_status
#q2:types categorical numeric categorical categorical categorical categorical categorical categorical
recip.220.WT.OB1.D7 CCTCCGTCATGG 457 wild type C35 hc_1 Healthy 49 wild type and Healthy
recip.290.ASO.OB2.D1 AACAGTAAACAA 456 susceptible C35 hc_1 Healthy 49 susceptible and Healthy
recip.389.WT.HC2.D21 ATGTATCAATTA 435 susceptible C31 hc_1 Healthy 21 susceptible and Healthy
recip.391.ASO.PD2.D14 GTCAGTATGGCT 435 susceptible C31 hc_1 Healthy 14 susceptible and Healthy
recip.391.ASO.PD2.D21 AGACAGTAGGAG 437 susceptible C31 hc_1 Healthy 21 susceptible and Healthy
recip.391.ASO.PD2.D7 GGTCTTAGCACC 435 susceptible C31 hc_1 Healthy 7 susceptible and Healthy
recip.400.ASO.HC2.D14 CGTTCGCTAGCC 437 susceptible C31 hc_1 Healthy 14 susceptible and Healthy
recip.401.ASO.HC2.D7 ATTTACAATTGA 437 susceptible C31 hc_1 Healthy 7 susceptible and Healthy
recip.403.ASO.PD2.D21 CGCAGATTAGTA 456 susceptible C35 hc_1 Healthy 21 susceptible and Healthy
recip.411.ASO.HC2.D14 ATGTTAGGGAAT 456 susceptible C35 hc_1 Healthy 14 susceptible and Healthy
recip.411.ASO.HC2.D21 CTCATATGCTAT 457 wild type C35 hc_1 Healthy 21 wild type and Healthy
recip.411.ASO.HC2.D49 GCAACGAACGAG 435 susceptible C31 hc_1 Healthy 49 susceptible and Healthy
recip.412.ASO.HC2.D14 AAGTGGCTATCC 457 wild type C35 hc_1 Healthy 14 wild type and Healthy
recip.412.ASO.HC2.D7 GCATTCGGCGTT 456 susceptible C35 hc_1 Healthy 7 susceptible and Healthy
recip.413.WT.HC2.D7 ACCAGTGACTCA 457 wild type C35 hc_1 Healthy 7 wild type and Healthy
recip.456.ASO.HC3.D49 ACGGCGTTATGT 468 wild type C42 hc_1 Healthy 49 wild type and Healthy
recip.458.ASO.HC3.D21 ACGGCCCTGGAG 468 wild type C42 hc_1 Healthy 21 wild type and Healthy
recip.458.ASO.HC3.D49 CATTTGACGACG 469 wild type C42 hc_1 Healthy 49 wild type and Healthy
recip.459.WT.HC3.D14 ACATGGGCGGAA 468 wild type C42 hc_1 Healthy 14 wild type and Healthy
recip.459.WT.HC3.D21 CATAAATTCTTG 469 wild type C42 hc_1 Healthy 21 wild type and Healthy
recip.459.WT.HC3.D49 GCTGCGTATACC 536 susceptible C43 pd_1 PD 49 susceptible and PD
recip.460.WT.HC3.D14 CTGCGGATATAC 469 wild type C42 hc_1 Healthy 14 wild type and Healthy
recip.460.WT.HC3.D21 GTCAATTAGTGG 536 susceptible C43 pd_1 PD 21 susceptible and PD
recip.460.WT.HC3.D49 GAGAAGCTTATA 537 wild type C43 pd_1 PD 49 wild type and PD
recip.460.WT.HC3.D7 GACCCGTTTCGC 468 wild type C42 hc_1 Healthy 7 wild type and Healthy
recip.461.ASO.HC3.D21 AGCCCGCAAAGG 537 wild type C43 pd_1 PD 21 wild type and PD
recip.461.ASO.HC3.D49 GGCGTAACGGCA 538 wild type C44 pd_1 PD 49 wild type and PD
recip.461.ASO.HC3.D7 ATTGCCTTGATT 469 wild type C42 hc_1 Healthy 7 wild type and Healthy
recip.462.WT.PD3.D14 GTGAGGGCAAGT 536 susceptible C43 pd_1 PD 14 susceptible and PD
recip.462.WT.PD3.D21 GGCCTATAAGTC 538 wild type C44 pd_1 PD 21 wild type and PD
recip.462.WT.PD3.D49 AATACAGACCTG 539 susceptible C44 pd_1 PD 49 susceptible and PD
recip.462.WT.PD3.D7 TTAGGATTCTAT 536 susceptible C43 pd_1 PD 7 susceptible and PD
recip.463.WT.PD3.D14 ATATTGGCAGCC 537 wild type C43 pd_1 PD 14 wild type and PD
recip.463.WT.PD3.D21 CGCGGCGCAGCT 539 susceptible C44 pd_1 PD 21 susceptible and PD
recip.463.WT.PD3.D7 GTTTATCTTAAG 537 wild type C43 pd_1 PD 7 wild type and PD
recip.464.WT.PD3.D14 TCATCCGTCGGC 538 wild type C44 pd_1 PD 14 wild type and PD
recip.465.ASO.PD3.D14 GGCTTCGGAGCG 539 susceptible C44 pd_1 PD 14 susceptible and PD
recip.465.ASO.PD3.D7 CAGTCTAGTACG 538 wild type C44 pd_1 PD 7 wild type and PD
recip.466.ASO.PD3.D7 GTGGGACTGCGC 539 susceptible C44 pd_1 PD 7 susceptible and PD
recip.467.WT.HC3.D49.a GTCAGGTGCGGC 437 susceptible C31 hc_1 Healthy 49 susceptible and Healthy
recip.467.WT.HC3.D49.b GTTAACTTACTA 546 susceptible C49 pd_1 PD 49 susceptible and PD
recip.536.ASO.PD4.D49 CAAATTCGGGAT 547 wild type C49 pd_1 PD 49 wild type and PD
recip.537.WT.PD4.D21 CTCTATTCCACC 546 susceptible C49 pd_1 PD 21 susceptible and PD
recip.538.WT.PD4.D21 ATGGATAGCTAA 547 wild type C49 pd_1 PD 21 wild type and PD
recip.539.ASO.PD4.D14 GATCCGGCAGGA 546 susceptible C49 pd_1 PD 14 susceptible and PD
recip.539.ASO.PD4.D7 GTTCGAGTGAAT 546 susceptible C49 pd_1 PD 7 susceptible and PD
recip.540.ASO.HC4.D14 CTTCCAACTCAT 547 wild type C49 pd_1 PD 14 wild type and PD
recip.540.ASO.HC4.D7 CGGCCTAAGTTC 547 wild type C49 pd_1 PD 7 wild type and PD

如果启用该标志,则指定的输出文件将包含在修改后的上面投射的列类型,以及其余列和相关数据包含在 --output-file sample_metadata.tsv 中。

如果不希望将强制转换元数据保存到输出文件,可以省略该参数,结果输出到--output-file sdtout(如上面的例子)。

标志用于处理强制转换列 包含在原始元数据文件中,列包含在元数据文件中 分别不包括在演员阵容中。我们可以看到这些标志物是如何做到的 在下面使用:--ignore-extra --error-on-missing

在第一个示例中,我们将看看在列时使用标志 未包含在原始元数据文件中的强制转换。让我们先来看看 如果包含额外的列并且未启用此标志,则会发生什么。--ignore-extra

qiime tools cast-metadata sample_metadata.tsv \
--cast spleen:numeric

标准:
Usage: qiime tools cast-metadata [OPTIONS] METADATA...
Try 'qiime tools cast-metadata --help' for help.
Error: Invalid value for cast: The following cast columns were not found within the metadata: spleen

请注意,强制转换调用中包含的列会导致引发错误。如果我们想要忽略原始元数据文件中不存在的任何额外列,我们可以启用该标志。spleen --ignore-extra

qiime tools cast-metadata sample_metadata.tsv \
--cast spleen:numeric \
--ignore-extra

启用此标志后,强调转换中包含的所有列中不存在,原始数据文件将被忽略。请注意,对于此示例,已省略,因为我们不会看到启用此标志引发的错误. stdout

在第二个示例中,我们将看一下标志,它处理元数据中存在的列未包含在强制转换中。--error-on-missing

默认行为允许在强制转换中调用,否则将引发错误。--error-on-missing

qiime tools cast-metadata sample_metadata.tsv \
--cast mouse_id:numeric \
--error-on-missing

标准:Usage: qiime tools cast-metadata [OPTIONS] METADATA...
Try 'qiime tools cast-metadata --help' for help.

Error: Invalid value for cast: The following columns within the metadata were not provided in the cast: days_post_transplant, genotype_and_donor_status, genotype, donor_status, donor, cage_id, barcode

对象接口 Artifact API

与q2cli不同Artifact API(使用QIIME2和Python)没有单一的中心位置 实用程序函数。相反,实用程序通常作为方法绑定到对象,对这些对象进行操作。

了解注册到插件的操作 Discovering Actions registered to a plugin

使用新插件时,检查可用的操作可能会很有用。我们首先导入插件,然后查询其属性。这为我们提供了公开方法的列表,以及它们的方法、可视化工具还是管道的详细信息。actions

 >>>from qiime2.plugins import feature_table
>>>help(feature_table.actions)
Help on module qiime2.plugins.feature_table.actions in qiime2.plugins.feature_table:

NAME
qiime2.plugins.feature_table.actions

DATA
__plugin__ = <qiime2.plugin.plugin.Plugin object>
core_features = <visualizer qiime2.plugins.feature_table.visualizers.c...
filter_features = <method qiime2.plugins.feature_table.methods.filter_...
...

如果你已经知道要查找的方法、管道或可视化工具,您可以直接获取该操作子组:

 >>>help(feature_table.methods)

如果你在Jupyter Notebook或IPython中工作,您可能更喜欢Tab-Complete 而不是运行Help():

 >>>feature_table.visualizers.  # press tab after the . for tab-complete...

获取有关操作的帮助 Getting help with an Action

导入插件后,可在交互式会话中操作帮助文本 使用iPython运算符:?

>>> feature_table.methods.merge?
Call signature:
feature_table.methods.merge(
tables: List[FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²],
overlap_method: Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample', 'sum')¹ | Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample')² = 'error_on_overlapping_sample',
) -> (FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²,)
Type: Method
String form: <method qiime2.plugins.feature_table.methods.merge>
File: ~/miniconda/envs/q2-dev/lib/python3.8/site-packages/qiime2/sdk/action.py
Docstring: QIIME 2 Method
Call docstring:
Combine multiple tables

Combines feature tables using the `overlap_method` provided.

Parameters
----------
tables : List[FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²]
overlap_method : Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample', 'sum')¹ | Str % Choices('average', 'error_on_overlapping_feature', 'error_on_overlapping_sample')², optional
Method for handling overlapping ids.

Returns
-------
merged_table : FeatureTable[Frequency]¹ | FeatureTable[RelativeFrequency]²
The resulting merged feature table.

检索引文 Retrieving Citations

工件API不提供用于从插件获取所有引文的实用程序。每个操作的引文都可以在每个操作的属性中访问,采用BibTeX格式。citations

 >>>feature_table.actions.rarefy.citations
(CitationRecord(type='article', fields={'doi': '10.1186/s40168-017-0237-y', 'issn': '2049-2618', 'pages': '27', 'number': '1', 'volume': '5', 'month': 'Mar', 'year': '2017', 'journal': 'Microbiome', 'title': 'Normalization and microbial differential abundance strategies depend upon data characteristics', 'author': 'Weiss, Sophie and Xu, Zhenjiang Zech and Peddada, Shyamal and Amir, Amnon and Bittinger, Kyle and Gonzalez, Antonio and Lozupone, Catherine and Zaneveld, Jesse R. and Vázquez-Baeza, Yoshiki and Birmingham, Amanda and Hyde, Embriette R. and Knight, Rob'}),)

偷看结果 Peeking at Results

工件API提供了一种显示任何QIIME2存档的UUID,Semantic Type term:数据格式的方法。.peek

 >>>from qiime2 import Artifact
>>>Artifact.peek('observed_features_vector.qza')
ResultMetadata(uuid='2e96b8f3-8f0a-4f6e-b07e-fbf8326232e9', type='SampleData[AlphaDiversity]', format='AlphaDiversityDirectoryFormat')

如果已将项目加载到内存中,并且不关心数据格式,项目的字符串表示形式将提供其UUID和语义类型。

 >>>from qiime2 import Artifact
>>>table = Artifact.load('table.qza')
>>>table
<artifact: FeatureTable[Frequency] uuid: 2e96b8f3-8f0a-4f6e-b07e-fbf8326232e9>

验证结果 Validating Results

可以通过加载项目然后运行方法来验证项目。采用一个参数,可以设置为 或 ,默认为 。最小验证对于快速检查很有用,而最大验证通常会牺牲全面性来换取更长的运行时间。valida tevalidate level max min max

如果验证成功,则返回验证方法;只需要在解释器中运行一个空行。如果工件无效,则引发 或。None x.validate() ValidationError NotImplementedError

 >>>from qiime2 import Artifact
>>>table = Artifact.load('table.qza')
>>>table.validate(level='min')

>>>print(table.validate()) # equivalent to print(table.validate(level='max'))
None

查看数据 Viewing Data

视图API允许我们查看多种类型的数据 无需将其另存为. .qza

 >>>art = artifact.load('some.qza')

... # perform some analysis, producing a result

>>>myresult.view(pd.Series)
s00000001 74
s00000002 48
s00000003 79
s00000004 113
s00000005 111
Name: observed_otus, Length: 471, dtype: int64

仅当存在转换器时,才能以特定格式查看数据,从当前视图类型注册为所需类型。如果没有变压器,我们会收到错误。例如,我们尝试将此SampleData[AlphaDiversity]视为数据帧。

 >>>myresult.view(pd.Series)
---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/tmp/ipykernel_18201/824837086.py in <module>
12 # Note: Views are only possible if there are transformers registered from the default
13 # view type to the type you want. We get an error if there's no tranformer
---> 14 art.view(pd.DataFrame)

... # traceback Here

Exception: No transformation from <class 'q2_types.sample_data._format.AlphaDiversityDirectoryFormat'> to <class 'pandas.core.frame.DataFrame'>

某些项目可作为元数据查看。如果您想检查,请尝试:

 >>>art.has_metadata()
True

>>>art_as_md = art.view(Metadata)
>>>art_as_md
Metadata
--------
471 IDs x 1 column
observed_otus: ColumnProperties(type='numeric')

Call to_dataframe() for a tabular representation.

查看可视化结果 Viewing Visualizations

工件API不提供用于查看QIIME2可视化的实用程序。用户通常会保存可视化并使用QIIME2 视图进行搜索。

art.save('obs_features.qza')

检查元数据 Inspecting Metadata

一旦被加载,元数据表可以汇总查看或以数据帧格式很好地显示。

 >>>from etadata = Metadata.load('simple-metadata.tsv')
>>>Metadata
--------
516 IDs x 3 columns
barcode: ColumnProperties(type='categorical')
days: ColumnProperties(type='numeric')
extraction: ColumnProperties(type='categorical')

>>>print(metadata)
>>>metadata.to_dataframe()
barcode days extraction
sampleid
s00000001 806rcbc0 1 1
s00000002 806rcbc1 3 1
s00000003 806rcbc2 7 1
s00000004 806rcbc3 1 1
s00000005 806rcbc4 11 1
... ... ... ...

强制转换元数据类型 Casting Metadata Column Types

项目API不提供用于强制转换元数据类型的专用实用程序,并且是只读属性。但是,可以编辑您的使用,重新加载它,或将您的元数据投射到Pandas.DataFrame,强制转换需要更改其属性的列,并重新加载为元数据,并更正类型。下面是后一种方法的演练。Metadata.columns .tsv Metadata.load

加载一些元数据

# Imagine you have loaded a tsv as metadata
>>> md = Metadata.load('md.tsv')
>>> print(md)

Metadata
--------
3 IDs x 5 columns
strCatOnly: ColumnProperties(type='categorical')
intNum: ColumnProperties(type='numeric')
intCat: ColumnProperties(type='categorical')
floatNum: ColumnProperties(type='numeric')
floatCat: ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation.

我们在tsv中定义了三列分类数据和两列数字。列ID描述数据值(e.g.)和声明的列类型(e.g. Num for)。int numeric

铸造限制

序列作为Python字符串读入,并在Numpy/Pandas堆栈中表示为“对象”。如果我们键入此列,加载元数据将失败并显示错误,因为我们没有将字符串表示为数字的好方法。同样,你也不会有太多运气将字符串转变为Pandas或在pandas中转换字符串数据。 strCatOnly numeric int float

转换为数据帧

>>>md = md.to_dataframe()

>>>print(md)
>>>print()
>>>print("intCat should be an object (because categorical): ", str(md['intCat'].dtype))
>>>print("floatNum should be a float (because numerical): ", str(md['floatNum'].dtype))
>>>print("intNum should be a float, not an int (because categorical): ", str(md['intCat'].dtype))

strCatOnly intNum intCat floatNum floatCat
sampleid
S1 TCCCTTGTCTCC 1.0 1 1.01 1.01
S2 ACGAGACTGATT 3.0 3 3.01 3.01
S3 GCTGTACGGATT 7.0 7 7.01 7.01

intCat should be an object (because categorical): object
floatNum should be a float (because numerical): float64
intNum should be a float, not an int (because categorical): float64

原始.tsv和列包含整数据。元数据类型化的列在Pandas中表示为 。因此,在调用时呈现为浮点数据,并在数据帧中表现为 。intNum intCat categorical object numeric float intNum to_dataframe intCat object

这些行为可以干净地往返。如果我们将数据帧强制转换为元数据而不进行任何更改,则新的元数据将与从tsv加载的元数据相同。不过,我们在这里看看数据帧如何允许我们强制转换元数据列类型,所以让我们试一试。

铸造柱

>>>md['intCat'] = md['intCat'].astype("int")
>>>md['floatNum'] = md['floatNum'].astype('str')

>>>print(md)
>>>print()
>>>print("intCat should be an int now: ", str(md['intCat'].dtype))
>>>print>>>"floatNum should be an object now: ", str(md['floatNum'].dtype))

strCatOnly intNum intCat floatNum floatCat
sampleid
S1 TCCCTTGTCTCC 1.0 1 1.01 1.01
S2 ACGAGACTGATT 3.0 3 3.01 3.01
S3 GCTGTACGGATT 7.0 7 7.01 7.01

intCat should be an int now: int64
floatNum should be an object now: object

数据帧看起来相同,但列dtype已按预期更改。当我们将此数据帧转换为元数据时,他们会相应地更改。在Pandas中表示为(包括)的列是。在Pandas中表示为 或的列。 ColumnProperties objects  strs  categorical  ints  floats  numeric

将数据帧转换为元数据

>>>md = Metadata(md)
>>>md

Metadata
--------
3 IDs x 5 columns
strCatOnly: ColumnProperties(type='categorical')
intNum: ColumnProperties(type='numeric')
intCat: ColumnProperties(type='numeric')
floatNum: ColumnProperties(type='categorical')
floatCat: ColumnProperties(type='categorical')

Call to_dataframe() for a tabular representation.

请注意,以前是,现在是,虽然已从更改为。intCat categorical numeric  floatNum numeric categorical

译者简介

刘永鑫,研究员,博士生导师。2014年博士毕业于中国科学院大学生物信息学专业,之后在中国科学院遗传与发育生物学研究所工作历任博士后、工程师、高级工程师,2022年10月加入中国农业科学院深圳农业基因组研究所担任课题组长。研究方向为宏基因组方法开发、功能挖掘和科学传播。参与QIIME 2项目,主导开发了易扩增子(EasyAmplicon)、易宏基因组(EasyMetagenome)、培养组(Culturome)分析流程、数据分析网站(EVenn、ImageGP) 和R包(amplicon、ggClusterNet)等,目标是全面打造宏基因组领域方法学基础设施,推动微生物组学发展。以(共同)第一或通讯作者在Nature Biotechnology、Nature Microbiology、iMeta等期刊发表论文20余篇。合作在Science、Cell Host & Microbe、Microbiome等期刊发表论文20余篇,累计发表论文50余篇,被引用14000+次。主编《微生物组实验手册》专著,由300多位同行参与,共同打造本领域长期更新的中文百科全书。创办宏基因组公众号,15万+同行关注,分享原创文章3千余篇,累计阅读量超4千万,打造本领域最具影响力的科学传播平台。发起《iMeta》期刊,联合全球千位专家共同打造宏基因组学、微生物组和生物信息学顶刊,解决我国本领域期刊出版卡脖子问题。课题组长期招聘博士后、客座研究生,有兴趣可加微信yongxinliu详谈。

王惠铃,湖南农业大学,生物信息学本科在读,在刘永鑫组毕业实习。负责本次版本的更新和测试。

Reference

https://docs./2024.2

Evan Bolyen*, Jai Ram Rideout*, Matthew R. Dillon*, Nicholas A. Bokulich*, Christian C. Abnet, Gabriel A. Al-Ghalith, Harriet Alexander, Eric J. Alm, Manimozhiyan Arumugam, Francesco Asnicar, Yang Bai, Jordan E. Bisanz, Kyle Bittinger, Asker Brejnrod, Colin J. Brislawn, C. Titus Brown, Benjamin J. Callahan, Andrés Mauricio Caraballo-Rodríguez, John Chase, Emily K. Cope, Ricardo Da Silva, Christian Diener, Pieter C. Dorrestein, Gavin M. Douglas, Daniel M. Durall, Claire Duvallet, Christian F. Edwardson, Madeleine Ernst, Mehrbod Estaki, Jennifer Fouquier, Julia M. Gauglitz, Sean M. Gibbons, Deanna L. Gibson, Antonio Gonzalez, Kestrel Gorlick, Jiarong Guo, Benjamin Hillmann, Susan Holmes, Hannes Holste, Curtis Huttenhower, Gavin A. Huttley, Stefan Janssen, Alan K. Jarmusch, Lingjing Jiang, Benjamin D. Kaehler, Kyo Bin Kang, Christopher R. Keefe, Paul Keim, Scott T. Kelley, Dan Knights, Irina Koester, Tomasz Kosciolek, Jorden Kreps, Morgan G. I. Langille, Joslynn Lee, Ruth Ley, Yong-Xin Liu, Erikka Loftfield, Catherine Lozupone, Massoud Maher, Clarisse Marotz, Bryan D. Martin, Daniel McDonald, Lauren J. McIver, Alexey V. Melnik, Jessica L. Metcalf, Sydney C. Morgan, Jamie T. Morton, Ahmad Turan Naimey, Jose A. Navas-Molina, Louis Felix Nothias, Stephanie B. Orchanian, Talima Pearson, Samuel L. Peoples, Daniel Petras, Mary Lai Preuss, Elmar Pruesse, Lasse Buur Rasmussen, Adam Rivers, Michael S. Robeson, Patrick Rosenthal, Nicola Segata, Michael Shaffer, Arron Shiffer, Rashmi Sinha, Se Jin Song, John R. Spear, Austin D. Swafford, Luke R. Thompson, Pedro J. Torres, Pauline Trinh, Anupriya Tripathi, Peter J. Turnbaugh, Sabah Ul-Hasan, Justin J. J. van der Hooft, Fernando Vargas, Yoshiki Vázquez-Baeza, Emily Vogtmann, Max von Hippel, William Walters, Yunhu Wan, Mingxun Wang, Jonathan Warren, Kyle C. Weber, Charles H. D. Williamson, Amy D. Willis, Zhenjiang Zech Xu, Jesse R. Zaneveld, Yilong Zhang, Qiyun Zhu, Rob Knight & J. Gregory Caporaso#. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology. 2019, 37: 852-857. https:///10.1038/s41587-019-0209-9

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多