monocle提供了一套具有启发意义的轨迹方法,以简单粗暴的方式试图弥补这理想与现实的大峡谷。在monocle的世界里轨迹与图谱是分离的,即图谱是tsne/umap的,轨迹是另一个降维空间。那么有没有一种降维技术能够再走一步呢?今天我们介绍的scanpy的PAGA(graph abstraction reconciles clustering with trajectory inference through a topology preserving map of single cells(https://genomebiology./articles/10.1186/s13059-019-1663-x))就是这方面的一个尝试:在保留细胞图谱的基础上完成细胞轨迹的推断:
sc.pp.filter_genes(adata, min_counts=1) # only consider genes with more than 1 count sc.pp.normalize_per_cell( # normalize with total UMI count per cell adata, key_n_counts='n_counts_all' ) filter_result = sc.pp.filter_genes_dispersion( # select highly-variable genes adata.X, flavor='cell_ranger', n_top_genes=n_top_genes, log=False ) adata = adata[:, filter_result.gene_subset] # subset the genes sc.pp.normalize_per_cell(adata) # renormalize after filtering iflog: sc.pp.log1p(adata) # log transform: adata.X = log(adata.X + 1) sc.pp.scale(adata) # scale to unit variance and shift to zero mean
Parameters ---------- adata Annotated data matrix. n_top_genes Numberof genes to keep. log Take logarithm. plot Show a plot of the gene dispersion vs. mean relation. copy Return a copy of`adata` instead of updating it.
An alternative to tSNE that often preserves the topology of the data better. This requires to run :func:`~scanpy.pp.neighbors`, first.
The default layout ('fa', `ForceAtlas2`) [Jacomy14]_ uses the package |fa2|_ [Chippada18]_, which can be installed via `pip install fa2`.
`Force-directed graph drawing`_ describes a classoflong-established algorithms for visualizing graphs. It has been suggested for visualizing single-cell databy [Islam11]_. Many other layouts as implemented in igraph [Csardi06]_ are available. Similar approaches have been used by [Zunder15]_ or [Weinreb17]_.
Parameters ---------- adata Annotated data matrix. layout 'fa' (`ForceAtlas2`) orany valid `igraph layout <http:///c/doc/igraph-Layout.html>`__. Of particular interest are'fr' (Fruchterman Reingold), 'grid_fr' (Grid Fruchterman Reingold, faster than'fr'), 'kk' (Kamadi Kawai', slower than 'fr'), 'lgl' (Large Graph, very fast), 'drl' (Distributed Recursive Layout, pretty fast) and 'rt' (Reingold Tilford tree layout). root Root for tree layouts. random_state For layouts with random initialization like 'fr', change this to use different intial states for the optimization. If `None`, no seed is set. adjacency Sparse adjacency matrix of the graph, defaults to `adata.uns['neighbors']['connectivities']`. key_added_ext By default, append `layout`. proceed Continue computation, starting off with 'X_draw_graph_`layout`'. init_pos `'paga'`/`True`, `None`/`False`, or any valid 2d-`.obsm` key. Use precomputed coordinates for initialization. If `False`/`None` (the default), initialize randomly. copy Return a copy instead of writing to adata. **kwds Parameters of chosen igraph layout. See e.g. `fruchterman-reingold`_ [Fruchterman91]_. One of the most important ones is `maxiter`.
By quantifying the connectivity ofpartitions (groups, clusters) of the single-cell graph, partition-based graph abstraction (PAGA) generates a much simpler abstracted graph (*PAGA graph*) ofpartitions, in which edge weights represent confidence in the presence of connections. By tresholding this confidence in :func:`~scanpy.pl.paga`, a much simpler representation of the manifold datais obtained, which is nonetheless faithful to the topology of the manifold.
The confidence should be interpreted as the ratio of the actual versus the expected valueof connetions under the nullmodelof randomly connecting partitions. We donot provide a p-valueas this nullmodel does not precisely capture what one would consider"connected"inrealdata, hence it strongly overestimates the expected value. See an extensive discussion of this in [Wolf19]_.
.. note:: Note that you can use the resultof :func:`~scanpy.pl.paga`in :func:`~scanpy.tl.umap`and :func:`~scanpy.tl.draw_graph` via `init_pos='paga'`toget single-cell embeddings that are typically more faithful to the global topology.
Parameters ---------- adata An annotated data matrix. groups Keyfor categorical in`adata.obs`. You can pass your predefined groups by choosing any categorical annotation of observations. Default: The firstpresentkeyof`'leiden'`or`'louvain'`. use_rna_velocity Use RNA velocity to orient edges in the abstracted graph and estimate transitions. Requires that `adata.uns` contains a directed single-cell graph withkey`['velocity_graph']`. This feature might be subject tochangein the future. model The PAGA connectivity model. copy Copy `adata`before computation andreturn a copy. Otherwise, perform computation inplace andreturn`None`.
Returns ------- **connectivities** : :class:`numpy.ndarray` (adata.uns['connectivities']) The full adjacency matrix of the abstracted graph, weights correspond to confidence in the connectivities of partitions. **connectivities_tree** : :class:`scipy.sparse.csr_matrix` (adata.uns['connectivities_tree']) The adjacency matrix of the tree-like subgraph that best explains the topology.
Notes ----- Together with a random walk-based distance measure (e.g. :func:`scanpy.tl.dpt`) this generates a partial coordinatization of data useful for exploring and explaining its variation.
.. currentmodule:: scanpy
See Also -------- pl.paga pl.paga_path pl.paga_compare
drawing single-cell graph using layout 'fa' WARNING: Package 'fa2' is not installed, falling back to layout 'fr'.To use the faster and better ForceAtlas2 layout, install package'fa2' (`pip install fa2`).