分享

Introduction to Single-cell RNA-seq

 生物_医药_科研 2018-12-16

ppt链接:http://www./jmzeng/ppt/singlecellrnaseq-170131050320.pdf

点击文章末尾的阅读原文可以开心的看PPT撒~

以下为ppt文本内容:

1. Introduction to Single-cell RNA-Seq Wally the Welsh Corgi

2. Connecting & Computer Preliminaries Make sure your workshop provided computer is connected to the “Broad” or “Broad Internal” wireless network. Please do not connect your personal items (laptop, phone, etc.) to these wireless networks; it will tax the wireless system and make the workshop less effective. The password for computers is “password”.

3. Introduction to single-cell RNA-Seq Timothy Tickle Brian Haas Asma Bankapur Center for Cell Circuits Computational Genomics Workshop 2017

4. We Know Tissues are Heterogeneous

5. Cell Identity is More Than Histopathology A cell participates in multiple cell contexts. Multiple factors shape a cell’s identity - Membership in a taxonomy of cell types - Simultaneous time- dependent processes - Response to the environment - Spatial positioning

6. Before We Get Started · Single-cell RNA-Seq (scRNA-Seq) analysis methodology is developing. – Give you a feel for the data. – Perform some analysis together. · There is a vivid diversity of methodology. – These technique will grow as the field does. – Why were these specific tools chosen? · This is a guided conversation through scRNA-Seq analysis. – Breadth and targeted depth. – There may be other opinions, if you have one, please speak up so we can all learn from it.

7. · Sections will be hands-on. – Much can be applied to other analysis. – Strengthen those R ninja skills! – If you need, cut and pasting is available. · cut_and_paste.txt · There will be many cute corgi pictures. Before We Get Started

8. We Will Attempt to Cover · Describe scRNA-Seq assays. · Comparing assays. · Sequence pipelines. · How do measured counts behave? · Concerns over study design. · Initial data exploration. · Gene and cell filtering. · Plotting genes. · Dimensional Reduction and plotting cells. · Differential expression. · Communicating your study.

9. Section: scRNA-Seq Assays · There are many scRNA-Seq Assays, each differs: – Some commercialized – Full transcriptome vs 3’ – Less or more automated – Different levels of throughput – Differences in cost

10. Smart-Seq2

11. · Developed for single cell but can performed using total RNA. · Selects for poly-A tail. · Full transcript assay. – Uses template switching for 5' end capture. · Standard illumina sequencing. – Off-the-shelf products. · Hundreds of samples. · Often do not see UMI used. Smart-Seq2: Description Full transcript scRNA-Seq

12. · Poly-A capture with 30nt polyT and 25nt 5' anchor sequence. · RT adding untemplated C · Template switching · Locked Nucleic Acid binds to untemplated C · RT switches template · Preamplification / cleanup · DNA fragmentation and adapter ligation together. · Gap Repair, enrich, purify. Smart-Seq2: Assay Overview

13. Smart-Seq2: Equipment

14. Drop-seq

15. Drop-seq: Description · Moved throughput from hundreds to thousands. · Droplet-based processing using microfluidics · Nanoliter scale aqueous drops in oil. · 3' End · Bead based (STAMPs). · Single-cell transcriptomes attached to microparticles. · Cell barcodes use split-pool synthesis. · Uses UMI (Unique Molecular Identifier). · RMT (Random Molecular Tag). · Degenerate synthesis.

16. Drop-seq: Overview · Click Here for Drop-seq Video Abstract

17. Drop-seq: Assay Overview

18. Drop-seq: Assay Overview

19. Drop-seq: Assay Overview

20. Drop-seq: Equipment

21. Drop-seq: Pointers · Droplet-based assays can have leaky RNA. · Before library generation wash off any medium (inhibits library generation). · Adding PBS and BSA (0.05-0.01%) can protect the cell. – Too much produces a residue making harvesting the beads difficult. · Filter all reagent with a 80 micron strainer before microfluidics. · Some purchased devices add a hydrophobic coating. – Can deteriorate (2 months at best). – Recoating does work (in-house).

22. 10X: Massively Parallel Sequencing

23. 10X: Description · Droplet-based, 3' mRNA. – GEM (Gel Bead in Emulsion) · Standardized instrumentation and reagents. · More high-throughput scaling to tens of thousands. · Less processing time. · Cell Ranger software is available for install.

24. 10X: Assay Overview

25. 10X: Assay Overview

26. 10X: Equipment

27. A Word on Sorting · After disassociating cells cells can be performed. · Know your cells, are they sticky, are they big? – Select an appropriate sized nozzle. · Don't sort too quickly (1-2k cells per second or lower) – The slower the more time cells sit in lysis after sorting – 10 minutes max in lysis (some say 30 minutes) · Calibrate speed of instrument with beads – Check alignment every 5-6 plates · Afterwards spin down to make sure cells are in lysis buffer – Flash freeze · Chloe Villani on sorting [click here]

28. Section: Comparing scRNA-Seq Assays

29. scRNA-Seq Assay Performance

30. ERCC-based Benchmarking · Based on ERCC spike-ins. – Exogenous RNA-Spikins – No secondary structure – 25b polyA Tail · May be a conservative measurement given endogenous mRNA will have ~250b polyA. · Accuracy – How well the abundance levels correlated with known spiked-in amounts. · Sensitivity – Minimum number of input RNA molecules required to detect a spike-in.

31. Sensitivity and Specificity Accuracy Great! Poor Sensitivity Bulk Great! Bulk CEL-Seq2 Drop-Seq 10XSmart-Seq2 10 molecules 1 molecule

32. Final Thoughts · Different assays have different throughput. – SmartSeq2 < drop-seq="">< 10x="" ·="" smartseq2="" is="" full="" transcript.="" ·="" plate-based="" methods="" get="" lysed="" in="" wells="" and="" so="" do="" not="" leak.="" –="" droplet-based="" can="" have="" leaky="" rna.="" ·="" in="" drop-seq="" assays="" rt="" happens="" outside="" the="" droplets="" –="" can="" use="" harsher="" lysis="" buffers.="" –="" 10x="" needs="" lysis="" buffers="" compatible="" with="" the="" rt="" enzyme.="" ·="" 10x="" is="" more="" standardized="" and="" comes="" with="" a="" pipeline.="" –="" drop-seq="" is="" more="" customizable="" but="" more="" hands-on.="" ·="" cost="" per="" library="" varies="">

33. Section: scRNA-Seq Pipelines

34. Sequences Differ So Pipelines Differ · scRNA-Seq assays are different and produce different sequences – The sequence pipelines must be tailored to the sequence of interest. – Many pipelines are NOT compatible but many show similarities.

35. Start with FASTQ Sequences FASTQ File Format Sequence Header cDNA Sequence Base Quality 4 Lines are 1 sequence

36. Assays Differ in FASTQ Contents

37. SmartSeq2: Pipeline Overview

38. · Common functionality: trimming, alignment, generating count matrix. · Adds book keeping for cell barcodes and UMIs, bead error detection, cell barcode collapsing, UMI collapsing. Drop-seq: Pipeline Overview

39. Drop-seq: Further Help

40. · Steps conceptually similar to the Drop-seq pipeline. 10X: Pipeline Overview

41. 10X: Further Help

42. · Much of the QC that is performed is using traditional tools. Sequence Level Quality Control

43. Pipeline Section Summary · Single-cell RNA-Seq is a diverse ecosystem of assays. – Each assay has pros and cons. · Sequences derived from these assays are complex and vary. · Different pipelines are needed to address different sequence formats. – Common steps include: · Aligning · QC · Read counting.

44. Section: scRNA-Seq Count Data

45. This is an Expression Matrix

46. Genes Have Different Distributions

47. Genes Have Different Distributions

48. Genes Have Different Distributions

49. Genes Have Different Distributions

50. Genes Have Different Distributions

51. · Zero inflation. – Drop-out event during reverse- transcription. – Genes with more expression have less zeros. – Complexity varies. · Transcription stochasticity. – Transcription bursting. – Coordinated transcription of multigene networks. – Over-dispersed counts. · Higher Resolution. – More sources of signal Underlying Biology

52. Expression has Many Sources per Cell

53. Data Analysis with UMIs Read Counts Counts by UMI Collapsed but Not Linear

54. Summary of the Data · We are still understanding scData and how to apply it. – Data can be NOT normal. – Data can be Zero-inflated. – Data can be very noisy. – Cells vary in library complexity. – Can represent many “basis vectors” or sources of expression simultaneously. · Keeping these characteristics in analysis assumptions. · Tend to filter more conservatively with UMIs.

55. Section: Study Design and scRNA-Seq

56. scRNA-Seq Study Design · How many cells? – Can change depending on the variability of the biology and the expectation of finding rare populations. · How to design cell capture? – Single cell RNA-Seq is especially prone to technical batch affects (due to processing). · Use of UMIs · Use of ERCC spike-ins

57. How Many Cells? · Satija lab online tool – satijalab.org/howmanycells

58. Single Cell RNA-Seq and Batch Affects

59. What is Study Confounding?

60. Confounding by Design

61. Section: Initial Data Analysis

62. Motivation: Why Am I Using R? · A lot of method development is happening in R. · Free / open source / open science. · Many supplemental computational biology packages. · Data science is an art. – Data often requires one to create and manipulate analysis. · This will allow you to experience key concepts in analysis.

63. RStudio (IDE)

64. Initial Data Exploration

65. Today’s Data · To generate a comprehensive, validated classification scheme for the bipolar cells of the mouse retina. – Cone or rod type, ON or OFF, 9-12 subtypes (morphological) · ~44k cells from a transgenic mouse line marking BCs – After filtering 27k (we use 5k)

66. Logistics: Importing Code Libraries · R Exercise

67. Representing Sparse Matrices · R Exercise

68. What is a Sparse Matrix? · Sparse Matrix – A matrix where most of the elements are 0. · Dense Matrix – A matrix where most elements are not 0. · Many ways to efficiently represent a sparse matrix in memory. – Here, the underlying data structure is a coordinate list.

69. 2D Arrays vs Coordinate Lists Can be optimal for dense matrices More optimal for sparse matrices VS

70. Seurat https://github.com/satijalab/seurat

71. Create a Seurat Object · R exercise

72. Expression: Bulk RNA-Seq Definition In bulk RNA-Seq we learned counts are not expression. · Some counts belong to sequences which could go to many genes. · Some transcripts are longer than other so they get sequenced more. · Some samples are more deeply sequenced. · The data is not normally distributed. Depending on the scRNA-Seq assay these may be important. Seurat has assumptions it makes with it’s defaults – More appropriate for 3 prime assays.

73. Count Preparation is Different Depending on the Source RSEM KALLISTO TPM RSEM KALLISTO Correct for Sequencing Depth Log2() + 1Log2() + 1 Correct for Sequencing Depth X / Column Total * 1E5 or 1E6 TPMSeurat Seurat Seurat No transcript length correction

74. Prepping Counts For Seurat 3 prime- · Expected by Seurat. · Counts collapsed with UMIs. · Log2 transform (in Seurat). · Account for sequencing depth (in Seurat). Full Transcript Sequencing- · Can be used in Seurat. · TPM +1 transformed counts. · Log2 transform (in Seurat). · Sequencing depth is already accounted.

75. Sometimes Averages are Not Useful Say you were standing with one foot in the oven and one foot in an ice bucket. According to the percentage people, you should be perfectly comfortable. –Bobby Bragan

76. Filtering Genes: Averages are Less Useful

77. Filtering Genes: Using Prevalence

78. Filtering Genes: Using Prevalence

79. Filtering Using Metadata

80. What is Metadata? Other information that describes your measurements. – Patient information. · Life style (smoking), Patient Biology (age), Comorbidity – Study information. · Treatment, Cage, Sequencing Site, Sequencing Date – Sequence QC on cells. · Useful in filtering.

81. Filtering Cells: Removing Outlier Cells · Bulk RNA-Seq studies often do not remove outliers cells – scRNA-Seq often removes “failed libraries”. · Outlier cells are not just measured by complexity · Percent Reads Mapping · Percent Mitochondrial Reads · Presence of marker genes · Intergenic/ exonic rate · 5' or 3' bias · other metadata … · Useful Tools – Picard Tools and RNASeQC

82. Seurat: Filtering on Metadata · R Exercise

83. Section: Plot Genes

84. Seurat: Viewing Specific Genes · R Exercise

85. Section: Working with Batch Affects

86. Normalization and Batch Affect Correction · The nature of scRNA-Seq assays can make them prone to confounding with batch affects. – Normalization and batch affect correction can help. · Some are moving away from relying on a specific method. – Exploring the idea of combining or selecting from a collection of normalization or correction methods best for a specific study. · Some believe UMI based analysis need not be normalized between samples given the absolute count of the molecules are being reported. – Be careful not to remove biological signal with good experimental design (avoiding confounding by design).

87. Seurat and Batch Affect Correction · Using linear models one can regress covariates. – scale.data hold the residuals after regressing (z-scored) · Dimensionality reduction and clustering. · We use metadata we have. – One could imagine creating a metadata for cell cycle.

88. Seurat and Batch Affect Correction · R exercise

89. Section: Dimensionality Reduction and Plotting Samples

90. Dimensionality Reduction · Start with many measurements (high dimensional). – Want to reduce to few features (lower-dimensional space). · One way is to extract features based on capturing groups of variance. · Another could be to preferentially select some of the current features. – We have already done this. · We need this to plot the cells in 2D (or ordinate them) · In scRNA-Seq PC1 may be complexity.

91. · Eigenvectors of covariance matrix. · Find orthogonal groups of variance. · Given from most to least variance. – Components of variation. – Linear combinations explaining the variance. PCA: in Quick Theory

92. PCA: an Interactive Example · PCA Explained Visually

93. PCA: in Practice Things to be aware of- · Data with different magnitudes will dominate. – Zero center and divided by SD. · (Standardized). · Can be affected by outliers. · Data is often first filtered to remove noise.

94. t-SNE: Nonlinear Dimensional Reduction

95. t-SNE: Collapsing the Visualization to 2D

96. t-SNE: How it works.

97. PCA and t-SNE Together · Often t-SNE is performed on PCA components – Liberal number of components. – Removes mild signal (assumption of noise). – Faster, on less data but, hopefully the same signal.

98. Learn More About t-SNE · Awesome Blog on t-SNE parameterization – http:///2016/misread-tsne · Publication – https://lvdmaaten./publications/papers/JMLR_200 8.pdf · Nice YouTube Video – https://www./watch?v=RJVL80Gg3lA · Code – https://lvdmaaten./tsne/ · Interactive Tensor flow – http://projector./

99. Plotting Cells

100. Plotting Cells and Gene Expression · R exercise.

101. · Smart Local Moving (SLM) algorithm for community (cluster) detection in large networks. – Can be applied to 10s of millions cells, 100s of millions of relationships. – Evolved from the Louvain algorithm Defining Clusters through Graphs http://www./slm/

102. Local Moving Heuristic 1 2 3 4 5 6 7

103. Section Summary · Dimensionality reduction help reduce data while hopefully keeping important signal. – t-SNE on PCA is often used in analysis · Created several types of plot often seen in publications. – Plotting genes (through subgroups). – Ordinating cells in t-SNE space. – Heat maps of genes associated with PC components. – Plotting metadata on projects of data is an important QC tool. · Cluster of cells are currently defined through graph, separate from the ordination (t-SNE / PCA).

104. Section: Differential Expression

105. Seurat: Differential Expression · Default if one cluster again many tests. – Can specify an ident.2 test between clusters. · Adding speed by exluding tests. – Min.pct - controls for sparsity – Min percentage in a group – Thresh.test - must have this difference in averages.

106. Seurat: Many Choices for DE · bimod – Tests differences in mean and proportions. · roc – Uses AUC like definition of separation. · t – Student's T-test. · tobit – Tobit regression on a smoothed data.

107. Seurat: DE and Plotting DE Genes · R Exercise.

108. Dot plots Size of circle · Gene prevalence in cluster. Color of circle · More red, more expressed in cluster. Scales well with many cells. sparse genesprevalent genes lowly expressed highly expressed very specific

109. · Additionally introduces a GSEA method. Mast · Uses hurdle model – Two part generalized linear model to address both rate of expression (prevalence) and expression. – GLM means covariates can be used to control for unwanted signal. · CDR: Cellular detection rate – Cellular complexity – Values below a threshold are 0 https://github.com/RGLab/MAST

110. Mast: Hurdle Models Logistic Regression Gaussian Linear Model

111. Mast: DE and Plotting DE Genes · R Exercise.

112. Section: Communicating Results to Collaborators · Designing a study. · Writing a grant. · Performing experiments. · Collecting data. · Running sequencing pipelines. · Performing some preliminary analysis. · Sharing ideas with private collaborators. · Refining analysis. · Completing a paper. · Sharing analysis publicly.

113. The Single Cell Portal https://portals./single_cell

114. The Single Cell Portal Study Descriptions Can Be Created

115. The Single Cell Portal Data Can Be Shared

116. The Single Cell Portal One Can Interact with Cell Clusters

117. The Single Cell Portal Gene Expression Can be Viewed Across Clusters

118. The Single Cell Portal Gene Expression Can be Viewed Across Clusters

119. The Single Cell Portal Multiple Clustering Can be Used

120. The Single Cell Portal Genes Can Be Viewed in Many Clusters

121. The Single Cell Portal Expression Can Be Shown in Many Clusterings

122. The Single Cell Portal Expression in Clusters Can Also Be Shown as Heatmaps

123. The Single Cell Portal · Studies can be … – Private – Private but shared privately – Public but with data inaccessible – Public

124. Section: Wrapping Up What Did We Miss (So Much)? So much more to learn! We covered this

125. Awesome List https://github.com/seandavi/awesome-single-cell

126. Single Cell Network www.singlecellnetwork.org

127. Thank You Aviv Regev Brian Haas Adam Haber Anindita Basu Asma Bankapur Chloe Villani Karthik Shekhar Kristine Schwenck Matan Hofree Michel Cole Monika Kowalczyk Nir Yosef Sean Simmons Regev Single Cell Working Group Today's Attendees

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多