START: Shiny Transcriptome Analysis Resource Tool

Getting Started with START

Features
Data Formats
Save Data for Future Upload
More Help

The START app allows users to visualize RNA-seq data starting with count data.

Explore the app's features with the example data set pre-loaded by clicking on the tabs above.
Upload your data in the “Input Data” tab.

Features

PCA Plot Box Plot Heatmap Volcano Plot

Visualize your data:

clustering (PCA plots, heatmaps)
group comparisons (scatterplots, volcano plots)
gene-level boxplots of expression values

Data Format

Must be a .CSV comma-separated-value file (you may export from Excel).
File must have a header row.
First/Left-hand column(s) must be gene identifiers.
Format expression column names as GROUPNAME_REPLICATE#, e.g. Treat_1, Treat_2, Treat_3, Control_1, Control_2, High_1, High_2

Count or Expression Data

Each row denotes a gene, each column denotes a sample.

Analyzed Data

Each row denotes a gene, each column denotes a sample.
Additional columns provide Fold Changes and P-values

TIP: Save Data for Future Upload

After uploading your data to START, click red button

to download an .RData file to upload your data to START with one click.

Next time use the “Input Data” tab –> “START RData file” option.

More Help and Info

Additional help information and more detailed instructions are provided under the “Instructions” tab.

App Info

The START app has been developed by Jessica Minnier, Jiri Sklenar, Anthony Paul Barnes, and Jonathan Nelson of Oregon Health & Science University, Knight Cardiovascular Institute and School of Public Health.

Please cite our app:

Nelson, JW, Sklenar J, Barnes AP, Minnier J. (2016) “The START App: A Web-Based RNAseq Analysis and Visualization Resource.” Bioinformatics. doi: 10.1093/bioinformatics/btw624.

The source code of START is available on Github.

We would appreciate reports of any issues with the app via the issues option of Github or by emailing start.app.help-at-gmail.com.

Instructions

Input Data
Data Formats
Save Data For Future START Sessions
Visualizations
PCA Plots
Analysis Plots
Volcano Plots
Scatterplots
Gene Expression Boxplots
Heatmaps

The START app allows users to visualize RNA-seq data starting with count data.

Explore the app's features with the example data set pre-loaded by clicking on the tabs above.
Upload your data in the “Input Data” tab.

Instructions

The app is hosted on the website: https://kcvi./START/

Code can be found on github: https://github.com/jminnier/STARTapp

To run this app locally on your machine, download R or RStudio and run the following commands once to set up the environment:

install.packages(c("reshape2","ggplot2","ggthemes","gplots","ggvis","dplyr","tidyr","DT",
                   "RColorBrewer","pheatmap","shinyBS","plotly","markdown","NMF","scales","heatmaply"))
## try http:// if https:// URLs are not supported
source("https:///biocLite.R")
biocLite(c("limma","edgeR"))

You may now run the shiny app with just one command in R:

shiny::runGitHub("STARTapp", "jminnier")

Input Data

You may use this app by

Exploring the pre-loaded example data set. This is a pre-loaded mouse RNA-seq example for exploring the app's features.
Upload your own data that is either i. Count data (or log2-expression data) ii. Analyzed data = expression data + p-values and fold changes.
Uploading an .RData file containing your data that was previously downloaded from a START app session.

Data Format

Must be a .CSV comma-separated-value file (you may export from Excel).
File must have a header row.
First/Left-hand column(s) must be gene identifiers.
Format expression column names as GROUPNAME_REPLICATE#, e.g. Treat_1, Treat_2, Treat_3, Control_1, Control_2, High_1, High_2

Count or Expression Data

Each row denotes a gene, each column denotes a sample.

Count data contains read counts for each gene for each sample, along with gene identifiers.

Analysis: When raw counts are uploaded, the data is then analyzed by the app. The app uses the voom method from the ‘limma’ Bioconductor package to transform the raw counts into logged and normalized intensity values. These values are then analyzed via linear regression where gene intensity is regressed on the group factor. P-values from all pairwise regression tests for group effect are computed and Benjamini-Hochberg false discovery rate adjusted p-values are computed for each pairwise comparison. The “log2cpm” values are the log2-counts-per-million values. The “log2cpm_voom” values are the normalized logcpm values from the voom method. Both methods use an offset of 0.5, which means 0.5 is added to all count values before normalizing (in the case of voom) and log transforming so that 0 counts have non infinite values.

Example file: https://github.com/jminnier/STARTapp/blob/master/data/examplecounts_short.csv

Analyzed Data

Each row denotes a gene, each column denotes a sample.
Additional columns provide Fold Changes and P-values

Analyzed data must contain some kind of expression measure for each sample (i.e. counts, normalized intensities, CPMs), and a set of p-values with corresponding fold changes for those p-values. For instance, if you have a p-value for the comparison of group1 vs group2, you can upload the observed fold change or log2(fold change) between group1 vs group2. If you have a more complex design and do not have fold changes readily available, you may upload the test statistics or other similar measures of effect size as placeholders. The fold changes are mainly used in the volcano plots. We recommend uploading p-values that are adjusted for multiple comparisons (such as q-values from the qvalue package, or adjusted p-values from p.adjust() function in R).

Example file: https://github.com/jminnier/STARTapp/blob/master/data/exampleanalysisres_short.csv

TIP: Save Data for Future Upload

After submitting a raw data or analyzed file, you may download the .csv file with the analysis results for your own use (or to upload as an “analyzed data”) or more conveniently click the button “Save Results as RData File for Future Upload” so that you may easily and quickly upload your data to the START app in the future under the “RData from previous START upload” option with one click.

After uploading your data to START, click red button

to download an .RData file to upload your data to START with one click.

Next time use the “Input Data” tab –> “START RData file” option.

Visualizations

Group Plots

PCA Plot

This plot uses Principal Component Analysis (PCA) to calculate the principal components of the expression data using data from all genes. Euclidean distances between expression values are used. Samples are projected on the first two principal components (PCs) and the percent variance explained by those PCs are displayed along the x and y axes. Ideally your samples will cluster by group identifier.

Sample Distance Heatmap

This plot displays unsupervised clustering of the Euclidean distances between samples using data from all genes. Again your data should cluster by group.

Analysis Plots

These plots use the p-values and fold changes to visualize your data.

Volcano Plot

This is a scatter plot log fold changes vs –log10(p-values) so that genes with the largest fold changes and smallest p-values are shown on the extreme top left and top right of the plot. Hover over points to see which gene is represented by each point. (https://en./wiki/Volcano_plot_(statistics))

Scatter Plot

This is a scatter plot of average gene expression in one group against another group. This allows the viewer to observe which genes have the largest differences between two groups. The smallest distances will be along the diagonal line, and points far away from the diagonal show the most differences. Hover over points to see which gene is represented by each point.

Gene Expression Boxplot

Use the search bar to look up genes in your data set. For selected gene(s) the stripchart (dotplot) and boxplots of the expression values are presented for each group. You may plot one or multiple genes along side each other. Hover over points for more information about the data.

Heatmap

A heatmap of expression values are shown, with genes and samples arranged by unsupervised clustering. You may filter on test results as well as P-value cutoffs. By default the top 100 genes (with lowest P-values) are shown.

Shinyapps.io Terms & Conditions

Terms of Use

If you have concerns about the terms of use for this web hosted application, please run the app locally on your computer. See the “Instructions” tab for more information on this.

Shinyapps.io Terms of Use

This application is hosted on a Shinyapps.io server https://www./.

By using this app you are agreeing to the terms of use as described by Shinnyaps.io: https://www./about/shinyapps-terms-use/

We (the authors and maintainers of this app) will not save your data on our servers. However, as the Shinyapps server is not HIPAA compliant, you must refrain from uploading protected health information or confidential data with this app. You may instead download the code and run the app locally on your private computer and network (see above). We are not responsible for the confidentiality, availability, security, loss, misuse or misappropriation of any data you submit to this application.

From the Shinyapps terms of use (https://www./about/shinyapps-terms-use/): “If you choose to upload data to an application you are using via the RStudio Service, you acknowledge and agree you are giving certain legal rights to the licensor of the application to process and otherwise use your data. Please carefully review any license terms accompanying any application to which you submit your data for the legal rights which you are giving the application licensor. Further, RStudio does not claim ownership of your data; however, you hereby grant RStudio a worldwide, perpetual, irrevocable, royalty-free, fully paid up, transferable and non-exclusive license, as applicable, to use and copy your data in connection with making your data available to the application (and the licensor of such application) to which you have submitted your data. You acknowledge and agree that even if you remove your data from the RStudio Service, your data may have been downloaded by, and remain accessible to, the licensors of those applications to which you submitted your data. Accordingly, do not submit data which you desire to remain confidential or which you wish to limit the right to access or use. You should never submit to the RStudio Service any data which consists of personally identifiable information, credit card information, or protected health information, as such terms are defined by relevant laws, rules, and regulations. RSTUDIO IS NOT RESPONSIBLE FOR THE CONFIDENTIALITY, AVAILABILITY, SECURITY, LOSS, MISUSE OR MISAPPROPRIATION OF ANY DATA YOU SUBMIT TO THE RSTUDIO SERVICE OR ANY APPLICATION MADE AVAILABLE VIA THE RSTUDIO SERVICE.”

START: Shiny Transcriptome Analysis Resource Tool

Getting Started with START

Features

Data Format

TIP: Save Data for Future Upload

More Help and Info

App Info

Data Contents: Check Before `Submit`

Analysis Results: Ready to View Other Tabs

Select Groups to View

Filters

Visualization Settings

Instructions

Instructions

Input Data

Data Format

Count or Expression Data

Analyzed Data

TIP: Save Data for Future Upload

Visualizations

Group Plots

PCA Plot

Sample Distance Heatmap

Analysis Plots

Volcano Plot

Scatter Plot

Gene Expression Boxplot

Heatmap

News & Releases

Version History

News

Shinyapps.io Terms & Conditions

Terms of Use

Shinyapps.io Terms of Use