In addition, HTML files that contains links of the declared significant microRNAs to the Sanger miRBase http://microrna./ are given.
1.1 示例数据特征
The data come from human mesenchymal stem cells obtained from bone marrow.100 ng of each RNA sample were hybridized onto Agilent Human microRNA Microarray v2.0 (G4470B,Agilent Technologies).
The Human microRNA microarray v2.0 contains 723 human and 76 human viral microRNAs, each of them replicated 16 times. There are 362 microRNAs interrogated by 2 different oligonucleotides, 45 microRNAs by 3 and 390 microRNAs interrogated by 4 different oligonucleotides. Only 2 microRNAs are interrogated by the same oligonucleotide. The array contains also a set of positive and negative controls that are replicated a different number of times.
人骨髓间充质干细胞的microRNA,
共有三种治疗效果A,B,C,每个实验条件2个重复;
将两种处理(MSC_B和MSC_C)与对照MSC_A进行比较。
1.2 示例数据处理路线
准备数据&数据可视化(数据检查)
be normalized between arrays;
The signal is background corrected using the exponential + normal convolution model;
the background signal is normalized between arrays;
the total gene signal is estimated from a linear model that takes into account the probe effect. The estimates of the model are obtained using a robust methods such as the median polish.
Obtaining the Total microRNA Gene Signal processed by AFE;
normalization between arrays.
通过AFE算法计算TGS(Total Gene Signal)
利用RMA算法评估 the gene signal
质控并输出至 ExpressionSet
After obtaining the normalized total gene signal, some of the genes are eliminated from the analysis using some of the quality flags that AFE attaches to each feature.
差异分析:利用 limma 的线性模型特征完成了差异表达分析。
M value, moderated t and F statistics, p values and FDR, etc
数据预处理和差异表达分析所必须列:gTotalGeneSignal, gMeanSignal, gIsGeneDetected, ControlType, ProbeName, and GeneName.
其中AFE完整算法包括:Typically it contains the Multiplicatively Detrended BackgroundSubtracted Signal or the BackgroundSubtractedSignal.
含义:the TotalProbeSignal times the number of probes per gene;The gTotalProbeSignal is the robust average of all the processed signals for each replicated probe multiplied by the total number of probe replicates. These signals are used by AFE algorithms to estimate the gTotalGeneSignal.
The ProbeName is an Agilent-assigned identifier for the probe synthesized on the microarray.
The GeneName is an identifier for the gene for which the probe provides expression information.
4. 绘图-数据可视化
AgiMicroRna可直接通过gMeanSignal绘制箱线图(boxplot)、密度图(density)、MA图(MA plots)、Relative Log Expression (RLE)和样本层次聚类(hierarchical cluster of samples)等功能,可以评估数据质量,并检查处理步骤的性能。并且该包提供了一步化出图函数:qcPlots
4.1 箱线图(boxplot)&密度图(density)
两者均是针对“log2(dd.micromeanS)=5-6的密度大,数量多。
[ps] 可以将密度图看做箱线图的另一种表示形式;
4.2 MA图
The MA plots represent the fold-change (M) in the y-axis against the average log expression (A) for two given arrays.
M为Y轴,M = (x - y);A为X轴,A = (x + y)/2;
y = apply(uRNAList$meanS, 1, median)
x = uRNAList$meanS[, i]
该图所用到的变量值:
dd.micro$meanS;
dd.microGeneName (miRNA的ID)
dd.microControlType (feature分类)
The signal values of the reference array are computed as the median spots taken over the whole set of arrays Every kind of feature is identified with different color. 该包自定了一个参考数组(y值),将其他数组(x值)与该参考数组进行比较。
The RLE plot displays for each sample a Boxplot with the Relative Log Expression (RLE).
The RLE is computed for every spot in the array as the difference between the spot and the median of the same spot across all the arrays.(所有数组中同一点的中位数与点之间的差值)
4.4 hierarchical 层次聚类图
qcPlots makes a hierarchical cluster of the samples using the hclust function of the stats package.
The options for the distance measures are euclidean and pearson.
聚类图未按照样本分组聚类的原因:
可能是进行层次聚类的基因略少;
The variables that distinguish the experimental conditions from one another are the differential expressed genes, and that the number of genes may be few relative to the full set of genes of the data set, and hence the cluster analysis will often not reflect the influence of these relevant genes. Therefore if the percentage of differential expression is low, the samples might not be grouped according to their experimental group, since the whole set of genes has very little information regarding the experimental grouping, and the plot will mainly show other grouping features or simply random noise.
4.5 CV图
在探针和基因水平识别重复特征,并计算阵列的变异系数。
The cvArray identifies the replicated non-controlprobes for each feature in the array and computes CV for every microRNA probe set. Then, the median of the CV for each probe set is reported as the array reproducibility.(识别阵列中每个特征的复制的非控制探针,并计算每个microRNA探针集的CV。然后,每个探针组CV的中位数作为阵列重现性报告。)
To obtain the CV using the cvArray function, we can either choose the MeanSignal or ProcessedSignal. 使用cvArray函数得到CV,我们可以选择 means signal 或 ProcessedSignal。
A lower CV median indicates a better array reproducibility.
CV:标准差/均值。
5. 数据标准化(2种方法)
目的:试图补偿芯片之间的系统技术差异,以更清楚地看到样品之间的生物学差异。
AgiMicroRna uses two strategies to obtain a gene signal estimate normalized between arrays.
5.1 AFE算法
uses the TGS signal processed by the AFE algorithms.
This TGS can be normalized between arrays using either the quantile (default) or the scale methods.
If we use the none option the TGS is only log2 transformed. 即:函数tgsNormalization 中有三个标准化参数:“none”(仅做log2转化),“quantile”,"scale"。
[github.com/borisvish/Median-Polish](https://github.com/borisvish/Median-Polish#:~:text=Median-Polish. Python implementation of Median Polish algorithm for,considering row and column labels as categorical factors.)