(mRNA,lncRNA,miRNA,甲基化,蛋白)均可走上述流程,也就是说33种癌症乘以5种亚型,乘以5种分子,乘以15个策略就已经是过万篇数据挖掘课题了,而且你仔细搜索一下就发现,真的是已经有了过万篇数据挖掘文章了哦! 人民群众的模仿能力是远超出我的想象,这里分享一个策略给大家。最近看到一篇文章,做的是甲基化和表达量联合分析,核心数据集如下所示: 完整的路线图如下: 文章链接是:https://www./articles/10.3389/fgene.2020.00294/full 我有预感,这样的策略,马上也会应用于33种癌症乘以5种亚型,就是一百多篇文章出来啦。不过,能模仿下来,首先就有一个R语言的拦路虎哦!再怎么强调生物信息学数据分析学习过程的计算机基础知识的打磨都不为过,我把它粗略的分成基于R语言的统计可视化,以及基于Linux的NGS数据处理: 把R的知识点路线图搞定,如下:
如果你学会了R,那么很容易就可以看懂MethylMix包的使用方法啦。去探索你感兴趣的癌症吧,当然了,又不是一定需要癌症数据,只要是有转录组和甲基化数据的项目,都可以使用起来。这个时候学习起来也不晚,虽然你错过了这个数据挖掘策略,但是未来十年仍然是有几十上百个好用的策略,到时候就需要你的R语言水平啦!可以考虑我们生信技能树的官方学习班(下一期是8月31号开始哈):
MethylMix说明书简介用途:识别甲基化驱动的癌症基因,可以识别癌症中高甲基化和低甲基化基因。 数据集以下是包含在R包中的数据集。 GEcancer
METcancer
METnormal
ProbeAnnotation
SNPprobes
函数ClusterProbes用途:把每个探针注释到一个基因。一个基因有多个CpG位点,对这些CpG位点聚类,如果同时提供正常组织的样本,那么只对同时出现在正常样本和癌症样本中的探针进行操作。去除了SNP探针。用法: ClusterProbes(MET_Cancer, MET_Normal, CorThreshold = 0.4) ### MET_Cancer: 癌症组织样本 ### MET_Normal: 正常组织样本 ### CorThreshold: 划分聚类的相关性阈值 输出结果:
Download_DNAmethylation用途:从TCGA下载DNA甲基化数据用法: Download_DNAmethylation(CancerSite, TargetDirectory, downloadData = TRUE) ### CancerSite:各种癌症的缩写名称,例如OV,THCA等 ### TargetDirectory:下载文件存放位置 ### downloadData (默认:TRUE). 如果是false, 则返回下载链接 输出结果: Download_GeneExpression用途:从TCGA下载RNAseq数据用法: Download_GeneExpression(CancerSite, TargetDirectory, downloadData = TRUE) ### CancerSite:同上 ### TargetDirectory:下载文件存放位置 ### downloadData (默认:TRUE). 如果是false, 则返回下载链接 输出结果: Preprocess_DNAmethylation用途:预处理从TCGA下载DNA甲基化数据用法: Preprocess_DNAmethylation(CancerSite, METdirectories, MissingValueThreshold = 0.2) ### CancerSite: 同上 ### METdirectories: 由Download_DNAmethylation返回的对象。由下载数据路径组成的向量。 ### MissingValueThreshold: 当存在缺失值时,移除样本或基因的阈值 输出结果:预处理之后的癌症组织和正常组织矩阵 Preprocess_GeneExpression用途:预处理从TCGA下载RNAseq数据用法: Preprocess_GeneExpression(CancerSite, MAdirectories, MissingValueThresholdGene = 0.3, MissingValueThresholdSample = 0.1) ### CancerSite: 同上 ### MAdirectories: 由Download_DNAmethylation返回的对象。由下载数据路径组成的向量。 ### MissingValueThresholdGene:默认值0.3,当存在缺失值时,移除样本或基因的阈值。 ### MissingValueThresholdSample:默认值0.1,当存在缺失值时,移除样本或基因的阈值。
输出结果:预处理之后的癌症组织和正常组织矩阵 GetData用途:打包了 用法: GetData(cancerSite, targetDirectory) ### cancerSite:同上 ### targetDirectory:输出文件存放位置 输出结果:
MethylMix用途:对DNA甲基化数据建立mixture model 用法: MethylMix(METcancer, GEcancer, METnormal = NULL, listOfGenes = NULL, filter = TRUE, NoNormalMode = FALSE, OutputRoot = "") ### 三个数据集:METcancer, GEcancer,METnormal ### listOfGenes:和rownames(METcancer)一致 ### filter:默认值TRUE 选择甲基化和基因表达呈负相关的基因 ### NoNormalMode:默认值FALSE,在癌症组织中的甲基化状态不和正常组织比较 ### OutputRoot:存储 MethylMix results object的路径 输出结果:
MethylMix_ModelGeneExpression用途:对DNA甲基化数据建立mixture model 用法: MethylMix_ModelGeneExpression(METcancer, GEcancer, CovariateData = NULL) ### 两个数据集:METcancer和GEcancer ### CovariateData:一般不需要设置,如果样本来源于不同的组织类型,则可以设置。 输出结果:输出甲基化和基因表达呈显著负相关的的基因名 MethylMix_PlotModel用途:根据MethylMix函数结果画图 用法: MethylMix_PlotModel(GeneName, MixtureModelResults, METcancer, GEcancer = NULL, METnormal = NULL) #GeneName: Name of the gene for which to create a MethylMix plot. #MixtureModelResults:MethylMix函数返回的list ### 三个数据集:METcancer,GEcancer,METnormal 输出结果:两张图:
MethylMix_Predict用途:Given a new data set with methylation data, this function predicts the mixture component for each new sample and driver gene. Predictions are based on posterior probabilities calculated with MethylMix’x fitted mixture model 用法: MethylMix_Predict(newBetaValuesMatrix, MethylMixResult) ### newBetaValuesMatrix:新矩阵,genes/cpg sites in rows, samples in columns. genes/cpg sites的名字要和MethylMix函数中用到的矩阵中的genes/cpg sites的名字保持一致,但是genes/cpg sites数量可以不一致。 ### MethylMixResult:MethylMix函数输出结果 输出结果:输出一个预测矩阵,driver genes in rows, new samples in columns predictOneGene用途:Given a new vector of beta values, this function calculates a matrix with posterior prob of belonging to each mixture commponent (columns) for each new beta value (rows), and return the number of the mixture component with highest posterior probabilit 用法: predictOneGene(newVector, mixtureModel) ### newVector:vector with new beta values ### mixtureModel:beta mixture model object for the gene being evaluated. 输出结果:输出一个预测矩阵,driver genes in rows, new samples in columns 代码事例### Optional register cluster to run in parallel library(doParallel) cl <- makeCluster(5) registerDoParallel(cl) ### Methylation data for ovarian cancer cancerSite <- "OV" targetDirectory <- paste0(getwd(), "/") ### GetData可以替代下面的代码 GetData(cancerSite, targetDirectory) ### Downloading methylation data METdirectories <- Download_DNAmethylation(cancerSite, targetDirectory, TRUE) ### Processing methylation data METProcessedData <- Preprocess_DNAmethylation(cancerSite, METdirectories) ### Saving methylation processed data saveRDS(METProcessedData, file = paste0(targetDirectory, "MET_", cancerSite, "_Processed.rds")) ### Clustering methylation data res <- ClusterProbes(METProcessedData[[1]], METProcessedData[[2]]) ### Saving methylation clustered data toSave <- list(METcancer = res[[1]], METnormal = res[[2]], ProbeMapping = res$ProbeMapping) saveRDS(toSave, file = paste0(targetDirectory, "MET_", cancerSite, "_Clustered.rds")) stopCluster(cl) ### load the three data sets needed for MethylMix data(METcancer) data(METnormal) data(GEcancer) ### run MethylMix on a small set of example data MethylMixResults <- MethylMix(METcancer, GEcancer, METnormal) ### run in parallel library(doParallel) cl <- makeCluster(5) registerDoParallel(cl) MethylMixResults <- MethylMix(METcancer, GEcancer, METnormal) stopCluster(cl) ### load data sets data(METcancer) data(GEcancer) ### model gene expression MethylMixResults <- MethylMix_ModelGeneExpression(METcancer, GEcancer)
### Load the three data sets needed for MethylMix data(METcancer) data(METnormal) data(GEcancer) ### Run methylmix on a small set of example data MethylMixResults <- MethylMix(METcancer, GEcancer, METnormal) ### Plot the most famous methylated gene for glioblastoma MethylMix_PlotModel("MGMT", MethylMixResults, METcancer)
### Plot MGMT also with its normal methylation variation MethylMix_PlotModel("MGMT", MethylMixResults, METcancer, METnormal = METnormal) ### Plot a MethylMix model for another gene MethylMix_PlotModel("ZNF217", MethylMixResults, METcancer, METnormal = METnormal) ### Also plot the inverse correlation with gene expression (creates two separate plots) MethylMix_PlotModel("MGMT", MethylMixResults, METcancer, GEcancer, METnormal) ### Plot all functional and differential genes for (gene in MethylMixResults$MethylationDrivers) { MethylMix_PlotModel(gene, MethylMixResults, METcancer, METnormal = METnormal) } ### load the three data sets needed for MethylMix data(METcancer) data(METnormal) data(GEcancer) ### run MethylMix on a small set of example data MethylMixResults <- MethylMix(METcancer, GEcancer, METnormal)
### toy example new data, of same dimension of original METcancer data newMETData <- matrix(runif(length(METcancer)), nrow = nrow(METcancer)) rownames(newMETData) <- rownames(METcancer) colnames(newMETData) <- paste0("sample", 1:ncol(METcancer)) predictions <- MethylMix_Predict(newMETData, MethylMixResults) |
|