孟德尔随机化MR：全身炎症调节因子与急性慢性肝功能衰竭风险（后附分析代码）

国熙台 2024-01-02 发布于山东

展开全文

孟德尔随机化研究是一种近年来主要应用于流行病学病因推断上的一种数据分析方式。Katan1986 提出：不同基因型决定不同的中间表型，若该表型代表个体的某暴露特征，用基因型和疾病的关联效应能够代表暴露因素对疾病的作用，由于等位基因遵循随机分配原则，该作用不受传统流行病学研究中的混杂因素和反向因果关联所影响 (反向因果关联，指的是暴露和结局的时间顺序颠倒)。

目前的问题是在观察性研究中得到的结论往往是association而不是causation，而基于孟德尔独立分配定律（配子形成时等位基因随机分配到子代配子中），所以基因和疾病之间的关联不会受到出生后的环境、社会经济地位、行为因素等常见混杂因素的干扰，且因果时序合理，使效应估计值更接近真实情况。

2023.4发表了一项有关双向孟德尔随机化研究：全身炎症调节因子与急性慢性肝功能衰竭风险《Systemic inflammatory regulators and risk of acute-on-chronic liver failure: A bidirectional mendelian-randomization study》

摘要：

炎症在慢加急性肝功能衰竭（ACLF）的发病机制中起一定作用，但炎症与ACLF之间是否存在因果关系尚不清楚。采用两样本孟德尔随机化（MR）方法研究全身炎症调节因子与ACLF之间的因果关系。该研究分析了从全基因组关联研究（GWAS）荟萃分析数据库中提取的8，293名个体的41种细胞因子和生长因子，涉及253例ACLF病例和456，095例对照。我们的研究结果显示，干细胞因子（SCF）水平降低、碱性成纤维细胞生长因子（bFGF）水平降低和白细胞介素-13（IL-13）水平升高与ACLF的风险增加相关（OR = 0.486，95% CI = 0.264-0.892，p = 0.020; OR = 0.323，95% CI = 0.107-0.972，p = 0.044; OR = 1.492，95%CI = 1.111-2.004，p = 0.008）。此外，遗传预测ACLF不影响全身炎症调节因子的表达。我们的研究结果表明，细胞因子在ACLF的发病机制中起着至关重要的作用。需要进一步的研究来确定这些生物标志物是否可以用于预防和治疗ACLF。

结果：

结果表明，遗传预测的全身性炎症调节因子与ACLF有关。干细胞因子和碱性成纤维细胞生长因子水平的升高与急性肝功能衰竭发病风险的降低呈负相关(OR=0.486，95%CI=0.264~0.892，p=0.020；OR=0.323，95%CI=0.107~0.972，p=0.044)。MR-Egger截取未检测到干细胞因子和碱性成纤维细胞生长因子的潜在水平多效性(分别为p=0.790；p=0.898)。

此外，基于MR-Egger和IVW测试的Q值显示，SCF和bFGF没有明显的异质性(均p>0.05)。静脉注射白介素13(IL-13)水平越高，患急性肝功能衰竭的风险越高(OR=1.492，95%CI=1.111~2.004，P=0.008)。MR-Egger截取未检测到IL-13的潜在水平多效性(p=0.287，ReSpectiveLy)。

利用IVW方法确定了基因预测的巨噬细胞炎性蛋白1a水平与急性肝功能衰竭风险之间的关联(OR=1.870，95%CI=1.077-3.247，P=0.026)。使用MR-PRESSO全球检验仍然观察到了多效性的证据(p<0.01)。水平多效性可能违反MR假设，因此这一分析可能不可靠。

用同样的方法预测了遗传预测的ACLF和细胞因子水平之间的关联。然而，在任何MR方法中，基因预测的ACLF与任何细胞因子水平都没有关联。

孟德尔随机化分析流程

1、软件安装

2、暴露数据下载

3、结局数据下载

（vcf文件！！！！）

4、关联性分析

5、绘制曼哈顿图

#install.packages('devtools')#devtools::install_github('mrcieu/gwasglue', force = TRUE)

#if (!require('BiocManager', quietly = TRUE))# install.packages('BiocManager')#BiocManager::install('VariantAnnotation')

#install.packages('dplyr')#install.packages('tidyr')#install.packages('CMplot')

#引用包library(VariantAnnotation)library(gwasglue)library(dplyr)library(tidyr)library(CMplot)

inputFile='***.vcf.gz' #输入文件(根据下载暴露数据的文件名称进行修改)setwd('C:\\Users\\***\\Desktop\\Mendelian\\04.exposure') #设置工作目录

#读取输入文件, 并对输入文件进行格式转换vcfRT <- readVcf(inputFile)data=gwasvcf_to_TwoSampleMR(vcf=vcfRT, type='exposure')

#根据pvalue<5e-08对结果进行过滤outTab<-subset(data, pval.exposure<5e-08)write.csv(outTab, file='exposure.pvalue.csv', row.names=F)

#准备绘制曼哈顿图的数据data=data[,c('SNP', 'chr.exposure', 'pos.exposure', 'pval.exposure')]colnames(data)=c('SNP','CHR','BP','pvalue')

#绘制线性的曼哈顿图CMplot(data, plot.type='m', LOG10=TRUE, threshold=5e-08, threshold.lwd=3, threshold.lty=1, signal.cex=0.2, chr.den.col=NULL, cex=0.2, bin.size=1e5, ylim=c(0,50), file='pdf', file.output=TRUE, width=15, height=9, verbose=TRUE)

#绘制圈图CMplot(data, plot.type='c', LOG10=TRUE, threshold=5e-08, threshold.lwd=3, threshold.lty=1, signal.cex=0.2, chr.den.col=NULL, cex=0.2, bin.size=1e5, ylim=c(0,100), file='pdf', file.output=TRUE, width=7, height=7, verbose=TRUE)

6、连锁不平衡

#install.packages('remotes')#remotes::install_github('MRCIEU/TwoSampleMR')

#引用包library(TwoSampleMR)
exposureFile='exposure.pvalue.csv'     #输入文件setwd('C:\\Users\\***\\Desktop\\Mendelian\\05.LD')      #设置工作目录
#读取输入文件exposure_dat<-read_exposure_data(filename=exposureFile,                       sep = ',',                       snp_col = 'SNP',                       beta_col = 'beta.exposure',                       se_col = 'se.exposure',                       effect_allele_col = 'effect_allele.exposure',                       other_allele_col = 'other_allele.exposure',                       eaf_col = 'eaf.exposure',                       samplesize_col = 'samplesize.exposure',                       clump = F)
#去除连锁不平衡的SNPexposure_dat_clumped <- clump_data(exposure_dat, clump_kb=10000, clump_r2=0.001)write.csv(exposure_dat_clumped, file='exposure.LD.csv', row.names=F)

7、去除弱工具变量（F检验值）

Ffilter=10 #F值过滤条件inputFile='exposure.LD.csv' #输入文件setwd('C:\\Users\\***\\Desktop\\Mendelian\\06.F') #设置工作目录

#读取输入文件dat<-read.csv(inputFile, header=T, sep=',', check.names=F)

#计算F检验值N=dat[1,'samplesize.exposure'] #获取样品的数目dat=transform(dat,R2=2*((beta.exposure)^2)*eaf.exposure*(1-eaf.exposure)) #计算R2dat=transform(dat,F=(N-2)*R2/(1-R2)) #计算F检验值

#根据F值>10进行过滤, 删除弱工具变量outTab=dat[dat$F>Ffilter,]write.csv(dat, 'exposure.F.csv', row.names=F)

8、去除混杂因素（PhenoScanner）

library(MendelianRandomization)     #引用包inputFile='exposure.F.csv'          #输入文件setwd('C:\\Users\\***\\Desktop\\Mendelian\\07.phenoscanner')     #设置工作目录
#读取输入文件dat=read.csv(inputFile, header=T, sep=',', check.names=F)
#对SNP分组snpId=dat$SNPy=seq_along(snpId)chunks <- split(snpId, ceiling(y/100))
#对分组进行循环,每次得到一个分组outTab=data.frame()for(i in names(chunks)){  #混杂因素分析  confounder=phenoscanner(    snpquery = chunks[[i]],    catalogue = 'GWAS',    pvalue = 1e-05,    proxies = 'None',    r2 = 0.8,    build = 37)  outTab=rbind(outTab, confounder$results)}#输出SNP相关性状的表格write.csv(outTab, 'confounder.result.csv', row.names=F)
#输出去除混杂因素的结果delSnp=c('rs13078960', 'rs2030323')     #混杂SNP的名称(需修改)dat=dat[!dat$SNP %in% delSnp,]write.csv(dat, 'exposure.confounder.csv', row.names=F)

9、孟德尔随机化分析

10、敏感性分析

#devtools::install_github('mrcieu/gwasglue',force = TRUE)#BiocManager::install('VariantAnnotation')

#install.packages('remotes')#remotes::install_github('MRCIEU/TwoSampleMR')

#引用包library(VariantAnnotation)library(gwasglue)library(TwoSampleMR)

exposureName='BMI' #图形中展示暴露数据的名称outcomeName='Coronary heart disease' #图形中展示结局数据的名称

exposureFile='exposure.confounder.csv' #暴露数据输入文件outcomeFile='***.vcf.gz' #结局数据输入文件(改成我们下载的结局数据文件)setwd('C:\\Users\\***\\Desktop\\Mendelian\\08.TwoSampleMR') #设置工作目录

#读取暴露数据exposure_dat<-read_exposure_data(filename=exposureFile, sep = ',', snp_col = 'SNP', beta_col = 'beta.exposure', se_col = 'se.exposure', effect_allele_col = 'effect_allele.exposure', other_allele_col = 'other_allele.exposure', eaf_col = 'eaf.exposure', clump = F)

#读取结局数据的vcf文件,并对数据进行格式转换vcfRT <- readVcf(outcomeFile)outcomeData=gwasvcf_to_TwoSampleMR(vcf=vcfRT, type='outcome')

#从结局数据中提取工具变量outcomeTab<-merge(exposure_dat, outcomeData, by.x='SNP', by.y='SNP')write.csv(outcomeTab[,-(2:ncol(exposure_dat))], file='outcome.csv')

#读取整理好的结局数据outcome_dat<-read_outcome_data(snps=exposure_dat$SNP, filename='outcome.csv', sep = ',', snp_col = 'SNP', beta_col = 'beta.outcome', se_col = 'se.outcome', effect_allele_col = 'effect_allele.outcome', other_allele_col = 'other_allele.outcome', pval_col = 'pval.outcome', eaf_col = 'eaf.outcome')

#将暴露数据和结局数据合并exposure_dat$exposure=exposureNameoutcome_dat$outcome=outcomeNamedat<-harmonise_data(exposure_dat=exposure_dat, outcome_dat=outcome_dat)

#输出用于孟德尔随机化的工具变量outTab=dat[dat$mr_keep=='TRUE',]write.csv(outTab, file='table.SNP.csv', row.names=F)

#MR-PRESSO异常值检测(偏倚的SNP)presso=run_mr_presso(dat)write.csv(presso[[1]]$`MR-PRESSO results`$`Outlier Test`, file='table.MR-PRESSO.csv')

#孟德尔随机化分析mrResult=mr(dat)#选择孟德尔随机化的方法#mr_method_list()$obj

#mrResult=mr(dat, method_list=c('mr_ivw', 'mr_egger_regression', 'mr_weighted_median', 'mr_simple_mode', 'mr_weighted_mode'))

#对结果进行OR值的计算mrTab=generate_odds_ratios(mrResult)write.csv(mrTab, file='table.MRresult.csv', row.names=F)

#异质性分析heterTab=mr_heterogeneity(dat)write.csv(heterTab, file='table.heterogeneity.csv', row.names=F)

#多效性检验pleioTab=mr_pleiotropy_test(dat)write.csv(pleioTab, file='table.pleiotropy.csv', row.names=F)

#绘制散点图pdf(file='pic.scatter_plot.pdf', width=7.5, height=7)mr_scatter_plot(mrResult, dat)dev.off()

#森林图res_single=mr_singlesnp(dat) #得到每个工具变量对结局的影响pdf(file='pic.forest.pdf', width=7, height=6.5)mr_forest_plot(res_single)dev.off()

#漏斗图pdf(file='pic.funnel_plot.pdf', width=7, height=6.5)mr_funnel_plot(singlesnp_results = res_single)dev.off()

#留一法敏感性分析pdf(file='pic.leaveoneout.pdf', width=7, height=6.5)mr_leaveoneout_plot(leaveoneout_results = mr_leaveoneout(dat))dev.off()

pFilter=1       #pvalue过滤条件setwd('C:\\Users\\***\\Desktop\\Mendelian\\09.forest')     #设置工作目录
############定义森林图函数############bioForest=function(inputFile=null, forestFile=null, forestCol=null){    #读取输入文件  rt=read.csv(inputFile, header=T, sep=',', check.names=F)  row.names(rt)=rt$method  rt=rt[rt$pval<pFilter,]  method <- rownames(rt)  or <- sprintf('%.3f',rt$'or')  orLow  <- sprintf('%.3f',rt$'or_lci95')  orHigh <- sprintf('%.3f',rt$'or_uci95')  OR <- paste0(or,'(',orLow,'-',orHigh,')')  pVal <- ifelse(rt$pval<0.001, '<0.001', sprintf('%.3f', rt$pval))      #输出图形  pdf(file=forestFile, width=7, height=4.6)  n <- nrow(rt)  nRow <- n 1  ylim <- c(1,nRow)  layout(matrix(c(1,2),nc=2),width=c(3.5,2))      #绘制森林图左边的信息  xlim = c(0,3)  par(mar=c(4,2.5,2,1))  plot(1,xlim=xlim,ylim=ylim,type='n',axes=F,xlab='',ylab='')  text.cex=0.8  text(0,n:1,method,adj=0,cex=text.cex)  text(1.9,n:1,pVal,adj=1,cex=text.cex);text(1.9,n 1,'pvalue',cex=1,font=2,adj=1)  text(3.1,n:1,OR,adj=1,cex=text.cex);text(2.7,n 1,'OR',cex=1,font=2,adj=1)      #绘制右边的森林图  par(mar=c(4,1,2,1),mgp=c(2,0.5,0))  xlim = c(min(as.numeric(orLow)*0.975,as.numeric(orHigh)*0.975,0.9),max(as.numeric(orLow),as.numeric(orHigh))*1.025)  plot(1,xlim=xlim,ylim=ylim,type='n',axes=F,ylab='',xaxs='i',xlab='OR')  arrows(as.numeric(orLow),n:1,as.numeric(orHigh),n:1,angle=90,code=3,length=0.05,col='darkblue',lwd=3)  abline(v=1, col='black', lty=2, lwd=2)  boxcolor = ifelse(as.numeric(or)>1, forestCol, forestCol)  points(as.numeric(or), n:1, pch = 15, col = boxcolor, cex=2)  axis(1)  dev.off()}
#调用函数,绘制森林图bioForest(inputFile='table.MRresult.csv', forestFile='forest.pdf', forestCol='red')