【原】R语言实操：使用TwoSampleMR包进行孟德尔随机化分析

育种数据分析 2024-09-11 发布于河南

展开全文

大家好，我是邓飞。

前几天一直学习孟德尔随机化的理论知识，写了几篇博客，迷迷糊糊感觉入门了，今天跑代码试了试，看着结果和图表，感觉理解更深入了。果然，看书百遍，不如一练，今天分享一下实操代码。

前几天的博客：

孟德尔随机化的术语理解

孟德尔随机化：工具变量三大假设

从一篇孟德尔随机化文章看MR常见结果形式

示例数据使用官网的数据，进行了一点补充，对结果进行了可视化。（https://mrcieu./TwoSampleMR/articles/introduction.html）

整个步骤：

步骤1：提取暴露数据的GWAS

> ## 1, 安装TwoSampleMR，如果已安装，可以忽略> > # library(remotes)> # install_github("MRCIEU/TwoSampleMR")> > ## 2， 载入TwoSampleMR包> library(TwoSampleMR)> > ## 3，从数据库中提取暴露的GWAS summary数据> exposure_dat = extract_instruments("ieu-a-2")> dim(exposure_dat)[1] 79 15

共有79行15列的暴露数据结果。

步骤2：提取结局数据的GWAS

> ## 4，从数据库中提取结局变量的的GWAS summary数据，SNP用暴露数据的结果> # Get effects of instruments on outcome> outcome_dat = extract_outcome_data(snps=exposure_dat$SNP, outcomes = "ieu-a-7")Extracting data for 79 SNP(s) from 1 GWAS(s)> dim(outcome_dat)[1] 79 16

共79行15列的结局数据，注意，这里直接使用暴露数据质控后的SNP，提取结局数据得到的结果，所以位点数是一样的。

步骤3：合并暴露数据和结局数据

> ## 5，将暴露数据和结局数据合并> # Harmonise the exposure and outcome data> dat = harmonise_data(exposure_dat, outcome_dat)Harmonising Body mass index || id:ieu-a-2 (ieu-a-2) and Coronary heart disease || id:ieu-a-7 (ieu-a-7)> dim(dat)[1] 79 36

合并的数据，共79行，36列，这些数据可以用于孟德尔随机化的分析。

步骤4：孟德尔随机化分析及结果可视化

> ## 6，进行孟德尔随机化分析> res = mr(dat)Analysing 'ieu-a-2' on 'ieu-a-7'> ## 7，异质化分析> mr_heterogeneity(dat)  id.exposure id.outcome                              outcome                      exposure                    method        Q1     ieu-a-2    ieu-a-7 Coronary heart disease || id:ieu-a-7 Body mass index || id:ieu-a-2                  MR Egger 143.30462     ieu-a-2    ieu-a-7 Coronary heart disease || id:ieu-a-7 Body mass index || id:ieu-a-2 Inverse variance weighted 143.6508  Q_df       Q_pval1   77 6.841585e-062   78 8.728420e-06> ## 8，水平多效性分析> mr_pleiotropy_test(dat)  id.exposure id.outcome                              outcome                      exposure egger_intercept          se      pval1     ieu-a-2    ieu-a-7 Coronary heart disease || id:ieu-a-7 Body mass index || id:ieu-a-2    -0.001719304 0.003985962 0.6674266> ## 9，留一法分析> res_loo = mr_leaveoneout(dat)> mr_leaveoneout_plot(res_loo)$`ieu-a-2.ieu-a-7`
attr(,"split_type")[1] "data.frame"attr(,"split_labels")  id.exposure id.outcome1     ieu-a-2    ieu-a-7> ## 10，散点图> > p1 = mr_scatter_plot(res, dat)> p1$`ieu-a-2.ieu-a-7`
attr(,"split_type")[1] "data.frame"attr(,"split_labels")  id.exposure id.outcome1     ieu-a-2    ieu-a-7> > ## 11，森林图> res_single = mr_singlesnp(dat)> mr_forest_plot(res_single)$`ieu-a-2.ieu-a-7`
attr(,"split_type")[1] "data.frame"attr(,"split_labels")  id.exposure id.outcome1     ieu-a-2    ieu-a-7Warning messages:1: Removed 1 row containing missing values or values outside the scale range (`geom_errorbarh()`). 2: Removed 1 row containing missing values or values outside the scale range (`geom_point()`). > > ## 12，漏斗图> mr_funnel_plot(res_single)$`ieu-a-2.ieu-a-7`
attr(,"split_type")[1] "data.frame"attr(,"split_labels")  id.exposure id.outcome1     ieu-a-2    ieu-a-7

下图是留一法的森林图：

下图是孟德尔随机化的森林图：

下图是孟德尔随机化的散点图：

下图是孟德尔随机化的漏斗图：

还有哪些需要研究的？

如何读取自己的GWAS summary结果，并将格式整理为TwoSampleMR的格式？

如何对暴露数据GWAS结果进行质控，包括LD质控，F值质控，R2质控等？

怎么对已发表的文章进行结果图标的复现？

这些都是细枝末节，等我后续一一完成博客的文章，欢迎继续关注。

上面分析完整的代码汇总：

## 1, 安装TwoSampleMR，如果已安装，可以忽略
# library(remotes)# install_github("MRCIEU/TwoSampleMR")
## 2， 载入TwoSampleMR包library(TwoSampleMR)
## 3，从数据库中提取暴露的GWAS summary数据exposure_dat = extract_instruments("ieu-a-2")dim(exposure_dat)
## 4，从数据库中提取结局变量的的GWAS summary数据，SNP用暴露数据的结果# Get effects of instruments on outcomeoutcome_dat = extract_outcome_data(snps=exposure_dat$SNP, outcomes = "ieu-a-7")dim(outcome_dat)
## 5，将暴露数据和结局数据合并# Harmonise the exposure and outcome datadat = harmonise_data(exposure_dat, outcome_dat)dim(dat)
## 6，进行孟德尔随机化分析res = mr(dat)
## 7，异质化分析mr_heterogeneity(dat)

## 8，水平多效性分析mr_pleiotropy_test(dat)

## 9，留一法分析res_loo = mr_leaveoneout(dat)mr_leaveoneout_plot(res_loo)

## 10，散点图
p1 = mr_scatter_plot(res, dat)p1
## 11，森林图res_single = mr_singlesnp(dat)mr_forest_plot(res_single)
## 12，漏斗图mr_funnel_plot(res_single)