【原】芯片的探针ID找到基因名-基于R语言-一文就够

健明 2021-07-14

展开全文

使用bioconductor注释包

如果该芯片平台有对应的bioconductor注释包，只有约90个常用的芯片有！

比如：

library(hgu133a.db)
ids=toTable(hgu133aSYMBOL)
head(ids)
## 或者
platformDB='hugene10sttranscriptcluster.db'
library(platformDB, character.only=TRUE)
probeset <- featureNames(GSE62832[[1]])

这些bioconductor注释包规律是一样的，都是存储一下探针ID及其对应的基因名的关系而已。

其它包列表见我早期菜鸟团博客收集：http://www./1399.html

使用GPL平台信息

即使该芯片平台没有被bioconductor组织者制作R包，也是很容易拿到探针ID及其对应的基因名的关系文件，只需理解GEO数据库的GPL平台信息即可，如下：

library(Biobase)
library(GEOquery)
#Download GPL file, put it in the current directory, and load it:
gpl <- getGEO('GPL10558', destdir=".")
colnames(Table(gpl)) ## [1] 41108    17
## 重点就是要花时间来摸索这个返回值
head(Table(gpl)[,c(1,10,13)])  ## you need to check this , which column do you need 
probe2symbol=Table(gpl)[,c(1,13)]