最近看到一个韩国人的TCGA数据挖掘文章,标题是:《Classification of Genes Based on Age-Related Differential Expression in Breast Cancer》,发表于2017年,研究者根据年龄对病人进行分组,使用TCGA数据库里面的乳腺癌患者数据,病人分成 3 组:
Young patients were defined as ≤45 years of age
elderly patients were defined as those ≥60 years of age
The rest of the patients were defined as “intermediate.”
全部的分析流程如下所示:
A total of 5,962 genes in class A were defined as significant DEGs in breast cancer, and 13,684 in class B were nonsignificant.
Ones who want to find biomarkers or driver genes are likely to investigate only genes in class A.
However, we classified the genes of each class once again into eight groups, based on the pattern of p-values, which were calculated separately for every age group (secondary classification in Fig. 2).
After a second round of classification, the genes were eventually divided into 16 classes (A1–B8) (Supplementary Table 1).