分享

根据肿瘤突变信息文件计算一系列指标

 健明 2021-09-13

文献《Multi-Omics Profiling Reveals Distinct Microenvironment Characterization and Suggests Immune Escape Mechanisms of Triple-Negative Breast Cancer》,研究者把TNBC根据免疫分成3个亚群,然后寻找Potential intrinsic immune escape mechanisms of TNBC,这个过程应用了很多突变位点的量化指标,包括:

  • neoantigens,
  • cancer testis antigens (CTAs),
  • homologous recombination deficiency (HRD) scores,
  • intratumoral heterogeneity (ITH)
  • TMB

结果如下;

 

Comparison of mutation loads (A), neoantigen load (B), HRD scores (C), CTA numbers (D), necrosis (E), and ITH scores (F) among the three clusters. In the violin plots, the mean values are plotted as red dots, and the boxplot was drawn inside the violin plot.

计算方法都在附件:https://clincancerres./content/suppl/2019/03/05/1078-0432.CCR-18-3524.DC1

我摘抄了这个英文描述,相信绝大部分人看着都会两眼摸黑:

Calculation of neoantigens

With the WES data (.bam) of paired normal samples from TNBC patients, we first used POLYSOLVER tool (8) to infer the 4-digit HLA genotype for each sample (arguments: Asian 1 hg19 STDFQ 0). Then, neoantigens were predicted using NetMHCpan (v4.0) (9), with the somatic mutation data (.maf) and HLA genotype data as the inputs. Neoantigens derived from protein coding single nucleotide variants (SNV) (Variant_Classification = “Missense_Mutation”, and Variant_Type = ''SNP”) and small insertions and deletions (Indel) (Variant_Classification = “Frame_Shift_Ins’’, ''Frame_Shift_Del’’, ''In_Frame_Ins’’, ''In_Frame_Del’’, and Variant_Type = ''INS”, “DEL”) were predicted separately. Mutations which were predicted to produce peptide with affinity < 500 nM and of which the corresponding gene was expressed greater than Combat value 1 (evaluated based on median expression rather than the specific sample) were chosen as neoantigens. We referred to pVAC-seq (10) and made some modifications based on the features of our dataset to construct this algorithm.

Calculation of cancer testis antigens (CTA)

The CTDatabase (http://www.cta./) was first queried for CTAs. We then calculated the difference in each candidate CTA between the tumor site and the paired normal site; genes whose expression were at least four times higher in the tumor site than the paired normal tissue in at least one patient were selected as TNBC-specific CTAs. In all, a total of 177 CTAs were included in our study. The CTA landscape of TNBC is described in Supplementary Figure 8.

Calculation of homologous recombination deficiency (HRD) scores

The HRD score was calculated as the sum of three scores: allelic imbalance extending to the telomere (NtAI) score, loss of heterozygosity (LOH) score and modified large-scale state transition (LSTm) score. The calculation of these scores was previously described (11). Briefly, the NtAI score was defined as the number of regions with allelic imbalance longer than 11 Mb and extending to one of the subtelomeres but do not crossing the centromere. The LOH score was defined as the number of LOH regions longer than 15 Mb but shorter than the whole chromosome. The LST score was defined as the number of break points between regions longer than 10 Mb after filtering out regions shorter than 3 Mb. In order to diminish effect of ploidy, The LST score was modified using the following formula: LSTm = LST – kP, where P is ploidy, and k is a constant of 15.5.

Estimation of intratumoral heterogeneity (ITH)

ASCAT (12) was used to integrate the copy number data with the data on somatic mutations to estimate the purity and ploidy of each tumor using default parameters. A modified PyClone workflow (13) was then used to estimate the cancer cell fractions of each sample. The fraction of subclonal cancer cells was set as indicators representing the ITH.

那么有没有捷径学这些方法呢

当然是有的,在华大和诺禾都工作过了的十多年生信项目经验的讲师手把手小班教学,你值得拥有, 《肿瘤基因组生物信息学培训班2021年唯一场次》同样的名额有限,理论上很快就招满了!

报名方式

因为本课程是肿瘤信息学专项数据分析,所以不会像之前的《基因组组装》课程那样花两个月时间铺垫Linux基础知识和python知识,也不会像生信技能树的《生信入门》课程那样集中火力于计算机基础知识的打磨,包括基于R语言的统计可视化,以及基于Linux的NGS数据处理

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多