分享

到底有多少突变致病性预测工具​

 昵称66850938 2019-10-23
判断突变是否具有致病性,有很多生物信息软件可以预测,不仅能预测coding variants,也可以对non-coding variants进行预测。考虑的因素包括进化保守性、蛋白结构和功能、序列同源性等,涉及到算法包括:贝叶斯、最大似然、隐马模型、SVM、随机森林、迭代贪婪、逻辑回归、神经网络、GBDT(Gradient Boosting Decision Tree)迭代决策树等。

1. 致病性预测: 

1.1 function prediction scores

SIFT

http://sift.

SIFT:SIFT分值表示该变异对蛋白序列的影响,包含三个值,一是SIFT 初始分值,二是转换后的值(1-SIFT),三是T或者D。当该变异同时影响多个蛋白序列时,对每条蛋白序列有一个SIFT 值,取最小值。SIFT 分值越小越“有害”,表明该SNP 导致蛋白结构或功能改变的可能性大;D: Deleterious (sift<=0.05); T: tolerated (sift>0.05));

PolyPhen2 HDIV&HVAR

http://genetics.bwh./pph2

利用PolyPhen2 基于HumanVar 数据库预测该变异对蛋白序列的影响,用于单基因遗传病。该列包含两个值,第一个是PolyPhen 2 分值,数值越大越“有害”,表明该SNP 导致蛋白结构或功能改变的可能性大;第二个是D 或P 或B(D: Probably damaging (>=0.909), P: possibly damaging (0.447<=pp2_hvar<=0.909); B: benign (pp2_hvar<=0.446));

利用PolyPhen2 基于HumanDiv 数据库预测该变异对蛋白序列的影响,用于复杂疾病。该列包含两个值,第一个是PolyPhen 2 分值,数值越大越“有害”,表明该SNP 导致蛋白结构或功能改变的可能性大;第二个是D 或P 或B(D: Probably damaging (>=0.957), P: possibly damaging(0.453<=pp2_hdiv<=0.956); B: benign (pp2_hdiv<=0.452));

MutationTaster

http://www.

Mutation Taster 分值表示该变异对蛋白序列的影响,包含三个值,一是Mutation Taster 初始分值,二是转换后的值,三是A、D、N 或者P。第二个值越大越“有害”,表明该SNP 导致蛋白结构或功能改变的可能性大,其中"A" ("disease causing automatic"); "D" ("disease causing"); "N"("polymorphism"); "P" ("polymorphism automatic")。

MutationAssessor

http:///r3/

FATHMM

http://fathmm./

ConSurf

http://consurf./2016/

PANTHER

http://www./tools/csnpScoreForm.jsp

PhD-SNP

http://snps./phd-snp/phd-snp.html

SNPs&GO

http://snps-and-go.biocomp./snps-and-go

Align GVGD

http://agvgd./agvgd_input.php

MAPP

http://mendel./SidowLab/downloads/MAPP/index.html

MutPred

http://mutpred./

PROVEAN

http://provean./index.php

nsSNPAnalyzer

http://snpanalyzer.

SNAP

http://www./services/SNAP

Pmut

http://mmb2.pcb.:8080/PMut/

LRT

http://www.genetics./jflab/

VEST

http://wiki.

GWAVA(Genome Wide Annotation of VAriants)

https://www./sanger/StatGen_Gwava

GWASCatalog (Genome Wide Association Study catalog)

http://www./gwas/

Haploreg (annotation on non-coding variants from HaploReg)

http://archive./mammals/haploreg/haploreg.php

RegulomeDB (annotations for SNPs with known and predicted regulatory elements)

http:///

CoVEC40

https:///projects/covec/files

M-CAP(Mendelian Clinically ApplicablePathogenicity)(Nat Genet.2016)

http://bejerano./MCAP

M-CAP uses gradient boosting trees, a supervised learning classifier that excels at analyzingnonlinear interactions between features, and has state-of-the-art performance in a variety of classification tasks. The features M-CAP uses for classification are based on both existing pathogenicity likelihood scores and direct measures of evolutionary conservation, the cross-species analog to frequency within the human population. We provide both 

(i) a novel method that combines amino acid conservation features with gradient boosting trees that can be applied to any variant training set 

(ii) computed scores trained on mutations linked to Mendelian diseases that can be directly used by clinicians to interpret variants of uncertain consequences.

InterVar

http://wintervar./

1.2 ensemble scores

PredictSNP

http://loschmidt.chemi./predictsnp/ 

CADD

http://cadd.gs. 

MetaLR

http://genomics./members/15-member-detail/36-coco-dong

MetaSVM

http://genomics./members/15-member-detail/36-coco-dong

CONDEL

http://bg./fannsdb/ 

dbNSFP v3.0

http://sites.google.com/site/jpopgen/dbNSFP 

KGGSeq

http://statgenpro.psychiatry./kggseq

DANN

https://cbcl.ics./public_data/DANN/

Eigen(Nat Genet. 2016)

https://xioniti01.u.hpc./TrainingTestingDatasets/TestingDatasets/Mendelian

REVEL(AJHG,2016): 针对Rare Missense Variants

https://sites.google.com/site/revelgenomics/

2. 保守性预测

PhyloP

http://compgen.bscb./phast/ 

PhastCons

http://compgen.bscb./phast/ 

GERP++

http://mendel./SidowLab/downloads/gerp/index.html 

SiPhy

http://portals./genome_bio/siphy/

3. 可变剪切预测

SPIDEX

http://www./spidex/ 

SPANR(splicing-based analysis of variants)

For each exon, the tool extracts 1393 features from proximal DNA sequence and uses a computational model to predict the percent of transcripts with the exon spliced in (PSI) for each of16 human tissues, using both the wildtype (reference genome) and mutated sequences.

http://tools.genes.

GeneSplicer

http://www.cbcb./software/GeneSplicer/gene_spl.shtml 

HumanSplicingFinder 

http://www./HSF/ 

MaxEntScan

http://genes./burgelab/maxent/Xmaxentscan_scoreseq.html

NetGene2

http://www.cbs./services/NetGene2 

NNSplice

http://www./seq_tools/splice.html 

FSPLICE

http://www./berry.phtml?topic=fsplice&group=programs&subgroup=gfind

dbscSNV 预测剪接位点附近变异的有害性

https://www./data/dbscSNV

dbscSNV includes all potential human SNVs within splicing consensus regions (-3 to +8 at the 5' splice site and -12 to +2 at the 3' splice site), i.e. scSNVs, related functional annotations and two ensemble prediction scores for predicting their potential of altering splicing

到底有多少突变致病性预测工具?

Schematic illustration of pre-mRNA splicing. 5′ Splice site and 3′ splice site are recognized by the spliceosome, and the intron is excised, and exons are spliced. The whole process is regulated by trans-acting elements such as SR proteins, heterogeneous nuclear ribonucleoproteins, and the regulatory complex. ESE, exonic splicing enhancer; ESS, exonic splicing silencer; ISE, intronic splicing enhancer; ISS, intronic splicing silencer; ss, splice site.

In silico tools for splicing defect prediction: a survey from the viewpoint of end users. Genetics in Medicine (2013) 16, 497–503 doi:10.1038/gim.2013.176

到底有多少突变致病性预测工具?

到底有多少突变致病性预测工具?

4. 人群频率

ExAC

http://exac./ 

Exome Sequencing Project (ESP)

https://esp.gs./drupal/ 

1000 Genomes 

http://www./ 

从2016-02-23开始,1000 Genomes正式由International Genome Sample Resource

(IGSR)开始维护和管理。IGSR是由EMBL-EBI成立,其目的是为了继续对千人基因组产生数据进行维护,并提供新数据和新的分析方法。IGSR由Wellcome Trust提供经费支持(grant number WT104947/Z/14/Z)。IGSR的三个主要目的是:

1.提供对1000 Genomes数据的访问支持和使用 

2.整合1000Genomes样品的其他发表数据 

3.收集1000Genomes之外的其他人群数据 

MyGene(2014年百人基因组计划) 

https://www./ 

69 Genomes Data

http://www./public-data/69-genomes/ 

COSMIC

http://cancer./cosmic 

TCGA

https://cancergenome./ 

GDC

https://gdc./

ICGC

https:///

gnomAD(包含126,216 WES +15,136WGS)

http://gnomad./

Compared to ExAC, there is the entirely new population category of Ashkenazi Jewish and greatly expanded numbers in all population sub-groups.

到底有多少突变致病性预测工具?

Wellderly (Allele frequencies from "healthy elderly" patients enrolled in the Scripps Wellderly study)

https://www./research__areas-of-research__genome-and-genomic-medicine-research__wellderly-study

UK10K (Annotates variant frequency in the UK10K low-frequency variants project)

http://www./

Human Longevity, Inc. (HLI)

最新发布10,545 human genomes,提供HLI Open Search;The 10,545 human genomes are part of the HLI’s database, which currently contains more than 30,000 high-quality genomic and phenotypic integrated health records. HLI’s goal is to have one million integrated health records in the database by 2020.

http://www./

AACR-GENIE癌症数据库

美国癌症研究协会(AACR)已公开发布迄今为止最大的癌症基因组数据库之一。该数据库包含19,000名患者的高质量基因测序数据,涵盖59种癌症类型。 它还包括了一些临床数据,有近3000名肺癌患者,2000多名乳腺癌和2000多名结肠直肠癌的信息。基因组数据和有限数量的临床数据可以通过AACR website获取到,或者通过Sage Bionetworks直接下载。

对于突变的致病性判断,除了以上的工具和数据库之外,还有ClinVar、OMIM、MedGen、 Orphanet、DisGeNet、HGMD、DoCMSwissVar 、LOVDGRASP等疾病数据库可供参考。同时也应参照ACMG、NCCN、EMSO、ASCO、CAP、AMP等变异解读标准、临床实践指南等。Uniport对位点结构域功能预测很重要。ICGC链接信号通路值得借鉴。

5. 蛋白相互作用数据库

STRING 

http:///

I2D

http://ophid./ophidv2.204/

Mentha

http://mentha./about.php

iReflndex

ftp://ftp.no.embnet.org/irefindex

PINA

http://cbg.garvan./pina/

HINT

http://hint./

InWeb_IM

https://www./inbio/map/

6. 蛋白质三级结构预测

MODELLER 

http://msg./local/programs/modeller/manual.html

ModWeb

https://modbase.compbio./modweb/

SWISS-MODEL

https://swissmodel./

7. 结构变异数据库

DGV

http://dgv./dgv/app/home

dbVar

https://www.ncbi.nlm./dbvar

NA12878(尚未发表的基因组参考序列,价值不容忽视)和华夏一号

8. 其他数据库

专利数据库:  Espacenet

临床试验:    ClinicalTrials、CTR(国内部分临床试验,结果一般)

药物数据库: DrugBank、PharmGKB、FDA、CFDA、DGIdbcanSARSelleck(抑制剂类查询的专属)、Drug Information portal(药物信息数据库

GTExEqtl :  eQTL information based on GTEx project

Interpro :  protein domain based on InterPro database

化合物-蛋白质作用网数据库:STITCH

癌症药物研发canSAR数据库自2011年以来已运行5年,旨在使用“大数据”来详细描述人类分子的行为。该数据库已包含数十亿的试验数据,反映了上百万种药物或蛋白对人类的影响,同时该数据库还附加临床试验的基因组数据信息。《Nucleic Acids Research》杂志上描述了对canSAR数据库的更新,其中最为显著的是添加了错误蛋白的3D结构以及癌症通讯网络图谱,其他的改进包括添加了更好的浏览器和搜索工具、新的疾病摘要和癌症列表,并增加了批量分析工具。目前该数据拥有近110000个致癌分子的300万个小生境的3D结构。该数据库的另一个新特点是科学家可识别肿瘤内的通信路径。

9. 临床参考术语

SNOMED CT (Systematized Nomenclature of Medicine,Clinical Terms,临床医学系统术语)、

LONIC (Logical Observation Identifiers Names and Codes,观测指标标识符逻辑命名与编码系统)、

CHPO (The Chinese Human Phenotype Ontology,中文人类表型术语集)、

UMLS (Unified Medical Language System)

MeSH (Medical Subject Headings)

ICD10 (international Classification of diseases ,ICD)

SNOMED CT是国际通用的临床术语及医学本体体系,对于临床数据的语义层面共享、结构化、数据分析及临床决策支持等方面具有重要的意义。

LONIC数据库旨在促进临床观测指标结果的交换与共享,LOINC术语涉及用于临床医疗护理、结局管理和临床研究等目的的各种临床观测指标。张林医师通过多年的辛勤工作,完成了LOINC中文版的翻译,目前在国内各医院、检验机构、私立医疗集团等体系内部作为检验数据的术语标准被广泛应用。 

HPO有助于临床医生以标准化的医学名词和术语来描述罕见病患者的表型,不仅有利于诊断疾病,确定致病基因,还能帮助研究人员寻找疾病与特定表型之间的关系。以顾卫红教授为主导的团队完成了HPO的中文版翻译,为中文临床表型的梳理提供了核心的基础。

ps:

NIGMS

NINDS

NHGRI

NIA

CIRM

CHDI

Stem Cells

HapMap

1000 Genomes

CEPH Resources

NEI-AREDS

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多