VarSome ACMG Implementation

生物_医药_科研 2022-06-08 发布于江苏

展开全文

VarSome ACMG Implementation

version: 11.2.16, dated: Tue May 31 07:34:12 CEST 2022

Introduction

The ”Standards and guidelines for the interpretation of sequence variants” was published in 2015 by Sue Richards et al. in their seminal paper (ACMG Guidelines), from which our implementation is derived.

The standards were very much written for interpretation by humans, not machines, they assume the clinician has a deep knowledge of the domain and relevant papers and conditions. Automating these standards is a matter of interpretation, we have opted to statistically quantify terms such as “hot-spot” or “well known” resulting in many thresholds that are tuned via our calibration process.

Our guiding principle throughout has been to implement the best algorithms we could, following the advice from our clinical advisors, feedback from the VarSome user community, and using statistically justified thresholds. All the rules provide clear natural language explanations of why they were triggered and which evidence was used, or indeed, a full explanation of why the criteria were not met (this is currently only visible in VarSome).

We also strive to continuously improve our implementation, adjusting rules or thresholds, incorporating new data sources, and adding refinements as new publications and methodology changes are suggested.

Databases

The VarSome automated classification processes rely on vast quantities of accurate curated data from the following databases (in no particular order).

Important:depending on licensing agreements and in some cases the fees charged by source organisations, not all databases are visible to all users, and this may directly impact the completeness or quality of automated classifications.

ACMG classifier

UniProt Variants, provided by UNIPROT, version 15-Mar-2022 (82.7k records)
UniProt Regions, provided by UNIPROT, version 15-Mar-2022 (206k records)
RefSeq, provided by NCBI, version 210
phyloP100way, provided by CSH, version 13-Apr-2021 (3.14G records)
dbNSFP-p, provided by dbNSFP, version 4.2
dbscSNV, provided by dbNSFP, version v1.1 (15.0M records)
dbNSFP genes, provided by dbNSFP, version 4.2 (21.3k records)
dbNSFP-c, provided by dbNSFP, version 4.2 (82.8M records)
DANN SNVs, provided by UCI, using version 2014 (9.41G records) for hg19, unavailable for hg38
ClinVar, provided by NCBI, version 13-May-2022 (1.45M records)
CADD, provided by UW, version 1.6
CGD, provided by NHGRI, version 15-Mar-2022 (4.38k records)
Domino, provided by UNIL, version 04-Sep-2019 (17.9k records)
Ensembl, provided by EMBL, version 105
gnomAD exomes, provided by Broad, using version 2.1.1 (17.2M records) for hg19, and using version 2.1.1 (17.2M records) for hg38
gnomAD exomes coverage, provided by Broad, using version 2.1 (59.6M records) for hg19, unavailable for hg38
gnomAD gene constraints, provided by Broad, version 2.1 (19.6k records)
gnomAD genomes, provided by Broad, using version 2.1.1 (262M records) for hg19, and using version 3.1.1 (759M records) for hg38
gnomAD genomes coverage, provided by Broad, using version 2.1 (3.14G records) for hg19, and using version 3.0 (3.21G records) for hg38
gnomAD Mitochondrial, provided by Broad, unavailable for hg19version 3.1 (18.2k records) for hg38
HGNC, provided by HUGO, version 27-Feb-2022
HPO, version 17-Apr-2022
Mitomap, provided by CHOP, version 15-Feb-2022 (38.4k records)
Papers & classifications contributed by the VarSome community.

Other Databases

VarSome also annotates variants using the following databases, although these are not currently leveraged by the automated classifications:

Semantic Scholar, provided by Allen Institute
TP53 Somatic, provided by IARC, version release 20 (2.45k records)
TP53 Germline, provided by IARC, version release 20 (436 records)
DGV, provided by TCAG, version 30-Jun-2021 (792k records)
Cancer Gene Census, provided by Sanger, version v95
Pub Med, provided by NCBI
The Human Protein Atlas, provided by KAW, version 07-Sep-2021 (19.3k records)
PMKB, provided by Weill Cornell Medicine, version 19-Aug-2021 (161 records)
phastCons100way, provided by CSH, version 14-Apr-2021 (3.14G records)
DECIPHER, provided by Sanger, version 17-Apr-2022 (26.8k records)
dbVar, provided by NCBI, version 23-Feb-2022 (1.86M records)
dbSNP, provided by NCBI, version build 154 (774M records)
DailyMed, provided by NIH, version 03-Sep-2021
CPIC Genes-Drugs, provided by CPIC, version 17-May-2022
Cosmic Licensed, provided by Sanger, version v95
ClinVar CNVs, provided by NCBI, version 13-May-2022 (64.4k records)
ClinGen Regions, provided by NIH, version 23-Feb-2022 (504 records)
Bravo, provided by UMICH, using version Freeze5 (25.5M records) for hg19, and using version Freeze8 (75.5M records) for hg38
AACT, provided by CTTI, version 17-May-2022
CancerHotspots, provided by MSK, version 10-Sep-2021 (2.25M records)
cBioPortal, provided by MSK, version 23-Mar-2022 (11.1M records)
CIViC, provided by WUSTL, version 17-May-2022 (843 records)
CKB, provided by JAX, version 17-May-2022
ClinGen, provided by NIH, version 23-Feb-2022 (1.46k records)
ClinGen CNVs, provided by NIH, version 23-Feb-2022 (156 records)
ClinGen Disease Validity, provided by NIH, version 23-Feb-2022 (1.25k records)
DoCM, provided by WUSTL
DGI, provided by WUSTL, version 09-Sep-2021
EMA Approved Drugs, provided by EMA, version 03-Sep-2021
ExacCNV, provided by Broad, using version 01-Jul-2021 (49.3k records) for hg19, and using version 20180227 (48.6k records) for hg38
ExAC genes, provided by Broad, version 18-Sep-2018 (18.3k records)
FDA Approved Drugs, provided by FDA, version 03-Sep-2021
Pharmacogenomic Biomarkers, provided by FDA, version 16-Sep-2021
FusionGDB, provided by UTexas, version 19-Nov-2021 (15.6k records)
GDC, provided by NIH, version 13-Dec-2021 (2.70M records)
GenCC, version 23-Feb-2022 (4.57k records)
gene2phenotype, provided by EBI, version 23-Feb-2022 (2.65k records)
GERP, using version 2010 (2.60G records) for hg19, unavailable for hg38
GHR Genes, provided by NLM, version 17-May-2022 (1.49k records)
gnomAD structural variants, provided by Broad, version 30-Jun-2021 (334k records)
GTEx, provided by NIH, version v8 (313k records)
GWAS Catalog, provided by EBI, version 17-Apr-2022 (359k records)
ICGC somatic, provided by ICGC, using version release 28 (81.7M records) for hg19, unavailable for hg38
kaviar3, provided by ISB, version 4-Feb-2016 (83.3M records)
Mastermind, provided by Genomenon, version 2022.1.22 (14.8M records)
Mondo, provided by Monarch, version 17-Apr-2022
OncoTree, provided by MSK, version 01-Sep-2021
PanelApp, provided by Genomics England, version 17-May-2022
PharmGKB, version 15-Mar-2022

(Version information subject to change at any time, some databases may require a license and may not be displayed).

dbNSFP Sources (non-synonymous coding SNVs)

Additional sources annotated using the dbNSFP database:

Functional predictions:

ALoFT
BayesDel
DEOGEN2
Eigen
Eigen-PC
FATHMM
FATHMM-XF
FATHMM-MKL
fitCons
LIST-S2
LRT
M-CAP
MetaLR
MetaRNN
MetaSVM
MPC
MutationAssessor
MutationTaster
MutPred
MVP
Polyphen-2
PrimateAI
PROVEAN
REVEL
SIFT
SIFT4G

Conservation scores:

bStatistic
phastCons100way Vertebrate
phastCons30way Mammalian
phastCons17way Primate
phyloP100way Vertebrate
phyloP30way Mammalian
phyloP17way Primate
SiPhy

Gene annotation sources:

BioCarta
Consensus
egenetics
Essential Genes
GDI
Gene Ontology
GHIS
GNF/Atlas
HIPred
KEGG
LoFTool
P(HI) Score
P(rec) Score
RVIS
UniProt Genes

Clinical Evidence

Clinical Evidence is the foundation stone of our ACMG evaluation, we currently source this from:

ClinVar
UniProt
MitoMap
Publications linked by VarSome users
VarSome user classifications

The VarSome options allow the user to specify a minimum number of stars to filter ClinVar, so entries with fewer stars will be ignored, or similarly disable clinical classifications from UniProt.

Clinically Reported Variants

On a daily basis, we re-annotate all the variants from the sources listed above, this data is then used for all the rules that require clinical evidence, or statistics derived thereof.

The current database was last updated on version 30-May-2022 (1.56M records).

For each variant we record its original “source” classification, allele frequency and coding impact. We also re-classify the variants using our implementation of the ACMG rules, with the clinical evidence rules (PS3,BS3,PP5 & BP6) disabled - this is useful in establishing how reliable the evidence might be. The strengths of rules such as PS1 and PM5 will be downgraded if a variant has been reported pathogenic but that it is not confirmed through the independent ACMG re-classification.

This database is displayed in VarSome as a “lollipop graph” in the genome browser:

The graph can be filtered by coding impact, or various types of null variants.

Gene Statistics

This database is derived from the clinical variants database and is also updated daily: it keeps track of how many variants are benign/pathogenic for each gene, along with their coding impacts and exon location - these are used in rules PP2 and BP1 for example.

The gene statistics are displayed in the VarSome “gene” page:

We derive a “benign cut-off frequency” from these variant classifications & their allele frequencies for use in rule BS1.

Mode Of Inheritance

A number of rules (PM2, BS2, BP1) depend on the mode of inheritance for a given gene.

The following sources are used:

CGD: this is the primary source of inheritance data, and covers modes such as X-Linked or Y-Linked.
ClinGen, gene2Phenotype, GenCC: these are used as supplementary sources to CGD depending on the provided expert panel verdict.
Domino: this source is only used for genes that do not have any other entry. This is a high quality in-silico prediction tool, as detailed in PMID:5630195.

We compare Domino's 'Probability of Autosomal Dominant' to the following thresholds:

Dominant: if more than 0.5934,
Recessive: if less than 0.3422.AD/AR: if in between 0.3422 and 0.5934.

Note that we do not currently use the input phenotype for genes that are AD or AR for different diseases, we plan to enhance this in future.

Splice-Site Prediction

We use the scSNV database for splice-site prediction. This is only available for single-nucleotide variants. We use both the 'ADA Boost Splicing' threshold (0.708) and 'Random Forest Splicing' threshold (0.515) to identify potentially splicing variants for rules BP7 and PP3.

Conservation

We use PhyloP100Way for conservation tests, this is available for nearly all positions in the genome, and proves to be a very useful indication of whether a variant may be benign or pathogenic. Using conservation contributes greatly to reducing over-calling pathogenic variants, for example, the strength of rule PM2 is reduced to supporting if the conservation score is very low.

Conservation is used to:

Adjust the strength of rules PM2 and PM4.
Exclude highly-conserved variants from rules BP4 and BP7.
As a last-resort fallback for rules PP3 and BP4 if no in-silico predictions are available.

Multiple thresholds have been carefully calibrated to maximise accuracy whilst not over-calling:

Likely Benign: if the score is less than 1.4, this is used for in-silico predictions BP4, and rules PM4 and BS1.
Likely Pathogenic: if the score is greater than 3.81, this is used for in-silico predictions PP3, and rule PM4.
Conserved: if the score is greater than 6.8, used by rules PM2 and PM4.
Highly Conserved: if the score is greater than 7.2, used by rules BP4, BP7, PM2 and PM4.

Transcript Selection

All the ACMG rules are evaluated against a single transcript. Selecting this transcript is clearly of critical importance and can modify the outcome of the classification. Transcripts are prioritized according to the following criteria:

Most severe coding impact, or within +/- 2 bases of the splicing site,
Canonical,
Longest transcript.

The above criteria can be overridden by users as follows:

Selecting a different transcript in the VarSome UI.
Configuring transcripts to be used for specific genes in VarSome Clinical.

The Ensembl Transcript Support Level (TSL) is a method to highlight the well-supported and poorly-supported transcript models for users, based on the type and quality of the alignments used to annotate the transcript. We disqualify Ensembl transcripts that have a TSL with a value different from 1.

Note: some variants can be in multiple transcripts associated with multiple genes, although it is rare for a variant to be coding in multiple genes. The rules above will first determine the transcript to use, from which the gene is then derived.

Allele Frequency

VarSome currently uses GnomAD exomes & genomes to evaluate allele counts and frequencies, it uses both the frequency data and the coverage data reported for both these databases.

Frequencies will not be considered valid if:

Coverage is less than 20,
the Allele Number is less than 1000,
the GnomAD quality filter is suspect (ie: not PASS).

Rules BA1 and BS1 will iterate through the various ethnicities to see whether the variant is common in a sub-population.

Further databases such as BRAVO and TwinsUK will be incorporated in future.

Rule Strengths

Each rule has a default strength recommended by ACMG, however the guidelines also allow the clinician to change the strength of a rule based on the evidence they have at their disposal. We use this option in VarSome to boost or reduce the strength of rules based on the data from the annotation.The user is completely free of course to modify this manually if they disagree using VarSome UI. Our own regression testing shows these 'variable strengths' are very useful in improving the overall accuracy of the automated classifier.

More detail is provided in the documentation for the individual rules, but here are some key examples:

PVS1: the strength is reduced to 'Strong' for variants in the 3' UTR or close to the end of the protein.
clinical evidence rules PP5 and BP6: here we may significantly boost the default strength from 'Supporting' to 'Very Strong' if the evidence justifies it. We do this to ensure that 'Expert Panel' or 'Practice Guideline' variants from ClinVar are correctly highlighted and classified, or similarly to highlight publications linked by VarSome users.
PS1 and PM5: we reduce the strength to 'Moderate' or 'Supporting' respectively if the alternative amino-acid variant reported at the same position has not been independently confirmed as pathogenic using the ACMG rules (with clinical evidence disabled).
PM1: we boost the strength to 'Strong' if the variant is located in a particularly dense mutation hot-spot.
PP3: we exceptionally boost the strength to 'Strong' if the variant is predicted splicing and rule PVS1 was not triggered.

Modifying the rule strengths is a conscious decision we have made in order to ensure we provide the most accurate automated classification possible, but the user can very easily override the strengths provided, or even disable a rule completely using the VarSome UI.

Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome front-end.

ACMG Verdict

Rules are combined exactly as described in the ACMG Guidelines paper, with the following changes that have been implemented following advice from our clinical advisors:

In the absence of any pathogenic evidence, a single strong benign rule (for example BS1, BS2) is sufficient to trigger a 'Likely Benign' verdict. This is justified by PMID:29300386 and we have been advised that it will be included in the ACMG guidelines in future.
Enabling the use of 'Moderate' strength evidence in benign rules, triggering a 'Likely Benign' verdict: this has been implemented in a manner that is fully consistent with the original guidelines.

Calibration

Many of the rules implemented here rely on thresholds, PM1 is a good example where defining a “hot-spot” is clearly a fuzzy measure. In practice we carefully adjust these thresholds through statistical regression against a large population of reliably curated variants. When calibrating, we disable the clinical evidence rules (PP5 and BP6) in order to ensure that the classifier works well in the absence of variant-specific evidence, and thus can be extrapolated reliably beyond the test population. The calibrations are 'fair' in that they do not over-emphasise pathogenic vs benign or uncertain variants: we simply seek to maximise overall accuracy.

Saphetor reserves the right to adjust the implementation of the rules and the calibrated thresholds at any time. In practice this has allowed us to deliver continual improvements in the overall quality of our automated classification - but it also entails that results may change when re-annotating a variant several months later: methodologies, thresholds, and especially the clinical data used to calibrate them, may all have changed.

Although we use machine-learning techniques to adjust the thresholds used, we do not use neural-networks in the actual classification itself. We believe it is important to have fully transparent, justifiable and explainable rules, as opposed to inscrutable black-boxes. The 'AI' aspect is also well captured in the computational evidence, DANN being a prime example of how powerful such an approach can be.

Implemented Rules

PVS1

Null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multiexon deletion) in a gene where LOF is a known mechanism of disease. (Pathogenic, Very Strong)

The rule first establishes whether this is a null variant by checking its coding impact on the transcript:

nonsense variant
frameshift variant
exon deletion variant
intronic variant within ±2 bases of the transcript splice site
start loss variant.

We determine that LOF is a “Known Mechanism of Disease” from either:

The gene statistics: if at least 4 LOF variants in this gene have been reported as pathogenic.
GnomAD gene constraints LOF Observed/Expected is less than 0.7555.

We reduce the strength to Strong if the variant is located:

in the 3' UTR
in the last exon, and would remove less than 1.93% of the protein length.

Purely for information, a list of possible associated diseases is sourced from CGD and reported in the rule explanation.

Note: rule PVS1 disables rule PM4 in order to avoid double-counting the same evidence.

PS1

Same amino acid change as a previously established pathogenic variant regardless of nucleotide change. (Pathogenic, Strong)

This rule only applies to missense variants, it considers all possible equivalent amino acid missense variants (ie: resulting in the same amino-acid). The rule will trigger if any pathogenic variants are identified in the clinical variants database. We then check whether they are independently confirmed pathogenic using the ACMG rules, and if not will reduce the rule strength to accordingly.

PS3

Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. (Pathogenic, Strong)

BS3

Well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing. (Benign, Strong)

These two rules leverage the clinical variants database, looking for papers that refer to in-vitro or functional studies. VarSome user contributions are particularly helpful as users are asked to manually confirm the studies referred to in the paper. For papers linked by ClinVar, UniProt & MitoMap, we automatically scan the title & abstract and look for potential studies.

Ultimately the papers highlighted by this rule must be reviewed by an experienced clinician.

Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome front-end.

PM1

Located in a mutational hot spot and/or critical and well-established functional domain (e.g., active site of an enzyme) without benign variation. (Pathogenic, Moderate)

This rule leverages the clinical variants database to evaluate how many missense/in-frame pathogenic variants are found in the region of the variant being classified:

Hot-Spot: using a region of 25 base-pairs on either side of the variant, the rule checks that there are at least 5 pathogenic variants (only using missense and inframe-indel variants), then weighs them by distance to compute a “proximity score”. The rule triggers with strength supporting, moderate or strong depending on the proximity and density of pathogenic and benign variants located within the hot-spot.
Protein Domains: if the variant is within a functional domain reported by UniProt, the rule tallies all the clinically reported missense/in-frame variants within the domain. It checks that the domain contains at least 2 pathogenic variants, and then triggers with strength supporting or strong based on the number of pathogenic, uncertain & benign variants reported within the domain.

The thresholds used by rule PM1 have been established through a careful calibration process and may change over time as further clinical evidence becomes available, or we refine the methodology.

Note: benign variants with a frequency greater than 0.015 are excluded when counting the clinical variants database within a given domain or hot-spot.

BP3

In-frame deletions/insertions in a repetitive region without a known function. (Benign, Supporting)

This rule is the benign counterpart to rule to PM1:

it uses UniProt to ensure the variant isn't in a known functional domain
it checks whether the variant is in a repeat region,
it further checks whether the variant is in a region of low-conservation (PhyloP100Way less than 1.4),
lastly it checks whether there are any known pathogenic variants in the region considered.

Rule BP3 will be disabled if rule PM1 triggered.

PM2

Absent from controls (or at extremely low frequency if recessive) in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium. (Pathogenic, Moderate)

We first established the gene's mode of inheritance.

The rule will trigger if the allele frequency is not found in GnomAD, with valid coverage, or:

For dominant genes (including X-Linked and AD/AR) we check that the allele count is less than 5.
For recessive genes (AR): the rule will trigger if the homozygous allele count is less than 3. Alternatively we use the ACMG standard rule which fires if the allele frequency is less than 0.0001 (see ACMG Guidelines), however this is a more conservative threshold and our tests show it results in too many false negatives.

In addition, the strength of rule PM2 is adjusted using the conservation score from PhyloP100Way:

'supporting' if the variant does not alter the protein length and the position is not conserved (PhyloP < 1.4),
'strong' if the postion is strongly conserved (PhyloP > 7.2).

PM4

Protein length changes as a result of in-frame deletions/insertions in a non-repeat region or stop-loss variants. (Pathogenic, Moderate)

This rule applies to in-frame indels or stop-loss variants that cause the length of the protein to change, but it has also been extended to cover non-coding variants that are close to a canonical splice-site.The rule will not fire if the variant is in a repeat region as reported by UniProt or by checking for short repetitive regions in the DNA itself.

In order to avoid double-counting the same evidence, rule PM4 will not be applied if rule PVS1 was triggered.

PM5

Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before. (Pathogenic, Moderate)

This rule is a weaker version of PS1, it similarly only applies to missense variants, but considers all possible amino acid missense variants in the same codon. The rule will trigger if any pathogenic variants are identified in the clinical variants database. We then check whether they are independently confirmed pathogenic using the ACMG rules, and if not will reduce the rule strength to accordingly.

PP2

Missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease. (Pathogenic, Supporting)

BP1

Missense variant in a gene for which primarily truncating variants are known to cause disease. (Benign, Supporting)

These two “variant spectrum” rules are very similar: they only apply to missense variants and leverage the gene statistics for the relevant gene:

PP2 checks that the ratio of pathogenic missense variants over all non-VUS missense variants is greater than 0.4017, with a secondary requirement that the ratio of pathogenic variants over all clinically reported variants is greater than 0.3519,
BP1 conversely checks that the ratio of benign missense variants over all non-VUS missense variants is greater than 0.825, with a secondary requirement that the ratio of benign variants over all clinically reported variants is greater than 0.7431.

The calibration section explains how these thresholds are established.

PP3

Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc.) (Pathogenic, Supporting)

BP4

Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc.) (Benign, Supporting)

The two rules PP5 and BP4 use a very similar implementation. The in-silico prediction data-sets used are static, many are sourced from dbNSFP which covers all non-synonymous coding single-nucleotide variants, but some (DANN & CADD for example) are available for a much wider range of non-coding SNVs, and are sourced directly from the provider.

Based on combined accuracy tests, we have selected the following sub-set of in-silico prediction tools:

BayesDel with Max Allele frequency
DANN (VarSome Free only, Premium & Clinical customers benefit from CADD)
DEOGEN2
FATHMM-MKL
LIST-S2
M-CAP
Mutation Assessor
Mutation Taster
Primate AI
MVP
EIGEN
SIFT
scSNV Splice-site prediction
phyloP: a simple conservation test is used if no other data is available.

In addition, users of VarSome Clinical and VarSome Premium will also benefit from high quality predictions from the following licensed in-silico predictors:

CADD (instead of DANN)
Cosmic.FATTHM
Polyphen 2

As more tools become available these lists will change. Some tools have far greater coverage than others, for example DANN & CADD are available for all SNVs, where most other tools are only available for non-synonymous coding SNVs. Similarly, the phyloP score is available for nearly all positions and is used to establish whether the position is conserved.

Wherever possible we use the default pathogenic/benign predictions from each tool, however for some tools (DANN, SIFT, CADD and phyloP) we use internally calibrated thresholds.

The algorithm considers the ratio of pathogenic predictions to the total number of in-silico predictions available for this variant, resulting in a pathogenic outcome if over 0.6). Alternatively it considers the ratio of benign predictions to the total available, and will trigger BP4 if above 0.4615. This is demonstrably more accurate than the unanimous verdict strictly required by the ACMG Guidelines.

Rule PP3 is disabled if rule PVS1 was triggered in order to avoid double-counting the same evidence.

Rule BP4 may trigger in conjunction with rule BP7 which allows many non-truncating synonymous variants to be classified Likely Benign. Rule BP4 explicitly checks for conservation itself rather than relying solely on the in-silico tools themselves.

Splice Prediction and PVS1

If rule PVS1 did not fire, rule PP3 will be triggered with strength strong if the variant is predicted splicing (see splice-site prediction, this could be a cryptic splice-site for example)
This will override any benign predictions from other sources (we have observed that most in-silico tools are poor at identifying potential splice-sites).
Rule BP4 is disabled if the variant is both PVS1 & predicted-splicing, this refinement allows us to correctly classify a small number of edge-case splicing variants.

Conservation

Statistically, if a variant is not found in any static in-silico database, it is most likely to be pathogenic. To refine this somewhat extreme aggressive prediction, we use phyloP as a simple fall-back in the absence of any other prediction, returning a pathogenic prediction if phyloP is greater than 3.81, or benign if the variant is non-truncating and phyloP is less than 1.4 (we use a different threshold of 7.2 for strongly conserved - see conservation).

PP5

Reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation. (Pathogenic, Supporting)

BP6

Reputable source recently reports variant as benign, but the evidence is not available to the laboratory to perform an independent evaluation. (Benign, Supporting)

Similarly to rules PS3 and BS3, these two rules leverage the clinical variants database to report whether the variant has been clinically reported (see clinical evidence, but without any reference to in-vitro or functional studies).

The default strength for these rules is Supporting, per the ACMG Guidelines, however our implementation will use stronger rule strengths if borne out by the available evidence (see rule strengths). Whilst this may be considered not strictly in-line with the guidelines, it does allow us to ensure that critical clinical evidence is not missed. Users are remain free to manually change the strength used when reviewing the classification.

In practice, we may boost the strength of the rule all the way up to 'Very Strong' if the evidence justifies it:

ClinVar

We use Very Strong if 'practice guideline' (4 stars), or 'reviewed by expert panel' (3 stars),
Moderate if consistent submissions from multiple sources = 2 stars,

VarSome user-linked publications: we consider the number of entries & linked publications.
Multiple sources consistently confirming the classification: here we assume these sources are truly independent.

Note: rules PP3 or BS3 may trigger too, but the same evidence will not be counted twice.

Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome front-end.

BA1

Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium. (Benign, Stand Alone)

Rule BA1 is applied if the allele frequency is greater than the threshold 0.05. This is in strict concordance with the ACMG Guidelines and determines a variant to be stand-alone benign for Mendelian disease.

The BA1 Exceptions have also been implemented, as recommended by ClinGen.

Note that rules BS1 and BS2 may trigger at much lower frequency thresholds.

BS1

Allele frequency is greater than expected for disorder. (Benign, Strong)

Here we find the highest GnomAD allele frequency for the variant across the main population ethnicities and compare this to the benign cut-off frequency derived from the gene statistics. If there are too few known variants (fewer than 5), we use a much higher default threshold, 0.015, for rare diseases.

In order to avoid double-counting, rule BS1 is not evaluated if either rules BA1 or PM2 were triggered first.

Rule BS1 has been extended to trigger for non-coding variants that are far from the canonical splice-site, not predicted splicing, and the position is not conserved.

BS2

Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age. (Benign, Strong)

We first determine the mode of inheritance of the gene, then compares the allele count (see allele frequency for quality checks) to the corresponding threshold:

recessive or X-linked genes: allele count greater than 3,
dominant genes: allele count greater than 5.

Rule BS2 is not evaluated if rule BA1 was triggered, to avoid double-counting the same evidence, and for performance we disable BS2 if rule PM2 triggered.

BP7

A synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence nor the creation of a new splice site AND the nucleotide is not highly conserved. (Benign, Supporting)

This rule applies to synonymous variants that are not deemed highly conserved using phyloP (see conservation).

Splicing is checked as follows:

the variant is found more than 2 bases away from the next splice site,
it isn't predicted splicing using splice-site prediction.

Rule BP7 will be disabled if there is strong clinical evidence to the contrary (ie: possibly a cryptic splice-site).

PS2

De novo (both maternity and paternity confirmed) in a patient with the disease and no family history. (Pathogenic, Strong)

PM6

Assumed de novo, but without confirmation of paternity and maternity. (Pathogenic, Moderate)

PP1

Cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease. (Pathogenic, Supporting)

BS4

Lack of segregation in affected members of a family. (Benign, Strong)

Unimplemented Rules

The following rules are not implemented or not currently available to VarSome users - in most cases this is because the necessary data required to evaluate the rules is not in the public-domain, or the rules require patient-specific information, sometimes on a per-variant basis. Should they have more evidence, users can manually toggle rules on or off in VarSome, or adjust the strength used, and the resulting classification will be re-evaluated immediately.

PS4

The prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls. (Pathogenic, Strong)

This rule has not been implemented.

PM3

For recessive disorders, detected in trans with a pathogenic variant (Pathogenic, Moderate)

This rule has not been implemented.

PP4

Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology. (Pathogenic, Supporting)

This rule has not been implemented.

BP2

Observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern. (Benign, Supporting)

This rule has not been implemented.