VarSome ACMG Implementation(c) Copyright Saphetor SA. All rights reserved. version: 11.2.16, dated: Tue May 31 07:34:12 CEST 2022 IntroductionThe ”Standards and guidelines for the interpretation of sequence variants” was published in 2015 by Sue Richards et al. in their seminal paper (ACMG Guidelines), from which our implementation is derived. The standards were very much written for interpretation by humans, not machines, they assume the clinician has a deep knowledge of the domain and relevant papers and conditions. Automating these standards is a matter of interpretation, we have opted to statistically quantify terms such as “hot-spot” or “well known” resulting in many thresholds that are tuned via our calibration process. Our guiding principle throughout has been to implement the best algorithms we could, following the advice from our clinical advisors, feedback from the VarSome user community, and using statistically justified thresholds. All the rules provide clear natural language explanations of why they were triggered and which evidence was used, or indeed, a full explanation of why the criteria were not met (this is currently only visible in VarSome). We also strive to continuously improve our implementation, adjusting rules or thresholds, incorporating new data sources, and adding refinements as new publications and methodology changes are suggested. DatabasesThe VarSome automated classification processes rely on vast quantities of accurate curated data from the following databases (in no particular order). Important:depending on licensing agreements and in some cases the fees charged by source organisations, not all databases are visible to all users, and this may directly impact the completeness or quality of automated classifications. ACMG classifier
Other DatabasesVarSome also annotates variants using the following databases, although these are not currently leveraged by the automated classifications:
(Version information subject to change at any time, some databases may require a license and may not be displayed). dbNSFP Sources (non-synonymous coding SNVs)Additional sources annotated using the dbNSFP database: Functional predictions:
Conservation scores:
Gene annotation sources:
Clinical EvidenceClinical Evidence is the foundation stone of our ACMG evaluation, we currently source this from:
The VarSome options allow the user to specify a minimum number of stars to filter ClinVar, so entries with fewer stars will be ignored, or similarly disable clinical classifications from UniProt. Clinically Reported VariantsOn a daily basis, we re-annotate all the variants from the sources listed above, this data is then used for all the rules that require clinical evidence, or statistics derived thereof. The current database was last updated on version 30-May-2022 (1.56M records). For each variant we record its original “source” classification, allele frequency and coding impact. We also re-classify the variants using our implementation of the ACMG rules, with the clinical evidence rules (PS3,BS3,PP5 & BP6) disabled - this is useful in establishing how reliable the evidence might be. The strengths of rules such as PS1 and PM5 will be downgraded if a variant has been reported pathogenic but that it is not confirmed through the independent ACMG re-classification. This database is displayed in VarSome as a “lollipop graph” in the genome browser: The graph can be filtered by coding impact, or various types of null variants. Gene StatisticsThis database is derived from the clinical variants database and is also updated daily: it keeps track of how many variants are benign/pathogenic for each gene, along with their coding impacts and exon location - these are used in rules PP2 and BP1 for example. The gene statistics are displayed in the VarSome “gene” page: We derive a “benign cut-off frequency” from these variant classifications & their allele frequencies for use in rule BS1. Mode Of InheritanceA number of rules (PM2, BS2, BP1) depend on the mode of inheritance for a given gene. The following sources are used:
We compare Domino's 'Probability of Autosomal Dominant' to the following thresholds:
Note that we do not currently use the input phenotype for genes that are AD or AR for different diseases, we plan to enhance this in future. Splice-Site PredictionWe use the scSNV database for splice-site prediction. This is only available for single-nucleotide variants. We use both the 'ADA Boost Splicing' threshold (0.708) and 'Random Forest Splicing' threshold (0.515) to identify potentially splicing variants for rules BP7 and PP3. ConservationWe use PhyloP100Way for conservation tests, this is available for nearly all positions in the genome, and proves to be a very useful indication of whether a variant may be benign or pathogenic. Using conservation contributes greatly to reducing over-calling pathogenic variants, for example, the strength of rule PM2 is reduced to supporting if the conservation score is very low. Conservation is used to: Multiple thresholds have been carefully calibrated to maximise accuracy whilst not over-calling:
Transcript SelectionAll the ACMG rules are evaluated against a single transcript. Selecting this transcript is clearly of critical importance and can modify the outcome of the classification. Transcripts are prioritized according to the following criteria:
The above criteria can be overridden by users as follows:
The Ensembl Transcript Support Level (TSL) is a method to highlight the well-supported and poorly-supported transcript models for users, based on the type and quality of the alignments used to annotate the transcript. We disqualify Ensembl transcripts that have a TSL with a value different from 1. Note: some variants can be in multiple transcripts associated with multiple genes, although it is rare for a variant to be coding in multiple genes. The rules above will first determine the transcript to use, from which the gene is then derived. Allele FrequencyVarSome currently uses GnomAD exomes & genomes to evaluate allele counts and frequencies, it uses both the frequency data and the coverage data reported for both these databases. Frequencies will not be considered valid if:
Rules BA1 and BS1 will iterate through the various ethnicities to see whether the variant is common in a sub-population. Further databases such as BRAVO and TwinsUK will be incorporated in future. Rule StrengthsEach rule has a default strength recommended by ACMG, however the guidelines also allow the clinician to change the strength of a rule based on the evidence they have at their disposal. We use this option in VarSome to boost or reduce the strength of rules based on the data from the annotation.The user is completely free of course to modify this manually if they disagree using VarSome UI. Our own regression testing shows these 'variable strengths' are very useful in improving the overall accuracy of the automated classifier. More detail is provided in the documentation for the individual rules, but here are some key examples:
Modifying the rule strengths is a conscious decision we have made in order to ensure we provide the most accurate automated classification possible, but the user can very easily override the strengths provided, or even disable a rule completely using the VarSome UI. Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome front-end. ACMG VerdictRules are combined exactly as described in the ACMG Guidelines paper, with the following changes that have been implemented following advice from our clinical advisors:
CalibrationMany of the rules implemented here rely on thresholds, PM1 is a good example where defining a “hot-spot” is clearly a fuzzy measure. In practice we carefully adjust these thresholds through statistical regression against a large population of reliably curated variants. When calibrating, we disable the clinical evidence rules (PP5 and BP6) in order to ensure that the classifier works well in the absence of variant-specific evidence, and thus can be extrapolated reliably beyond the test population. The calibrations are 'fair' in that they do not over-emphasise pathogenic vs benign or uncertain variants: we simply seek to maximise overall accuracy. Saphetor reserves the right to adjust the implementation of the rules and the calibrated thresholds at any time. In practice this has allowed us to deliver continual improvements in the overall quality of our automated classification - but it also entails that results may change when re-annotating a variant several months later: methodologies, thresholds, and especially the clinical data used to calibrate them, may all have changed. Although we use machine-learning techniques to adjust the thresholds used, we do not use neural-networks in the actual classification itself. We believe it is important to have fully transparent, justifiable and explainable rules, as opposed to inscrutable black-boxes. The 'AI' aspect is also well captured in the computational evidence, DANN being a prime example of how powerful such an approach can be. Implemented RulesPVS1Null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multiexon deletion) in a gene where LOF is a known mechanism of disease. (Pathogenic, Very Strong) The rule first establishes whether this is a null variant by checking its coding impact on the transcript:
We determine that LOF is a “Known Mechanism of Disease” from either:
We reduce the strength to Strong if the variant is located:
Purely for information, a list of possible associated diseases is sourced from CGD and reported in the rule explanation. Note: rule PVS1 disables rule PM4 in order to avoid double-counting the same evidence. PS1Same amino acid change as a previously established pathogenic variant regardless of nucleotide change. (Pathogenic, Strong) This rule only applies to missense variants, it considers all possible equivalent amino acid missense variants (ie: resulting in the same amino-acid). The rule will trigger if any pathogenic variants are identified in the clinical variants database. We then check whether they are independently confirmed pathogenic using the ACMG rules, and if not will reduce the rule strength to accordingly. PS3Well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. (Pathogenic, Strong) BS3Well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing. (Benign, Strong) These two rules leverage the clinical variants database, looking for papers that refer to in-vitro or functional studies. VarSome user contributions are particularly helpful as users are asked to manually confirm the studies referred to in the paper. For papers linked by ClinVar, UniProt & MitoMap, we automatically scan the title & abstract and look for potential studies. Ultimately the papers highlighted by this rule must be reviewed by an experienced clinician. Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome front-end. PM1Located in a mutational hot spot and/or critical and well-established functional domain (e.g., active site of an enzyme) without benign variation. (Pathogenic, Moderate) This rule leverages the clinical variants database to evaluate how many missense/in-frame pathogenic variants are found in the region of the variant being classified:
The thresholds used by rule PM1 have been established through a careful calibration process and may change over time as further clinical evidence becomes available, or we refine the methodology. Note: benign variants with a frequency greater than 0.015 are excluded when counting the clinical variants database within a given domain or hot-spot. BP3In-frame deletions/insertions in a repetitive region without a known function. (Benign, Supporting) This rule is the benign counterpart to rule to PM1:
Rule BP3 will be disabled if rule PM1 triggered. PM2Absent from controls (or at extremely low frequency if recessive) in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium. (Pathogenic, Moderate) We first established the gene's mode of inheritance. The rule will trigger if the allele frequency is not found in GnomAD, with valid coverage, or:
In addition, the strength of rule PM2 is adjusted using the conservation score from PhyloP100Way:
PM4Protein length changes as a result of in-frame deletions/insertions in a non-repeat region or stop-loss variants. (Pathogenic, Moderate) This rule applies to in-frame indels or stop-loss variants that cause the length of the protein to change, but it has also been extended to cover non-coding variants that are close to a canonical splice-site.The rule will not fire if the variant is in a repeat region as reported by UniProt or by checking for short repetitive regions in the DNA itself. In order to avoid double-counting the same evidence, rule PM4 will not be applied if rule PVS1 was triggered. PM5Novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before. (Pathogenic, Moderate) This rule is a weaker version of PS1, it similarly only applies to missense variants, but considers all possible amino acid missense variants in the same codon. The rule will trigger if any pathogenic variants are identified in the clinical variants database. We then check whether they are independently confirmed pathogenic using the ACMG rules, and if not will reduce the rule strength to accordingly. PP2Missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease. (Pathogenic, Supporting) BP1Missense variant in a gene for which primarily truncating variants are known to cause disease. (Benign, Supporting) These two “variant spectrum” rules are very similar: they only apply to missense variants and leverage the gene statistics for the relevant gene:
The calibration section explains how these thresholds are established. PP3Multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact, etc.) (Pathogenic, Supporting) BP4Multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact, etc.) (Benign, Supporting) The two rules PP5 and BP4 use a very similar implementation. The in-silico prediction data-sets used are static, many are sourced from dbNSFP which covers all non-synonymous coding single-nucleotide variants, but some (DANN & CADD for example) are available for a much wider range of non-coding SNVs, and are sourced directly from the provider. Based on combined accuracy tests, we have selected the following sub-set of in-silico prediction tools:
In addition, users of VarSome Clinical and VarSome Premium will also benefit from high quality predictions from the following licensed in-silico predictors:
As more tools become available these lists will change. Some tools have far greater coverage than others, for example DANN & CADD are available for all SNVs, where most other tools are only available for non-synonymous coding SNVs. Similarly, the phyloP score is available for nearly all positions and is used to establish whether the position is conserved. Wherever possible we use the default pathogenic/benign predictions from each tool, however for some tools (DANN, SIFT, CADD and phyloP) we use internally calibrated thresholds. The algorithm considers the ratio of pathogenic predictions to the total number of in-silico predictions available for this variant, resulting in a pathogenic outcome if over 0.6). Alternatively it considers the ratio of benign predictions to the total available, and will trigger BP4 if above 0.4615. This is demonstrably more accurate than the unanimous verdict strictly required by the ACMG Guidelines. Rule PP3 is disabled if rule PVS1 was triggered in order to avoid double-counting the same evidence. Rule BP4 may trigger in conjunction with rule BP7 which allows many non-truncating synonymous variants to be classified Likely Benign. Rule BP4 explicitly checks for conservation itself rather than relying solely on the in-silico tools themselves. Splice Prediction and PVS1
ConservationStatistically, if a variant is not found in any static in-silico database, it is most likely to be pathogenic. To refine this somewhat extreme aggressive prediction, we use phyloP as a simple fall-back in the absence of any other prediction, returning a pathogenic prediction if phyloP is greater than 3.81, or benign if the variant is non-truncating and phyloP is less than 1.4 (we use a different threshold of 7.2 for strongly conserved - see conservation). PP5Reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation. (Pathogenic, Supporting) BP6Reputable source recently reports variant as benign, but the evidence is not available to the laboratory to perform an independent evaluation. (Benign, Supporting) Similarly to rules PS3 and BS3, these two rules leverage the clinical variants database to report whether the variant has been clinically reported (see clinical evidence, but without any reference to in-vitro or functional studies). The default strength for these rules is Supporting, per the ACMG Guidelines, however our implementation will use stronger rule strengths if borne out by the available evidence (see rule strengths). Whilst this may be considered not strictly in-line with the guidelines, it does allow us to ensure that critical clinical evidence is not missed. Users are remain free to manually change the strength used when reviewing the classification. In practice, we may boost the strength of the rule all the way up to 'Very Strong' if the evidence justifies it:
Note: rules PP3 or BS3 may trigger too, but the same evidence will not be counted twice. Important: we provide an option to disable all the clinical evidence rules (PS3, BS3, PP5 & BP6) in the VarSome front-end. BA1Allele frequency is >5% in Exome Sequencing Project, 1000 Genomes Project, or Exome Aggregation Consortium. (Benign, Stand Alone) Rule BA1 is applied if the allele frequency is greater than the threshold 0.05. This is in strict concordance with the ACMG Guidelines and determines a variant to be stand-alone benign for Mendelian disease. The BA1 Exceptions have also been implemented, as recommended by ClinGen. Note that rules BS1 and BS2 may trigger at much lower frequency thresholds. BS1Allele frequency is greater than expected for disorder. (Benign, Strong) Here we find the highest GnomAD allele frequency for the variant across the main population ethnicities and compare this to the benign cut-off frequency derived from the gene statistics. If there are too few known variants (fewer than 5), we use a much higher default threshold, 0.015, for rare diseases. In order to avoid double-counting, rule BS1 is not evaluated if either rules BA1 or PM2 were triggered first. Rule BS1 has been extended to trigger for non-coding variants that are far from the canonical splice-site, not predicted splicing, and the position is not conserved. BS2Observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous), or X-linked (hemizygous) disorder, with full penetrance expected at an early age. (Benign, Strong) We first determine the mode of inheritance of the gene, then compares the allele count (see allele frequency for quality checks) to the corresponding threshold:
Rule BS2 is not evaluated if rule BA1 was triggered, to avoid double-counting the same evidence, and for performance we disable BS2 if rule PM2 triggered. BP7A synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence nor the creation of a new splice site AND the nucleotide is not highly conserved. (Benign, Supporting) This rule applies to synonymous variants that are not deemed highly conserved using phyloP (see conservation). Splicing is checked as follows:
Rule BP7 will be disabled if there is strong clinical evidence to the contrary (ie: possibly a cryptic splice-site). PS2De novo (both maternity and paternity confirmed) in a patient with the disease and no family history. (Pathogenic, Strong) PM6Assumed de novo, but without confirmation of paternity and maternity. (Pathogenic, Moderate) PP1Cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease. (Pathogenic, Supporting) BS4Lack of segregation in affected members of a family. (Benign, Strong) Unimplemented RulesThe following rules are not implemented or not currently available to VarSome users - in most cases this is because the necessary data required to evaluate the rules is not in the public-domain, or the rules require patient-specific information, sometimes on a per-variant basis. Should they have more evidence, users can manually toggle rules on or off in VarSome, or adjust the strength used, and the resulting classification will be re-evaluated immediately. PS4The prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls. (Pathogenic, Strong) This rule has not been implemented. PM3For recessive disorders, detected in trans with a pathogenic variant (Pathogenic, Moderate) This rule has not been implemented. PP4Patient’s phenotype or family history is highly specific for a disease with a single genetic etiology. (Pathogenic, Supporting) This rule has not been implemented. BP2Observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern. (Benign, Supporting) This rule has not been implemented. BP5Variant found in a case with an alternate molecular basis for disease. (Benign, Supporting) This rule has not been implemented. |
|