With the advancements in high-throughput next generation sequencing (NGS), sequencing has become more affordable and faster. The applicability of NGS at gene panel, exome and genome levels makes it a very versatile and robust option in clinical testing. However, this method comes with some challenges. The large amount of data produced by NGS is not always straightforward to analyze, and proper analysis and interpretation of clinically significant variants is the key for an accurate clinical diagnosis. In this blog letter, we are going to explain the standards and guidelines adopted by genomize-Seq for sequence variant interpretation.
genomize-Seq employs a six-tier variant classification system using the terms ‘pathogenic’, ‘likely pathogenic’, ‘uncertain significance’, ‘likely benign’, ‘benign’ and ‘pathogenic with family segregation data’: while the first five terms are already used by a majority of clinics and databases, the last term is coined by genomize and explained in further in this letter. The classification of a variant should be done based on how it affects the normal function of the gene or the protein. The explanations for variant classifications are listed in Table 1.
Table 1 – The explanations for variant classification
While assigning a pathogenicity category to a sequence variant, genomize-Seq uses the standards recommended by the American College of Medical Genetics (ACMG) and Genomics and the Association for Molecular Pathology(1). These standards enable the integration of all available information on a variant while weighing the information, and produce a consensus pathogenicity value. The frequency of the variant in population databases such as 1000Genomes, ESP6500 and ExAC; functional and experimental data on the variant from literature; computational prediction results such as SIFT, PolyPhen and MutationTaster; the existence of the variant in clinically relevant databases such as ClinVar and segregation data are some of the variant evidence types that are used for variant classification. However, the weights of all these evidence types are not the same. For example, functional data has more distinctive information about the pathogenicity of a variant than computational data. Also, while it is possible to say a variant observed with a high frequency in healthy populations can be directly classified as benign, it cannot be directly concluded that a low frequency is the indicator of a pathogenic variant. Therefore, a classification that weighs evidence types is also done. We assign a evidence code for each evidence and calculate the final pathogenicity as the sum of all these evidence codes. The evidence codes are listed in Table 2.
Table 2 – Criteria and evidence codes for classifying variants
While the evidence codes starting with PVS, PS, PM and PP are indicators of a pathogenic variant, evidence codes starting with BA, BS and BP show that the variant is benign. The weighing is done previously while determining which evidence codes are given to which kind of data by ACMG: the evidence codes PVS and BA represent very strong evidence, the codes PS and BS means strong evidence, PM is moderate evidence and finally the codes PP and BP signifies supporting evidence. The weight of the codes descend in the order of very strong, strong, moderate and supporting. The rules for combining evidence codes are also taken from the ACMG paper1 and shown in Table 3. As genomize, we added an extra pathogenicity category, PF (The last category in Table 3), which is applicable in the cases when a missense variant has low frequency and is reported pathogenic by computational evidence (SIFT, PolyPhen, MutationTaster), or when the variant is an inframe insertion/deletion and has low frequency; however, there is not any other kind of evidence (functional data, literature support) present. These cases are supportive of a pathogenic variant; yet, are not enough to classify the variant as pathogenic. When this is the case, we categorize the variant as PF (Pathogenic with Family Segregation Data) and recommend the user to check family segregation data. This category prevents missense and inframe variants without any prior functional or clinical data to be directly categorized as variants of unknown significance (VUS), and marks them as variants in need of deeper analysis. As a result of family segregation data analysis, if the variant is not present in the healthy family members and present in affected family members (segregation: ACMG evidence code PP1) or if the variant is de-novo in the affected individual (ACMG evidence codes PS2 or PM6, the variant is not present in the parents and present in the affected child), PF variants can be considered as pathogenic. While ACMG does not suggest a concrete metric for segregation, it encourages users to further question pathogenicity of the variant depending on the extent of segregation observed in the family.
Table 2 – Criteria and evidence codes for classifying variants
Two Example Cases
Let’s find out the pathogenicity class of MEFV variant R202Q with rsid rs224222 shown in Figure 1. The variant is observed with very high allele frequencies in almost in populations (1000Genomes, ESP6500 and ExAC); this is an indicator for a benign variant and corresponds to the evidence code BA1. All computational methods (SIFT, PolyPhen and MutationTaster) predict the variant as harmless, and this brings the evidence code BP4. The ClinVar entries report benign and likely benign, meaning the evidence code BP6. At this point, the variant rs224222 has three evidence codes: BA1, BP4 and BP6. When we apply the rules defined in Table 3, we can conclude that this variant is benign.
Figure 1 – An example case for pathogenicity classification
Figure 2 – Another example case for pathogenicity classification
About The Author: Erşen Kavak
More posts by Erşen Kavak