Disease-associated and/or functional polymorphisms in HGMD®

HGMD seeks to include DNA sequence variants that are either (i) disease-associated and of likely functional significance, or (ii) of clear functional significance even though no associated clinical phenotype may have been identified to date. The difficulty inherent in assessing published polymorphism reports describing potential disease associations has led to the adoption of a set of inclusion criteria that together describe what we consider to be a methodical and uniform approach to dealing with such variants as they appear in the literature.

At present, ~55% of the polymorphic variants recorded in HGMD are ‘disease-associated’. However, even in such cases where no disease association has yet been demonstrated, functional polymorphisms that alter the expression of a gene or the structure/function of the gene product are potentially very important. Although a functional polymorphism with no disease association may not have any direct and/or immediate clinical relevance, these data are potentially very valuable in terms of understanding inter-individual differences in disease susceptibility. The vast majority of polymorphic variants in HGMD are single nucleotide polymorphisms (SNPs) but a small number are of the insertion/deletion type. The polymorphic variants logged in HGMD are generally located in either gene regulatory or coding regions although it should be noted that SNPs occurring outside of these regions may nevertheless still have consequences for gene expression, splicing, transcription factor binding etc.

The distinction between a disease-associated polymorphism and a pathological mutation is in practice often fairly arbitrary and is generally made in the context of the prevalence of the variant in the population as well as its penetrance (the frequency with which a specific genotype manifests itself as a given clinical phenotype). Variants with a minor allele frequency of >1% in the population being studied are, by convention, termed polymorphisms. These polymorphisms are identified in the database by the addition of terms to the clinical/laboratory phenotypic description. These additions are limited to ‘association with’ and ‘association with?’ (question marks being included to indicate that the association is judged by the HGMD curators to be somewhat tenuous).

Inclusion Criteria for Disease-Associated/Functional Polymorphisms

Polymorphic variants logged in HGMD usually fall into two discrete categories:

Disease-associated polymorphisms of functional significance
To be included as disease-associated, a statistically significant (p<0.05) association between the polymorphism and a clinical phenotype must have been reported. In addition, other information (e.g. in vitro or in vivo expression/functional data, replicated association studies, epidemiological studies, evolutionary conservation data etc) should have been made available to support the contention that the polymorphism in question is itself of bona fide functional significance. Such a polymorphism could have consequences for gene expression, protein structure/function, gene splicing, etc. These supporting experimental data are required to ensure that non-causative variants (i.e. those merely in linkage disequilibrium with the actual causative variants) are not included. If the functional data required to support the inclusion of a disease-associated variant are contained within a subsequent article, the reference logged in HGMD will still be that which originally reported the disease association. Since HGMD does not currently support multiple referencing, those additional reports describing functional studies are given in the comment field. NCBI dbSNP numbers (where identified) are also included in the comment field.

Polymorphisms of functional significance with no reported disease association
If no clinical phenotype is known to be associated with a polymorphic variant, but sufficient in vitro or in vivo expression/functional data1 have nevertheless been presented to indicate functional significance, then the variant will be included in HGMD. Typically, such data provide evidence for a direct effect on gene expression, protein structure and/or function, gene splicing etc. These variants can thus, in a very real sense, be considered as giving rise to a ‘deficiency’ (or occasionally a surfeit) of a given gene transcript or protein product. Hence, the phenotype recorded in HGMD would entail a brief description of the functional effect e.g. ‘Reduced gene expression, association with’. If, at a later date, evidence becomes available to indicate that a disease/clinical phenotype is associated with such a polymorphism, the disease/clinical phenotype and reference to that variant is entered into the comment field, reflecting an additional reference and phenotype. Polymorphic variants affecting individual drug responses, patient survival times after diagnosis, and responses to surgical intervention, are not included in HGMD. Studies which simply report dbSNP numbers in association with disease (e.g. from large scale genome-wide association studies), with no additional evidence of direct functional involvement are also not included in HGMD. Users interested in this particular category of variation should try other databases such as the Catalogue of Published Genome-Wide Association Studies (http://www.genome.gov/26525384/) or the Genetic Association Database (http://geneticassociationdb.nih.gov/).

In some instances, the above criteria may be only partially satisfied, such that the HGMD curators remain unconvinced as to the functional/phenotypic relevance of the variant reported. In such cases, the polymorphism may nevertheless be included as a result of (i) supporting information becoming available subsequent to publication of the original (first) report, or (ii) because the associated gene/disease state was deemed to be of sufficient importance for the variant to warrant further study. Such variants have been ascribed the descriptor ‘association with?’ (as opposed to ‘association with’ without a question mark) to indicate that some degree of uncertainty is involved.
1One caveat to bear in mind is that in vitro studies are not invariably accurate indicators of the in vivo situation [see for example Cirulli ET & Goldstein DB. (2007) In vitro assays fail to predict in vivo effects of regulatory polymorphisms. Hum. Mol. Genet. 16: 1931-1939 & Dimas AS et al (2009) Common regulatory variation impacts gene expression in a cell type-dependent manner. Science 325: 1246-1250.]

Sub-categorisation of HGMD polymorphism data
Recently, HGMD has adopted a policy of sub-categorising polymorphism entries. Thus, polymorphisms may now be allocated to one of three possible categories reflecting the aforementioned criteria:

Disease-associated polymorphism (DP)

A polymorphism reported to be in significant association with disease (p<0.05) that is assumed to be functional (e.g. as a consequence of location, evolutionary conservation, replication studies etc), although there may as yet be no direct evidence (e.g. from an expression study) of function.

Disease-associated polymorphism with additional supporting functional evidence (DFP)

A polymorphism reported to be in significant association with disease (p<0.05) that has evidence of being of direct functional importance (e.g. as a consequence of altered expression, mRNA studies etc).

In vitro/laboratory or in vivo functional polymorphism (FP)

A polymorphism reported to affect the structure, function or expression of the gene (or gene product), but with no disease association reported as yet.

Frameshift or truncating variant (FTV)

A polymorphic or rare variant reported in the literature (e.g. detected in the process of whole genome/exome screening) that is predicted to truncate or otherwise alter the gene product (i.e. a nonsense or frameshift variant) but with no disease association reported as yet. Please note that any variant affecting the obligate donor/acceptor splice site of a gene will not be included in this category unless there is evidence for an effect on the splicing phenotype. Variants occurring in pseudogenes will also be excluded unless evidence for a functional effect is present for both the pseudogene itself2 and the variant in question.
2Balakirev ES & Ayala FJ. (2003) Pseudogenes: are they "junk" or functional DNA?. Ann Rev Genet 37: 123-51.


Replication studies for disease-associated polymorphic variants
The replication of disease-association studies can be a source of additional information to satisfy the inclusion criteria. If a replication study serves to support a previously tenuous genotype-phenotype correlation, then the phenotype can be ‘promoted’ from ‘association with?’ to ‘association with’ and the details of the replication study may be recorded in the comment field.


Other Categories of Variation

Copy number variations
Copy number variations (CNVs) are DNA segments >1 kb in length that present with variable numbers of copies in a given population. These variants are being reported in the literature with an ever increasing frequency. CNVs are potentially functionally significant and should therefore in principle be treated by HGMD in a similar manner to any other polymorphism. Human CNVs are however already being collected by other databases such as the Database of Genomic Variants (http://projects.tcag.ca/variation/) and the Human Genome Structural Variation Project (http://humanparalogy.gs.washington.edu/structuralvariation/). CNVs that are disease-associated are also being collated in databases such as DECIPHER (http://www.sanger.ac.uk/PostGenomics/decipher/), the European Cytogeneticist's Association Register of Unbalanced Chromosome Aberrations (http://www.ecaruca.net) and the Chromosome Abnormality Database (http://www.ukcad.org.uk/cocoon/ukcad/). Whilst HGMD does not wish to replicate the excellent curatorial work of other organisations, HGMD is still interested in such variants if they meet certain criteria. HGMD will therefore include these variations if they are shown to be both of functional significance and associated with disease, and if they involve a single characterised gene that is itself clearly involved in the disease association.

Risk haplotypes
Reports of haplotypes associated with an increased risk of disease are not included in cases where there is no indication as to precisely which variant (or variants) within the haplotype is responsible for the disease association/functional effect. If, however, evidence is presented to support the contention that a single variant within the risk haplotype is causative and/or of functional significance to a degree which satisfies the inclusion criteria, then it would certainly be included in HGMD.

The main limitation with recording disease-associated polymorphic variants of functional significance is the inclusion of only a single reference for each sequence change in HGMD. A large proportion of the papers reporting an association between a disease and a polymorphic variant do not include functional data on that variant. This problem will in the future be addressed by the introduction of multiple referencing.

home page
what's new

Copyright 2009 HGMD