Aim
HGMD seeks to include DNA sequence variants that are either (i) disease-associated
and of likely functional significance, or (ii) of clear functional significance
even though no associated clinical phenotype may have been identified to date.
The difficulty inherent in assessing published polymorphism reports describing
potential disease associations has led to the adoption of a set of inclusion
criteria that together describe what we consider to be a methodical and uniform
approach to dealing with such variants as they appear in the literature.
Background
At present, ~55% of the polymorphic variants recorded in HGMD are ‘disease-associated’.
However, even in such cases where no disease association has yet been demonstrated,
functional polymorphisms that alter the expression of a gene or the structure/function
of the gene product are potentially very important. Although a functional polymorphism
with no disease association may not have any direct and/or immediate clinical
relevance, these data are potentially very valuable in terms of understanding
inter-individual differences in disease susceptibility. The vast majority of
polymorphic variants in HGMD are single nucleotide polymorphisms (SNPs) but
a small number are of the insertion/deletion type. The polymorphic variants
logged in HGMD are generally located in either gene regulatory or coding regions
although it should be noted that SNPs occurring outside of these regions may
nevertheless still have consequences for gene expression, splicing, transcription
factor binding etc.
Definitions
The distinction between a disease-associated polymorphism and a pathological
mutation is in practice often fairly arbitrary and is generally made in the
context of the prevalence of the variant in the population as well as its penetrance
(the frequency with which a specific genotype manifests itself as a given clinical
phenotype). Variants with a minor allele frequency of >1% in the population
being studied are, by convention, termed polymorphisms. These polymorphisms
are identified in the database by the addition of terms to the clinical/laboratory
phenotypic description. These additions are limited to ‘association with’
and ‘association with?’ (question marks being included to indicate
that the association is judged by the HGMD curators to be somewhat tenuous).
Disease-associated polymorphisms of functional significance
To be included as disease-associated, a statistically significant (p<0.05)
association between the polymorphism and a clinical phenotype must have been
reported. In addition, other information (e.g. in vitro or in vivo
expression/functional data, replicated association studies, epidemiological
studies, evolutionary conservation data etc) should have been made available to
support the contention that the polymorphism in question is itself of bona
fide functional significance. Such a polymorphism could have consequences
for gene expression, protein structure/function, gene splicing, etc. These supporting
experimental data are required to ensure that non-causative variants (i.e. those
merely in linkage disequilibrium with the actual causative variants) are not
included. If the functional data required to support the inclusion of a disease-associated
variant are contained within a subsequent article, the reference logged in HGMD
will still be that which originally reported the disease association. Since
HGMD does not currently support multiple referencing, those additional reports
describing functional studies are given in the comment field. NCBI dbSNP numbers
(where identified) are also included in the comment field.
Polymorphisms of functional significance with no reported
disease association
If no clinical phenotype is known to be associated with a polymorphic variant,
but sufficient in vitro or in vivo expression/functional data1
have nevertheless been presented to indicate functional significance, then the
variant will be included in HGMD. Typically, such data provide evidence for
a direct effect on gene expression, protein structure and/or function, gene
splicing etc. These variants can thus, in a very real sense, be considered as
giving rise to a ‘deficiency’ (or occasionally a surfeit) of a given
gene transcript or protein product. Hence, the phenotype recorded in HGMD would
entail a brief description of the functional effect e.g. ‘Reduced gene
expression, association with’. If, at a later date, evidence becomes available
to indicate that a disease/clinical phenotype is associated with such a polymorphism,
the disease/clinical phenotype and reference to that variant is entered into
the comment field, reflecting an additional reference and phenotype. Polymorphic
variants affecting individual drug responses, patient survival times after diagnosis,
and responses to surgical intervention, are not included in HGMD. Studies which
simply report dbSNP numbers in association with disease (e.g. from large scale
genome-wide association studies), with no additional evidence of direct functional
involvement are also not included in HGMD. Users interested in this particular category of variation
should try other databases such as the Catalogue of Published Genome-Wide Association Studies
(http://www.genome.gov/26525384/) or the Genetic Association Database (http://geneticassociationdb.nih.gov/).
In some instances, the above criteria may be only partially
satisfied, such that the HGMD curators remain unconvinced as to the functional/phenotypic
relevance of the variant reported. In such cases, the polymorphism may nevertheless
be included as a result of (i) supporting information becoming available subsequent
to publication of the original (first) report, or (ii) because the associated
gene/disease state was deemed to be of sufficient importance for the variant
to warrant further study. Such variants have been ascribed the descriptor ‘association
with?’ (as opposed to ‘association with’ without a question
mark) to indicate that some degree of uncertainty is involved.
1One caveat
to bear in mind is that in vitro studies are not invariably accurate
indicators of the in vivo situation [see for example Cirulli ET &
Goldstein DB. (2007) In vitro assays fail to predict in vivo
effects of regulatory polymorphisms. Hum. Mol. Genet. 16: 1931-1939
& Dimas AS et al (2009) Common regulatory variation impacts gene expression in a cell type-dependent manner.
Science 325: 1246-1250.]
Sub-categorisation of HGMD polymorphism data
Recently, HGMD has adopted a policy of sub-categorising polymorphism entries.
Thus, polymorphisms may now be allocated to one of three possible categories
reflecting the aforementioned criteria:
Disease-associated polymorphism (DP)
A polymorphism reported to be in significant association with disease (p<0.05) that is assumed to be functional (e.g. as a consequence of location, evolutionary conservation, replication studies etc), although there may as yet be no direct evidence (e.g. from an expression study) of function.
Disease-associated polymorphism with additional supporting functional evidence (DFP)
A polymorphism reported to be in significant association with disease (p<0.05) that has evidence of being of direct functional importance (e.g. as a consequence of altered expression, mRNA studies etc).
In vitro/laboratory or in vivo functional polymorphism (FP)
A polymorphism reported to affect the structure, function or expression of the gene (or gene product), but with no disease association reported as yet.
Frameshift or truncating variant (FTV)
A polymorphic or rare variant reported in the literature (e.g. detected in the process of whole
genome/exome screening) that is predicted to truncate or otherwise alter the gene product (i.e. a nonsense or frameshift variant) but with no disease
association reported as yet. Please note that any variant affecting the obligate donor/acceptor splice site of a gene will not be included in this
category unless there is evidence for an effect on the splicing phenotype. Variants occurring in pseudogenes will also be excluded unless evidence for
a functional effect is present for both the pseudogene itself2 and the variant in question.
2Balakirev ES & Ayala FJ. (2003) Pseudogenes: are they "junk" or functional DNA?.
Ann Rev Genet 37: 123-51.
Replication studies for disease-associated polymorphic variants
The replication of disease-association studies can be a source of additional
information to satisfy the inclusion criteria. If a replication study serves
to support a previously tenuous genotype-phenotype correlation, then the phenotype
can be ‘promoted’ from ‘association with?’ to ‘association
with’ and the details of the replication study may be recorded in the
comment field.
Copy number variations
Copy number variations (CNVs) are DNA segments >1 kb in length that present
with variable numbers of copies in a given population. These variants are being
reported in the literature with an ever increasing frequency. CNVs are potentially
functionally significant and should therefore in principle be treated by HGMD
in a similar manner to any other polymorphism. Human CNVs are however already
being collected by other databases such as the Database of Genomic Variants
(http://projects.tcag.ca/variation/) and the Human Genome Structural Variation
Project (http://humanparalogy.gs.washington.edu/structuralvariation/). CNVs
that are disease-associated are also being collated in databases such as DECIPHER
(http://www.sanger.ac.uk/PostGenomics/decipher/), the European Cytogeneticist's
Association Register of Unbalanced Chromosome Aberrations (http://www.ecaruca.net)
and the Chromosome Abnormality Database (http://www.ukcad.org.uk/cocoon/ukcad/).
Whilst HGMD does not wish to replicate the excellent curatorial work of other
organisations, HGMD is still interested in such variants if they meet certain
criteria. HGMD will therefore include these variations if they are shown to
be both of functional significance and associated with disease, and if they
involve a single characterised gene that is itself clearly involved in the disease
association.
Risk haplotypes
Reports of haplotypes associated with an increased risk of disease are not included
in cases where there is no indication as to precisely which variant (or variants)
within the haplotype is responsible for the disease association/functional effect.
If, however, evidence is presented to support the contention that a single variant
within the risk haplotype is causative and/or of functional significance to
a degree which satisfies the inclusion criteria, then it would certainly be
included in HGMD.
Limitations
The main limitation with recording disease-associated polymorphic variants of
functional significance is the inclusion of only a single reference for each
sequence change in HGMD. A large proportion of the papers reporting an association
between a disease and a polymorphic variant do not include functional data on
that variant. This problem will in the future be addressed by the introduction
of multiple referencing.
Copyright © 2009 HGMD®