Human Gene Mutation Database
If you refer to HGMD in any publication, please cite Stenson et al. (2017), The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Hum Genet epub. doi: 10.1007/s00439-017-1779-6 [PubMed]
Articles describing the HGMD facility have also appeared in Trends Genet (1997) 13:121-122, Nucleic Acids Res (1998) 26:285-287, Hum Mutat (2000) 15: 45-51, Hum Mutat (2003) 21:577-581, Genome Med (2009) 1:13, and Hum Genet (2014) 133:1-9. A list of recent journal articles written utilising HGMD data and published by authors within HGMD can be found here.
Human gene mutation is a highly specific process, and this specificity has important implications for the nature, prevalence and therefore diagnosis of genetic disease. Indeed, the recognition that certain DNA sequences are hypermutable has yielded clues as to the endogenous mutational mechanisms involved and provided insights into the intricacies of the processes of DNA replication and repair (Cooper and Krawczak 1993). In practical terms, a fuller understanding of the mutational process may prove important in molecular diagnostic medicine by contributing to improvements in the design and efficacy of mutation search procedures and strategies in different genetic disorders.
The Human Gene Mutation Database (HGMD) represents an attempt to collate known (published) gene lesions responsible for human inherited disease. This database, whilst originally established for the study of mutational mechanisms in human genes (Cooper and Krawczak 1993), has now acquired a much broader utility in that it embodies an up-to-date and comprehensive reference source to the spectrum of inherited human gene lesions. Thus, HGMD provides information of practical diagnostic importance to (i) researchers and diagnosticians in human molecular genetics, (ii) physicians interested in a particular inherited condition in a given patient or family, and (iii) genetic counsellors.
The Human Gene Mutation Database includes the first example of all mutations causing or associated with human inherited disease, plus disease-associated/functional polymorphisms reported in the literature. HGMD may also include additional reports for certain mutations if these reports serve to enhance the original entry (e.g. functional studies).
These data comprise various types of mutation within the coding regions, splicing and regulatory regions of human nuclear genes. Somatic mutations and mutations in the mitochondrial genome are thus not included, although in the latter case, links to Mitomap are now provided. Each mutation is entered only once in order to avoid confusion between recurrent and identical-by-descent lesions. Mutations inferred from amino acid sequencing have been excluded since, in the absence of direct DNA analysis, some ambiguity may exist as to the DNA sequence changes involved. Silent mutations within the coding region which do not alter the encoded amino acid are also not recorded. If such mutations are known to adversely affect mRNA splicing or gene expression, or have been reported in significant association with disease, they may be included.
HGMD does not usually include mutations lacking obvious phenotypic consequences although a few such variants have been included where they could conceivably have some clinical effect (e.g. albumins, butyrylcholinesterases). Many published mutation searches identify more than one genetic change in a single patient. In such cases, the relationship between a given lesion and the clinical phenotype has not always been immediately clear, and the curators of HGMD have had to rely exclusively upon the judgements of authors, peer reviewers and journal editors. The possibility of unintentional inclusion of some lesions with little or no pathological significance can therefore not be ruled out.
In March 1999, HGMD began to include disease-associated polymorphisms. These are extracted from the same journals that are scanned for mutations (>250). To be included, there must be a convincing association of the polymorphism with the phenotype. These polymorphisms are currently identified in the database by an addition to the phenotypic description. These additions are limited to "association", "association with" and "increased" or "lower" "risk", depending on how the polymorphism was reported. Question marks are sometimes included to indicate that the association is a tenuous one. Some of the polymorphisms are included as "variants" only. This will occur for any polymorphism reported as possibly clinically significant, but without an associated clinical phenotype. For a more complete explanation of how we choose to include such polymorphisms, please read our polymorphism inclusion criteria.
HGMD also includes some mutation data from those locus-specific mutation databases (LSDBs)
that are in the public domain+. Data from password-protected databases are therefore not included.
Data obtained from publicly available LSDBs are placed in the freely available public version of
HGMD immediately after inclusion.
+den Dunnen et al. (2009) Sharing data between LSDBs and central repositories. Hum Mutat 30(4):493-495.
Users interested in any of the categories of lesion which have not yet been included are referred to Online Mendelian Inheritance in Man and Mitomap.
Pathological mutations that dramatically disrupt the structure of a given gene are self-evidently very likely to be responsible for the associated clinical phenotype. However, for other categories of lesion, pathological mutations are often difficult to distinguish from polymorphisms with little or no clinical significance, particularly if their structural or functional consequences are subtle (Cotton and Scriver, 1998)*. Evidence for their authenticity in a pathological context therefore usually comes from one or more different lines of evidence:
Despite the best efforts of the HGMD curators, it may be assumed that some categories of gene lesion listed in HGMD (e.g. missense mutations, regulatory mutations, splicing [intronic] mutations) are likely to include some mutations that are not actually causative even although they have been reported as being so. Users of HGMD should be aware that, at present, we have no wholly objective means of knowing how accurate each mutation category actually is in terms of the pathological authenticty of the lesions listed therein. Recent studies have however provided evidence that the majority of rare (missense) alleles are likely to be deleterious (Kryukov et al, 2007)+.
*Cotton RG, Scriver CR. (1998) Proof of "disease causing" mutation. Hum Mutat 12: 1-3. PubMed
+Kryukov GV, Pennacchio LA, Sunyaev SR. (2007) Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80(4):727-39. PubMed
The HGMD web interface is designed to be as easy to use as possible. Each gene has been assigned its own "home page", where all other links are located. Each home page contains a table detailing the type and number of mutations logged in HGMD, with hypertext links to the mutation pages. The second table contains details on entries in HGMD categorised by phenotype, with links to Online Mendelian Inheritence in Man (OMIM) entries matching these phenotypes.
Meaningful integration of the data with phenotypic, structural and mapping information on human genes has been accomplished through bi-directional links between HGMD and both the Genome DataBase (GDB) and Online Mendelian Inheritance in Man (OMIM), Balitmore, USA. These links can be found at the bottom of each page. We have recently also added links to the Nomenclature Database, Genatlas and Geneclinics, where corresponding pages are available. In addition, hypertext links have been established from HGMD references to Medline abstracts through Entrez, on each mutation page. Links are also available to other facilities provided by HGMD, such as mutation maps and cDNA sequence information.
Links to splice junction data are also available for certain genes. These sequences consist of approximately 25 bp of exonic sequence, (uppercase) along with up to 25 bp of intronic sequence, (lowercase). The amount of intronic sequence will vary due to the currently limited availablilty of such sequences in the literature and GenBank. The initiating methionine and termination signals are marked in each sequence by square brackets [ATG], [TAA]. Square brackets are also used to define split codons where they occur at the splice junction. Normal brackets ( ) are used when a series of identical bases is encountered in the intronic sequence. Hence, t(6) would indicate a run of 6 t's at that position.
Searching HGMD is done through our search facility, which is available at the top of every gene and mutation page. More detailed information regarding searching can be found on our Help page.
All HGMD entries comprise a reference to the first literature report of a mutation, the associated disease state as specified in that report, the gene name, symbol (as recommended by the Genome DataBase Nomenclature Committee) and chromosomal location. In cases where a gene symbol has not yet been made available owing to the recency of the cloning report, a provisional symbol has been adopted which is denoted by lower-case letters.
Single base-pair substitutions in coding regions are presented in terms of a triplet change with an additional flanking base included if the mutated base lies in either the first or third position in the triplet. Substitutions causing regulatory abnormalities are logged in with thirty nucleotides flanking the site of mutation on both sides; the location of the mutation relative to the transcriptional initiation site, initiator ATG or polyadenylation site is given. Mutations with consequences for mRNA splicing are presented in brief with information specifying the relative position of the lesion with respect to a numbered intron donor or acceptor splice site. Positions given as positive integers refer to a 3' (downstream) location, negative integers refer to a 5' (upstream) location. Micro-deletions (of 20 bp or less) are presented in terms of the deleted bases in lower case plus, in upper case, 10 bp DNA sequence flanking both sides of the lesion. The numbered codon is preceded in the given sequence by the caret character ("^"). In cases where any location parameter is listed as '?', either the location is unknown or a consistent nucleotide/codon numbering system is lacking. Where deletions extend outwith the coding region of the gene in question, other positional information is occasionally provided e.g. 5' UTR (5' untranslated region) or E6I6 (denotes exon 6/intron 6 boundary). It should be noted that codon numbering may in some cases display inconsistencies owing to the common use of different numbering systems for the same protein. For some genes (where there is no risk of error or ambiguity), residue numbering has been standardized with respect to a generally accepted numbering system.
For gross deletions, gross insertions, repeat variations and complex rearrangements, information regarding the nature and location of a lesion is logged in narrative form because of the extremely variable quality of the original data reported.
Data are collected weekly by a combination of manual and computerised search procedures. In excess of 250 journals are scanned for articles describing germline mutations causing human genetic disease. The required data are extracted from the original articles and augmented with the necessary supporting data. Data included are mainly from the original published reports, although some data have been taken from 'Mutation Updates' and review articles. Unpublished mutations and mutations reported only in abstract form are not included. Reports of such lesions can be however accessed for some genes via the Locus-Specific Mutation Databases. Please note that, in order to keep HGMD reliable, the curators have adopted a policy of excluding mutations which have not been adequately described in the corresponding report.
The database curators would be most grateful to be informed of any errors as this will help to keep the database as complete and accurate as possible. Corrections and/or additions can be sent either by using our comment form or by contacting the curators directly.
David N Cooper
Tel : (+44) (0)2920 744057
Fax : (+44) (0)2920 747603
Institute of Medical Genetics
Cardiff CF14 4XN
HGMD is currently in the process of expansion and up-dating designed to make it a quick, easy, accessible and authoritative reference source to all inherited gene lesions in human. The availability of such a resource will allow
Cooper DN, Krawczak M. (1993) Human Gene Mutation. BIOS Scientific Publishers, Oxford [Reprinted with revisions 1994, reprinted in paperback 1995].
We are most grateful to the following Locus-specific Database curators and specialists for allowing us to check their compilations to pinpoint errors in HGMD.
|Raymond Dalgleish||(Leicester)||COL1A1, COL1A2|
|Chris Jones||(Cardiff)||XPA, XPC|
|Paul Bowden||(Cardiff)||KRT1, KRT5, KRT2A, KRT10, KRT14, KRT9, KRT17, KRT13, KRT16|
|Andrew Hattersley||(Exeter)||GCK, TCF1, TCF14|
|Michael Baser||(Los Angeles)||NF2|
HGMD is very grateful to the following for their past financial support :
Copyright © 2007 HGMD®