HGMD

Human Gene Mutation Database


If you refer to HGMD in any publication, please cite Stenson et al (2014), The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum Genet 133:1-9. [PubMed]

Articles describing the HGMD facility have also appeared in Trends Genet (1997) 13:121-122, Nucleic Acids Res (1998) 26:285-287, Hum Mutat (2000) 15: 45-51, Hum Mutat (2003) 21:577-581, and Genome Med (2009) 1:13. A list of recent journal articles written utilising HGMD data and published by authors within HGMD can be found here.


Rationale

Human gene mutation is a highly specific process, and this specificity has important implications for the nature, prevalence and therefore diagnosis of genetic disease. Indeed, the recognition that certain DNA sequences are hypermutable has yielded clues as to the endogenous mutational mechanisms involved and provided insights into the intricacies of the processes of DNA replication and repair (Cooper and Krawczak 1993). In practical terms, a fuller understanding of the mutational process may prove important in molecular diagnostic medicine by contributing to improvements in the design and efficacy of mutation search procedures and strategies in different genetic disorders.

The Human Gene Mutation Database (HGMD) represents an attempt to collate known (published) gene lesions responsible for human inherited disease. This database, whilst originally established for the study of mutational mechanisms in human genes (Cooper and Krawczak 1993), has now acquired a much broader utility in that it embodies an up-to-date and comprehensive reference source to the spectrum of inherited human gene lesions. Thus, HGMD provides information of practical diagnostic importance to (i) researchers and diagnosticians in human molecular genetics, (ii) physicians interested in a particular inherited condition in a given patient or family, and (iii) genetic counsellors.

Data coverage

The Human Gene Mutation Database includes the first example of all mutations causing or associated with human inherited disease, plus disease-associated/functional polymorphisms reported in the literature. HGMD may also include additional reports for certain mutations if these reports serve to enhance the original entry (e.g. functional studies).

These data comprise various types of mutation within the coding regions, splicing and regulatory regions of human nuclear genes. Somatic mutations and mutations in the mitochondrial genome are thus not included, although in the latter case, links to Mitomap are now provided. Each mutation is entered only once in order to avoid confusion between recurrent and identical-by-descent lesions. Mutations inferred from amino acid sequencing have been excluded since, in the absence of direct DNA analysis, some ambiguity may exist as to the DNA sequence changes involved. Silent mutations within the coding region which do not alter the encoded amino acid are also not recorded. If such mutations are known to adversely affect mRNA splicing or gene expression, or have been reported in significant association with disease, they may be included.

HGMD does not usually include mutations lacking obvious phenotypic consequences although a few such variants have been included where they could conceivably have some clinical effect (e.g. albumins, butyrylcholinesterases). Many published mutation searches identify more than one genetic change in a single patient. In such cases, the relationship between a given lesion and the clinical phenotype has not always been immediately clear, and the curators of HGMD have had to rely exclusively upon the judgements of authors, peer reviewers and journal editors. The possibility of unintentional inclusion of some lesions with little or no pathological significance can therefore not be ruled out.

In March 1999, HGMD began to include disease-associated polymorphisms. These are extracted from the same journals that are scanned for mutations (>250). To be included, there must be a convincing association of the polymorphism with the phenotype. These polymorphisms are currently identified in the database by an addition to the phenotypic description. These additions are limited to "association", "association with" and "increased" or "lower" "risk", depending on how the polymorphism was reported. Question marks are sometimes included to indicate that the association is a tenuous one. Some of the polymorphisms are included as "variants" only. This will occur for any polymorphism reported as possibly clinically significant, but without an associated clinical phenotype. For a more complete explanation of how we choose to include such polymorphisms, please read our polymorphism inclusion criteria.

HGMD also includes some mutation data from those locus-specific mutation databases (LSDBs) that are in the public domain+. Data from password-protected databases are therefore not included. Data obtained from publicly available LSDBs are placed in the freely available public version of HGMD immediately after inclusion.
+den Dunnen et al. (2009) Sharing data between LSDBs and central repositories. Hum Mutat 30(4):493-495.

Users interested in any of the categories of lesion which have not yet been included are referred to Online Mendelian Inheritance in Man and Mitomap.

Pathological authenticity

Pathological mutations that dramatically disrupt the structure of a given gene are self-evidently very likely to be responsible for the associated clinical phenotype. However, for other categories of lesion, pathological mutations are often difficult to distinguish from polymorphisms with little or no clinical significance, particularly if their structural or functional consequences are subtle (Cotton and Scriver, 1998)*. Evidence for their authenticity in a pathological context therefore usually comes from one or more different lines of evidence:

Despite the best efforts of the HGMD curators, it may be assumed that some categories of gene lesion listed in HGMD (e.g. missense mutations, regulatory mutations, splicing [intronic] mutations) are likely to include some mutations that are not actually causative even although they have been reported as being so. Users of HGMD should be aware that, at present, we have no wholly objective means of knowing how accurate each mutation category actually is in terms of the pathological authenticty of the lesions listed therein. Recent studies have however provided evidence that the majority of rare (missense) alleles are likely to be deleterious (Kryukov et al, 2007)+.

*Cotton RG, Scriver CR. (1998) Proof of "disease causing" mutation. Hum Mutat 12: 1-3. PubMed
+Kryukov GV, Pennacchio LA, Sunyaev SR. (2007) Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. Am J Hum Genet 80(4):727-39. PubMed

Web-Page Layout

The HGMD web interface is designed to be as easy to use as possible. Each gene has been assigned its own "home page", where all other links are located. Each home page contains a table detailing the type and number of mutations logged in HGMD, with hypertext links to the mutation pages. The second table contains details on entries in HGMD categorised by phenotype, with links to Online Mendelian Inheritence in Man (OMIM) entries matching these phenotypes.

Meaningful integration of the data with phenotypic, structural and mapping information on human genes has been accomplished through bi-directional links between HGMD and both the Genome DataBase (GDB) and Online Mendelian Inheritance in Man (OMIM), Balitmore, USA. These links can be found at the bottom of each page. We have recently also added links to the Nomenclature Database, Genatlas and Geneclinics, where corresponding pages are available. In addition, hypertext links have been established from HGMD references to Medline abstracts through Entrez, on each mutation page. Links are also available to other facilities provided by HGMD, such as mutation maps and cDNA sequence information.

Links to splice junction data are also available for certain genes. These sequences consist of approximately 25 bp of exonic sequence, (uppercase) along with up to 25 bp of intronic sequence, (lowercase). The amount of intronic sequence will vary due to the currently limited availablilty of such sequences in the literature and GenBank. The initiating methionine and termination signals are marked in each sequence by square brackets [ATG], [TAA]. Square brackets are also used to define split codons where they occur at the splice junction. Normal brackets ( ) are used when a series of identical bases is encountered in the intronic sequence. Hence, t(6) would indicate a run of 6 t's at that position.

Searching HGMD is done through our search facility, which is available at the top of every gene and mutation page. More detailed information regarding searching can be found on our Help page.

Database structure

All HGMD entries comprise a reference to the first literature report of a mutation, the associated disease state as specified in that report, the gene name, symbol (as recommended by the Genome DataBase Nomenclature Committee) and chromosomal location. In cases where a gene symbol has not yet been made available owing to the recency of the cloning report, a provisional symbol has been adopted which is denoted by lower-case letters.

Single base-pair substitutions in coding regions are presented in terms of a triplet change with an additional flanking base included if the mutated base lies in either the first or third position in the triplet. Substitutions causing regulatory abnormalities are logged in with thirty nucleotides flanking the site of mutation on both sides; the location of the mutation relative to the transcriptional initiation site, initiator ATG or polyadenylation site is given. Mutations with consequences for mRNA splicing are presented in brief with information specifying the relative position of the lesion with respect to a numbered intron donor or acceptor splice site. Positions given as positive integers refer to a 3' (downstream) location, negative integers refer to a 5' (upstream) location. Micro-deletions (of 20 bp or less) are presented in terms of the deleted bases in lower case plus, in upper case, 10 bp DNA sequence flanking both sides of the lesion. The numbered codon is preceded in the given sequence by the caret character ("^"). In cases where any location parameter is listed as '?', either the location is unknown or a consistent nucleotide/codon numbering system is lacking. Where deletions extend outwith the coding region of the gene in question, other positional information is occasionally provided e.g. 5' UTR (5' untranslated region) or E6I6 (denotes exon 6/intron 6 boundary). It should be noted that codon numbering may in some cases display inconsistencies owing to the common use of different numbering systems for the same protein. For some genes (where there is no risk of error or ambiguity), residue numbering has been standardized with respect to a generally accepted numbering system.

For gross deletions, gross insertions, repeat variations and complex rearrangements, information regarding the nature and location of a lesion is logged in narrative form because of the extremely variable quality of the original data reported.

Data collection

Data are collected weekly by a combination of manual and computerised search procedures. In excess of 250 journals are scanned for articles describing germline mutations causing human genetic disease. The required data are extracted from the original articles and augmented with the necessary supporting data. Data included are mainly from the original published reports, although some data have been taken from 'Mutation Updates' and review articles. Unpublished mutations and mutations reported only in abstract form are not included. Reports of such lesions can be however accessed for some genes via the Locus-Specific Mutation Databases. Please note that, in order to keep HGMD reliable, the curators have adopted a policy of excluding mutations which have not been adequately described in the corresponding report.

Errors of omission or commission

The database curators would be most grateful to be informed of any errors as this will help to keep the database as complete and accurate as possible. Corrections and/or additions can be sent either by using our comment form or by contacting the curators directly.

David N Cooper
Tel : (+44) (0)2920 744057
Fax : (+44) (0)2920 747603

Institute of Medical Genetics
Cardiff University
Heath Park
Cardiff CF14 4XN
United Kingdom

Future plans

HGMD is currently in the process of expansion and up-dating designed to make it a quick, easy, accessible and authoritative reference source to all inherited gene lesions in human. The availability of such a resource will allow

  1. the rapid assessment of current diagnostic possibilities both for the molecular genetic analysis of a particular disease and for the direct detection of specific mutations at a given locus.

  2. determination of the nature and distribution of mutations within a given gene so as to optimize mutation screening procedures.

  3. the cross-checking of newly found mutations against known lesions as a check for their authenticity and to avoid redundancy in reporting.

  4. the eventual electronic publication of data. This will speed up the process of information flow as well as reduce the burden on human molecular genetics journals which currently publish a large number of mutation reports.

  5. further study of the mutational mechanisms responsible for the observed mutational spectrum.

Reference

Cooper DN, Krawczak M. (1993) Human Gene Mutation. BIOS Scientific Publishers, Oxford [Reprinted with revisions 1994, reprinted in paperback 1995].

Acknowledgements

We are most grateful to the following Locus-specific Database curators and specialists for allowing us to check their compilations to pinpoint errors in HGMD.

Meena Upadhyaya (Cardiff) NF1
Raymond Dalgleish (Leicester) COL1A1, COL1A2
Ian Day (London) LDLR
Chris Jones (Cardiff) XPA, XPC
David Lane (London) AT3
Charles Scriver (Montreal) PAH
Lap-Chee Tsui (Toronto) CFTR
Ted Tuddenham (London) F8
Peter Collins (Stockholm) CDK4
Hiromu Nakajima (Osaka) PFKM
Chester Whitley (Minneapolis) AR
Cynthia Bartels (Omaha) BCHE
Paul Bowden (Cardiff) KRT1, KRT5, KRT2A, KRT10, KRT14, KRT9, KRT17, KRT13, KRT16
Francesco Giannelli (London) F9
Pieter Reitsma (Amsterdam) PROC
Sophie Gandrille (Paris) PROS
Betty Hosler (Charlestown) SOD1
Andrew Hattersley (Exeter) GCK, TCF1, TCF14
Dagmar Fuehrer (Cardiff) TSHR
Michael Baser (Los Angeles) NF2

HGMD is very grateful to the following for their past financial support :
Celera logo
Macmillan logo
Pfizer logo 
Sun Life logo 
GDB logo
DFG logo
GFH logo
Springer logo
Bios logo
Research Genetics logo
SKB logo 
Hybaid logo
Scotlab logo
Boehringer logo


home page
search
statistics
what's new
background

Copyright 2007 HGMD