Human Gene Nomenclature
- Get link
- X
- Other Apps
The naming of human genes is guided by the Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) (https://www.genenames.org/ ), which provides unique symbols and names for every human gene. The process of gene naming is complex and follows prinicples intended to improve clarity in scientific communication.
Key principles:
Uniqueness: Each human gene is given a unique name and symbol (an abbreviation) to avoid confusion. The symbols are often derived from the gene's function or its original name, but they can also be named for the conditions they are associated with.
Format: Gene symbols are brief, usually contain no more than five characters, and are written in italics. The letters are all capitalized if the gene is from a species other than humans; in humans, only the first letter is capitalized. Gene names, on the other hand, are written in full and in non-italic font.
Descriptiveness: Whenever possible, gene names and symbols should reflect the gene's character or function. For example, the BRCA1 gene is named for its role in BReast CAncer 1 susceptibility.
Avoiding Misleading Terms: Naming should avoid or limit the use of potentially misleading or offensive terms. For example, gene names should not imply a disease causality or severity that may not be universally true.
Orthology Considerations: When a new gene is found to be closely related to a known gene from another species (an ortholog), it is often given a similar name to reflect this relationship.
Revisions: If necessary, gene names and symbols can be revised. However, this is done sparingly to avoid confusion in the literature.
The HGNC provides a searchable database of approved human gene names and symbols. For new genes, scientists can propose names to the HGNC, which are then reviewed and either approved or sent back for revisions to ensure they meet the established guidelines.
Examples of Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) approved gene names and symbols:
BRCA1 - The full name is "BRCA1, DNA repair associated." This gene is involved in repairing damaged DNA and thus plays a role in maintaining the stability of a cell's genetic information. Mutations in this gene can lead to an increased risk of breast and ovarian cancer.
TP53 - The full name is "tumor protein p53." This gene encodes a protein that regulates the cell cycle and thus functions as a tumor suppressor. Mutations in this gene are associated with a variety of human cancers.
CFTR - The full name is "CF transmembrane conductance regulator." This gene is responsible for the production of a protein that helps create sweat, digestive fluids, and mucus. Mutations in this gene can cause cystic fibrosis.
APOE - The full name is "apolipoprotein E." This gene is part of a group of genes that produce apolipoproteins, proteins that bind lipids (fats) to form lipoproteins. The APOE gene has been associated with Alzheimer's disease.
HBB - The full name is "hemoglobin subunit beta." This gene is part of the hemoglobin molecule in red blood cells, which carries oxygen from the lungs to the rest of the body. Mutations in this gene can cause disorders like sickle cell anemia or beta-thalassemia.
Note that each symbol is a shortened version of the full gene name and is generally descriptive of the gene's function or the protein it encodes. However there are many exceptions.
6. Toll-like receptor - Toll-like receptors (TLRs) are a class of proteins that play a key role in the innate immune system. They are single, membrane-spanning, non-catalytic receptors usually expressed in sentinel cells such as macrophages and dendritic cells, that recognize structurally conserved molecules derived from microbes. Once these microbes have breached physical barriers such as the skin or intestinal tract mucosa, they are recognized by TLRs, which activate immune cell responses. The TLRs include TLR1, TLR2, TLR3, TLR4, TLR5, TLR6, TLR7, TLR8, TLR9, TLR10, TLR11, TLR12, and TLR13, though the latter 3 are not found in human. They received their name from their similarity to the protein coded by the toll gene identified in Drosophila in 1985 by Christiane Nüsslein-Volhard. The researchers were so surprised that they spontaneously shouted out in German, " Das ist ja toll! " which translates as "That's great!"
Alternative gene naming conventions
There are several alternative gene naming conventions, although they are not as universally recognized or as strictly regulated as the Human Genome Organisation (HUGO) Gene Nomenclature Committee (HGNC) system. Some of these alternative naming systems are used for different species or for specific types of genes, while others are older systems that have been mostly replaced by the HGNC system. For example:
Mouse Genome Informatics (MGI): This system is used for naming mouse genes. While it's similar to the HGNC system, there are some differences in capitalization and italicization. For example, all letters in a mouse gene symbol are capitalized and italicized.
FlyBase: This is a database for the naming of Drosophila (fruit fly) genes. It has its own set of conventions, which include the use of descriptive names and symbols, similar to the HGNC system.
Zebrafish Information Network (ZFIN): This database is used for the naming of zebrafish genes. It also follows a similar system to HGNC, but with a few differences in format.
Yeast Genome Database (SGD): Yeast genes have their own naming conventions, which include a three-letter abbreviation of the gene function, followed by a number.
Older Naming Systems: Before the establishment of the HGNC, genes were often named based on the disease they were associated with or the sequence in which they were discovered. Some of these older names are still in use, although they are gradually being replaced by the HGNC names.
Genes were often named based on the characteristics they affected, the diseases they were associated with, or the sequence in which they were discovered. This often resulted in multiple names for the same gene or set of genes, which could lead to confusion. For exampls:
Blood Group Systems: Before the establishment of standardized gene names, genes that governed blood types were named after the blood group system they were associated with. For example, the ABO blood group system is governed by the ABO gene, and the Rh blood group system is governed by the RHD and RHCE genes. These names are still widely used today, even though they don't follow the standard HGNC conventions.
Complement System: The genes in the complement system, a part of the immune system, were originally named based on the order of their discovery and their function. For example, the C1Q gene is responsible for the production of the complement component 1q, the first component of the serum complement system. The C2 and C4 genes, despite being part of the same pathway, are not sequential in the pathway. This reflects the history of their discovery rather than their order in the pathway.
Cytochrome P450 genes encode enzymes that are involved in the metabolism of a wide variety of molecules including drugs, carcinogens, and endogenous substrates such as hormones. The naming of cytochrome P450 genes follows a specific system that allows for the organization of these genes into families and subfamilies based on their sequence similarity.
Families: Cytochrome P450 genes are grouped into families, designated by a number after the prefix "CYP". Genes in the same family have more than 40% homology (sequence similarity).
Subfamilies: Within families, genes are further grouped into subfamilies, designated by a letter. Genes in the same subfamily have more than 55% homology.
Individual Genes: Within a subfamily, individual genes are designated by a second number.
For example, the gene CYP2D6 is in the CYP (cytochrome P450) family 2, subfamily D, and is gene number 6 within that subfamily.
This system allows for the organization of the cytochrome P450 genes in a way that reflects their evolutionary relationships and functional similarities. It's worth noting that while this system does not follow the standard HGNC conventions, it's widely recognized and used in scientific research due to its usefulness in classifying these genes.
Comparison of genes with other organsims
The HGNC Comparison of Orthology Predictions (HCOP) search is a tool that integrates and displays the orthology assertions predicted for a specified human gene, or set of human genes. HCOP was originally designed to show orthology predictions between human and mouse, but has been expanded to include data from chimp, macaque, rat, dog, horse, cattle, pig, opossum, platypus, chicken, anole lizard, xenopus, zebrafish, C. elegans, Drosophila, S. cerevisiae, and S. pombe meaning that there are currently 19 genomes available for comparison in HCOP.
The Vertebrate Gene Nomenclature Committee (VGNC) is an extension of the established HGNC (HUGO Gene Nomenclature Committee) project that names human genes. VGNC is responsible for assigning standardized names to genes in vertebrate species that currently lack a nomenclature committee. The VGNC also coordinates with the 5 existing vertebrate nomenclature committees, MGNC (mouse), RGNC (rat), CGNC (chicken), XNC (Xenopus frog) and ZNC (zebrafish), to ensure genes are named in line with their human homologs.
The VGNC uses a software pipeline based on the HGNC Consensus Orthology Predictions (HCOP) tool to transfer human nomenclature to other species automatically for genes where the same orthologs are predicted by four different resources (Ensembl, NCBI Gene, OMA and PANTHER). For genes where there is no such consensus ortholog prediction, naming is based on a combination of a semi-automated pipeline and manual curation, taking into account synteny, phylogeny and other data. We also rely on the expertise of specialist advisors for naming within complex families. Two key complex families are the olfactory receptors (ORs) - the largest family in mammalian genomes - and the cytochrome P450s (CYPs); we are collaborating with Drs David Nelson and Jed Goldstone for the CYPs and Drs Doron Lancet and Tsviya Olender for the ORs.
The VGNC initially began with naming chimpanzee and expanded to horse, cattle and dog and then added cat and macaque. We also have gene nomenclature data for specific gene families for several other species. Please see our "Species List" page for a complete list of the species we cover.
The VGNC naming process will be extended to other species in due course. Our criteria for choosing further vertebrate species are the quality of the genome assembly and annotation, the perceived value as a research organism and the level of support from the scientific community. Please contact us at vgnc@genenames.org with suggestions on which species should be our next choices for gene naming.
The HGNC guidelines for human gene naming have recently been updated and now also include a section on naming across vertebrates.
- Get link
- X
- Other Apps
Comments
Post a Comment