Gene Death in the human lineage

This page presents some details of kinases and other genes in which genomic sequence analysis indicates that the human gene is a pseudogene with functional orthologs in mouse and rat. These genes are neither processed pseudogenes or duplications of functional genes. Rather, they represent loss of function of previously vital genes.

These data highlight a number of genes that were omitted from the 2004 'final report' of the public human genome sequencing consortium (1), which noted 37 such human pseudogenes. Criteria used for that assessment included having Ensembl predicted peptides in both rodents, functional sequences in both rat and mouse, and syntenic location of the gene in all three species, as noted by flanking gene sequences.

Recently decayed human kinase pseudogenes

By comparison of mouse and human genome (see the mouse kinome paper), we detected 4 functional mouse kinases, whose human orthologs had stops or frameshifts within their kinase domain sequence. Orthology was predicted based on location in syntenic regions and much greater sequence similarity between orthologs than paralogs.

These genes are PLK5, one of 5 homologs of the Drosophila cell cycle gene polo, with predicted functions in cell cycle. The human homolog (SgK384ps) has a single stop within the coding region, indicating that it was up to recently functional. The chimp homolog is also a pseudogene, though the disablements are at different locations, indicating that the gene may have been intact, even if not required, in their common ancestor.

TSSK5 is also one of a family of five kinases, in this case, named after a Testis-Specific Serine/threonine Kinase. The family is mostly expressed during spermiogenesis and is not well understood functionally. The human gene (aka TSSKps1) has two frameshifts. The chimp ortholog has two frameshifts, one of which is in the same location as human, though oddly, encodes a 1nt insertion relative to mouse, where the human has a 1nt deletion.

The other two sequences are guanylate cyclases. These contain a kinase catalytic domain (ePK domain), though it is thought not to be catalytically active. The two human pseudogenes are more highly disabled in human than the previous kinases: KSGCps has two stop and three frameshifts, while CYGXps has five frameshifts, one stop, and is missing the N terminus. The chimp orthologs share many of the disablements and have further disablements of their own. KSGC is also known as Gucy2g (guanylate cyclase 2g) or GC-G, and CYGX as GC-D (Guanylate Cyclase D), which is thought to function in olfaction, giving a functional correlated to its loss in primates.

Loss of specific exons within kinase genes has also occurred in the human lineage. For instance, the DCAMKL1 gene in mouse contains an alternatively spliced 16 AA module, whose protein sequence is perfectly conserved in rat. However, the human genome contains a degenerate sequence for that module, and no human ESTs contain the sequence.

Of these four genes, only TSSK5 (serine/threonine kinase 22B) is listed in the human genome paper. In all cases, at least partial EST/cDNA converage is available for both mouse and rat orthologs. Mouse and human sequences are availble through KinBase; the rat sequences (all full length) are as follows:

>PLK5_Rn
MEPRLRRRRSRQLVATFLRDPGSGRVYRRGKLIGKGAFSRCYKLTDMSTSAVFALKVVPRGGAGRLRLRGKVEREIALHSRLHHRNIVAFHAHFADRDHVYMVLEYCSRQVPLQSLAHVL
KVRRTLTEPEVRYYFRGLVSGLRYLHQQRIVHRDLKPSNFFLNKNMEVKIGDLGLAARVGPAGRCHRVLCGTPNFQAPEVVSRNGHSAKSDIWALGCIMYTVLTGTPPFAAAPLSEMYQN
IRDGHYLEPTQLSPSARSLIARLLAPDPDERPSLDHLLQDDFFSQGFTPERLPPHSCHSPPVFAFPPPLGRLFRKVGQLLLTQCRPPCPFTSKEASGPGEESTEPDHMEASNEEGAPLCT
ESRIHLLTLGTPRTDPADAKGTLALQLEVATRKLCLCLDAGPVAGQDPPGEQRSVLWAPKWVDYSLKYGFGYQLSDGGSGVLFRDGSHMALRPPGGHVSYQPDQGTLWIFALRDVPGPLR
AKLAVLRLFACYMQRRLREEGTAPMTATPASPDFCLLSFTADAQALAMLFSNGTVQVSSKTSPVQLVLSGEGEDFLLTIQEPGGPSMGTSYTLDVLRSHGISLAVRHHLRLGLHLLQSV

>sp|P51839|CYGX_RAT Olfactory guanylyl cyclase GC-D precursor
MAGLQQGCHPEGQDWTAPHWKTCRALPGPRGLTVRHLRTVSSISVFSVVFWGVLLWADSLSLPAWARETFTLGVLGPWDCDPIFAQALPSMATQLAVDRVNQDASLLLGSQLDFKILPTG
CDTPHALATFVAHRNTVAAFIGPVNPGYCPAAALLAQGWGKSLFSWACGAPEGGGALVPTLPSMADVLLSVMRHFGWARLAIVSSHQDIWVTTAQQLATAFRAHGLPIGLITSLGPGEKG
ATEVCKQLHSVHGLKIVVLCMHSALLGGLEQTVLLRCAREEGLTDGRLVFLPYDTLLFALPYRNRSYLVLDDDGPLQEAYDAVLTISLDTSPESHAFTATKMRGGAAANLGPEQVSPLFG
TIYDAVILLAHALNHSETHGTGLSGAHLGNHIRALDVAGFSQRIRIDGKGRRLPQYVILDTNGEGSQLVPTHILDVSTQQVQPLGTAVHFPGGSPPAHDASCWFDPNTLCIRGVQPLGSL
LTLTITCVLALVGGFLAYFIRLGLQQLRLLRGPHRILLTPQELTFLQRTPSRRRPHVDSGSESRSVVDGGSPQSVIQGSTRSVPAFLEHTNVALYQGEWVWLKKFEAGTAPDLRPSSLSL
LRKMREMRHENVTAFLGLFVGPEVSAMVLEHCARGSLEDLLRNEDLRLDWTFKASLLLDLIRGLRYLHHRHFPHGRLKSRNCVVDTRFVLKITDHGYAEFLESHCSFRPQPAPEELLWTA
PELLRGPRGPWGPGKATFKGDVFSLGIILQEVLTRDPPYCSWGLSAEEIIRKVASPPPLCRPLVSPDQGPLECIQLMQLCWEEAPDDRPSLDQIYTQFKSINQGKKTSVADSMLRMLEKY
SQSLEGLVQERTEELELERRKTERLLSQMLPPSVAHALKMGTTVEPEYFDQVTIYFSDIVGFTTISALSEPIEVVGFLNDLYTMFDAVLDSHDVYKVETIGDAYMVASGLPRRNGNRHAA
EIANMALEILSYAGNFRMRHAPDVPIRVRAGLHSGPCVAGVVGLTMPRYCLFGDTVNTASRMESTGLPYRIHVSRNTVQALLSLDEGYKIDVRGQTELKGKGLEETYWLTGKTGFCRSLP
TPLSIQPGDPWQDHINQEIRTGFAKLARVC

>KSGC aka GC-G from rat
MASRARSEPPLEHRFYGGAESHAGHSSLVLTLFVVMLMTCLEAAKLTVGFHAPWNISHPFSVQRLGAGLQIAVDKLNSEPVGPGNLSWEFTYTNATCNAKESLAAFIDQVQREHISVLIG
PACPEAAEVIGLLASEWDIPLFDFVGQMTALEDHFWCDTCVTLVPPKQEIGTVLRESLQYLGWEYIGVFGGSSAGSSWGEVNELWKAVEDELQLHFTITARVRYSSGHSDLLQEGLRSMS
SVARVIILICSSEDAKHILQAAEDLGLNSGEFVFLLLQQLEDSFWKEVLAEDKVTRFPKVYESVFLIAPSTYGGSAGDDDFRKQVYQRLRRPPFQSSISSEDQVSPYSAYLHDALLLYAQ
TVEEMMKAEKDFRDGRQLISTLRADQVTLQGITGPVLLDAQGKRHMDYSVYALQKSGNGSRFLPFLHYDSFQKVIRPWRDDLNASGPHGSHPEYKPDCGFHEDLCRTKPPTGAGMTASVT
AVIPTVTLLVVASAAAITGLMLWRLRGKVQNHPGDTWWQIHYDSITLLPQHKPSHRGTPMSRCNVSNASTVKISADCGSFAKTHQDEELFYAPVGLYQGNHVALCYIGEEAEARIKKPTV
LREVWLMCELKHENIVPFFGVCTEPPNICIVTQYCKKGSLKDVLRNSDHEMDWIFKLSFVYDIVNGMLFLHGSPLRSHGNLKPSNCLVDSHMQLKLAGFGLWEFKHGSTCRIYNQEATDH
SELYWTAPELLRLRELPWSGTPQGDVYSFAILLRDLIHQQAHGPFEDLEAAPEEIISCIKDPRAPVPLRPSLLEDKGDERIVALVRACWAESPEQRPAFPSIKKTLREASPRGRVSILDS
MMGKLEMYASHLEEVVEERTCQLVAEKRKVEKLLSTMLPSFVGEQLIAGKSVEPEHFESVTIFFSDIVGFTKLCSLSSPLQVVKLLNDLYSLFDHTIQTHDVYKVETIGDAYMVASGLPI
RNGAQHADEIATMSLHLLSVTTNFQIGHMPEERLKLRIGLHTGPVVAGVVGITMPRYCLFGDTVNMASRMESSSLPLRIHVSQSTARALLVAGGYHLQKRGTISVKGKGEQTTFWLTGKD
GFAVPLPEFTEEEAKVPEIL

>TSSK5_Rn
MRSNSRRKEDQRVFIEQVRECMNNGYLLSSKKIGSGAFSKVYLAYATRERMKHNPRLSSDLRGKHHSMVAIKIVSMAEAPAEYSRKFLPREILSLNATYKHMNIVQLYETYQNSQRSYLV
LELAARGDLLEYINAVSDLRCCPGLEEEEARRLFWQLVSAVAHCHSVGIVHRDLKCENILLDDQGFLKLTDFGFANWVGIKNSLLSTFCGSVAYTAPEILMSKKYNGEQADLWSLGIILH
AMVSGKLPFKEHQPHRMLHLIRRGPIFRPRLSPECRDLIRGLLQLHPGDRLDLQQVAAHCWMLPAEHMLSSVLGATREQKHSWSAIGPDNAEPDRDTRYSRSKGSSPSSGRTSPRRASLA
QLCSTWKPAPEE

Non-kinase genes decayed in human

Many other genes have been reported in the literature to be functional in mouse and non-functional in human. Here are a selection which complement the list from the genome paper. They have not been analyzed in depth and may not correspond to the strict criteria used in the genome study.

OST-PTP

This tyrosine phosphatase is involved in osteoblast differentiation in rodents, but is non-functional in human, where the ortholog is a pseudogene, and a second psu also exists (2).

Cytochrome P450 genes

This is a dynamic family, many of whose members cluster on the chromosome. There are two cases of Cyp450 orthologs in both mouse and rodent whose closest syntenic gene in human, and closest sequence homolog, is a pseudogene (3). These are the cyp2g1 and cyp2t1 genes of rodents (the human psueogenes are annotated as CYP2G2P and CYP2T2P). In both cases, a recent duplication has made a second copy of the human pseudogene, which is also non functional, but sequence similarity and chromosomal position distinguish the ortholog from the paralog.

Neuropeptide Y receptor 6

The human form has a premature stop that causes loss of at least part of its function (see OMIM entry).

Various proteases

Many proteases are members of fast-evolving families, with extensive gene birth and death. Cases of proteases present in mouse but not human are covered in a review (4) and more extensively in a Degradome and Degradomics website from the same group. Genes lost in human inlude the aspartate protease chymosin, the apoptosis-related caspase 12, several ADAM family members (ADAM3, 4, 4B, 5, 6, 25), the testis-specific proteases TESS3 and 4 and TESP2 and 3, intestinal serine protease 1 (DISP) and implataion serine protease 2 (ISP2) and mastin. Two of these: caspse 12 and ADAM25 are mentioned in the human genome paper.

Gene loss within primate lineages

Gene death has continued even within the primate lineage, including some genes with known functional correlates to their death, and several with little functional explanation. I'm grateful to Chris Ponting for pointing out a number of these additional published examples:

 

References

1. The International Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931-45 (2004).

2. Cousin, W., Courseaux, A., Ladoux, A., Dani, C. and Peraldi, P. Cloning of hOST-PTP: the only example of a protein-tyrosine-phosphatase the function of which has been lost between rodent and human. Biochem Biophys Res Commun 321, 259-65 (2004).

3. Nelson, D. R. et al. Comparison of cytochrome P450 (CYP) genes from the mouse and human genomes, including nomenclature recommendations for genes, pseudogenes and alternative-splice variants. Pharmacogenetics 14, 1-18 (2004).

4. Puente, X. S., Sanchez, L. M., Overall, C. M. and Lopez-Otin, C. Human and mouse proteases: a comparative genomic approach. Nat Rev Genet 4, 544-58 (2003).

5. Chou, H. H., Takematsu, H., Diaz, S., Iber, J., Nickerson, E., Wright, K. L., Muchmore, E. A., Nelson, D. L., Warren, S. T., and Varki, A. A mutation in human CMP-sialic acid hydroxylase occurred after the Homo-Pan divergence. Proc. Natl. Acad. Sci. U. S. A. 95, 11751-11756 (1998).

6. Stedman HH, Kozyak BW, Nelson A, Thesier DM, Su LT, Low DW, Bridges CR, Shrager JB, Minugh-Purvis N, Mitchell MA. Myosin gene mutation correlates with anatomical changes in the human lineage. Nature 428, 415-8 (2004)

7. Stacey, M. et al. EMR4, a novel epidermal growth factor (EGF)-TM7 molecule up-regulated in activated mouse macrophages, binds to a putative cellular ligand on B lymphoma cell line A20. J Biol Chem 277, 29283-93 (2002).

8. Hamann, J. et al. Inactivation of the EGF-TM7 receptor EMR4 after the Pan-Homo divergence. Eur J Immunol 33, 1365-71 (2003).

9. Oda, M., Satta, Y., Takenaka, O. & Takahata, N. Loss of urate oxidase activity in hominoids and its evolutionary implications. Mol Biol Evol 19, 640-53 (2002).

10. Szabo, Z. et al. Sequential loss of two neighboring exons of the tropoelastin gene during primate evolution. J Mol Evol 49, 664-71 (1999).

11. Gilad, Y., Man, O., Paabo, S. and Lancet, D. Human specific loss of olfactory receptor genes. Proc Natl Acad Sci U S A 100, 3324-7 (2003).

12. Gilad, Y., Bustamante, C. D., Lancet, D. and Paabo, S. Natural selection on the olfactory receptor gene family in humans and chimpanzees. Am J Hum Genet 73, 489-501 (2003).

13. Gilad, Y. and Lancet, D. Population differences in the human functional olfactory repertoire. Mol Biol Evol 20, 307-14 (2003).