The Primate TFome
We recently determined that there are more than 3000 Gene Regulatory Factors (GRFs), including ~1500 DNA-binding transcription factors (TFs), co-factors, hormone receptors, histone-modifying enzymes etc., in the human genome. The number of GRFs is still unknown in most other sequenced genomes. The most significant problem for determining the exact TF content in the other genomes is the insufficient quality of their draft genomes, which makes it difficult to identify TFs in complicated genome areas. Furthermore, the lack of transcript information (RNA-Seq data, mRNA, cDNA, or EST sequences) makes it difficult to determine the sequence of the transcribed genes, because the prediction of promoters, open reading frames, and splice sites has to be done purely based on genomic features and conservation to other species. We take advantage of improved genomic information and increasing amounts of transcript data provided by RNA-Seq to computationally identify all TFs in primate genomes and to manually curate gene models for TFs in a number of primate species. We are using our high-quality TF gene models to reveal lineage- and species-specific TFs, TFs that have lineage- or species-specific changes in functional domains, and TFs under positive selection.
Comparative Functional Characterization of Transcription Factors
Only a small proportion of TFs has been functionally characterized. Very little is known about many gene families, and this situation is especially dramatic for the biggest TF family in mammalian genomes: the KRAB-ZNFs. The importance of TFs for phenotypic differences and speciation has been established for various examples (e.g. PRDM9, FOXP2, EGR1, BMP4). Several KRAB-ZNFs have been implicated in brain and cognitive development. We are focusing on human-specific TFs, TFs with human-specific domain changes, and TFs that are connected in gene regulatory networks in a human-specific way to determine experimentally their evolutionary impact. We perform for instance ChIP-Seq experiments in human and non-human primate cell lines to identify the binding sites of the TFs in both species. Furthermore, we manipulate expression levels of the TFs in cell lines of both species (knock-down and overexpression) followed by RNA-Seq to determine downstream targets. These experiments will not only give us insight into the function of the selected TFs, but more importantly, insight into their functional changes during evolution.
Evolution of Transcription Factor Networks
TFs regulate their target genes in a concerted, combinatorial fashion, thus forming often large and complex gene regulatory networks. Little is known about the evolution of such networks, about the amount of noise or redundancy in such networks, and the importance of gain or loss of nodes (genes) or links (interactions). Based on transcriptome information, we have previously identified a network of TFs that is active in the prefrontal cortex and is characterized by significant link changes between humans and chimpanzees. It appears that this network was involved in shaping some phenotypic differences, such as the larger human brain and its higher energy consumption. We are now investigating this TF network in other primates to reveal its evolutionary history. Furthermore we are interested in network differences underlying cognitive disorders.
Long Non-Coding RNAs in Primate Brain Evolution
Long non-coding RNAs (lncRNAs) are emerging as key players in the nervous system. Many of the about 15.000 human lncRNAs are expressed in the brain and multiple lines of evidence have linked them to important brain functions, such as neurogenesis and behavior, or have associated them with neurodegenerative and psychiatric diseases. Although several databases for lncRNAs exist, there is still a large gap in the structural and functional annotation of lncRNAs hindering a full understanding of their role in the nervous system. Many characteristics of the brain are human specific. Genes that evolve quickly, as lncRNAs do, are therefore the best candidates to be primarily responsible for the evolution of these innovations. Since biological function has to be studied in the light of evolution, we aim here at establishing a full catalog of human lncRNAs, including an annotation of their sequence, structure, expression, network integration and evolutionary changes by collating and coherently re-analyzing the wealth of already available high throughout data.