Skip Navigation

NAR Top Articles - Computational Biology

Computational Biology

View all categories

January 2015

Long non-coding RNA identification over mouse brain development by integrative modeling of chromatin and genomic features
J. Lv, H. Liu, Z. Huang, J. Su, H. He, Y. Xiu, Y. Zhang and Q. Wu
Nucleic Acids Res. (2013) 41 (22): 10044-10061
Free Full Text
In silico prediction of genomic long non-coding RNAs (lncRNAs) is prerequisite to the construction and elucidation of non-coding regulatory network. Chromatin modifications marked by chromatin regulators are important epigenetic features, which can be captured by prevailing high-throughput approaches such as ChIP sequencing. We demonstrate that the accuracy of lncRNA predictions can be greatly improved when incorporating high-throughput chromatin modifications over mouse embryonic stem differentiation toward adult Cerebellum by logistic regression with LASSO regularization. The discriminating features include H3K9me3, H3K27ac, H3K4me1, open reading frames and several repeat elements. Importantly, chromatin information is suggested to be complementary to genomic sequence information, highlighting the importance of an integrated model. Applying integrated model, we obtain a list of putative lncRNAs based on uncharacterized fragments from transcriptome assembly. We demonstrate that the putative lncRNAs have regulatory roles in vicinity of known gene loci by expression and Gene Ontology enrichment analysis...

A high-resolution network model for global gene regulation in Mycobacterium tuberculosis
E. J. Peterson, D. J. Reiss, S. Turkarslan, K. J. Minch, T. Rustad, C. L. Plaisier, W. J. Longabaugh, D. R. Sherman and N. S. Baliga
Nucleic Acids Res. (2014) 42 (18): 11291-11303
Free Full Text
The resilience of Mycobacterium tuberculosis (MTB) is largely due to its ability to effectively counteract and even take advantage of the hostile environments of a host. In order to accelerate the discovery and characterization of these adaptive mechanisms, we have mined a compendium of 2325 publicly available transcriptome profiles of MTB to decipher a predictive, systems-scale gene regulatory network model. The resulting modular organization of 98% of all MTB genes within this regulatory network was rigorously tested using two independently generated datasets: a genome-wide map of 7248 DNA-binding locations for 143 transcription factors (TFs) and global transcriptional consequences of overexpressing 206 TFs. This analysis has discovered specific TFs that mediate conditional co-regulation of genes within 240 modules across 14 distinct environmental contexts. In addition to recapitulating previously characterized regulons, we discovered 454 novel mechanisms for gene regulation during stress, cholesterol utilization and dormancy...

Predicting DNA methylation level across human tissues
B. Ma, E. H. Wilker, S. A. Willis-Owen, H. M. Byun, K. C. Wong, V. Motta, A. A. Baccarelli, J. Schwartz, W. O. Cookson, K. Khabbaz, M. A. Mittleman, M. F. Moffatt and L. Liang
Nucleic Acids Res. (2014) 42 (6): 3515-3528
Free Full Text
Differences in methylation across tissues are critical to cell differentiation and are key to understanding the role of epigenetics in complex diseases. In this investigation, we found that locus-specific methylation differences between tissues are highly consistent across individuals. We developed a novel statistical model to predict locus-specific methylation in target tissue based on methylation in surrogate tissue. The method was evaluated in publicly available data and in two studies using the latest IlluminaBeadChips: a childhood asthma study with methylation measured in both peripheral blood leukocytes (PBL) and lymphoblastoid cell lines; and a study of postoperative atrial fibrillation with methylation in PBL, atrium and artery. We found that our method can greatly improve accuracy of cross-tissue prediction at CpG sites that are variable in the target tissue [R(2) increases from 0.38 (original R(2) between tissues) to 0.89 for PBL-to-artery prediction; from 0.39 to 0.95 for PBL-to-atrium; and from 0.81 to 0.98 for lymphoblastoid cell line-to-PBL based on cross-validation...

qDNAmod: a statistical model-based tool to reveal intercellular heterogeneity of DNA modification from SMRT sequencing data
Z. Feng, J. Li, J. R. Zhang and X. Zhang
Nucleic Acids Res. (2014) 42 (22): 13488-13499
Free Full Text
In an isogenic cell population, phenotypic heterogeneity among individual cells is common and critical for survival of the population under different environment conditions. DNA modification is an important epigenetic factor that can regulate phenotypic heterogeneity. The single molecule real-time (SMRT) sequencing technology provides a unique platform for detecting a wide range of DNA modifications, including N6-methyladenine (6-mA), N4-methylcytosine (4-mC) and 5-methylcytosine (5-mC). Here we present qDNAmod, a novel bioinformatic tool for genome-wide quantitative profiling of intercellular heterogeneity of DNA modification from SMRT sequencing data. It is capable of estimating proportion of isogenic haploid cells, in which the same loci of the genome are differentially modified. We tested the reliability of qDNAmod with the SMRT sequencing data of Streptococcus pneumoniae strain ST556. qDNAmod detected extensive intercellular heterogeneity of DNA methylation (6-mA) in a clonal population of ST556...

Prediction of DNA binding motifs from 3D models of transcription factors; identifying TLX3 regulated genes
M. Pujato, F. Kieken, A. A. Skiles, N. Tapinos and A. Fiser
Nucleic Acids Res. (2014) 42 (22): 13500-13512
Free Full Text
Proper cell functioning depends on the precise spatio-temporal expression of its genetic material. Gene expression is controlled to a great extent by sequence-specific transcription factors (TFs). Our current knowledge on where and how TFs bind and associate to regulate gene expression is incomplete. A structure-based computational algorithm (TF2DNA) is developed to identify binding specificities of TFs. The method constructs homology models of TFs bound to DNA and assesses the relative binding affinity for all possible DNA sequences using a knowledge-based potential, after optimization in a molecular mechanics force field. TF2DNA predictions were benchmarked against experimentally determined binding motifs. Success rates range from 45% to 81% and primarily depend on the sequence identity of aligned target sequences and template structures, TF2DNA was used to predict 1321 motifs for 1825 putative human TF proteins, facilitating the reconstruction of most of the human gene regulatory network. As an illustration, the predicted DNA binding site for the poorly characterized T-cell leukemia homeobox 3 (TLX3) TF was confirmed with gel shift assay experiments.

Predicting enhancer transcription and activity from chromatin modifications
Y. Zhu, L. Sun, Z. Chen, J. W. Whitaker, T. Wang and W. Wang
Nucleic Acids Res. (2013) 41 (22): 10032-10043
Free Full Text
Enhancers play a pivotal role in regulating the transcription of distal genes. Although certain chromatin features, such as the histone acetyltransferase P300 and the histone modification H3K4me1, indicate the presence of enhancers, only a fraction of enhancers are functionally active. Individual chromatin marks, such as H3K27ac and H3K27me3, have been identified to distinguish active from inactive enhancers. However, the systematic identification of the most informative single modification, or combination thereof, is still lacking. Furthermore, the discovery of enhancer RNAs (eRNAs) provides an alternative approach to directly predicting enhancer activity. However, it remains challenging to link chromatin modifications to eRNA transcription. Herein, we develop a logistic regression model to unravel the relationship between chromatin modifications and eRNA synthesis. We perform a systematic assessment of 24 chromatin modifications in fetal lung fibroblast and demonstrate that a combination of four modifications is sufficient to accurately predict eRNA transcription. Furthermore, we compare the ability of eRNAs and H3K27ac to discriminate enhancer activity...

Constrained transcription factor spacing is prevalent and important for transcriptional control of mouse blood cells
F. S. Ng, J. Schutte, D. Ruau, E. Diamanti, R. Hannah, S. J. Kinston and B. Gottgens
Nucleic Acids Res. (2014) 42 (22): 13513-13524
Free Full Text
Combinatorial transcription factor (TF) binding is essential for cell-type-specific gene regulation. However, much remains to be learned about the mechanisms of TF interactions, including to what extent constrained spacing and orientation of interacting TFs are critical for regulatory element activity. To examine the relative prevalence of the 'enhanceosome' versus the 'TF collective' model of combinatorial TF binding, a comprehensive analysis of TF binding site sequences in large scale datasets is necessary. We developed a motif-pair discovery pipeline to identify motif co-occurrences with preferential distance(s) between motifs in TF-bound regions. Utilizing a compendium of 289 mouse haematopoietic TF ChIP-seq datasets, we demonstrate that haematopoietic-related motif-pairs commonly occur with highly conserved constrained spacing and orientation between motifs. Furthermore, motif clustering revealed specific associations for both heterotypic and homotypic motif-pairs with particular haematopoietic cell types...

Profiling the transcription factor regulatory networks of human cell types
S. Zhang, D. Tian, N. H. Tran, K. P. Choi and L. Zhang
Nucleic Acids Res. (2014) 42 (20): 12380-12387
Free Full Text
Neph et al. (2012) (Circuitry and dynamics of human transcription factor regulatory networks. Cell, 150: 1274-1286) reported the transcription factor (TF) regulatory networks of 41 human cell types using the DNaseI footprinting technique. This provides a valuable resource for uncovering regulation principles in different human cells. In this paper, the architectures of the 41 regulatory networks and the distributions of housekeeping and specific regulatory interactions are investigated. The TF regulatory networks of different human cell types demonstrate similar global three-layer (top, core and bottom) hierarchical architectures, which are greatly different from the yeast TF regulatory network. However, they have distinguishable local organizations, as suggested by the fact that wiring patterns of only a few TFs are enough to distinguish cell identities. The TF regulatory network of human embryonic stem cells (hESCs) is dense and enriched with interactions that are unseen in the networks of other cell types. The examination of specific regulatory interactions suggests that specific interactions play important roles in hESCs.

Modeling epigenome folding: formation and dynamics of topologically associated chromatin domains
D. Jost, P. Carrivain, G. Cavalli and C. Vaillant
Nucleic Acids Res. (2014) 42 (15): 9553-9561
Free Full Text
Genomes of eukaryotes are partitioned into domains of functionally distinct chromatin states. These domains are stably inherited across many cell generations and can be remodeled in response to developmental and external cues, hence contributing to the robustness and plasticity of expression patterns and cell phenotypes. Remarkably, recent studies indicate that these 1D epigenomic domains tend to fold into 3D topologically associated domains forming specialized nuclear chromatin compartments. However, the general mechanisms behind such compartmentalization including the contribution of epigenetic regulation remain unclear. Here, we address the question of the coupling between chromatin folding and epigenome. Using polymer physics, we analyze the properties of a block copolymer model that accounts for local epigenomic information. Considering copolymers build from the epigenomic landscape of Drosophila, we observe a very good agreement with the folding patterns observed in chromosome conformation capture experiments. Moreover, this model provides a physical basis for the existence of multistability in epigenome folding at sub-chromosomal scale...

Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods
L. Varemo, J. Nielsen and I. Nookaew
Nucleic Acids Res. (2013) 41 (8): 4378-4391
Free Full Text
Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods. To address this, we have developed the R package Piano that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on we refine the GSA workflow by using modifications of the gene-level statistics. This enables us to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level. We use our fully implemented workflow to investigate the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes...

Back to the top