Skip Navigation

NAR Top Articles - Computational Biology

Computational Biology

View all categories

May 2015


Predicting enhancer transcription and activity from chromatin modifications
Zhu, Y; Sun, L; Chen, Z; Whitaker, JW; Wang, T; Wang, W
Nucleic Acids Res. 2013, 41, 10032-10043
Free Full Text
Enhancers play a pivotal role in regulating the transcription of distal genes. Although certain chromatin features, such as the histone acetyltransferase P300 and the histone modification H3K4me1, indicate the presence of enhancers, only a fraction of enhancers are functionally active. Individual chromatin marks, such as H3K27ac and H3K27me3, have been identified to distinguish active from inactive enhancers. However, the systematic identification of the most informative single modification, or combination thereof, is still lacking. Furthermore, the discovery of enhancer RNAs (eRNAs) provides an alternative approach to directly predicting enhancer activity. However, it remains challenging to link chromatin modifications to eRNA transcription. Herein, we develop a logistic regression model to unravel the relationship between chromatin modifications and eRNA synthesis. We perform a systematic assessment of 24 chromatin modifications in fetal lung fibroblast and demonstrate that a combination of four modifications is sufficient to accurately predict eRNA transcription. Furthermore, we compare the ability of eRNAs and H3K27ac to discriminate enhancer activity...

Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods
Varemo, L; Nielsen, J; Nookaew, I
Nucleic Acids Res. 2013, 41, 4378-4391
Free Full Text
Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods. To address this, we have developed the R package Piano that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on we refine the GSA workflow by using modifications of the gene-level statistics. This enables us to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level. We use our fully implemented workflow to investigate the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes...

miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data
An, JY; Lai, J; Lehman, ML; Nelson, CC
Nucleic Acids Res. 2013, 41, 727-737
Free Full Text
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory...

Predicting DNA methylation level across human tissues
Ma, BS; Wilker, EH; Willis-Owen, SAG; Byun, HM; Wong, KCC; Motta, V; Baccarelli, AA; Schwartz, J; Cookson, WOCM; Khabbaz, K; Mittleman, MA; Moffatt, MF; Liang, LM
Nucleic Acids Res. 2014, 42, 3515-3528
Free Full Text
Differences in methylation across tissues are critical to cell differentiation and are key to understanding the role of epigenetics in complex diseases. In this investigation, we found that locus-specific methylation differences between tissues are highly consistent across individuals. We developed a novel statistical model to predict locus-specific methylation in target tissue based on methylation in surrogate tissue. The method was evaluated in publicly available data and in two studies using the latest IlluminaBeadChips: a childhood asthma study with methylation measured in both peripheral blood leukocytes (PBL) and lymphoblastoid cell lines; and a study of postoperative atrial fibrillation with methylation in PBL, atrium and artery. We found that our method can greatly improve accuracy of cross-tissue prediction at CpG sites that are variable in the target tissue [R-2 increases from 0.38 (original R-2 between tissues) to 0.89 for PBL-to-artery prediction; from 0.39 to 0.95 for PBL-to-atrium; and from 0.81 to 0.98 for lymphoblastoid cell line-to-PBL based on cross-validation...

Interplay of microRNAs, transcription factors and target genes: linking dynamic expression changes to function
Nazarov, PV; Reinsbach, SE; Muller, A; Nicot, N; Philippidou, D; Vallar, L; Kreis, S
Nucleic Acids Res. 2013, 41, 2817-2831
Free Full Text
MicroRNAs (miRNAs) are ubiquitously expressed small non-coding RNAs that, in most cases, negatively regulate gene expression at the post-transcriptional level. miRNAs are involved in fine-tuning fundamental cellular processes such as proliferation, cell death and cell cycle control and are believed to confer robustness to biological responses. Here, we investigated simultaneously the transcriptional changes of miRNA and mRNA expression levels over time after activation of the Janus kinase/Signal transducer and activator of transcription (Jak/STAT) pathway by interferon-gamma stimulation of melanoma cells. To examine global miRNA and mRNA expression patterns, time-series microarray data were analysed. We observed delayed responses of miRNAs (after 24-48 h) with respect to mRNAs (12-24 h) and identified biological functions involved at each step of the cellular response. Inference of the upstream regulators allowed for identification of transcriptional regulators involved in cellular reactions to interferon-gamma stimulation...

Surprisingly extensive mixed phylogenetic and ecological signals among bacterial Operational Taxonomic Units
Koeppel, AF; Wu, M
Nucleic Acids Res. 2013, 41, 5175-5188
Free Full Text
The lack of a consensus bacterial species concept greatly hampers our ability to understand and organize bacterial diversity. Operational taxonomic units (OTUs), which are clustered on the basis of DNA sequence identity alone, are the most commonly used microbial diversity unit. Although it is understood that OTUs can be phylogenetically incoherent, the degree and the extent of the phylogenetic inconsistency have not been explicitly studied. Here, we tested the phylogenetic signal of OTUs in a broad range of bacterial genera from various phyla. Strikingly, we found that very few OTUs were monophyletic, and many showed evidence of multiple independent origins. Using previously established bacterial habitats as benchmarks, we showed that OTUs frequently spanned multiple ecological habitats. We demonstrated that ecological heterogeneity within OTUs is caused by their phylogenetic inconsistency, and not merely due to 'lumping' of taxa resulting from using relaxed identity cut-offs. We argue that ecotypes, as described by the Stable Ecotype Model, are phylogenetically and ecologically more consistent than OTUs and therefore could serve as an alternative unit for bacterial diversity studies...

A high-resolution network model for global gene regulation in Mycobacterium tuberculosis
Peterson, EJR; Reiss, DJ; Turkarslan, S; Minch, KJ; Rustad, T; Plaisier, CL; Longabaugh, WJR; Sherman, DR; Baliga, NS
Nucleic Acids Res. 2014, 42, 11291-11303
Free Full Text
The resilience of Mycobacterium tuberculosis (MTB) is largely due to its ability to effectively counteract and even take advantage of the hostile environments of a host. In order to accelerate the discovery and characterization of these adaptive mechanisms, we have mined a compendium of 2325 publicly available transcriptome profiles of MTB to decipher a predictive, systems-scale gene regulatory network model. The resulting modular organization of 98% of all MTB genes within this regulatory network was rigorously tested using two independently generated datasets: a genome-wide map of 7248 DNA-binding locations for 143 transcription factors (TFs) and global transcriptional consequences of over-expressing 206 TFs. This analysis has discovered specific TFs that mediate conditional co-regulation of genes within 240 modules across 14 distinct environmental contexts. In addition to recapitulating previously characterized regulons, we discovered 454 novel mechanisms for gene regulation during stress, cholesterol utilization and dormancy...

Computational prediction of the localization of microRNAs within their pre-miRNA
Leclercq, M; Diallo, AB; Blanchette, M
Nucleic Acids Res. 2013, 41, 7200-7211
Free Full Text
MicroRNAs (miRNAs) are short RNA species derived from hairpin-forming miRNA precursors (pre-miRNA) and acting as key posttranscriptional regulators. Most computational tools labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. Sequence and structural features that determine the location of the miRNA, and the extent to which these properties vary from species to species, are poorly understood. We have developed miRdup, a computational predictor for the identification of the most likely miRNA location within a given pre-miRNA or the validation of a candidate miRNA. MiRdup is based on a random forest classifier trained with experimentally validated miRNAs from miRbase, with features that characterize the miRNA-miRNA* duplex. Because we observed that miRNAs have sequence and structural properties that differ between species, mostly in terms of duplex stability, we trained various clade-specific miRdup models and obtained increased accuracy. MiRdup self-trains on the most recent version of miRbase and is easy to use...

Deciphering the rules by which dynamics of mRNA secondary structure affect translation efficiency in Saccharomyces cerevisiae
Mao, YH; Liu, HL; Liu, YL; Tao, SH
Nucleic Acids Res. 2014, 42, 4813-4822
Free Full Text
Messenger RNA (mRNA) secondary structure decreases the elongation rate, as ribosomes must unwind every structure they encounter during translation. Therefore, the strength of mRNA secondary structure is assumed to be reduced in highly translated mRNAs. However, previous studies in vitro reported a positive correlation between mRNA folding strength and protein abundance. The counterintuitive finding suggests that mRNA secondary structure affects translation efficiency in an undetermined manner. Here, we analyzed the folding behavior of mRNA during translation and its effect on translation efficiency. We simulated translation process based on a novel computational model, taking into account the interactions among ribosomes, codon usage and mRNA secondary structures. We showed that mRNA secondary structure shortens ribosomal distance through the dynamics of folding strength. Notably, when adjacent ribosomes are close, mRNA secondary structures between them disappear, and codon usage determines the elongation rate...

Translating mRNAs strongly correlate to proteins in a multivariate manner and their translation ratios are phenotype specific
Wang, T; Cui, YZ; Jin, JJ; Guo, JH; Wang, GB; Yin, XF; He, QY; Zhang, G
Nucleic Acids Res. 2013, 41, 4743-4754
Free Full Text
As a well-known phenomenon, total mRNAs poorly correlate to proteins in their abundances as reported. Recent findings calculated with bivariate models suggested even poorer such correlation, whereas focusing on the translating mRNAs (ribosome nascent-chain complex-bound mRNAs, RNC-mRNAs) subset. In this study, we analysed the relative abundances of mRNAs, RNC-mRNAs and proteins on genome-wide scale, comparing human lung cancer A549 and H1299 cells with normal human bronchial epithelial (HBE) cells, respectively. As discovered, a strong correlation between RNC-mRNAs and proteins in their relative abundances could be established through a multivariate linear model by integrating the mRNA length as a key factor. The R-2 reached 0.94 and 0.97 in A549 versus HBE and H1299 versus HBE comparisons, respectively. This correlation highlighted that the mRNA length significantly contributes to the translational modulation, especially to the translational initiation, favoured by its correlation with the mRNA translation ratio (TR) as observed...

Back to the top