August 2015

Predicting enhancer transcription and activity from chromatin modifications
Zhu, Y; Sun, L; Chen, Z; Whitaker, JW; Wang, T; Wang, W
Nucleic Acids Res. 2013, 41, 10032-10043
Enhancers play a pivotal role in regulating the transcription of distal genes. Although certain chromatin features, such as the histone acetyltransferase P300 and the histone modification H3K4me1, indicate the presence of enhancers, only a fraction of enhancers are functionally active. Individual chromatin marks, such as H3K27ac and H3K27me3, have been identified to distinguish active from inactive enhancers. However, the systematic identification of the most informative single modification, or combination thereof, is still lacking. Furthermore, the discovery of enhancer RNAs (eRNAs) provides an alternative approach to directly predicting enhancer activity. However, it remains challenging to link chromatin modifications to eRNA transcription. Herein, we develop a logistic regression model to unravel the relationship between chromatin modifications and eRNA synthesis. We perform a systematic assessment of 24 chromatin modifications in fetal lung fibroblast and demonstrate that a combination of four modifications is sufficient to accurately predict eRNA transcription. Furthermore, we compare the ability of eRNAs and H3K27ac to discriminate enhancer activity...

miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data
An, JY; Lai, J; Lehman, ML; Nelson, CC
Nucleic Acids Res. 2013, 41, 727-737
miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory...

Surprisingly extensive mixed phylogenetic and ecological signals among bacterial Operational Taxonomic Units
Koeppel, AF; Wu, M
Nucleic Acids Res. 2013, 41, 5175-5188
The lack of a consensus bacterial species concept greatly hampers our ability to understand and organize bacterial diversity. Operational taxonomic units (OTUs), which are clustered on the basis of DNA sequence identity alone, are the most commonly used microbial diversity unit. Although it is understood that OTUs can be phylogenetically incoherent, the degree and the extent of the phylogenetic inconsistency have not been explicitly studied. Here, we tested the phylogenetic signal of OTUs in a broad range of bacterial genera from various phyla. Strikingly, we found that very few OTUs were monophyletic, and many showed evidence of multiple independent origins. Using previously established bacterial habitats as benchmarks, we showed that OTUs frequently spanned multiple ecological habitats. We demonstrated that ecological heterogeneity within OTUs is caused by their phylogenetic inconsistency, and not merely due to 'lumping' of taxa resulting from using relaxed identity cut-offs. We argue that ecotypes, as described by the Stable Ecotype Model, are phylogenetically and ecologically more consistent than OTUs and therefore could serve as an alternative unit for bacterial diversity studies...

The genome-wide distribution of non-B DNA motifs is shaped by operon structure and suggests the transcriptional importance of non-B DNA structures in Escherichia coli
Du, XJ; Wojtowicz, D; Bowers, AA; Levens, D; Benham, CJ; Przytycka, TM
Nucleic Acids Res. 2013, 41, 5965-5977
Although the right-handed double helical B-form DNA is most common under physiological conditions, DNA is dynamic and can adopt a number of alternative structures, such as the four-stranded G-quadruplex, left-handed Z-DNA, cruciform and others. Active transcription necessitates strand separation and can induce such non-canonical forms at susceptible genomic sequences. Therefore, it has been speculated that these non-B DNA motifs can play regulatory roles in gene transcription. Such conjecture has been supported in higher eukaryotes by direct studies of several individual genes, as well as a number of large-scale analyses. However, the role of non-B DNA structures in many lower organisms, in particular proteobacteria, remains poorly understood and incompletely documented. In this study, we performed the first comprehensive study of the occurrence of B DNA-non-B DNA transition-susceptible sites (non-B DNA motifs) within the context of the operon structure of the Escherichia coli genome...

TEMP: a computational method for analyzing transposable element polymorphism in populations
Zhuang, JL; Wang, J; Theurkauf, W; Weng, ZP
Nucleic Acids Res. 2014, 42, 6826-6838
Insertions and excisions of transposable elements (TEs) affect both the stability and variability of the genome. Studying the dynamics of transposition at the population level can provide crucial insights into the processes and mechanisms of genome evolution. Pooling genomic materials from multiple individuals followed by high-throughput sequencing is an efficient way of characterizing genomic polymorphisms in a population. Here we describe a novel method named TEMP, specifically designed to detect TE movements present with a wide range of frequencies in a population. By combining the information provided by pair-end reads and split reads, TEMP is able to identify both the presence and absence of TE insertions in genomic DNA sequences derived from heterogeneous samples; accurately estimate the frequencies of transposition events in the population and pinpoint junctions of high frequency transposition events at nucleotide resolution. Simulation data indicate that TEMP outperforms other algorithms such as PoPoolationTE, RetroSeq, VariationHunter and GASVPro. TEMP also performs well on whole-genome human data derived from the 1000 Genomes Project...

Enriching the gene set analysis of genome-wide data by incorporating directionality of gene expression and combining statistical hypotheses and methods
Varemo, L; Nielsen, J; Nookaew, I
Nucleic Acids Res. 2013, 41, 4378-4391
Gene set analysis (GSA) is used to elucidate genome-wide data, in particular transcriptome data. A multitude of methods have been proposed for this step of the analysis, and many of them have been compared and evaluated. Unfortunately, there is no consolidated opinion regarding what methods should be preferred, and the variety of available GSA software and implementations pose a difficulty for the end-user who wants to try out different methods. To address this, we have developed the R package Piano that collects a range of GSA methods into the same system, for the benefit of the end-user. Further on we refine the GSA workflow by using modifications of the gene-level statistics. This enables us to divide the resulting gene set P-values into three classes, describing different aspects of gene expression directionality at gene set level. We use our fully implemented workflow to investigate the impact of the individual components of GSA by using microarray and RNA-seq data. The results show that the evaluated methods are globally similar and the major separation correlates well with our defined directionality classes...

Molecular evolutionary and structural analysis of the cytosolic DNA sensor cGAS and STING
Wu, XM; Wu, FH; Wang, XQ; Wang, LL; Siedow, JN; Zhang, WG; Pei, ZM
Nucleic Acids Res. 2014, 42, 8243-8257
Cyclic GMP-AMP (cGAMP) synthase (cGAS) is recently identified as a cytosolic DNA sensor and generates a non-canonical cGAMP that contains G(2',5')pA and A(3',5')pG phosphodiester linkages. cGAMP activates STING which triggers innate immune responses in mammals. However, the evolutionary functions and origins of cGAS and STING remain largely elusive. Here, we carried out comprehensive evolutionary analyses of the cGAS-STING pathway. Phylogenetic analysis of cGAS and STING families showed that their origins could be traced back to a choanoflagellate Monosiga brevicollis. Modern cGAS and STING may have acquired structural features, including zinc-ribbon domain and critical amino acid residues for DNA binding in cGAS as well as carboxy terminal tail domain for transducing signals in STING, only recently in vertebrates. In invertebrates, cGAS homologs may not act as DNA sensors. Both proteins cooperate extensively, have similar evolutionary characteristics, and thus may have co-evolved during metazoan evolution. cGAS homologs and a prokaryotic dinucleotide cyclase for canonical cGAMP share conserved secondary structures and catalytic residues...

Interplay of microRNAs, transcription factors and target genes: linking dynamic expression changes to function
Nazarov, PV; Reinsbach, SE; Muller, A; Nicot, N; Philippidou, D; Vallar, L; Kreis, S
Nucleic Acids Res. 2013, 41, 2817-2831
MicroRNAs (miRNAs) are ubiquitously expressed small non-coding RNAs that, in most cases, negatively regulate gene expression at the post-transcriptional level. miRNAs are involved in fine-tuning fundamental cellular processes such as proliferation, cell death and cell cycle control and are believed to confer robustness to biological responses. Here, we investigated simultaneously the transcriptional changes of miRNA and mRNA expression levels over time after activation of the Janus kinase/Signal transducer and activator of transcription (Jak/STAT) pathway by interferon-gamma stimulation of melanoma cells. To examine global miRNA and mRNA expression patterns, time-series microarray data were analysed. We observed delayed responses of miRNAs (after 24-48 h) with respect to mRNAs (12-24 h) and identified biological functions involved at each step of the cellular response. Inference of the upstream regulators allowed for identification of transcriptional regulators involved in cellular reactions to interferon-gamma stimulation...

The elusive evidence for chromothripsis
Kinsella, M; Patel, A; Bafna, V
Nucleic Acids Res. 2014, 42, 8231-8242
The chromothripsis hypothesis suggests an extraordinary one-step catastrophic genomic event allowing a chromosome to 'shatter into many pieces' and reassemble into a functioning chromosome. Recent efforts have aimed to detect chromothripsis by looking for a genomic signature, characterized by a large number of breakpoints (50-250), but a limited number of oscillating copy number states (2-3) confined to a few chromosomes. The chromothripsis phenomenon has become widely reported in different cancers, but using inconsistent and sometimes relaxed criteria for determining rearrangements occur simultaneously rather than progressively. We revisit the original simulation approach and show that the signature is not clearly exceptional, and can be explained using only progressive rearrangements. For example, 3.9% of progressively simulated chromosomes with 50-55 breakpoints were dominated by two or three copy number states. In addition, by adjusting the parameters of the simulation, the proposed footprint appears more frequently. Lastly, we provide an algorithm to find a sequence of progressive rearrangements that explains all observed breakpoints from a proposed chromothripsis chromosome...

A high-resolution network model for global gene regulation in Mycobacterium tuberculosis
Peterson, EJR; Reiss, DJ; Turkarslan, S; Minch, KJ; Rustad, T; Plaisier, CL; Longabaugh, WJR; Sherman, DR; Baliga, NS
Nucleic Acids Res. 2014, 42, 11291-11303
The resilience of Mycobacterium tuberculosis (MTB) is largely due to its ability to effectively counteract and even take advantage of the hostile environments of a host. In order to accelerate the discovery and characterization of these adaptive mechanisms, we have mined a compendium of 2325 publicly available transcriptome profiles of MTB to decipher a predictive, systems-scale gene regulatory network model. The resulting modular organization of 98% of all MTB genes within this regulatory network was rigorously tested using two independently generated datasets: a genome-wide map of 7248 DNA-binding locations for 143 transcription factors (TFs) and global transcriptional consequences of over-expressing 206 TFs. This analysis has discovered specific TFs that mediate conditional co-regulation of genes within 240 modules across 14 distinct environmental contexts. In addition to recapitulating previously characterized regulons, we discovered 454 novel mechanisms for gene regulation during stress, cholesterol utilization and dormancy...

