Skip Navigation

NAR Top Articles - Computational Biology

Computational Biology

View all categories

December 2013


MiRNA-miRNA synergistic network: construction via co-regulating functional modules and disease miRNA topological features
Xu, JA; Li, CX; Li, YS; Lv, JY; Ma, Y; Shao, TT; Xu, LD; Wang, YY; Du, L; Zhang, YP; Jiang, W; Li, CQ; Xiao, Y; Li, X
Nucleic Acids Res. (2011) 39 (3): 825-836
Free Full Text
Synergistic regulations among multiple microRNAs (miRNAs) are important to understand the mechanisms of complex post-transcriptional regulations in humans. Complex diseases are affected by several miRNAs rather than a single miRNA. So, it is a challenge to identify miRNA synergism and thereby further determine miRNA functions at a system-wide level and investigate disease miRNA features in the miRNA-miRNA synergistic network from a new view. Here, we constructed a miRNA-miRNA functional synergistic network (MFSN) via co-regulating functional modules that have three features: common targets of corresponding miRNA pairs, enriched in the same gene ontology category and close proximity in the protein interaction network. Predicted miRNA synergism is validated by significantly high co-expression of functional modules and significantly negative regulation to functional modules. We found that the MFSN exhibits a scale free, small world and modular architecture. Furthermore, the topological features of disease miRNAs in the MFSN are distinct from non-disease miRNAs...

A comprehensive comparison of RNA-Seq-based transcriptome analysis from reads to differential gene expression and cross-comparison with microarrays: a case study in Saccharomyces cerevisiae
Nookaew, I; Papini, M; Pornputtapong, N; Scalcinati, G; Fagerberg, L; Uhlen, M; Nielsen, J
Nucleic Acids Res. (2012) 40 (20): 10084-10097
Free Full Text
RNA-seq, has recently become an attractive method of choice in the studies of transcriptomes, promising several advantages compared with microarrays. In this study, we sought to assess the contribution of the different analytical steps involved in the analysis of RNA-seq data generated with the Illumina platform, and to perform a cross-platform comparison based on the results obtained through Affymetrix microarray. As a case study for our work we, used the Saccharomyces cerevisiae strain CEN.PK 113-7D, grown under two different conditions (batch and chemostat). Here, we asses the influence of genetic variation on the estimation of gene expression level using three different aligners for read-mapping (Gsnap, Stampy and TopHat) on S288c genome, the capabilities of five different statistical methods to detect differential gene expression (baySeq, Cuffdiff, DESeq, edgeR and NOISeq) and we explored the consistency between RNA-seq analysis using reference genome and de novo assembly approach...

Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
McCarthy, DJ; Chen, YS; Smyth, GK
Nucleic Acids Res. (2012) 40 (10): 4288-4297
Free Full Text
A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability...

miRDeep2 accurately identifies known and hundreds of novel microRNA genes in seven animal clades
Friedlander, MR; Mackowiak, SD; Li, N; Chen, W; Rajewsky, N
Nucleic Acids Res. (2012) 40 (1): 37-52
Free Full Text
microRNAs (miRNAs) are a large class of small non-coding RNAs which post-transcriptionally regulate the expression of a large fraction of all animal genes and are important in a wide range of biological processes. Recent advances in high-throughput sequencing allow miRNA detection at unprecedented sensitivity, but the computational task of accurately identifying the miRNAs in the background of sequenced RNAs remains challenging. For this purpose, we have designed miRDeep2, a substantially improved algorithm which identifies canonical and non-canonical miRNAs such as those derived from transposable elements and informs on high-confidence candidates that are detected in multiple independent samples. Analyzing data from seven animal species representing the major animal clades, miRDeep2 identified miRNAs with an accuracy of 98.6-99.9% and reported hundreds of novel miRNAs. To test the accuracy of miRDeep2, we knocked down the miRNA biogenesis pathway in a human cell line and sequenced small RNAs before and after. The vast majority of the >100 novel miRNAs expressed in this cell line were indeed specifically downregulated, validating most miRDeep2 predictions...

Evolution and function of CAG/polyglutamine repeats in protein-protein interaction networks
Schaefer, MH; Wanker, EE; Andrade-Navarro, MA
Nucleic Acids Res. (2012) 40 (10): 4273-4287
Free Full Text
Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions...

Quantitative prediction of 3D solution shape and flexibility of nucleic acid nanostructures
Kim, DN; Kilchherr, F; Dietz, H; Bathe, M
Nucleic Acids Res. (2012) 40 (7): 2862-2868
Free Full Text
DNA nanotechnology enables the programmed synthesis of intricate nanometer-scale structures for diverse applications in materials and biological science. Precise control over the 3D solution shape and mechanical flexibility of target designs is important to achieve desired functionality. Because experimental validation of designed nanostructures is time-consuming and cost-intensive, predictive physical models of nanostructure shape and flexibility have the capacity to enhance dramatically the design process. Here, we significantly extend and experimentally validate a computational modeling framework for DNA origami previously presented as CanDo [Castro,C.E., Kilchherr,F., Kim,D.-N., Shiao,E.L., Wauer,T., Wortmann,P., Bathe,M., Dietz,H. (2011) A primer to scaffolded DNA origami. Nat. Meth., 8, 221-229.]. 3D solution shape and flexibility are predicted from basepair connectivity maps now accounting for nicks in the DNA double helix, entropic elasticity of single-stranded DNA, and distant crossovers required to model wireframe structures, in addition to previous modeling (Castro,C.E., et al.) that accounted only for the canonical twist, bend and stretch stiffness of double-helical DNA domains

Discovery of multi-dimensional modules by integrative analysis of cancer genomic data
Zhang, SH; Liu, CC; Li, WY; Shen, H; Laird, PW; Zhou, XJ
Nucleic Acids Res. (2012) 40 (19): 9379-9391
Free Full Text
Recent technology has made it possible to simultaneously perform multi-platform genomic profiling (e.g. DNA methylation (DM) and gene expression (GE)) of biological samples, resulting in so-called 'multi-dimensional genomic data'. Such data provide unique opportunities to study the coordination between regulatory mechanisms on multiple levels. However, integrative analysis of multi-dimensional genomics data for the discovery of combinatorial patterns is currently lacking. Here, we adopt a joint matrix factorization technique to address this challenge. This method projects multiple types of genomic data onto a common coordinate system, in which heterogeneous variables weighted highly in the same projected direction form a multi-dimensional module (md-module). Genomic variables in such modules are characterized by significant correlations and likely functional associations. We applied this method to the DM, GE, and microRNA expression data of 385 ovarian cancer samples from the The Cancer Genome Atlas project. These md-modules revealed perturbed pathways that would have been overlooked with only a single type of data...

Gene network inference and visualization tools for biologists: application to new human transcriptome datasets
Hurley, D; Araki, H; Tamada, Y; Dunmore, B; Sanders, D; Humphreys, S; Affara, M; Imoto, S; Yasuda, K; Tomiyasu, Y; Tashiro, K; Savoie, C; Cho, VK; Smith, S; Kuhara, S; Miyano, S; Charnock-Jones, DS; Crampin, EJ; Print, CG
Nucleic Acids Res. (2012) 40 (6): 2377-2398
Free Full Text
Gene regulatory networks inferred from RNA abundance data have generated significant interest, but despite this, gene network approaches are used infrequently and often require input from bioinformaticians. We have assembled a suite of tools for analysing regulatory networks, and we illustrate their use with microarray datasets generated in human endothelial cells. We infer a range of regulatory networks, and based on this analysis discuss the strengths and limitations of network inference from RNA abundance data. We welcome contact from researchers interested in using our inference and visualization tools to answer biological questions.

Performance comparison and evaluation of software tools for microRNA deep-sequencing data analysis
Li, Y; Zhang, Z; Liu, F; Vongsangnak, W; Jing, Q; Shen, BR
Nucleic Acids Res. (2012) 40 (10): 4298-4305
Free Full Text
With the development of next-generation sequencing (NGS) techniques, many software tools have emerged for the discovery of novel microRNAs (miRNAs) and for analyzing the miRNAs expression profiles. An overall evaluation of these diverse software tools is lacking. In this study, we evaluated eight software tools based on their common feature and key algorithms. Three deep-sequencing data sets were collected from different species and used to assess the computational time, sensitivity and accuracy of detecting known miRNAs as well as their capacity for predicting novel miRNAs. Our results provide useful information for researchers to facilitate their selection of the optimal software tools for miRNA analysis depending on their specific requirements, i.e. novel miRNAs discovery or miRNA expression profile analysis of sequencing data sets.

Genome-wide analysis reveals distinct patterns of epigenetic features in long non-coding RNA loci
Sati, S; Ghosh, S; Jain, V; Scaria, V; Sengupta, S
Nucleic Acids Res. (2012) 40 (20): 10018-10031
Free Full Text
A major fraction of the transcriptome of higher organisms comprised an extensive repertoire of long non-coding RNA (lncRNA) which express in a cell type and development stage-specific manner. While lncRNAs are a proven component of epigenetic gene expression modulation, epigenetic regulation of lncRNA itself remains poorly understood. Here we have analysed pan-genomic DNA methylation and histone modification marks (H3K4me3, H3K9me3, H3K27me3 and H3K36me3) associated with transcription start site (TSS) of lncRNA in four different cell types and three different tissue types representing various cellular stages. We observe that histone marks associated with active transcription H3K4me3 and H3K36me3 along with the repressive histone mark H3K27me3 have similar distribution pattern around TSS irrespective of cell types. Also, the density of these marks correlates well with expression of protein-coding and lncRNA genes. In contrast, the lncRNA genes harbour higher methylation density around TSS than protein-coding genes regardless of their expression status...

Back to the top