June 2015

SWISS-MODEL: modelling protein tertiary and quaternary structure using evolutionary information
Biasini, M; Bienert, S; Waterhouse, A; Arnold, K; Studer, G; Schmidt, T; Kiefer, F; Cassarino, TG; Bertoni, M; Bordoli, L; Schwede, T
Nucleic Acids Res. 2014, 42, W252-W258
Protein structure homology modelling has become a routine technique to generate 3D models for proteins when experimental structures are not available. Fully automated servers such as SWISS-MODEL with user-friendly web interfaces generate reliable models without the need for complex software packages or downloading large databases. Here, we describe the latest version of the SWISS-MODEL expert system for protein structure modelling. The SWISS-MODEL template library provides annotation of quaternary structure and essential ligands and co-factors to allow for building of complete structural models, including their oligomeric structure. The improved SWISS-MODEL pipeline makes extensive use of model quality estimation for selection of the most suitable templates and provides estimates of the expected accuracy of the resulting models. The accuracy of the models generated by SWISS-MODEL is continuously evaluated by the CAMEO system. The new web site allows users to interactively search for templates, cluster them by sequence similarity, structurally compare alternative templates...

CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing
Montague, TG; Cruz, JM; Gagnon, JA; Church, GM; Valen, E
Nucleic Acids Res. 2014, 42, W401-W407
Major advances in genome editing have recently been made possible with the development of the TALEN and CRISPR/Cas9 methods. The speed and ease of implementing these technologies has led to an explosion of mutant and transgenic organisms. A rate-limiting step in efficiently applying TALEN and CRISPR/Cas9 methods is the selection and design of targeting constructs. We have developed an online tool, CHOPCHOP (, to expedite the design process. CHOPCHOP accepts a wide range of inputs (gene identifiers, genomic regions or pasted sequences) and provides an array of advanced options for target selection. It uses efficient sequence alignment algorithms to minimize search times, and rigorously predicts off-target binding of single-guide RNAs (sgRNAs) and TALENs. Each query produces an interactive visualization of the gene with candidate target sites displayed at their genomic positions and color-coded according to quality scores. In addition, for each possible target site, restriction sites and primer candidates are visualized, facilitating a streamlined pipeline of mutant generation and validation. The ease-of-use and speed of CHOPCHOP make it a valuable tool for genome engineering.

Deciphering key features in protein structures with the new ENDscript server
Robert, X; Gouet, P
Nucleic Acids Res. 2014, 42, W320-W324
ENDscript 2 is a friendly Web server for extracting and rendering a comprehensive analysis of primary to quaternary protein structure information in an automated way. This major upgrade has been fully re-engineered to enhance speed, accuracy and usability with interactive 3D visualization. It takes advantage of the new version 3 of ESPript, our well-known sequence alignment renderer, improved to handle a large number of data with reduced computation time. From a single PDB entry or file, ENDscript produces high quality figures displaying multiple sequence alignment of proteins homologous to the query, colored according to residue conservation. Furthermore, the experimental secondary structure elements and a detailed set of relevant biophysical and structural data are depicted. All this information and more are now mapped on interactive 3D PyMOL representations. Thanks to its adaptive and rigorous algorithm, beginner to expert users can modify settings to fine-tune ENDscript to their needs. ENDscript has also been upgraded as an open platform for the visualization of multiple biochemical and structural data coming from external biotool Web servers, with both 2D and 3D representations.

RBPmap: a web server for mapping binding sites of RNA-binding proteins
Paz, I; Kosti, I; Ares, M; Cline, M; Mandel-Gutfreund, Y
Nucleic Acids Res. 2014, 42, W361-W367
Regulation of gene expression is executed in many cases by RNA-binding proteins (RBPs) that bind to mRNAs as well as to non-coding RNAs. RBPs recognize their RNA target via specific binding sites on the RNA. Predicting the binding sites of RBPs is known to be a major challenge. We present a new webserver, RBPmap, freely accessible through the website for accurate prediction and mapping of RBP binding sites. RBPmap has been developed specifically for mapping RBPs in human, mouse and Drosophila melanogaster genomes, though it supports other organisms too. RBPmap enables the users to select motifs from a large database of experimentally defined motifs. In addition, users can provide any motif of interest, given as either a consensus or a PSSM. The algorithm for mapping the motifs is based on a Weighted-Rank approach, which considers the clustering propensity of the binding sites and the overall tendency of regulatory regions to be conserved. In addition, RBPmap incorporates a position-specific background model, designed uniquely for different genomic regions, such as splice sites, 5' and 3' UTRs, non-coding RNA and intergenic regions...

WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013
Wang, J; Duncan, D; Shi, Z; Zhang, B
Nucleic Acids Res. 2013, 41, W77-W83
Functional enrichment analysis is an essential task for the interpretation of gene lists derived from large-scale genetic, transcriptomic and proteomic studies. WebGestalt (WEB-based GEne SeT AnaLysis Toolkit) has become one of the popular software tools in this field since its publication in 2005. For the last 7 years, WebGestalt data holdings have grown substantially to satisfy the requirements of users from different research areas. The current version of WebGestalt supports 8 organisms and 201 gene identifiers from various databases and different technology platforms, making it directly available to the fast growing omics community. Meanwhile, by integrating functional categories derived from centrally and publicly curated databases as well as computational analyses, WebGestalt has significantly increased the coverage of functional categories in various biological contexts including Gene Ontology, pathway, network module, gene-phenotype association, gene-disease association, gene-drug association and chromosomal location, leading to a total of 78 612 functional categories. Finally, new interactive features, such as pathway map, hierarchical network visualization and phenotype ontology visualization have been added to WebGestalt...

antiSMASH 2.0--a versatile platform for genome mining of secondary metabolite producers
Blin, K; Medema, MH; Kazempour, D; Fischbach, MA; Breitling, R; Takano, E; Weber, T
Nucleic Acids Res. 2013, 41, W204-W212
Microbial secondary metabolites are a potent source of antibiotics and other pharmaceuticals. Genome mining of their biosynthetic gene clusters has become a key method to accelerate their identification and characterization. In 2011, we developed antiSMASH, a web-based analysis platform that automates this process. Here, we present the highly improved antiSMASH 2.0 release, available at For the new version, antiSMASH was entirely re-designed using a plug-and-play concept that allows easy integration of novel predictor or output modules. antiSMASH 2.0 now supports input of multiple related sequences simultaneously (multi-FASTA/GenBank/EMBL), which allows the analysis of draft genomes comprising multiple contigs. Moreover, direct analysis of protein sequences is now possible. antiSMASH 2.0 has also been equipped with the capacity to detect additional classes of secondary metabolites, including oligosaccharide antibiotics, phenazines, thiopeptides, homoserine lactones, phosphonates and furans. The algorithm for predicting the core structure of the cluster end product is now also covering lantipeptides...

IgBLAST: an immunoglobulin variable domain sequence analysis tool
Ye, J; Ma, N; Madden, TL; Ostell, JM
Nucleic Acids Res. 2013, 41, W34-W40
The variable domain of an immunoglobulin (IG) sequence is encoded by multiple genes, including the variable (V) gene, the diversity (D) gene and the joining (J) gene. Analysis of IG sequences typically requires identification of each gene, as well as a comparison of sequence variations in the context of defined regions. General purpose tools, such as the BLAST program, have only limited use for such tasks, as the rearranged nature of an IG sequence and the variable length of each gene requires multiple rounds of BLAST searches for a single IG sequence. Additionally, manual assembly of different genes is difficult and error-prone. To address these issues and to facilitate other common tasks in analysing IG sequences, we have developed the sequence analysis tool IgBLAST ( With this tool, users can view the matches to the germline V, D and J genes, details at rearrangement junctions, the delineation of IG V domain framework regions and complementarity determining regions. IgBLAST has the capability to analyse nucleotide and protein sequences and can process sequences in batches. Furthermore, IgBLAST allows searches against the germline...

PredictProtein--an open resource for online prediction of protein structural and functional features
Yachdav, G; Kloppmann, E; Kajan, L; Hecht, M; Goldberg, T; Hamp, T; Honigschmid, P; Schafferhans, A; Roos, M; Bernhofer, M; Richter, L; Ashkenazy, H; Punta, M; Schlessinger, A; Bromberg, Y; Schneider, R; Vriend, G; Sander, C; Ben-Tal, N; Rost, B
Nucleic Acids Res. 2014, 42, W337-W343
PredictProtein is a meta-service for sequence analysis that has been predicting structural and functional features of proteins since 1992. Queried with a protein sequence it returns: multiple sequence alignments, predicted aspects of structure (secondary structure, solvent accessibility, transmembrane helices (TMSEG) and strands, coiled-coil regions, disulfide bonds and disordered regions) and function. The service incorporates analysis methods for the identification of functional regions (ConSurf), homology-based inference of Gene Ontology terms (metastudent), comprehensive subcellular localization prediction (LocTree3), protein-protein binding sites (ISIS2), protein-polynucleotide binding sites (SomeNA) and predictions of the effect of point mutations (non-synonymous SNPs) on protein function (SNAP2). Our goal has always been to develop a system optimized to meet the demands of experimentalists not highly experienced in bioinformatics...

LINCS Canvas Browser: interactive web app to query, browse and interrogate LINCS L1000 gene expression signatures
Duan, QN; Flynn, C; Niepel, M; Hafner, M; Muhlich, JL; Fernandez, NF; Rouillard, AD; Tan, CM; Chen, EY; Golub, TR; Sorger, PK; Subramanian, A; Ma'ayan, A
Nucleic Acids Res. 2014, 42, W449-W460
For the Library of Integrated Network-based Cellular Signatures (LINCS) project many gene expression signatures using the L1000 technology have been produced. The L1000 technology is a cost-effective method to profile gene expression in large scale. LINCS Canvas Browser (LCB) is an interactive HTML5 web-based software application that facilitates querying, browsing and interrogating many of the currently available LINCS L1000 data. LCB implements two compacted layered canvases, one to visualize clustered L1000 expression data, and the other to display enrichment analysis results using 30 different gene set libraries. Clicking on an experimental condition highlights gene-sets enriched for the differentially expressed genes from the selected experiment. A search interface allows users to input gene lists and query them against over 100 000 conditions to find the top matching experiments. The tool integrates many resources for an unprecedented potential for new discoveries in systems biology and systems pharmacology.

deepTools: a flexible platform for exploring deep-sequencing data
Ramirez, F; Dundar, F; Diehl, S; Gruning, BA; Manke, T
Nucleic Acids Res. 2014, 42, W187-W191
We present a Galaxy based web server for processing and visualizing deeply sequenced data. The web server's core functionality consists of a suite of newly developed tools, called deepTools, that enable users with little bioinformatic background to explore the results of their sequencing experiments in a standardized setting. Users can upload pre-processed files with continuous data in standard formats and generate heatmaps and summary plots in a straight-forward, yet highly customizable manner. In addition, we offer several tools for the analysis of files containing aligned reads and enable efficient and reproducible generation of normalized coverage files. As a modular and open-source platform, deepTools can easily be expanded and customized to future demands and developments. The deepTools webserver is freely available at and is accompanied by extensive documentation and tutorials aimed at conveying the principles of deep-sequencing data analysis. The web server can be used without registration. deepTools can be installed locally either stand-alone or as part of Galaxy.

