NAR Top Articles - Database
miRBase: integrating microRNA annotation and deep-sequencing data
Kozomara, A; Griffiths-Jones, S
Nucleic Acids Res. (2011) 39 (suppl_1): D152-D157
Free Full Text
miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15 000 microRNA gene loci in over 140 species, and over 17 000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at: http://www.mirbase.org/.
The Pfam protein families database
Punta, M; Coggill, PC; Eberhardt, RY; Mistry, J; Tate, J; Boursnell, C; Pang, N; Forslund, K; Ceric, G; Clements, J; Heger, A; Holm, L; Sonnhammer, ELL; Eddy, SR; Bateman, A; Finn, RD
Nucleic Acids Res. (2012) 40 (D1): D290-D301
Free Full Text
Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds...
COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer
Forbes, SA; Bindal, N; Bamford, S; Cole, C; Kok, CY; Beare, D; Jia, MM; Shepherd, R; Leung, K; Menzies, A; Teague, JW; Campbell, PJ; Stratton, MR; Futreal, PA
Nucleic Acids Res. (2011) 39 (suppl_1): D945-D950
Free Full Text
COSMIC (http://www.sanger.ac.uk/cosmic) curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136 000 coding mutations in almost 542 000 tumour samples; of the 18 490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases...
starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data
Yang, JH; Li, JH; Shao, P; Zhou, H; Chen, YQ; Qu, LH
Nucleic Acids Res. (2011) 39 (suppl_1): D202-D209
Free Full Text
MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (sRNAs) that regulate gene expression by targeting messenger RNAs. However, assigning miRNAs to their regulatory target genes remains technically challenging. Recently, high-throughput CLIP-Seq and degradome sequencing (Degradome-Seq) methods have been applied to identify the sites of Argonaute interaction and miRNA cleavage sites, respectively. In this study, we introduce a novel database, starBase (sRNA target Base), which we have developed to facilitate the comprehensive exploration of miRNA-target interaction maps from CLIP-Seq and Degradome-Seq data. The current version includes high-throughput sequencing data generated from 21 CLIP-Seq and 10 Degradome-Seq experiments from six organisms. By analyzing millions of mapped CLIP-Seq and Degradome-Seq reads, we identified similar to 1 million Ago-binding clusters and similar to 2 million cleaved target clusters in animals and plants, respectively...
Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration
Grazulis, S; Daskevic, A; Merkys, A; Chateigner, D; Lutterotti, L; Quiros, M; Serebryanaya, NR; Moeck, P; Downs, RT; Le Bail, A
Nucleic Acids Res. (2012) 40 (D1): D420-D427
Free Full Text
Using an open-access distribution model, the Crystallography Open Database (COD, http://www.crystallography.net) collects all known 'small molecule / small to medium sized unit cell' crystal structures and makes them available freely on the Internet. As of today, the COD has aggregated similar to 150 000 structures, offering basic search capabilities and the possibility to download the whole database, or parts thereof using a variety of standard open communication protocols. A newly developed website provides capabilities for all registered users to deposit published and so far unpublished structures as personal communications or pre-publication depositions. Such a setup enables extension of the COD database by many users simultaneously. This increases the possibilities for growth of the COD database, and is the first step towards establishing a world wide Internet-based collaborative platform dedicated to the collection and curation of structural knowledge.
InterPro in 2011: new developments in the family and domain prediction database
Hunter, S; Jones, P; Mitchell, A; Apweiler, R; Attwood, TK; Bateman, A; Bernard, T; Binns, D; Bork, P; Burge, S; de Castro, E; Coggill, P; Corbett, M; Das, U; Daugherty, L; Duquenne, L; Finn, RD; Fraser, M; Gough, J; Haft, D; Hulo, N; Kahn, D; Kelly, E; L
Nucleic Acids Res. (2012) 40 (D1): D306-D312
Free Full Text
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.
MEROPS: the database of proteolytic enzymes, their substrates and inhibitors
Rawlings, ND; Barrett, AJ; Bateman, A
Nucleic Acids Res. (2012) 40 (D1): D343-D350
Free Full Text
Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database (http://merops.sanger.ac.uk) aims to fulfil the need for an integrated source of information about these. The database has hierarchical classifications in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The database has been expanded to include proteolytic enzymes other than peptidases. Special identifiers for peptidases from a variety of model organisms have been established so that orthologues can be detected in other species. A table of predicted active-site residue and metal ligand positions and the residue ranges of the peptidase domains in orthologues has been added to each peptidase summary. New displays of tertiary structures, which can be rotated or have the surfaces displayed, have been added to the structure pages. New indexes for gene names and peptidase substrates have been made available. Among the enhancements to existing features are the inclusion of small-molecule inhibitors in the tables of peptidase-inhibitor interactions...
VnD: a structure-centric database of disease-related SNPs and drugs
Yang, JO; Oh, S; Ko, G; Park, SJ; Kim, WY; Lee, B; Lee, S
Nucleic Acids Res. (2011) 39 (suppl_1): D939-D944
Free Full Text
Numerous genetic variations have been found to be related to human diseases. Significant portion of those affect the drug response as well by changing the protein structure and function. Therefore, it is crucial to understand the trilateral relationship among genomic variations, diseases and drugs. We present the variations and drugs (VnD), a consolidated database containing information on diseases, related genes and genetic variations, protein structures and drug information. VnD was built in three steps. First, we integrated various resources systematically to deduce catalogs of disease-related genes, single nucleotide polymorphisms (SNPs), protein mutations and relevant drugs. VnD contains 137 195 disease-related gene records (13 940 distinct genes) and 16 586 genetic variation records (1790 distinct variations). Next, we carried out structure modeling and docking simulation for wild-type and mutant proteins to examine the structural and functional consequences of non-synonymous SNPs in the drug-related genes. Conformational changes in 590 wild-type and 4437 mutant proteins from drug-related genes were included in our database...
KEGG for integration and interpretation of large-scale molecular data sets
Kanehisa, M; Goto, S; Sato, Y; Furumichi, M; Tanabe, M
Nucleic Acids Res. (2012) 40 (D1): D109-D114
Free Full Text
Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/ or http://www.kegg.jp/)is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem. Major efforts have been undertaken to manually create a knowledge base for such systemic functions by capturing and organizing experimental knowledge in computable forms; namely, in the forms of KEGG pathway maps, BRITE functional hierarchies and KEGG modules. Continuous efforts have also been made to develop and improve the cross-species annotation procedure for linking genomes to the molecular networks through the KEGG Orthology system. Here we report KEGG Mapper, a collection of tools for KEGG PATHWAY, BRITE and MODULE mapping, enabling integration and interpretation of large-scale data sets. We also report a variant of the KEGG mapping procedure to extend the knowledge base, where different types of data and knowledge, such as disease genes and drug targets, are integrated as part of the KEGG molecular networks...
The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored
Szklarczyk, D; Franceschini, A; Kuhn, M; Simonovic, M; Roth, A; Minguez, P; Doerks, T; Stark, M; Muller, J; Bork, P; Jensen, LJ; von Mering, C
Nucleic Acids Res. (2011) 39 (suppl_1): D561-D568
Free Full Text
An essential prerequisite for any systems-level understanding of cellular functions is to correctly uncover and annotate all functional interactions among proteins in the cell. Toward this goal, remarkable progress has been made in recent years, both in terms of experimental measurements and computational prediction techniques. However, public efforts to collect and present protein interaction information have struggled to keep up with the pace of interaction discovery, partly because protein-protein interaction information can be error-prone and require considerable effort to annotate. Here, we present an update on the online database resource Search Tool for the Retrieval of Interacting Genes (STRING); it provides uniquely comprehensive coverage and ease of access to both experimental as well as predicted interaction information. Interactions in STRING are provided with a confidence score, and accessory information such as protein domains and 3D structures is made available, all within a stable and consistent identifier space...
- About this journal
- NAR Methods online
- 2013 Database Issue
- 2013 Web Server Issue
- NAR Special Collections
- Referee Information
- Rights & Permissions
- Dispatch date of the next issue
- This journal is a member of the Committee on Publication Ethics (COPE)
- view Recent Comments on articles
- We are mobile – find out more
Impact factor: 8.278
5-Yr impact factor: 8.055
Senior Executive Editors
- Instructions to authors
- Scope and Criteria for Consideration
- Submit a manuscript now
- Self-archiving policy
Open access options for authors