Skip Navigation

NAR Top Articles - Database


View all categories

December 2013

miRBase: integrating microRNA annotation and deep-sequencing data
Kozomara, A; Griffiths-Jones, S
Nucleic Acids Res. (2011) 39 (suppl_1): D152-D157
Free Full Text
miRBase is the primary online repository for all microRNA sequences and annotation. The current release (miRBase 16) contains over 15 000 microRNA gene loci in over 140 species, and over 17 000 distinct mature microRNA sequences. Deep-sequencing technologies have delivered a sharp rise in the rate of novel microRNA discovery. We have mapped reads from short RNA deep-sequencing experiments to microRNAs in miRBase and developed web interfaces to view these mappings. The user can view all read data associated with a given microRNA annotation, filter reads by experiment and count, and search for microRNAs by tissue- and stage-specific expression. These data can be used as a proxy for relative expression levels of microRNA sequences, provide detailed evidence for microRNA annotations and alternative isoforms of mature microRNAs, and allow us to revisit previous annotations. miRBase is available online at:

The Pfam protein families database
Punta, M; Coggill, PC; Eberhardt, RY; Mistry, J; Tate, J; Boursnell, C; Pang, N; Forslund, K; Ceric, G; Clements, J; Heger, A; Holm, L; Sonnhammer, ELL; Eddy, SR; Bateman, A; Finn, RD
Nucleic Acids Res. (2012) 40 (D1): D290-D301
Free Full Text
Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (, the USA ( and Sweden ( Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds...

starBase: a database for exploring microRNA-mRNA interaction maps from Argonaute CLIP-Seq and Degradome-Seq data
Yang, JH; Li, JH; Shao, P; Zhou, H; Chen, YQ; Qu, LH
Nucleic Acids Res. (2011) 39 (suppl_1): D202-D209
Free Full Text
MicroRNAs (miRNAs) represent an important class of small non-coding RNAs (sRNAs) that regulate gene expression by targeting messenger RNAs. However, assigning miRNAs to their regulatory target genes remains technically challenging. Recently, high-throughput CLIP-Seq and degradome sequencing (Degradome-Seq) methods have been applied to identify the sites of Argonaute interaction and miRNA cleavage sites, respectively. In this study, we introduce a novel database, starBase (sRNA target Base), which we have developed to facilitate the comprehensive exploration of miRNA-target interaction maps from CLIP-Seq and Degradome-Seq data. The current version includes high-throughput sequencing data generated from 21 CLIP-Seq and 10 Degradome-Seq experiments from six organisms. By analyzing millions of mapped CLIP-Seq and Degradome-Seq reads, we identified similar to 1 million Ago-binding clusters and similar to 2 million cleaved target clusters in animals and plants, respectively...

COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer
Forbes, SA; Bindal, N; Bamford, S; Cole, C; Kok, CY; Beare, D; Jia, MM; Shepherd, R; Leung, K; Menzies, A; Teague, JW; Campbell, PJ; Stratton, MR; Futreal, PA
Nucleic Acids Res. (2011) 39 (suppl_1): D945-D950
Free Full Text
COSMIC ( curates comprehensive information on somatic mutations in human cancer. Release v48 (July 2010) describes over 136 000 coding mutations in almost 542 000 tumour samples; of the 18 490 genes documented, 4803 (26%) have one or more mutations. Full scientific literature curations are available on 83 major cancer genes and 49 fusion gene pairs (19 new cancer genes and 30 new fusion pairs this year) and this number is continually increasing. Key amongst these is TP53, now available through a collaboration with the IARC p53 database. In addition to data from the Cancer Genome Project (CGP) at the Sanger Institute, UK, and The Cancer Genome Atlas project (TCGA), large systematic screens are also now curated. Major website upgrades now make these data much more mineable, with many new selection filters and graphics. A Biomart is now available allowing more automated data mining and integration with other biological databases...

mESAdb: microRNA Expression and Sequence Analysis Database
Kaya, KD; Karakulah, G; Yakicier, CM; Acar, AC; Konu, O
Nucleic Acids Res. (2011) 39 (suppl_1): D170-D180
Free Full Text
microRNA expression and sequence analysis database ( (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.

The 2012 Nucleic Acids Research Database Issue and the online Molecular Biology Database Collection
Galperin, MY; Fernandez-Suarez, XM
Nucleic Acids Res. (2012) 40 (suppl_1): D1-D8
Free Full Text
The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at, has been updated and now lists 1380 databases...

Crystallography Open Database (COD): an open-access collection of crystal structures and platform for world-wide collaboration
Grazulis, S; Daskevic, A; Merkys, A; Chateigner, D; Lutterotti, L; Quiros, M; Serebryanaya, NR; Moeck, P; Downs, RT; Le Bail, A
Nucleic Acids Res. (2012) 40 (suppl_1): D420-D427
Free Full Text
Using an open-access distribution model, the Crystallography Open Database (COD, collects all known 'small molecule / small to medium sized unit cell' crystal structures and makes them available freely on the Internet. As of today, the COD has aggregated similar to 150 000 structures, offering basic search capabilities and the possibility to download the whole database, or parts thereof using a variety of standard open communication protocols. A newly developed website provides capabilities for all registered users to deposit published and so far unpublished structures as personal communications or pre-publication depositions. Such a setup enables extension of the COD database by many users simultaneously. This increases the possibilities for growth of the COD database, and is the first step towards establishing a world wide Internet-based collaborative platform dedicated to the collection and curation of structural knowledge.

MEROPS: the database of proteolytic enzymes, their substrates and inhibitors
Rawlings, ND; Barrett, AJ; Bateman, A
Nucleic Acids Res. (2012) 40 (suppl_1): D343-D350
Free Full Text
Peptidases, their substrates and inhibitors are of great relevance to biology, medicine and biotechnology. The MEROPS database ( aims to fulfil the need for an integrated source of information about these. The database has hierarchical classifications in which homologous sets of peptidases and protein inhibitors are grouped into protein species, which are grouped into families, which are in turn grouped into clans. The database has been expanded to include proteolytic enzymes other than peptidases. Special identifiers for peptidases from a variety of model organisms have been established so that orthologues can be detected in other species. A table of predicted active-site residue and metal ligand positions and the residue ranges of the peptidase domains in orthologues has been added to each peptidase summary. New displays of tertiary structures, which can be rotated or have the surfaces displayed, have been added to the structure pages. New indexes for gene names and peptidase substrates have been made available. Among the enhancements to existing features are the inclusion of small-molecule inhibitors in the tables of peptidase-inhibitor interactions...

InterPro in 2011: new developments in the family and domain prediction database
Hunter, S; Jones, P; Mitchell, A; Apweiler, R; Attwood, TK; Bateman, A; Bernard, T; Binns, D; Bork, P; Burge, S; de Castro, E; Coggill, P; Corbett, M; Das, U; Daugherty, L; Duquenne, L; Finn, RD; Fraser, M; Gough, J; Haft, D; Hulo, N; Kahn, D; Kelly, E; L
Nucleic Acids Res. (2012) 40 (suppl_1): D306-D312
Free Full Text
InterPro ( is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.

The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes
McCall, MN; Uppal, K; Jaffee, HA; Zilliox, MJ; Irizarry, RA
Nucleic Acids Res. (2011) 39 (suppl_1): D1011-D1015
Free Full Text
Various databases have harnessed the wealth of publicly available microarray data to address biological questions ranging from across-tissue differential expression to homologous gene expression. Despite their practical value, these databases rely on relative measures of expression and are unable to address the most fundamental question-which genes are expressed in a given cell type. The Gene Expression Barcode is the first database to provide reliable absolute measures of expression for most annotated genes for 131 human and 89 mouse tissue types, including diseased tissue. This is made possible by a novel algorithm that leverages information from the GEO and ArrayExpress public repositories to build statistical models that permit converting data from a single microarray into expressed/unexpressed calls for each gene. For selected platforms, users may upload data and obtain results in a matter of seconds. The raw data, curated annotation, and code used to create our resource are also available at

Back to the top