Skip Navigation

NAR Top Articles - Genomics


View all categories

April 2015

Key components of the eight classes of type IV secretion systems involved in bacterial conjugation or protein secretion
Guglielmini, J; Neron, B; Abby, SS; Garcillan-Barcia, MP; de la Cruz, F; Rocha, EPC
Nucleic Acids Res. 2014, 42, 5715-5727
Free Full Text
Conjugation of DNA through a type IV secretion system (T4SS) drives horizontal gene transfer. Yet little is known on the diversity of these nanomachines. We previously found that T4SS can be divided in eight classes based on the phylogeny of the only ubiquitous protein of T4SS (VirB4). Here, we use an ab initio approach to identify protein families systematically and specifically associated with VirB4 in each class. We built profiles for these proteins and used them to scan 2262 genomes for the presence of T4SS. Our analysis led to the identification of thousands of occurrences of 116 protein families for a total of 1623 T4SS. Importantly, we could identify almost always in our profiles the essential genes of well-studied T4SS. This allowed us to build a database with the largest number of T4SS described to date. Using profile-profile alignments, we reveal many new cases of homology between components of distant classes of T4SS. We mapped these similarities on the T4SS phylogenetic tree and thus obtained the patterns of acquisition and loss of these protein families in the history of T4SS...

Optimization of scarless human stem cell genome editing
Yang, LH; Guell, M; Byrne, S; Yang, JL; De Los Angeles, A; Mali, P; Aach, J; Kim-Kiselak, C; Briggs, AW; Rios, X; Huang, PY; Daley, G; Church, G
Nucleic Acids Res. 2013, 41, 9049-9061
Free Full Text
Efficient strategies for precise genome editing in human-induced pluripotent cells (hiPSCs) will enable sophisticated genome engineering for research and clinical purposes. The development of programmable sequence-specific nucleases such as Transcription Activator-Like Effectors Nucleases (TALENs) and Cas9-gRNA allows genetic modifications to be made more efficiently at targeted sites of interest. However, many opportunities remain to optimize these tools and to enlarge their spheres of application. We present several improvements: First, we developed functional re-coded TALEs (reTALEs), which not only enable simple one-pot TALE synthesis but also allow TALE-based applications to be performed using lentiviral vectors. We then compared genome-editing efficiencies in hiPSCs mediated by 15 pairs of reTALENs and Cas9-gRNA targeting CCR5 and optimized ssODN design in conjunction with both methods for introducing specific mutations. We found Cas9-gRNA achieved 7-8x higher non-homologous end joining efficiencies (3%) than reTALENs (0.4%) and moderately superior homology-directed repair efficiencies (1.0 versus 0.6%) when combined with ssODN...

Prospective identification of parasitic sequences in phage display screens
Matochko, WL; Li, SC; Tang, SKY; Derda, R
Nucleic Acids Res. 2014, 42, 1784-1798
Free Full Text
Phage display empowered the development of proteins with new function and ligands for clinically relevant targets. In this report, we use next-generation sequencing to analyze phage-displayed libraries and uncover a strong bias induced by amplification preferences of phage in bacteria. This bias favors fast-growing sequences that collectively constitute <0.01% of the available diversity. Specifically, a library of 10(9) random 7-mer peptides (Ph.D.-7) includes a few thousand sequences that grow quickly (the 'parasites'), which are the sequences that are typically identified in phage display screens published to date. A similar collapse was observed in other libraries. Using Illumina and Ion Torrent sequencing and multiple biological replicates of amplification of Ph.D.-7 library, we identified a focused population of 770 'parasites'. In all, 197 sequences from this population have been identified in literature reports that used Ph.D.-7 library. Many of these enriched sequences have confirmed function (e. g. target binding capacity). The bias in the literature, thus, can be viewed as a selection with two different selection pressures: (i) target-binding selection, and (ii) amplification-induced selection...

A comparison of dense transposon insertion libraries in the Salmonella serovars Typhi and Typhimurium
Barquist, L; Langridge, GC; Turner, DJ; Phan, MD; Turner, AK; Bateman, A; Parkhill, J; Wain, J; Gardner, PP
Nucleic Acids Res. 2013, 41, 4549-4564
Free Full Text
Salmonella Typhi and Typhimurium diverged only similar to 50 000 years ago, yet have very different host ranges and pathogenicity. Despite the availability of multiple whole-genome sequences, the genetic differences that have driven these changes in phenotype are only beginning to be understood. In this study, we use transposon-directed insertion-site sequencing to probe differences in gene requirements for competitive growth in rich media between these two closely related serovars. We identify a conserved core of 281 genes that are required for growth in both serovars, 228 of which are essential in Escherichia coli. We are able to identify active prophage elements through the requirement for their repressors. We also find distinct differences in requirements for genes involved in cell surface structure biogenesis and iron utilization. Finally, we demonstrate that transposon-directed insertion-site sequencing is not only applicable to the protein-coding content of the cell but also has sufficient resolution to generate hypotheses regarding the functions of non-coding RNAs (ncRNAs) as well...

A comprehensive survey of non-canonical splice sites in the human transcriptome
Parada, GE; Munita, R; Cerda, CA; Gysling, K
Nucleic Acids Res. 2014, 42, 10564-10578
Free Full Text
We uncovered the diversity of non-canonical splice sites at the human transcriptome using deep transcriptome profiling. We mapped a total of 3.7 billion human RNA-seq reads and developed a set of stringent filters to avoid false non-canonical splice site detections. We identified 184 splice sites with non-canonical dinucleotides and U2/U12-like consensus sequences. We selected 10 of the herein identified U2/U12-like non-canonical splice site events and successfully validated 9 of them via reverse transcriptase-polymerase chain reaction and Sanger sequencing. Analyses of the 184 U2/U12-like non- canonical splice sites indicate that 51% of them are not annotated in GENCODE. In addition, 28% of them are conserved in mouse and 76% are involved in alternative splicing events, some of them with tissue-specific alternative splicing patterns. Interestingly, our analysis identified some U2/U12-like non-canonical splice sites that are converted into canonical splice sites by RNA A-to-I editing. Moreover, the U2/U12-like non-canonical splice sites have a differential distribution of splicing regulatory sequences, which may contribute to their recognition and regulation...

metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA
Dale, RK; Matzat, LH; Lei, EP
Nucleic Acids Res. 2014, 42, 9158-9170
Free Full Text
Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP-and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data...

Mapping of six somatic linker histone H1 variants in human breast cancer cells uncovers specific features of H1.2
Millan-Arino, L; Islam, AMMK; Izquierdo-Bouldstridge, A; Mayor, R; Terme, JM; Luque, N; Sancho, M; Lopez-Bigas, N; Jordan, A
Nucleic Acids Res. 2014, 42, 4474-4493
Free Full Text
Seven linker histone H1 variants are present in human somatic cells with distinct prevalence across cell types. Despite being key structural components of chromatin, it is not known whether the different variants have specific roles in the regulation of nuclear processes or are differentially distributed throughout the genome. Using variant-specific antibodies to H1 and hemagglutinin (HA)-tagged recombinant H1 variants expressed in breast cancer cells, we have investigated the distribution of six H1 variants in promoters and genome-wide. H1 is depleted at promoters depending on its transcriptional status and differs between variants. Notably, H1.2 is less abundant than other variants at the transcription start sites of inactive genes, and promoters enriched in H1.2 are different from those enriched in other variants and tend to be repressed. Additionally, H1.2 is enriched at chromosomal domains characterized by low guanine-cytosine (GC) content and is associated with lamina-associated domains. Meanwhile, other variants are associated with higher GC content, CpG islands and gene-rich domains...

Single-cell paired-end genome sequencing reveals structural variation per cell cycle
Voet, T; Kumar, P; Van Loo, P; Cooke, SL; Marshall, J; Lin, ML; Esteki, MZ; Van der Aa, N; Mateiu, L; McBride, DJ; Bignell, GR; McLaren, S; Teague, J; Butler, A; Raine, K; Stebbings, LA; Quail, MA; D'Hooghe, T; Moreau, Y; Futreal, PA; Stratton, MR; Vermee
Nucleic Acids Res. 2013, 41, 6119-6138
Free Full Text
The nature and pace of genome mutation is largely unknown. Because standard methods sequence DNA from populations of cells, the genetic composition of individual cells is lost, de novo mutations in cells are concealed within the bulk signal and per cell cycle mutation rates and mechanisms remain elusive. Although single-cell genome analyses could resolve these problems, such analyses are error-prone because of whole-genome amplification (WGA) artefacts and are limited in the types of DNA mutation that can be discerned. We developed methods for paired-end sequence analysis of single-cell WGA products that enable (i) detecting multiple classes of DNA mutation, (ii) distinguishing DNA copy number changes from allelic WGA-amplification artefacts by the discovery of matching aberrantly mapping read pairs among the surfeit of paired-end WGA and mapping artefacts and (iii) delineating the break points and architecture of structural variants...

The Genome of Anopheles darlingi, the main neotropical malaria vector
Marinotti, O; Cerqueira, GC; de Almeida, LGP; Ferro, MIT; Loreto, ELD; Zaha, A; Teixeira, SMR; Wespiser, AR; Silva, AAE; Schlindwein, AD; Pacheco, ACL; da Silva, ALD; Graveley, BR; Walenz, BP; Lima, BD; Ribeiro, CAG; Nunes-Silva, CG; de Carvalho, CR; Soar
Nucleic Acids Res. 2013, 41, 7387-7400
Free Full Text
Anopheles darlingi is the principal neotropical malaria vector, responsible for more than a million cases of malaria per year on the American continent. Anopheles darlingi diverged from the African and Asian malaria vectors similar to 100 million years ago (mya) and successfully adapted to the New World environment. Here we present an annotated reference A. darlingi genome, sequenced from a wild population of males and females collected in the Brazilian Amazon. A total of 10 481 predicted protein-coding genes were annotated, 72% of which have their closest counterpart in Anopheles gambiae and 21% have highest similarity with other mosquito species. In spite of a long period of divergent evolution, conserved gene synteny was observed between A. darlingi and A. gambiae. More than 10 million single nucleotide polymorphisms and short indels with potential use as genetic markers were identified. Transposable elements correspond to 2.3% of the A. darlingi genome...

Stability, delivery and functions of human sperm RNAs at fertilization
Sendler, E; Johnson, GD; Mao, SH; Goodrich, RJ; Diamond, MP; Hauser, R; Krawetz, SA
Nucleic Acids Res. 2013, 41, 4104-4117
Free Full Text
Increasing attention has focused on the significance of RNA in sperm, in light of its contribution to the birth and long-term health of a child, role in sperm function and diagnostic potential. As the composition of sperm RNA is in flux, assigning specific roles to individual RNAs presents a significant challenge. For the first time RNA-seq was used to characterize the population of coding and non-coding transcripts in human sperm. Examining RNA representation as a function of multiple methods of library preparation revealed unique features indicative of very specific and stage-dependent maturation and regulation of sperm RNA, illuminating their various transitional roles. Correlation of sperm transcript abundance with epigenetic marks suggested roles for these elements in the pre- and post-fertilization genome. Several classes of non-coding RNAs including lncRNAs, CARs, pri-miRNAs, novel elements and mRNAs have been identified which, based on factors including relative abundance, integrity in sperm, available knockout data of embryonic effect and presence or absence in the unfertilized human oocyte, are likely to be essential male factors critical to early post-fertilization...

Back to the top