January 2015

Optimization of scarless human stem cell genome editing
L. Yang, M. Guell, S. Byrne, J. L. Yang, A. De Los Angeles, P. Mali, J. Aach, C. Kim-Kiselak, A. W. Briggs, X. Rios, P. Y. Huang, G. Daley and G. Church
Nucleic Acids Res. (2013) 41 (19): 9049-9061
Efficient strategies for precise genome editing in human-induced pluripotent cells (hiPSCs) will enable sophisticated genome engineering for research and clinical purposes. The development of programmable sequence-specific nucleases such as Transcription Activator-Like Effectors Nucleases (TALENs) and Cas9-gRNA allows genetic modifications to be made more efficiently at targeted sites of interest. However, many opportunities remain to optimize these tools and to enlarge their spheres of application. We present several improvements: First, we developed functional re-coded TALEs (reTALEs), which not only enable simple one-pot TALE synthesis but also allow TALE-based applications to be performed using lentiviral vectors. We then compared genome-editing efficiencies in hiPSCs mediated by 15 pairs of reTALENs and Cas9-gRNA targeting CCR5 and optimized ssODN design in conjunction with both methods for introducing specific mutations. We found Cas9-gRNA achieved 7-8x higher non-homologous end joining efficiencies (3%) than reTALENs (0.4%) and moderately superior homology-directed repair efficiencies (1.0 versus 0.6%)...

Single-cell paired-end genome sequencing reveals structural variation per cell cycle
T. Voet, P. Kumar, P. Van Loo, S. L. Cooke, J. Marshall, M. L. Lin, M. Zamani Esteki, N. Van der Aa, L. Mateiu, D. J. McBride, G. R. Bignell, S. McLaren, J. Teague, A. Butler, K. Raine, L. A. Stebbings, M. A. Quail, T. D'Hooghe, Y. Moreau, P. A. Futreal,
Nucleic Acids Res. (2013) 41 (12): 6119-6138
The nature and pace of genome mutation is largely unknown. Because standard methods sequence DNA from populations of cells, the genetic composition of individual cells is lost, de novo mutations in cells are concealed within the bulk signal and per cell cycle mutation rates and mechanisms remain elusive. Although single-cell genome analyses could resolve these problems, such analyses are error-prone because of whole-genome amplification (WGA) artefacts and are limited in the types of DNA mutation that can be discerned. We developed methods for paired-end sequence analysis of single-cell WGA products that enable (i) detecting multiple classes of DNA mutation, (ii) distinguishing DNA copy number changes from allelic WGA-amplification artefacts by the discovery of matching aberrantly mapping read pairs among the surfeit of paired-end WGA and mapping artefacts and (iii) delineating the break points and architecture of structural variants...

Stability, delivery and functions of human sperm RNAs at fertilization
E. Sendler, G. D. Johnson, S. Mao, R. J. Goodrich, M. P. Diamond, R. Hauser and S. A. Krawetz
Nucleic Acids Res. (2013) 41 (7): 4104-4117
Increasing attention has focused on the significance of RNA in sperm, in light of its contribution to the birth and long-term health of a child, role in sperm function and diagnostic potential. As the composition of sperm RNA is in flux, assigning specific roles to individual RNAs presents a significant challenge. For the first time RNA-seq was used to characterize the population of coding and non-coding transcripts in human sperm. Examining RNA representation as a function of multiple methods of library preparation revealed unique features indicative of very specific and stage-dependent maturation and regulation of sperm RNA, illuminating their various transitional roles. Correlation of sperm transcript abundance with epigenetic marks suggested roles for these elements in the pre- and post-fertilization genome. Several classes of non-coding RNAs including lncRNAs, CARs, pri-miRNAs, novel elements and mRNAs have been identified which, based on factors including relative abundance, integrity in sperm, available knockout data of embryonic effect and presence or absence in the unfertilized human oocyte, are likely to be essential male factors...

The complex methylome of the human gastric pathogen Helicobacter pylori
J. Krebes, R. D. Morgan, B. Bunk, C. Sproer, K. Luong, R. Parusel, B. P. Anton, C. Konig, C. Josenhans, J. Overmann, R. J. Roberts, J. Korlach and S. Suerbaum
Nucleic Acids Res. (2014) 42 (4): 2415-2432
The genome of Helicobacter pylori is remarkable for its large number of restriction-modification (R-M) systems, and strain-specific diversity in R-M systems has been suggested to limit natural transformation, the major driving force of genetic diversification in H. pylori. We have determined the comprehensive methylomes of two H. pylori strains at single base resolution, using Single Molecule Real-Time (SMRT(R)) sequencing. For strains 26695 and J99-R3, 17 and 22 methylated sequence motifs were identified, respectively. For most motifs, almost all sites occurring in the genome were detected as methylated. Twelve novel methylation patterns corresponding to nine recognition sequences were detected (26695, 3; J99-R3, 6). Functional inactivation, correction of frameshifts as well as cloning and expression of candidate methyltransferases (MTases) permitted not only the functional characterization of multiple, yet undescribed, MTases, but also revealed novel features of both Type I and Type II R-M systems, including frameshift-mediated changes of sequence specificity...

The effect of tRNA levels on decoding times of mRNA codons
A. Dana and T. Tuller
Nucleic Acids Res. (2014) 42 (14): 9171-9181
The possible effect of transfer ribonucleic acid (tRNA) concentrations on codons decoding time is a fundamental biomedical research question; however, due to a large number of variables affecting this process and the non-direct relation between them, a conclusive answer to this question has eluded so far researchers in the field. In this study, we perform a novel analysis of the ribosome profiling data of four organisms which enables ranking the decoding times of different codons while filtering translational phenomena such as experimental biases, extreme ribosomal pauses and ribosome traffic jams. Based on this filtering, we show for the first time that there is a significant correlation between tRNA concentrations and the codons estimated decoding time both in prokaryotes and in eukaryotes in natural conditions (-0.38 to -0.66, all P values <0.006); in addition, we show that when considering tRNA concentrations, codons decoding times are not correlated with aminoacyl-tRNA levels. The reported results support the conjecture that translation efficiency is directly influenced by the tRNA levels in the cell. Thus, they should help to understand the evolution of synonymous aspects of coding sequences via the adaptation of their codons...

Transcriptional landscape and essential genes of Neisseria gonorrhoeae
C. W. Remmele, Y. Xian, M. Albrecht, M. Faulstich, M. Fraunholz, E. Heinrichs, M. T. Dittrich, T. Muller, R. Reinhardt and T. Rudel
Nucleic Acids Res. (2014) 42 (16): 10579-10595
The WHO has recently classified Neisseria gonorrhoeae as a super-bacterium due to the rapid spread of antibiotic resistant derivatives and an overall dramatic increase in infection incidences. Genome sequencing has identified potential genes, however, little is known about the transcriptional organization and the presence of non-coding RNAs in gonococci. We performed RNA sequencing to define the transcriptome and the transcriptional start sites of all gonococcal genes and operons. Numerous new transcripts including 253 potentially non-coding RNAs transcribed from intergenic regions or antisense to coding genes were identified. Strikingly, strong antisense transcription was detected for the phase-variable opa genes coding for a family of adhesins and invasins in pathogenic Neisseria, that may have regulatory functions. Based on the defined transcriptional start sites, promoter motifs were identified. We further generated and sequenced a high density Tn5 transposon library to predict a core of 827 gonococcal essential genes, 133 of which have no known function....

Genome-wide reorganization of histone H2AX toward particular fragile sites on cell activation
J. Seo, K. Kim, D. Y. Chang, H. B. Kang, E. C. Shin, J. Kwon and J. K. Choi
Nucleic Acids Res. (2014) 42 (2): 1016-1025
gammaH2AX formation by phosphorylation of the histone variant H2AX is the key process in the repair of DNA lesions including those arising at fragile sites under replication stress. Here we demonstrate that H2AX is dynamically reorganized to preoccupy gammaH2AX hotspots on increased replication stress by activated cell proliferation and that H2AX is enriched in aphidicolin-induced replisome stalling sites in cycling cells. Interestingly, H2AX enrichment was particularly found in genomic regions that replicate in early S phase. High transcription activity, a hallmark of early replicating fragile sites, was a determinant of H2AX localization. Subtelomeric H2AX enrichment was also attributable to early replication and high gene density. In contrast, late replicating and infrequently transcribed regions, including common fragile sites and heterochromatin, lacked H2AX enrichment. In particular, heterochromatin was inaccessible to H2AX incorporation, maybe partly explaining the cause of mutation accumulation in cancer heterochromatin. Meanwhile, H2AX in actively dividing cells was intimately colocalized with INO80. INO80 silencing reduced H2AX levels...

metaseq: a Python package for integrative genome-wide analysis reveals relationships between chromatin insulators and associated nuclear mRNA
R. K. Dale, L. H. Matzat and E. P. Lei
Nucleic Acids Res. (2014) 42 (14): 9158-9170
Here we introduce metaseq, a software library written in Python, which enables loading multiple genomic data formats into standard Python data structures and allows flexible, customized manipulation and visualization of data from high-throughput sequencing studies. We demonstrate its practical use by analyzing multiple datasets related to chromatin insulators, which are DNA-protein complexes proposed to organize the genome into distinct transcriptional domains. Recent studies in Drosophila and mammals have implicated RNA in the regulation of chromatin insulator activities. Moreover, the Drosophila RNA-binding protein Shep has been shown to antagonize gypsy insulator activity in a tissue-specific manner, but the precise role of RNA in this process remains unclear. Better understanding of chromatin insulator regulation requires integration of multiple datasets, including those from chromatin-binding, RNA-binding, and gene expression experiments. We use metaseq to integrate RIP- and ChIP-seq data for Shep and the core gypsy insulator protein Su(Hw) in two different cell types, along with publicly available ChIP-chip and RNA-seq data...

Prospective identification of parasitic sequences in phage display screens
W. L. Matochko, S. Cory Li, S. K. Tang and R. Derda
Nucleic Acids Res. (2014) 42 (3): 1784-1798
Phage display empowered the development of proteins with new function and ligands for clinically relevant targets. In this report, we use next-generation sequencing to analyze phage-displayed libraries and uncover a strong bias induced by amplification preferences of phage in bacteria. This bias favors fast-growing sequences that collectively constitute <0.01% of the available diversity. Specifically, a library of 10(9) random 7-mer peptides (Ph.D.-7) includes a few thousand sequences that grow quickly (the 'parasites'), which are the sequences that are typically identified in phage display screens published to date. A similar collapse was observed in other libraries. Using Illumina and Ion Torrent sequencing and multiple biological replicates of amplification of Ph.D.-7 library, we identified a focused population of 770 'parasites'. In all, 197 sequences from this population have been identified in literature reports that used Ph.D.-7 library. Many of these enriched sequences have confirmed function (e.g. target binding capacity). The bias in the literature, thus, can be viewed as a selection with two different selection pressures: (i) target-binding selection, and (ii) amplification-induced selection...

Sensitive, multiplex and direct quantification of RNA sequences using a modified RASL assay
H. B. Larman, E. R. Scott, M. Wogan, G. Oliveira, A. Torkamani and P. G. Schultz
Nucleic Acids Res. (2014) 42 (14): 9146-9157
A sensitive and highly multiplex method to directly measure RNA sequence abundance without requiring reverse transcription would be of value for a number of biomedical applications, including high throughput small molecule screening, pathogen transcript detection and quantification of short/degraded RNAs. R NA A: nnealing, S: election and L: igation (RASL) assays, which are based on RNA template-dependent oligonucleotide probe ligation, have been developed to meet this need, but technical limitations have impeded their adoption. Whereas DNA ligase-based RASL assays suffer from extremely low and sequence-dependent ligation efficiencies that compromise assay robustness, Rnl2 can join a fully DNA donor probe to a 3'-diribonucleotide-terminated acceptor probe with high efficiency on an RNA template strand. Rnl2-based RASL exhibits sub-femtomolar transcript detection sensitivity, and permits the rational tuning of probe signals for optimal analysis by massively parallel DNA sequencing (RASL-seq). A streamlined Rnl2-based RASL-seq protocol was assessed in a small molecule screen using 77 probe sets designed to monitor complex human B cell phenotypes...

