MOLECULAR GENETICS OF RETROTRANSPOSONS
     

Henry L. Levin, Ph.D., Principal Investigator
Angela Atwood-Moore, B.A., Senior Research Assistant
Kie-Bang Nam, Ph.D.,Research Fellow
Nathan Bowen, Ph.D., Postdoctoral Fellow
Min-Kyeong Kim, Ph.D., Postdoctoral Fellow
Maureen Khoo, B.A., Predoctoral Fellow
Laure Teysset, Ph.D., Postdoctoral Fellow
Erin Peters, B.A., Predoctoral Fellow
Le-Ben Wan, B.A., Predoctoral Fellow

For More Information

Henry L. Levin's photograph
 

Retroelements are a large class of genetic elements that multiply by the reverse transcription of an RNA intermediate. The resulting cDNA is incorporated into the genome of host cells. In eukaryotes, the wide-spread success of long terminal repeat-containing retroelements has led to replication mechanisms that are conserved among diverse families of retrotransposons and retroviruses. The medical importance of retroviruses such as HIV has intensified the need to understand the molecular details of the mechanisms responsible for the propagation of long terminal repeat (LTR) retroelements. Given that LTR-retrotransposons exist in yeast, the powerful techniques of yeast genetics can be applied to answer basic questions about the function of LTR-retroelements, perhaps yielding fundamental information that may identify new antiviral targets or strategies that can be used to combat the spread of retroviruses such as HIV. The Section on Eukaryotic Transposable Elements is studying the retrotransposon Tf1, which is found in the fission yeast Schizosaccharomyces pombe, with the aim of understanding the molecular mechanisms of reverse transcription, transport of Tf1 into the nucleus, and integration of Tf1 cDNA into the host genome.

Since the process of integration has the potential to compromise the fitness of the host, we wish to understand the balance between the ability of the transposon to insert into the host genome versus the efforts of the host to maintain its viability. The recently completed sequence of the S. pombe genome has allowed us to ask whether Tf1 integration results in the disruption of host genes. Much of what is known about the insertion sites of LTR-retrotransposons indicates that the process of integration is specifically controlled to avoid the disruption of host genes. In fact, all five transposons of Saccharomyces cerevisiae select sites for integration that lack coding sequences for host genes. Ty1, Ty2, Ty3, and Ty4 integrate into gene-poor regions associated with the 5' ends of tRNA genes. Ty5 avoids the disruption of host genes by selecting regions of silent chromatin for its integration. Much less is known about the interactions between the genome of S. pombe and its transposons. The recent completion of the genome sequence of S. pombe allowed us to study the full set of transposon sequences and their relationship to the host genes.

Transposon Content of S. pombe
Bowen, Levin in collaboration with Wooda
We conducted a comprehensive search of the genome sequence of S. pombe for transposon-related sequences. Surprisingly, the only transposons found in the genome were related to the Tf1/Tf2 family of LTR retrotransposons. No complete copies of Tf1 were found, and only 13 full-length copies of Tf2 were identified. These, together with 202 single LTRs, constitute the 1.1 percent of the S. pombe genome that was derived from transposons. Single LTRs result from the removal of full-length elements by homologous recombination. As a result, the single LTRs provide important information about the transposon history of the host genome. A phylogenetic analysis of the LTRs identified 25 Tf1 and 59 Tf2 LTRs, indicating that these two families were probably the most recent to expand within the genome. We also identified 118 LTRs that did not associate in the phylogenetic tree with any other elements. They likely represent the LTRs of transposons that were active much earlier than Tf1 and Tf2. Additional examination of all the LTRs revealed that each element was located within intergenic regions of sequence. Since 60.2 percent of the genome of S. pombe is coding sequence, the positions of the LTRs were strongly biased.

On a larger scale, the LTRs were found to be widely distributed throughout each of the three chromosomes of S. pombe. However, it was particularly surprising that the concentration of LTRs on chromosome 3 was twice that of the other two chromosomes. The position of Tf LTRs in intergenic regions and their high density in chromosome 3 could be the result of either specific biases in the removal of LTRs or preferences in the selection of insertion sites. Our recently completed study of transposition events allowed us to distinguish between these two possibilities.

Association of Tf LTRs with Chromosome 3 and with Intergenic Segments Is the Result of Preferences in the Selection of Insertion Sites
Levin in collaboration with Singletonb
We conducted a genome-wide study of Tf1 transposition in S. pombe that was designed to avoid any biases associated with the transcription environment of the target sites. Accordingly, we modified the version of Tf1 that we typically use in transposition assays. Tf1 was expressed in diploid cells of S. pombe from an inducible promoter that was included in a high-copy plasmid. We also introduced into the transposon an origin of bacterial replication. We induced large numbers of cellsfor transposition and, without selecting for transposition events, extracted the DNA from the yeast cells; the DNA was cut with restriction enzymes, ligated, and introduced into bacteria. The plasmids encoding kanamycin resistance contained Tf1 as well as a sequence from the genome of S. pombe that served as an insertion site. We identified 51 independent insertion events. By comparing the location of the insertions with the annotations of the S. pombe genome sequence, we found that all but one of the events occurred within intergenic regions. The result demonstrates that the location of the preexisting LTRs was likely attributable to selection during integration. The systematic insertion of Tf1 into intergenic regions represents a novel method for protecting host genes from damage arising from integration.

An analysis of the intergenic regions disrupted by the insertions revealed a strong preference for sequences between gene pairs that were transcribed in either divergent or tandem directions. While 18 insertions occurred between divergent genes and 32 occurred between tandem genes, none of the insertions occurred between genes transcribed in convergent directions. Given that 1,299 gene pair in the genome of S. pombe are divergent, 1,302 are convergent, and 2,289 are tandem, we predicted that 13.3 of our inserts should be in divergent regions, 13.8 in convergent regions, and 24 in tandem regions. These results are surprising in that we expected a greater lack of insertion between a convergent pair of genes. However, this calculation does not reflect the fact that average regions between divergent and tandem genes are larger than the average space between convergent genes. If we take into consideration that the average size of intergenic regions between divergent genes is 1.34 kb and that the average sizes of regions between tandem and convergent genes are 0.97 and 0.56 kb, respectively, we would expect that unbiased insertion of the 51 events into intergenic regions would produce 18.9 insertions between divergent pairs, 24 between tandem pairs, and 7.6 between convergent pairs. Since no inserts were found between convergent genes, the insertion of Tf1-ori/neo demonstrated a strong bias against intergenic regions associated with convergent genes.

To identify the sequences within the intergenic regions that are recognized by Tf1, we mapped the distance from each insertion to the 5' and 3' ends of the adjacent coding sequences. Seventy-four percent of the insertions were closer to the 5' end of genes than to the 3' ends. Even though the association with the 5' ends of genes occurred with distances of up to 1.54 kb, significant clustering occurred within 300 nucleotides of the start of translation.

To determine whether the transposition required specific classes of intergenic spaces, we compared the sizes of the intergenic regions disrupted by Tf1-ori/neo with the average sizes in the genome. For the transposition events that occurred between tandem gene pairs, the average size of the intergenic space was 1.44 kb, which was larger than 0.97 kb, the average intergenic space between tandem genes within the genome. The average size of the divergent intergenic spaces that received inserts was 1.55 kb, which was somewhat larger than 1.3 kb, the genome average for divergent spaces. In summary, Tf1 integration occurred in intergenic regions that were larger than the average sizes.

Perhaps the most striking result was the observation that Tf1 integration had a significant preference for chromosome 3. Per unit length of DNA, chromosome 3 received approximately twice the number of inserts than occurred in chromosome 1 or 2. Such preference was found not to be attributable to differences in the distribution or composition of intergenic sequences within the three chromosomes. Our results demonstrate that chromosome 3 was, in some physiological aspect, distinct from the other chromosomes, perhaps because of a unique form of chromatin structure or the presence of chromosome-specific factors. One possible role for the chromosome 3 preference could be a means of equalizing the probability of insertion between each chromosome. Given that chromosome 3 is about half the size of chromosomes 1 and 2, an equal number of insertions per chromosome would cause chromosome 3 to have twice the density of events. Support for such a model comes from the observation that such an equalized process has been observed for the Ty elements of S. cerevisiae. Despite as much as six-fold differences in size, the three smallest chromosomes have approximately the same number of Ty1 elements per tRNA than did the three largest chromosomes.

Functions of RT of Tf1 that Are Specifically Required after DNA Synthesis Is Initiated
Atwood-Moore, Levin
The synthesis of cDNA is a critical step in the propagation of LTR-retroelements. The complete process of reverse transcription consists of a complex set of reactions that requires two specialized primers and the transfer of DNA intermediates to specific segments of the template. Although the biochemical properties of reverse transcriptase (RT) have undergone extensive study, little is known about the domains of RT that recognize the primers or mediate the transfer events. To investigate further, we used assays of yeast genetics to identify systematically the residues of RT that are required for late steps in the production of cDNA. The RT of Tf1 was first mutagenized by PCR, and we identified elements defective for transposition in S. pombe. Based on homologous recombination, the mutants were also subjected to a genetic assay that detects intermediates of reverse transcription. Of 6,000 transposons screened, we identified 60 that were defective for transposition but nevertheless produced large amounts of intermediates of reverse transcription. The results of DNA blotting revealed a surprising class of mutants that produce normal levels of full-length, double-stranded cDNA. Ten of the mutations clustered within a small region of RNase H, the domain of RT that degrades RNA in RNA:DNA duplexes. Interestingly, most of the mutations occur in residues that are highly conserved among a broad family of retroviruses and retrotransposons. One possibility is that the mutations inhibit transposition because they affect the precise cleavage of one of the RNA primers and, as a result, produce cDNA with altered sequences at their ends. Crystal structure data from the laboratories of E. Arnold and S. Hughes provided significant support for this possibility. The researchers cocrystalized HIV RT with an RNA:DNA hybrid whose sequence is the plus-strand primer of reverse transcription, or PPT. Most of the 10 mutations in Tf1 RT corresponded to residues of HIV RT in a subdomain termed the RNase H primer grip. The residues form direct contacts with critical nucleotides of the PPT, contacts that have been proposed to play a role in the recognition of the PPT byRNase H. We are currently testing the possibility that the mutations we isolated in the RNase H lead to improper cleavage of the PPT.

The Domains of Nup124p that Contribute to the Nuclear Import of Tf1
Levin in collaboration with Balasundaramc
Another primary objective is to identify host factors that are required for the propagation of retroelements. Although little is known about host functions required for retrovirus propagation, compelling arguments can be made for the contribution of host components in processes such as particle formation, the transport of integrase (IN) and cDNA into the nucleus, and the integration of cDNA into the host genome.

To investigate the contribution of host factors to the transposition of Tf1, we conducted large-scale screens for mutations in host genes that cause reduced Tf1 mobility. Interestingly, we identified one gene that encodes Nup124p, a nuclear pore factor that possesses a specific activity required for the nuclear import of Tf1 cDNA and protein. The protein contains an N-terminal domain that interacts with the Gag of Tf1. The C-terminal domain of Nup124p contains 11 copies of FXFG, a motif associated with the binding of transport receptors. The repeats are predicted to contribute to the nuclear import of Tf1 by binding to transport receptors associated with Gag and IN. To investigate which sequences of Nup124p contribute to Tf1 activity and to its association with the nuclear pore complexes, we generated an extensive series of deletions, one of which removed individual groups of the FXFG repeats. All the alleles of nup124 retained their ability to support Tf1 transposition. Surprisingly, when all FXFG repeats were removed, transposition activity was severely reduced. The results indicate that, although no specific set of FXFG repeats is required for transposition, a threshold number must be present. Interestingly, the removal of FXFG repeats did not reduce the association of Nup124p with the nuclear pores.

We tested whether the domain of Nup124p that interacts with Gag is required for transposition. By deleting different segments of the N-terminal domain, we identified a segment of 300 residues of Nup124p that is necessary for Tf1 transposition. The allele also retains its association with the nuclear pores. We are currently testing whether the 300 residues are necessary for the interaction with Gag.

The deletion of residues in the C-terminus of Nup124p revealed a domain that is required for its association with nuclear pores. The deletion of as few as 10 residues from the C-terminus of Nup124p caused a disruption of the association with the nuclear pores as indicated by immunofluorescence microscopy. As expected, the removal of the 10 residues from the C-terminus of Nup124p caused a severe defect in Tf1 transposition.

 

PUBLICATIONS

  1. Dang V, Levin H. Nuclear import of the retrotransposon Tf1 is governed by a nuclear localization signal that possesses a unique requirement for the FXFG nucler pore factor Nup124p. Mol Cell Biol 2000;20:7798-7812.
  2. Haag AL, Lin JH, Levin HL. Evidence for the packaging of multiple copies of Tf1 mRNA into particles and the trans priming of reverse transcription. J Virology 2000;74:7164-7170.
  3. Levin H. Newly identified retrotransposons of the gypsy/Ty3 class in fungi, plants, and vertebrates. Mobile DNA II. American Society of Microbiology, Washington, D.C., 2001, in press.
  4. Levin H. The retrotransposons of Schizosaccharomyces pombe. In: The molecular biology of Schizosaccharomyces pombe. Heidelberg: Springer, 2001, in press.
  5. Singleton T, Levin H. An LTR-retrotransposon of the fission yeast has a unique preference for a specific chromosome. Eukaryotic Cell, 2002,in press.

aV. Wood, The Sanger Centre, Cambridge, U.K.
bT. Singleton, Delaware State University, Dover, DE.
cD. Balasundaram, The Institute of Molecular Argobiology, Singapore.