Yeast intein-encoded LAGLIDADG homing endonucleases
In general, homing endonucleases are meganucleases (i.e. they recognise and cut DNA sequences longer than 12 bp) encoded in various intervening sequences. They introduce double strand breaks in the allele lacking the intervening sequence and, thus, induce homology recombination and propagation of sequences that encode them. The best characterized family of homing endonucleases is the LAGLIDADG family. This family could be divided into two subfamilies, intron- and intein encoded. Intein-encoded homing endonucleases are translated as part of a host-protein; right after translation the homing endonuclease splices itself out of this host, so that two functional proteins are generated: the mature host protein and the homing endonuclease. This process is called protein splicing. Most intein-encoded LAGLIDADG homing endonuclesases consist of 4 domains: 2 LAGLIDADG domains that cleave the DNA, protein-splicing domain and additional DNA-binding domain involved in target site recognition. Here we investigate the diversity of yeast intein-encoded LAGLIDADG endonucleases, and annotate their particular residues in terms of importance for DNA recognition, binding and cleavage.
PFAM family PF05204 contains sequences of most of the yeast intein-encoded LAGLIDADG endonucleases. Since PFAM databank contains only those parts of the sequences, which correspond to LAGLIDADG domains, the alignment of whole sequences was generated and used for further studies (parts of the sequences corresponding to the host proteins were cut off). This alignment was manually refined according to structural information, available for one of the representatives of yeast intein-encoded LAGLIDADG nucleases, PI-SceI. Also, sequences of three more intein-encoded nucleases were added to the alignment, their host proteins are RGYR_METJA, P95484_PYRFU and DPOL_PYRKO, structures being available for two latter proteins (PDB-codes 1DQ3 and 2CW8 respectively). Though these three proteins also represent intein-encoded LAGLIDADG endonucleases, their sequences seem to vary from the rest, yet structurally the LAGLIDADG domain and the core of protein splicing domain of P95484_PYRFU and DPOL_PYRKO are aligned well with those of PI-SceI. Sequences of almost all yeast intein-encoded LAGLIDADG nucleases seem to be rather conserved, especially in the LAGLIDADG motifs, which are characteristic of all LAGLIDADG endonucleases. For three additional sequences these motifs seem to be almost only region of homology. This alignment also contains information about domain distribution along the sequence; secondary structure elements distribution for PI-SceI (PDB-entry 1VDE was used), P95484_PYRFU and DPOL_PYRKO; and functional annotation of residues involved in DNA binding and cleavage (see further).
Download the alignment of intein-encoded LAGLIDADG endonucleases.
Protein-DNA contacts of PI-SceI protein as a representative of the family with its recognition site were analyzed, for which purpose two structures, 1LWS and 1LWT, solved at resolution of 3.5 and 3.2 Å respectively, were used. Possible hydrogen bonds and hydrophobic interaction were taken into account. Possible hydrogen bond was determined if a hydrogen-donor atom of the DNA was deteced closer than 3.7 Å to hydrogen-acceptor atom of the protein and vice versa. Hydrophobic interactions were determined by detecting clusters of hydrophobic atoms with CluD, all parameters being set to default.
The results obtained were compared to the available experimental data. Paper  describes PI-SceI target site mutagenesis and ethylation interference experiments, which results are thought to reflect the importance of particular base pairs and phosphate groups for the DNA binding and cleavage by PI-SceI. Contacts made by PI-SceI according to the structural information with the nucleotides shown to be important in this paper were picked out and compared to all possible protein-DNA contacts. From all mutated base pairs only those that decreased the percentage of cleaved DNA lower than 20% were chosen (68% for wild type, see paper for details); for these base pairs only those contacts that were made to major or minor groove of the DNA were considered. As for the ethylation interference experiments, contacts made to those backbone phosphates that had peak hight ratio higher than 2.0 (see paper for details) were considered.
Papers , ,  and , contain information on mutagenesis studies of PI-SceI protein itself. For comparison with structural studies results we chose those residues that decrease protein activity to less than 10% compared to wild type when mutated.
Use Jmol applet to explore the structural analysis results and experimental data results; PDB-entry 1LWT is used.
Structural studies revealed 37 amino acid residues that make contacts to the DNA and are possibly important for DNA recognition and binding. 25 residues are possibly making hydrogen bonds with DNA backbone (53, 55, 57, 58, 65, 112, 124, 127 ,129, 169, 227, 261, 274, 275, 277, 278, 280, 281, 282, 328, 362, 376, 378, 384); 8 residues are possibly making hydrogen bonds with DNA major or minor groove(55 ,90, 94, 170, 223, 340, 366, 377); 12 residues are possibly involved in protein-DNA hydrophobic interactions (92, 94, 170, 173, 220, 328, 340, 364, 366, 375, 377, 384). Experimental mutagenesis of amino acid residues of PI-SceI allows picking out only 18 residues (90, 94, 218, 229, 231, 232, 277, 279, 281, 282, 284, 301, 326, 328, 340, 341, 377, 403), 8 of which coincide with structurally predicted ones(90, 94 277, 281, 282, 328, 340, 377). Some of the residues shown to be important for activity, but not involved in DNA binding, are known to coordinate metal ions (218, 326) or water molecules (229, 301, 341, 403), needed for cleavage. Another 18 residues interact with either base pairs shown to be important by mutagenesis, or phosphates that interfere with PI-SceI binding when ethylated (53, 55, 57, 58, 65, 94, 112, 124, 127, 129, 169, 170, 340, 362, 366, 375, 377, 384). Totally 23 of 37 theoretically predicted residues coincide with either those confirmed by mutagenesis or those that interact with nucleotides, shown to be important by mutagenesis or ethylene interference experiments (53, 55, 57, 58, 65, 90, 94, 112, 124, 127, 129, 169, 170, 277, 281, 282, 328, 340, 362, 366, 375, 377, 384). Only 3 residues (94, 340, 377) seem to be important according to all data. Arg94 forms hygrogen bonds with major groove atoms of G31 from chain B and T7 from chain C. Lys340 binds to G17.O6 from chain B. His377 binds to G14.O6 from chain B. G17 and G14 are situated close to cleavage site, while G31 and T7 are in the region of interaction with additional DNA-binding domain.
Download RasMol/Jmol scripts Return to the main page