Lebanonian Cedar Some additional inquiry information
Left arrow Back to the help page Back to the description of this program Left arrow
Right arrow Distances between sequences

It is considered that the space, where all amino acids and sequences are analysed, is a metric one. It means, a special measure, named "distance", is entered. It shows, how far (or how similar) to each other the proteins are. Here, the structural similarity is considered and, through this, a kind of functional similarity is explained. Among 20 amino acids there are some specific (e. g., tryptophan), which are present rather seldom. Another group of amino acids - aliphatic acids, very often, and, besides, not very special. It's evidently that these two cases should be differed.
Pair distance is closely connected with substitution parameters (they show the measure of tendency of replacing of one amino acid by another. The more similar two amino acids are, the oftener they substitute each other). All these quantities are experimentally defined.
Distance between two acids i and j is calculated as


where S(i,j) is the element of substitution matrix, placed in the i-th row and the j-th column. As substitution matrices are symmetrical, obtained distance matrix is also symmetrical: D(i,j) = D(j,i).
Default substitution matrx, used in this work, is BLOSUM62. It is considered to be one of the best. But, if one decides to work with other matrices (e.g. with those of PAM family), it is possible on the server math.belozersky.msu.ru.
When the distance between amino acids is entered, it is rather easy to consider such measure as pair distance between sequences in any position of alignment (in fact, this distance is just a distance between two amino acids). As distance is an additional quantity, pair distance between sequences on all alignment positions can be easily defined. When all pair distances are calculated, matrix of pair distances is filled.
Note that such matrix can be also made for any position - in this case it contains not general pair-wise distances, but distances between sequences in this position.

With the help of entered qualities it is easy to express some parameters of alignments and their positions. Thus, average distance in position is a measure of homogeny of this position. Really, the less is the distance between sequences, the more similar these sequences are, and vice versa. Hence, the less is the average distance in alignment position, the more homogenous this position is.
Further, if the meaning of any position of amino acid sequence has some influence on general properties of protein (we call such positions diagnostic), it must be well-correlated with the whole sequence. Two sequences, similar to each other (and, hence, having little pair distance), should be also similar in their diagnostic positions, and vice versa. In opportunity, meanings of pair distances in whole alignment and non-diagnostic positions (we call them variable) don't depend upon each other. In terms of mathematics it means the following:
Let's consider two matrices of pair distances between alignment sequences. The former is a matrix on the whole alignment. Another matrix - on the considered position. For diagnostic positions these two matrices are high-correlated, for variable positions they are low-correlated.

Arrow upstairs
When mistakes or interesting facts are found, please tell!