Lebanonian Cedar Some additional inquiry information
Left arrow Back to the help page Back to the main page Right arrow
Right arrow Criteria of branch quality

As it was repeatedly told, phylogenetic trees consist from branches. Every branch is a set of taxons, all branches unite in a general tree et cetera.
Surely, we don't know the real way of evolution, we can only suppose about it's ways (see also). That's why all these trees are, strongly speaking, not true. They present only reconstruction - the most probable way of this evolution.

Cluster method of phylogeny reconstruction is considered to be a very popular one. Besides, cluster approach can be used in modelling evolution. Every cluster is a branch, which unites several taxa. Separation of this cluster from other groups of taxa accords to an evolutionary event: some mutations in this group had place. The more general the cluster is, the earlier it separated and the more common fetures this cluster has. And vice versa.
But the majority of evolution reconstruction methods (including WPGMA, that is used here) don't know anything about the real taxonomy of alignment they work with. The whole reconstruction is based on amino acid sequences, presented in alignment. It can be, that the alignment consists lots of different sequences (some of them can even not be orthologs - see, for example, COG0477). There are no programs, which can reconstruct correct tree for such odd sets. But, as a mathematical (formal) algorithm, any method gives the whole tree - some structure (as a rule, phylogenetic trees are binary ones). It's necessary to mark up good branches in this structure.

There can be several ways of checking the quantity of tree branches. We use three of them - quantity of supporting positions, measure of branch separation (garno) and measure of kernel conservation (SavCons).

Quantity of supporting positions

Firstly, one can calculate the quantity of positions, supporting some branch. I. e., those positions, which can separate sequences of the branch from other sequences in the alignment. The number of such supporting positions is marked on every branch of phylogenetic tree.

Measure of branch separation

Then, every set of sequences can be simply estimated. As the metric is introduced, we can analyse distribution of sequences inside any set. For branch estimation two measures are suggested:
  - diameter of the branch - the maximum distance between sequences inside this branch:

Here and further: A - branch of the tree, i, j - sequences of alignment

  - distance between this branch and another part of alignment - minimal distance between two sequences; the former sequence is inside the branch, the latter is outside the branch:

Both these measures vary in different alignments, that's why their ratio seems to be more convenient (we called it garno:

On the page, where all branches of tree are descripted, these garnos are not cited, but they are considered. Branches, which are well-separated from the other part (i.e. this ratio is considerably greater than 1 - we accepted threshold 1.15), are green-colored; those branches, for which this ratio is near to 1 (more accurately, it varies from 0.85 to 1.15), are brown-colored; the worst branches (their garno is less than 0.85), are red-colored. Besides these standard colors, two additional are used. We decided to mark very noticeable branches (those, which garnos are more than 2) by blue color. At last, some branches consist only from 1 taxon (they are usually not interesting, but nevertheless are cited) - their diameter and, thus, garno cannpt be calculated. These branches are marked by black color. Also garno can't be calculated for the branches which consist of several taxa, but extract one taxon (i.e. include all except one taxa in the tree). Such branches are also marked by black color.

Measure of kernel conservation

Conserved positions in any aligned sequence set are assumed to keep the majority of properties of this set. That is why the quantity and the part of conserved positions are the main characteristics of alignments.
If a sequence set has some pecuiarity (it can be domain or site or specific motif etc.), this peculiarity will be shown as a group of conserved positions. The more conserved positions some branch has, the more specific it is. But, if we consider some highly conserved alignment (e.g robosomal proteins), almost every its group will have many conserved positions.
That's why number of conserved positions in some branch should be considered in complex with the distribution of conserved positions in the whole alignment. If the group loses almost all conserved positions (i.e. all peculiarities) after including any "foreign" sequence - it means, that this group is very specific. Vice versa, if nothing changes after adding new sequences into a group - this group has the same peculiarities with general set, but nothing special.
We decided to calculate the part of conserved positions, which are distinctive, i.e. which are lost after branch extension:

Notice that, after adding different sequences, different conserved positions could be lost. But this formula allows to calculate only the maximal part of positions, which can be lost after adding only one sequence. That means, SavCons gives the lower measure of group specificity, than it can really be.
Measures of kernel conservation are calculated for every branch, independently from its volume, and cited on the page with other branch parameters.

Arrow upstairs
When mistakes or interesting facts are found, please tell!