wLake 1.7 algrithm

3D structures of proteins or their complexes are considered as related
if their significant parts can be reliably superimposed. Two water
molecules from different structures are called equivalent if the
distance between the centers of their oxygen atoms is less than a
threshold.

A set of pairwise equivalent water molecules from different 
structures is called a cluster of water molecules.

An input of the algorithm is a set of superimposed related 
structures. The goal is to find all ConWMs in the input.

Lemma. A graph G of |V| vertices and |E| edges cannot contain any 
cliques of a size Min or greater if |V| is less than Min or |E| is 
less than Min * (Min - 1) / 2. G is a complete graph if |E| = |V| * 
(|V| - 1) / 2.  Lemma is used in algorithm to reject graphs that 
cannot contain cliques and to check the completeness of a graph.  

At the first stage, a special graph is constructed. Each water 
molecule corresponds to a vertex in the graph. Two vertices of the 
graph are joined by an edge if water molecules are equivalent. 
Thus, a ConWM corresponds to a clique in the graph. To find 
all cliques of a size exceeding a fixed minimum (Min) the following 
algorithm was developed.

ALGORITHM wLake (Graph G)
If G can't contain a clique of >= Min vertices then
   return NULL
If G is complete then
   return G
Vmin := a vertex from G with the minimal valence
Vs := {Vmin}  {vertices connected with Vmin}
Es := {edges connecting any pair of vertices  Vs}
G' := new Graph ()
For each V  Vs do
  Add copy of V into G'
For each E  Es do
  Add copy of E into G'
Delete Vmin from G
return wLake (G)  wLake (G')

This generally exponential algorithm works rather fast in cases of 
graphs of water molecules (a few seconds of CPU for about thirty 
superimposed  structures). 

The result of the program is a list of ConWMs. To estimate the 
reliability of each detected cluster, we developed a special 
program WLStat3. The input of the program is a list of clusters of size 2 
and more detected by wLake 1.7 and a numbers of total and unclusters water 
molecules in the every experimental structure. WLStat3 randomly put all 
the water molecules into the different hydration site (the hydration site 
is a cluster of water molecules or a simple unclustered molecule) 2000 
times and count the number of clusters of a size 2, 3, etc observed. Than 
E-value for each detected SWM cluster is determined.

wLake and WLStat programs can be downloaded for off-line usage 
 here.