| Current Activities |
|
Jarvis-Patrick Clustering [1] uses a nearest neighbor approach to
clustering objects. It is the method used by
Daylight Systems to cluster a chemical database since studies by
Dr. Peter Willett [2] found it to be the best method (see
Clustering Binary Objects for further
information).
To run this type of clustering, you need a measure of the distance between two objects and two integers, J and K. J is the size of the neighbor list, and K is the number of common neighbors. This method is as follows.
Several points need to be made about this procedure. The first is that it is not hierarchical. Only a single pass through all pairs of objects needs to be made. This produces clusters that are similar to those produced by a Simple Linkage Agglomerative Clustering, only it is much faster. In other words, if objects L and M pass the criteria listed above, as do objects L and N, all three will be placed in the same cluster whether or not objects M and N pass the criteria. The number of clusters produced is dependent upon the input values of J and K. Therefore, if a large number of objects needs to be clustered, several values may have to be tried before a reasonable number of clusters are formed and the number of singletons (or entropy clusters) is reasonably small. Daylight Systems suggests several methods to reduce the number of singletons. They also found that removing the first criterion "usually results in improved clustering" with a significant increase in the CPU-cost. This method has the advantage of identifying tight clusters within normal clusters by using two K-values, K and K' (K < K'). Two objects lie in the same cluster if the number of common neighbors is at least K, while they are in a tight cluster if this is at least K'.
References: |