Current Activities
Diversity Selection
Brian T. Luke

In Diversity Selection, the object is to generate a small subset of a set of objects that span the entire metric space of the full set. One method of doing this is to perform a Hierarchical Clustering of the objects up to a given threshold. The subset of objects would then be those objects that are closest to the centroid of each set.

A different method is to use a Greedy Procedure, which was originally done to produce the non-homologous set of peptide chains in the Protein Data Bank. This procedure is as follows:

  1. Select a threshold distance.
  2. Find the distance between all pairs of objects and place them in each other's neighbor list if their distance is less than the threshold.
  3. Find the object with the largest neighbor list. If two or more objects have this maximum-sized neighbor list, select the object with the smallest total distance to all of its neighbors.
  4. Remove the selected object from the set of objects, and remove it from the neighbor list of all nearby objects.
  5. Return to Step 3 until all remaining objects only have themselves in their neighbor list.

This procedure generates a subset of objects that have an inter-object distance greater than the threshold. What is needed is a procedure to determine the distance between each pair of objects.

Example of Diversity Selection.