| Current Activities |
|
The Wisconsin Breast Cancer Data Set used digitized images of stained
nuclei from 569 breast tumors. Each image was examined to produce a
set of 30 descriptors (metrics). Ten features (radius, perimeter, etc.)
were examined for each isolated nucleus, and the average, standard
deviation, and maximum value of each was determined. In addition, a
separate determination of malignant or benign for each tumor was made.
The creators of this data set used a Machine Learning procedure to make perdictions. In particular, all possible sets of three descriptors were examined, which produced 3-dimensional descriptor spaces. A plane was passed though the data and optimized to maximize the separation of malignant and benign tumors. The best set of descriptors produced an incorrect diagnosis 3.5% of the time. This is very good results, but suffers from two draw-backs. The first is that the analysis takes a very long time, and the second is that for cancer, being wrong 3.5% of the time may be too much. I used Fuzzy Clustering to examine this set. I ran 10,000 6-member cross-validation studies. For each 563-member training set, the 30 descriptors were ordered from best to worst in their ability to distinguish malignant from benign. A threshold value was used to determine the number of descriptors to use (15 for this data set). A linear combination of these descriptors was created so that the differentiation between malignant and benign was maximized. This single metric was then used in a fuzzy clustering. When a membership threshold of 80% was used, a not sure response was returned 4% of the time, but the procedure was wrong less than 1% of the time. A not sure verdict simply means that further tests should be done before proceeding. All 10,000 samples were run in less than one CPU-minute on a Power2, 591 node of an IBM SP2. Therefore, this procedure is able to quickly produce fuzzy sets that significantly reduce the number of incorrect diagnoses at the expense of returning a not sure diagnosis 4% if the time. |