| Current Activities |
|
The concepts of Fuzzy Set Theory, when treated in a purely mathematical
form, can be quite confusing. This outline describes the basic concepts by
examining a sample set of data. This method has been used in a
Breast Cancer Diagnosis study to classify
tumors from the
Wisconsin Breast Cancer Data Set as either malignant or benign.
The objective of any Decision Support procedure is to use the training set of data to develop a procedure that can be used to classify a new set of values. Machine Learning, Neural Networks, and several other procedures will take the new values and return a definitive result. In other words, this new sample belongs either to one set or the other. In this description, I'll label them Set-0 and Set-1, where Set-0 could be false, benign, cold or lean, and Set-1 could be true, malignant, hot or rich. Fuzzy Set Theory is different from these in that all results, in principle, belong to both sets, only to different degrees. For example, one set of values could result in a 98% membership in Set-0 and a 2% membership in Set-1, meaning that it is very likely in Set-0. Another set of values could yield a 20% membership in Set-0 and an 80% membership in Set-1, meaning that it probably is in Set-1, while a third set of values could yield a 45% membership in Set-0 and a 55% membership Set-1, meaning that the classification program is not sure what set it's in. This capability of the program to return a not sure result is what differentiates Fuzzy Set Theory from the other methods and, to me at least, is what gives it its strength. This could either mean further testing is necessary or ,in control theory, that nothing should be done at the present time. Now that all the background stuff is out of the way, Fuzzy Set Theory will be used with an example data set. Though a given data set probably contains values for many descriptors and a classification, I prefer to first reduce the values of the descriptors down to a single value. This is done by forming a linear combination of descriptors that yields an optimal separation between entries of Set-0 and Set-1. One way of doing this is described here. Assuming that this has been done, we have a set of values and corresponding classifications for each. Such a set is shown in Table 1, and again in Figure 1. This set contains 20 data points in Set-0 and 20 in Set-1. An algorithm that returns a definitive result would try to find an optimum point in the range of values that produces two Crisp Sets. Such a point would be 0.045 (or any point between 0.4 and 0.5), and if the value is less than 0.045 it is in Set-0, and in Set-1 if the value is greater than 0.045. Unfortunately, such an algorithm would be wrong 25% of the time since five of the Set-0 points are greater than this value and five of the Set-1 points are less. We will now create two fuzzy sets that contain all of these points to some degree. This is done by creating two membership functions, one for each set. The only rule of Fuzzy Set Theory that we need to be concerned about here is that every data point must be completely defined. This means that the sum of the fractional memberships in all fuzzy sets must be 1.0 for all points. This is actually very easy to do, as will be shown below. Though many different membership functions are possible, I will treat this example problem by defining two knot points, one for each set. The unnormalized membership in each set will be inversely proportional to the distance between the value and its knot. If the knots for Set-0 and Set-1 are labeled C0 and C1, respectively, and the value for a given data point is labeled Xi, the unnormalized memberships in Set-0 (p0i) and Set-1 (p1i) are given by
p0i = 1/f(D0i)
In these equations, ABS is the absolute value and f() is some function of this distance. The normalized memberships in Set-0 (P0i) and Set-1 (P1i) are simply
P0i = p0i/(p0i + p1i)
which ensures that P0i + P1i = 1.0 for all i. The procedure now becomes finding values for the knots and the function that yields good membership functions. As a first attempt, I will place the knots at the average values for each set, which happens to be -1.0 for Set-0 and 1.0 for Set-1. I will now show results for various choices of the function. I want to make sure that the function never becomes 0.0 so that the unnormalized membership functions remain finite. If we try f(D0i) = SQRT(1.0 + D0i) the normalized membership functions for different values of Xi are plotted in Figure 2. In this figure, the red curve is the membership function for Set-0 and the blue is for Set-1. The green line simply shows a membership value of 1.0, and is the sum of the red and blue curves. It is hopefully clear from this figure that using the square-root of one plus the distance from the knot is a very bad choice. All possible values of Xi have a maximum membership of 0.634 (63.4%) in any set. This would result in a not sure response for all points. If the function is changed to simply one plus the distance, f(D0i) = 1.0 + D0i the membership functions are shown in Figure 3. This is slightly better, but still shows that no point is more than 75% in a set. Increasing the function to f(D0i) = (1.0 + D0i)2 yields the membership functions shown in Figure 4. This is even better, but the decrease in membership for the largest and smallest values of Xi is not good. Finally, for
f(D0i) = (1.0 + D0i)4
the membership functions are shown in Figure 5 and Figure 6, respectively. Using one plus the distance from the knot to the sixth power yields reasonable membership functions. If this is used with our set of data, the fractional memberships in each set are shown in Table 2. These results show that a cutoff membership of 0.900 (actually any value between 0.888 and 0.955) can be used to state with certainty whether a data point belongs to a particular set. Those data points with a maximum membership in any set that is below this value would have to return a not sure response since points from both sets lie in this range. The cutoff value can be changed by either raising the exponent of the function to a value larger than six, or by moving the knot points. For example, if the knots are moved to -1.5 and 1.5 for Set-0 and Set-1, respectively, the plot of the normalized membership functions are shown in Figure 7 and the fractional memberships for each data point are shown in Table 3. Here, the cutoff point can be reduced to any value between 0.839 and 0.919. Finally, if the knots are moved to -2.0 and 2.0 for Set-0 and Set-1, respectively, the plot of the normalized membership functions are shown in Figure 8 and the fractional memberships for each data point are shown in Table 4. Here, the cutoff point can be reduced to any value between 0.798 and 0.883. One final point is that this procedure did nothing to change the rank ordering of the points in the data set. For any membership function, there will always be a cutoff where 15 of the 40 data set points return a value of not sure. The purpose of using fuzzy sets instead of crisp sets is that crisp sets will be wrong 25% of the time, while fuzzy sets will be correct 62.5% of the time and uncertain 37.5% of the time. This reduction in the percent correct is offset by an elimination of wrong classifications. If some level of error is tolerated, the cutoff in Table 4 can be reduced to about 0.728. This reduces the uncertainty to 30% of the time, increases the correct prediction to 67.5% of the time, and makes the prediction wrong 2.5% of the time. In conclusion, using fuzzy sets will reduce the percentage of correct predictions and introduce a range of values where the response is not sure, but greatly reduce the probability that an incorrect prediction will be made. For many real-world problems, not sure or do nothing for now is greatly preferred over making an incorrect prediction. In these cases, Fuzzy Set Theory may be a better way to handle the problem. |