Fuzzy K-Means is exactly the same algorithm as K-means, which is a popular simple clustering technique. The only difference is, instead of assigning a point exclusively to only one cluster, it can have some sort of fuzziness or overlap between two or more clusters. Following are the key points, describing Fuzzy K-Means:
Unlike K-Means, which seeks hard cluster, wherein each of the points belongs to one cluster, Fuzzy K-Means seeks the softer clusters for overlapping.
A single point in a soft cluster can belong to more than one cluster with a certain affinity value towards each of the points.
The affinity is in proportion with the distance of that point from the cluster centroid.
Similar to K-Means, Fuzzy K-Means works on the objects that have the distance measure defined and can be represented in the n-dimensional vector space.
Fuzzy K-Means MapReduce Flow
There’s not a lot of difference between the MapReduce flow of K-Means and Fuzzy K-Means. The implementation of both in Mahout is similar.
Following are the essential parameters for the implementation of Fuzzy K-Means:
You need a Vector data set for input.
There has to be the RandomSeedGenerator to seed the initial k clusters.
For distance measure SquaredEuclideanDistanceMeasure is required.
A large value of convergence threshold, such as –cd 1.0, if the squared value of the distance measure has been used
A value for maxIterations; the default value is -x 10.
The coefficient of normalization or the fuzziness factor, with a value greater than -m 1.0
Got a question for us? Mention them in the comments section and we will get back to you.
Hi Amir, you are facing this error due to Mahout dependencies mismatch. You should use Mahout 5 API to run this program along with Hadoop jars. Hope this helps!
This is the code for Fuzzy K-Means in Mahout; however, it does not work. Could you please help me about it?
https://github.com/tdunning/MiA/blob/master/src/main/java/mia/clustering/ch09/FuzzyKMeansExample.java.
It seems that the problem is related to this part :
List<List> finalClusters = FuzzyKMeansClusterer
.clusterPoints(sampleData, clusters,
new EuclideanDistanceMeasure(), 0.01, 3, 10);
Hi Amir, you are facing this error due to Mahout dependencies mismatch. You should use Mahout 5 API to run this program along with Hadoop jars.
Hope this helps!