Getting Xmeans software products programmatically in Weka

When using Kmeans in Weka, you can call getAssignments () on the resulting model output to get the cluster assignment for each given instance. Here is an example of (truncated) Jython:

>>>import weka.clusterers.SimpleKMeans as kmeans
>>>kmeans.buildClusterer(data)
>>>assignments = kmeans.getAssignments()
>>>assignments
>>>array('i',[14, 16, 0, 0, 0, 0, 16,...])

The index of each cluster number corresponds to the instance. So, instance 0 is in cluster 14, instance 1 is in cluster 16, etc.

My question is: is there something similar for Xmeans? I looked through the entire API here and don't see anything like it.

+5
source share
1 answer

Here is the answer to my question from the Weka newsletter:

 "Not as such. But all clusterers have a clusterInstance() method. You can 
 pass each training instance through the trained clustering model to 
 obtain the cluster index for each."

Here is my Jython implementation of this sentence:

 >>> import java.io.FileReader as FileReader
 >>> import weka.core.Instances as Instances
 >>> import weka.clusterers.XMeans as xmeans
 >>> import java.io.BufferedReader as read
 >>> import java.io.FileReader
 >>> import java.io.File
 >>> read = read(FileReader("some arff file"))
 >>> data = Instances(read)
 >>> file = FileReader("some arff file")
 >>> data = Instances(file)
 >>> xmeans = xmeans()
 >>> xmeans.setMaxNumClusters(100)  
 >>> xmeans.setMinNumClusters(2) 
 >>> xmeans.buildClusterer(data)# here our model 
 >>> enumerated_instances = data.enumerateInstances() #get the index of each instance 
 >>> for index, instance in enumerate(enumerated_instances):
         cluster_num = xmeans.clusterInstance(instance) #pass each instance through the model
         print "instance # ",index,"is in cluster ", cluster_num #pretty print results

 instance # 0 is in cluster  1
 instance # 1 is in cluster  1
 instance # 2 is in cluster  0
 instance # 3 is in cluster  0

, Weka.

+7

All Articles