Weka, SimpleKMeans cannot handle string attributes

I use Weka in Scala (although the syntax is almost identical to Java ). I am trying to evaluate my data using a SimpleKMeans cluster cluster, but the cluster will not accept string data. I do not want to group string data; I just want to use it to indicate points.

Here are the data I use:

@relation Locations
@attribute ID string
@attribute Latitude numeric
@attribute Longitude numeric
@data
'Carnegie Mellon University', 40.443064, -79.944163
'Stanford University', 37.427539, -122.170169
'Massachusetts Institute of Technology', 42.358866, -71.093823
'University of California Berkeley', 37.872166, -122.259444
'University of Washington', 47.65601, -122.30934
'University of Illinois Urbana Champaign', 40.091022, -88.229992
'University of Southern California', 34.019372, -118.28611
'University of California San Diego', 32.881494, -117.243079

As you can see, this is essentially a set of points on the x and y coordinate plane. The value of any patterns is negligible; it's just an exercise in working with Weka.

Here is the code that is causing me problems:

val instance = new Instances(new StringReader(wekaHeader + wekaData))

val simpleKMeans = new SimpleKMeans()
simpleKMeans.buildClusterer(instance)

val eval = new ClusterEvaluation()
eval.setClusterer(simpleKMeans)
eval.evaluateClusterer(new Instances(instance))

Logger.info(eval.clusterResultsToString)

I get the following error on simpleKMeans.buildClusterer(instance):

[UnsupportedAttributeTypeException: weka.clusterers.SimpleKMeans: cannot process string attributes!]

Weka ?


, :

Weka Explorer CSV:

ID, Latitude, Longitude
'Carnegie Mellon University', 40.443064, -79.944163
'Stanford University', 37.427539, -122.170169
'Massachusetts Institute of Technology', 42.358866, -71.093823
'University of California Berkeley', 37.872166, -122.259444
'University of Washington', 47.65601, -122.30934
'University of Illinois Urbana Champaign', 40.091022, -88.229992
'University of Southern California', 34.019372, -118.28611
'University of California San Diego', 32.881494, -117.243079

, Weka Explorer. Weka , . , . Weka Java API, Instances java.io.Reader ARFF.

:

val instance = new Instances(new StringReader(wekaHeader + wekaData))
instance.deleteAttributeAt(0)

val simpleKMeans = new SimpleKMeans()
simpleKMeans.buildClusterer(instance)

val eval = new ClusterEvaluation()
eval.setClusterer(simpleKMeans)
eval.evaluateClusterer(new Instances(instance))

Logger.info(eval.clusterResultsToString)

. , Weka , ID, .

+5
2

, , :

  • CSV

Sentry , CSV.

ARFF (, , Instances StringReader), StringToNominal:

  val instances = new Instances(new StringReader(wekaHeader + wekaData))

  val filter = new StringToNominal()
  filter.setAttributeRange("first")
  filter.setInputFormat(instances)

  val filteredInstance = Filter.useFilter(instances, filter)

  val simpleKMeans = new SimpleKMeans()
  simpleKMeans.buildClusterer(instance)
  ...

"" , . ( ), , , .


, cluster: Int -> Array[(ID, latitude, longitude)] ID -> cluster: Int. . , .

, simpleKMeans.getAssignments , . , , . Scala, zip , , groupBy map, . , ID , ID .

simpleKMeans.getClusterCentroids eval.clusterResultsToString(). , , ID . , , ID .

+5

, String CSV . , .

" !" . :

  • CSV Weka Explorer *.arff.
  • *.arrf , .
  • , *.arff.
  • *.arff . - .

→ *.arff

@attribute total numeric
@attribute avgDailyMB numeric
@attribute mccMncCount numeric
@attribute operatorCount numeric
@attribute authSuccessRate numeric
@attribute totalMonthlyRequets numeric
@attribute tokenCount numeric
@attribute osVersionCount numeric
@attribute totalAuthUserIds numeric
@attribute makeCount numeric
@attribute modelCount numeric
@attribute maxDailyRequests numeric
@attribute avgDailyRequests numeric

java.io.IOException: number expected, read Token[value.total], line 1750464
    at weka.core.converters.ArffLoader$ArffReader.errorMessage(ArffLoader.java:354)
    at weka.core.converters.ArffLoader$ArffReader.getInstanceFull(ArffLoader.java:728)
    at weka.core.converters.ArffLoader$ArffReader.getInstance(ArffLoader.java:545)
    at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:514)
    at weka.core.converters.ArffLoader$ArffReader.readInstance(ArffLoader.java:500)
    at weka.core.Instances.<init>(Instances.java:138)
    at com.lokendra.dissertation.ModelingUtils.kMeans(ModelingUtils.java:50)
    at com.lokendra.dissertation.ModelingUtils.main(ModelingUtils.java:28)
0

All Articles