How to group by trend instead of distance in R?

The k-medoids in the clara () function use distance to create clusters, so I get this pattern:

a <- matrix(c(0,1,3,2,0,.32,1,.5,0,.35,1.2,.4,.5,.3,.2,.1,.5,.2,0,-.1), byrow=T, nrow=5)
cl <- clara(a,2)
matplot(t(a),type="b", pch=20, col=cl$clustering) 

clustering by clara ()

But I want to find a clustering method that assigns a cluster to each row according to its trend, so rows 1, 2 and 3 belong to one cluster, and rows 4 and 5 to another.

+6
source share
3 answers

This question may be better for stats.stackexchange.com, but here is the solution anyway.

Your question is actually "How to choose the correct distance mark?". Instead of the Euclidean distance between these vectors, you need a distance that measures the similarity in the trend.

Here is one of the options:

a1 <- t(apply(a,1,scale))
a2 <- t(apply(a1,1,diff))

cl <- clara(a2,2)
matplot(t(a),type="b", pch=20, col=cl$clustering) 

enter image description here

, , , . , , . .

. , . , , "". , .

+5

. 90% .

, , , , , . , , 0 1. , !

+2

k means, , , N* N .

, / , .

R

a <- matrix(c(0,1,3,2,0,.32,1,.5,0,.35,1.2,.4,.5,.3,.2,.1,.5,.2,0,-.1),byrow=T, nrow=5)

library(TSclust)

library(reshape2)

Tech1 <- diss(a,"COR")       # Correlation
Tech2 <- diss(a,"EUC")       # Euclidean Distance
Tech3 <- diss(a, "DTW")      # Dynamic Time Wrapping

clust1 <- kmeans(Tech1, 3)
clust1 <- kmeans(Tech2, 3)
clust1 <- kmeans(Tech3, 3)

clust1$cluster
>> 1 2 3 4 5 
>> 1 2 2 3 3 

clust2$cluster
>> 1 2 3 4 5 
>> 1 2 2 3 3

clust3$cluster
>> 1 2 3 4 5 
>> 3 2 2 1 1 
+1

All Articles