How to determine if two sections (clustering) of data points are identical?

Question

How to determine if two sections (clustering) of data points are identical?

I have ndata points in arbitrary space, and I cluster them.
The result of my clustering algorithm is a section represented by an int vector of llength n, assigning each point in the cluster. Values lrange from 0 to (possibly) n-1.

Example:

l_1 = [ 1 1 1 0 0 2 6 ]

The division n=7points to 4 clusters: the first three points are grouped together, the fourth and fifth - together, and the last two points form two separate singleton clusters.

My question is:

Suppose I have two sections l_1and l_2how can I effectively determine if they represent the same sections?

Example:

l_2 = [ 2 2 2 9 9 3 1 ]

l_1, ( , "" / "" ).
,

l_3 = [ 2 2 2 9 9 3 3 ]

, .

++, Python Matlab.

c1 = bsxfun( @eq, l_1, l_1' );
c2 = bsxfun( @eq, l_2, l_2' );
l_1_l_2_are_identical = all( c1(:)==c2(:) );

c1 n x n true, k m false ( "" / "" ).
, c1 c2 , l_1 l_2 .

, n , O (n^2)...

?

!

+5

c++ python algorithm matlab cluster-analysis

Shai 20 . '13 12:52

3

, , n n . O (n ^ 2).

: , . . , . O (n).

Python:

l_1 = [ 1, 1, 1, 0, 0, 2, 6 ]

l_2 = [ 2, 2, 2, 9, 9, 3, 1 ]

l_3 = [ 2, 2, 2, 9, 9, 3, 3 ]

d1 = dict()
d2 = dict()
c1 = []
c2 = []

# assume lists same length

match = True
for i in range(len(l_1)):
    if l_1[i] not in d1:
        x1 = len(c1)
        d1[l_1[i]] = x1
        c1.append(1)
    else:
        x1 = d1[l_1[i]]
        c1[x1] += 1

    if l_2[i] not in d2:
        x2 = len(c2)
        d2[l_2[i]] = x2
        c2.append(1)
    else:
        x2 = d2[l_2[i]]
        c2[x2] += 1

    if x1 != x2 or  c1[x1] != c2[x2]:
        match = False

print "match = {}".format(match)

+1

Bull 20 . '13 13:46

Matlab:

function tf = isIdenticalClust( l_1, l_2 )
%
% checks if partitions l_1 and l_2 are identical or not
%
tf = all( accumarray( {l_1} , l_2 , [],@(x) all( x == x(1) ) ) == 1 ) &&...
     all( accumarray( {l_2} , l_1 , [],@(x) all( x == x(1) ) ) == 1 );

:
l_1 l_2 , l_1 . l_2 l_1.
- .

0

Shai 20 . '13 13:29

Anony-Mousse · Accepted Answer · 2013-03-20T12:55:42+0000

?

, .

, , :

.

, .

, 1.. 7 - .

[ 1 1 1 4 4 6 7 ]
  ^ first occurrence at pos 1 of 1 in l_1 / 2 in l_2
        ^ first occurrence at pos 4

l_1 l_2, l_3

[ 1 1 1 4 4 6 6 ]

, :

l_4 = [ A B 0 D 0 B A ]

      [ 1 2 3 4 3 2 1 ]

"A" 1, "B" 2 ..

, , - //f1 , (a, b) , a b .

: , , .

, ( python):

def canonical_form(li):
  """ Note, this implementation overwrites li """
  first = dict()
  for i in range(len(li)):
    v = first.get(li[i])
    if v is None:
      first[li[i]] = i
      v = i
    li[i] = v
  return li

print canonical_form([ 1, 1, 1, 0, 0, 2, 6 ])
# [0, 0, 0, 3, 3, 5, 6]
print canonical_form([ 2, 2, 2, 9, 9, 3, 1 ])
# [0, 0, 0, 3, 3, 5, 6]
print canonical_form([ 2, 2, 2, 9, 9, 3, 3 ])
# [0, 0, 0, 3, 3, 5, 5]
print canonical_form(['A','B',0,'D',0,'B','A'])
# [0, 1, 2, 3, 2, 1, 0]
print canonical_form([1,1,1,0,0,2,6]) == canonical_form([2,2,2,9,9,3,1])
# True
print canonical_form([1,1,1,0,0,2,6]) == canonical_form([2,2,2,9,9,3,3])
# False

How to determine if two sections (clustering) of data points are identical?

My question is:

More articles: