When strings are equivalent before rotation

I have a large number of lines. For my purposes, two lines are equivalent if one is the rotation of the other (for example, "1234" is equivalent to "3412").

What is the efficient way to process each line exactly once (before the turn) in Python?

A naive implementation of what I want might look like this:

class DuplicateException(Exception): pass
seen = set()
for s in my_strings:
  try:
    s2 = s+s
    for t in seen:

      # Slick method I picked up here in SO
      # for checking whether one string is
      # a rotation of another
      if len(s) == len(t) and t in s2:
        raise DuplicateException()

    seen.add(s)
    process(s)
  except DuplicateException: pass
+5
source share
2 answers

Choose the canonical way of representing the class of rotating strings (for example, the lexicographically smallest rotation between all possible rotations of a string) and work only with canonical representations (canonization).

For instance:

def canonicalize(s):
    return min(s[i:]+s[:i] for i in xrange(len(s)))

canonical_strings = {canonicalize(s) for s in my_strings}
for cs in canonical_strings:
    process(cs)
+6
source

, string , , , , .

, , , "rotate_to_smallest".

my_strings = ['1234', '123', '2341', '4312', '312', '56', '65', '1236']

def rotate_to_smallest(x):
    smallest = x
    for i in xrange(1, len(x)):
        rotation = x[i :] + x[: i]
        if rotation < smallest:
            smallest = rotation
    return smallest

def unique_rotations(my_strings):
    uniques = set(())
    for s in my_strings:
        smallest_rotation = rotate_to_smallest(s)
        if smallest_rotation not in uniques:
            uniques.add(smallest_rotation)
    return uniques

:

>>> unique_rotations(my_strings)
set(['1234', '56', '1243', '123', '1236'])
+3

All Articles