Python Performance: Heaps

I am trying to write an optimal function that sets a list of elements, returns the original list with top_k elements at the beginning (they do not need to order themselves). I have no restrictions on the order of the remaining elements, but ideally I would like them to be in the original order.

I tried 3 approaches. Firstly, a trivial solution that runs in O (top_k * N) time. Secondly, using a bunch of priorities of the largest elements with O (log (top_k) * N) (the slowest) and, finally, brute force sorting the entire list of O (N * logN), which turned out to be the fastest)

def semi_sort_trivial(items, top_k=3):
    for i in range(top_k):
        maximum = items[i]
        pos = i
        for j in range(i+1, len(itemss)):
            if maximum < events[j]:
                pos = j
                maximum = items[j]
        # Swap maximum with the top i'th position under evaluation.
        items[pos], items[i] = items[i], items[pos]
    return items

def semi_heap_sort(items, top_k=3):
    lst = []
    heap_store = items[:top_k]
    for item in items[top_k:]:
        lst.append(heapq.heappushpop(heap_store, item))
    return heap_store + lst

def semi_sort_usingsort(items, top_k=3):
    lst = sorted(items)[-top_k:]
    return lst + [item for item in items if item not in lst]


In [7]: %timeit semi_heap_sort(range(20))
10000 loops, best of 3: 26.3 us per loop

In [8]: %timeit semi_sort_trivial(range(20))
100000 loops, best of 3: 11 us per loop

In [9]: %timeit semi_sort_usingsort(range(20))
100000 loops, best of 3: 5.89 us per loop

, . , . . , . ?

, . N = 20 k = 3 N log N 20 * 5 , , N- top_k 20 * 2. , semi_sort_usingsort. , ?

.

+3
3

-, k = 3, n = 20. python vs C . sort() , C , .

-, . . . , Python .

-, heapq.nlargest, n . .

-,

def semi_heap_sort(items, top_k=L):
    lst = []
    heap_store = items[:top_k]

heapq.heapify() , , .

    for item in items[top_k:]:

, , . .

        lst.append(heapq.heappushpop(heap_store, item))
    return heap_store + lst

:

. , : ( , .

def mod_heap_sorta(items, top_k=L):
    heap_store = items[:top_k]
    heapq.heapify(heap_store)
    remaining = itertools.islice(items, top_k, None)
    leftovers = [heapq.heappushpop(heap_store, item) for item in remaining]
    return heap_store + leftovers

, , top_k. , -, . sem_trivial_sort top_k. k sem_trivial_sort , semi_heap_sort - .

, - . , . . PyPy, .

+3

. .

,

  • O (N)

  • , O ( )

  • X k/N O (1)

  • a >= X O (N).

, O (N). ( , , , .)

. , , , O (k)

0

Another thing you can try is quicksort to ignore any interval starting from a position longer top_k.

0
source

All Articles