Iterate over all elements of a DynamoDB table

I am trying to iterate through all the elements in my DynamoDB table. (I understand that this is an inefficient process, but I am doing this one-time creation of the index table.)

I understand that the DynamoDB scan () function returns less than 1 MB or the limit provided. To compensate for this, I wrote a function that looks for the result of "LastEvaluatedKey" and repeated queries starting with LastEvaluatedKey to get all the results.

Unfortunately, it seems that every time my function is sung, every single key in the entire database is scanned, quickly processing my selected reading units. It is very slow.

Here is my code:

def search(table, scan_filter=None, range_key=None,
           attributes_to_get=None,
           limit=None):
    """ Scan a database for values and return
        a dict.
    """

    start_key = None
    num_results = 0
    total_results = []
    loop_iterations = 0
    request_limit = limit

    while num_results < limit:
        results = self.conn.layer1.scan(table_name=table,
                                  attributes_to_get=attributes_to_get,
                                  exclusive_start_key=start_key,
                                  limit=request_limit)
        num_results = num_results + len(results['Items'])
        start_key = results['LastEvaluatedKey']
        total_results = total_results + results['Items']
        loop_iterations = loop_iterations + 1
        request_limit = request_limit - results['Count']

        print "Count: " + str(results['Count'])
        print "Scanned Count: " + str(results['ScannedCount'])
        print "Last Evaluated Key: " + str(results['LastEvaluatedKey']['HashKeyElement']['S'])
        print "Capacity: " + str(results['ConsumedCapacityUnits'])
        print "Loop Iterations: " + str(loop_iterations)

    return total_results

Function call:

db = DB()
results = db.search(table='media',limit=500,attributes_to_get=['id'])

And my conclusion:

Count: 96
Scanned Count: 96
Last Evaluated Key: kBR23QJNAwYZZxF4E3N1crQuaTwjIeFfjIv8NyimI9o
Capacity: 517.5
Loop Iterations: 1
Count: 109
Scanned Count: 109
Last Evaluated Key: ATcJFKfY62NIjTYY24Z95Bd7xgeA1PLXAw3gH0KvUjY
Capacity: 516.5
Loop Iterations: 2
Count: 104
Scanned Count: 104
Last Evaluated Key: Lm3nHyW1KMXtMXNtOSpAi654DSpdwV7dnzezAxApAJg
Capacity: 516.0
Loop Iterations: 3
Count: 104
Scanned Count: 104
Last Evaluated Key: iirRBTPv9xDcqUVOAbntrmYB0PDRmn5MCDxdA6Nlpds
Capacity: 513.0
Loop Iterations: 4
Count: 100
Scanned Count: 100
Last Evaluated Key: nBUc1LHlPPELGifGuTSqPNfBxF9umymKjCCp7A7XWXY
Capacity: 516.5
Loop Iterations: 5

Is this the expected behavior? Or what am I doing wrong?

+5
1

, Amazon . -, , :

  • capacity units == reserved computational units
  • capacity units != reserved network transit

, , , Scan.

Scan

  • : 1 , , limit
  • :

capacity unit , . , , . , ... 0.5 capacity / cumulated KB

- , .

, , ~ 10 , .

, . 1.0 100 , cumulated size < 2KB

+4

All Articles