How to make mapreduce-friendly (-er) datastore keys?

Question

How to make mapreduce-friendly (-er) datastore keys?

Edit: see my answer. The problem was in our code. MR is working fine, it might have a status message problem, but at least input readers are working fine.

I did an experiment several times, and now I'm sure that mapreduce (or DatastoreInputReader) has weird behavior. I suspect that this may have something to do with key ranges and their splitting, but this is just my guess.

In any case, here we have the setting:

We have an NDB model called "AdGroup" when creating new objects for this model - we use the same identifier that was returned from AdWords (this is an integer), but we use it as a string: AdGroup(id=str(adgroupId))
We have 1,163,871 of these objects in our data warehouse (that the Data Warehouse Administration page tells us - I know this is not a completely accurate number, but we don’t create / delete ad groups very often, so we can say with confidence that number is 1.1 million or more).

starts mapreduce (from another pipeline) as follows:

yield mapreduce_pipeline.MapreducePipeline(
    job_name='AdGroup-process',
    mapper_spec='process.adgroup_mapper',
    reducer_spec='process.adgroup_reducer',
    input_reader_spec='mapreduce.input_readers.DatastoreInputReader',
    mapper_params={
        'entity_kind': 'model.AdGroup',
        'shard_count': 120,
        'processing_rate': 500,
        'batch_size': 20,
    },
)

So, I tried to run this mapreduce several times today, without changing anything in the code or making changes to the data store. Each time I started it, the counter of the card counters had a different value from 450,000 to 550,000.

Correct me if I am wrong, but considering that I am using the most basic DatastoreInputReader - mapper-calls should be equal to the number of entities. That should be 1.1 million or more.

: , , , , " 4 , !".

, - blobstore ( ), BlobstoreLineInputReader. , blob , DatastoreInputReader. , - ?

. DatastoreKeyInputReader - - mapper-calls 450 000 550 000.

, , . , ? int ids str ids? , , mapreduce, , ?

PS: , .

+5

google-app-engine python-2.7 mapreduce

Paulius 06 . '13 16:20

1

Paulius · Answer 1 · 2013-03-26T08:00:44+0000

, . , mapreduce , (mapper ).

google, ( ApplicationError). - , . , . MR - MR "" . , - .

How to make mapreduce-friendly (-er) datastore keys?

More articles: