How to make mapreduce-friendly (-er) datastore keys?

Edit: see my answer. The problem was in our code. MR is working fine, it might have a status message problem, but at least input readers are working fine.

I did an experiment several times, and now I'm sure that mapreduce (or DatastoreInputReader) has weird behavior. I suspect that this may have something to do with key ranges and their splitting, but this is just my guess.

In any case, here we have the setting:

  • We have an NDB model called "AdGroup" when creating new objects for this model - we use the same identifier that was returned from AdWords (this is an integer), but we use it as a string: AdGroup(id=str(adgroupId))
  • We have 1,163,871 of these objects in our data warehouse (that the Data Warehouse Administration page tells us - I know this is not a completely accurate number, but we don’t create / delete ad groups very often, so we can say with confidence that number is 1.1 million or more).
  • starts mapreduce (from another pipeline) as follows:

    yield mapreduce_pipeline.MapreducePipeline(
        job_name='AdGroup-process',
        mapper_spec='process.adgroup_mapper',
        reducer_spec='process.adgroup_reducer',
        input_reader_spec='mapreduce.input_readers.DatastoreInputReader',
        mapper_params={
            'entity_kind': 'model.AdGroup',
            'shard_count': 120,
            'processing_rate': 500,
            'batch_size': 20,
        },
    )
    

So, I tried to run this mapreduce several times today, without changing anything in the code or making changes to the data store. Each time I started it, the counter of the card counters had a different value from 450,000 to 550,000.

Correct me if I am wrong, but considering that I am using the most basic DatastoreInputReader - mapper-calls should be equal to the number of entities. That should be 1.1 million or more.

: , , , , " 4 , !".

, - blobstore ( ), BlobstoreLineInputReader. , blob , DatastoreInputReader. , - ?

. DatastoreKeyInputReader - - mapper-calls 450 000 550 000.

, , . , ? int ids str ids? , , mapreduce, , ?

PS: , .

+5
1

, . , mapreduce , (mapper ).

google, ( ApplicationError). - , . , . MR - MR "" . , - .

0

All Articles