Replication of data from the GAE data warehouse

We have an application that we are deploying to GAE. I was instructed to come up with options for replicating the data that we store the GAE data warehouse to a system running in the Amazon cloud.

Ideally, we could do this without transferring the entire data warehouse to each synchronization. Replication should not be anything close to real time, so something like synchronizing once or twice a day will work fine.

Can someone with some experience with GAE help me here, what options might be? So far I have come up with:

  • Use the bulkloader.py file provided by Google to export the data to CSV and somehow transfer the CSV to Amazon and execute there

  • Create a Java application that runs in GAE, reads data from the data warehouse, and sends the data to another Java application that runs on Amazon.

Do these options work? What would happen to them? What other options are there?

+3
source share
1 answer

You can use logic similar to what the migration or App Engine HRD backup engine does:

  • Mark modified objects with a child marker
  • Launch MapperPipeline using the App Engine change library , iterate over these objects using the Datastore Input Reader
  • In your mapping function, select the parent object and serialize it into Google Storage using the Output Writer file and remove the marker
  • Ping URL- Google Storage

3 4 urlfetch (POST) , , .

datastore .

+5

All Articles