I have a table in Amazon dynamoDB with a record structure like
{"username" : "joe bloggs" , "products" : ["1","2"] , "expires1" : "01/01/2013" , "expires2" : "01/02/2013"}
where the products property is a list of products owned by the user and the expires n properties refer to products in the list, the list of products is dynamic, and there are many. I need to transfer this data to S3 in a format like
joe bloggs|1|01/01/2013
joe bloggs|2|01/02/2013
Using external hive tables I can map the username and products columns in dynamoDB, however I cannot map the dynamic columns. Is there a way I could extend or adapt org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler to interpret and structure the data received from dynamo before the hive absorbs it? or is there an alternative solution for converting dynamo data to first normal form?
One of my key requirements is that I support the throttling provided by the dynamodb.throughput.read.percent parameter, so I did not compromise the operational use of the table.
source
share