I am considering using AWS DynamoDB for the application we are creating. I understand that setting up a backup job that exports data from DynamoDB to S3 includes a data pipeline with EMR. But my question is: do I need to worry about the backup task being configured on day 1? What are the chances of losing data?
This is really subjective. IMO, you should not worry about them "now." You can also use simpler solutions besides pipleline . Perhaps this will be a good start.
DynamoDB , . . , , SDK .
DynamoDB :
(1) S3 , , , ( ?)
(2) S3, -. , S3, , , RDBMS (RDS ) S3 . EMR Redshift (ETL) BI. Redshift, ELT- - Redshift
(3) ( ) ( , ) - . - , , . , , DynamoDB, .
(4) S3. , - DynamoDB - concurrency .
AWS Data Pipeline ( EMR ).
, , , , .
S3. .
Dynamo DB , ( ). - .
You can say that Pipeline only consumes, say, 25% of the capacity when backing up so that your real users do not notice a delay. Each backup is "full" (not incremental), so at some periodic time interval you can delete several old backups if you are concerned about storage.