Strange CloudWatch Behavior

I have a backup script that runs every 2 hours. I want to use CloudWatch to track the successful execution of these scripts and CloudWatch Alarms to receive notifications when a problem with the script is running.

The script puts the data point in the CloudWatch metric after each successful backup:

    mon-put-data --namespace Backup --metric-name $metric --unit Count --value 1

I have an alarm that goes into ALARM state when the statistics "Sum" in the metric are less than 2 for a 6-hour period.

To check this setting, a day later I stopped putting the data in the metric (i.e. I commented out the mon-put-data command). Well, in the end the alarm went into ALARM state, and I received an email notification, as expected.

The problem is that after a while the alarm will return to the "OK" state, however, new data is not added to the metric!

Two transitions (OK => ALARM, then ALARM => OK) were registered and I will reproduce the logs in this question. Note that although both show "period: 21600" (i.e. 6 hours), the second shows a 12-hour period between startDate and queryDate; I see that this may explain the transition, but I don’t understand why CloudWatch is considering a 12-hour period to calculate statistics with a 6-hour period!

What am I missing here? How do I set up alarms to achieve what I want (i.e., Receive notification if backups fail)?

{
    "Timestamp": "2013-03-06T15:12:01.069Z",
    "HistoryItemType": "StateUpdate",
    "AlarmName": "alarm-backup-svn",
    "HistoryData": {
        "version": "1.0",
        "oldState": {
            "stateValue": "OK",
            "stateReason": "Threshold Crossed: 1 datapoint (3.0) was not less than the threshold (3.0).",
            "stateReasonData": {
                "version": "1.0",
                "queryDate": "2013-03-05T21:12:44.081+0000",
                "startDate": "2013-03-05T15:12:00.000+0000",
                "statistic": "Sum",
                "period": 21600,
                "recentDatapoints": [
                    3
                ],
                "threshold": 3
            }
        },
        "newState": {
            "stateValue": "ALARM",
            "stateReason": "Threshold Crossed: 1 datapoint (1.0) was less than the threshold (2.0).",
            "stateReasonData": {
                "version": "1.0",
                "queryDate": "2013-03-06T15:12:01.052+0000",
                "startDate": "2013-03-06T09:12:00.000+0000",
                "statistic": "Sum",
                "period": 21600,
                "recentDatapoints": [
                    1
                ],
                "threshold": 2
            }
        }
    },
    "HistorySummary": "Alarm updated from OK to ALARM"
}

The second one I just can't understand:

{
    "Timestamp": "2013-03-06T17:46:01.063Z",
    "HistoryItemType": "StateUpdate",
    "AlarmName": "alarm-backup-svn",
    "HistoryData": {
        "version": "1.0",
        "oldState": {
            "stateValue": "ALARM",
            "stateReason": "Threshold Crossed: 1 datapoint (1.0) was less than the threshold (2.0).",
            "stateReasonData": {
                "version": "1.0",
                "queryDate": "2013-03-06T15:12:01.052+0000",
                "startDate": "2013-03-06T09:12:00.000+0000",
                "statistic": "Sum",
                "period": 21600,
                "recentDatapoints": [
                    1
                ],
                "threshold": 2
            }
        },
        "newState": {
            "stateValue": "OK",
            "stateReason": "Threshold Crossed: 1 datapoint (3.0) was not less than the threshold (2.0).",
            "stateReasonData": {
                "version": "1.0",
                "queryDate": "2013-03-06T17:46:01.041+0000",
                "startDate": "2013-03-06T05:46:00.000+0000",
                "statistic": "Sum",
                "period": 21600,
                "recentDatapoints": [
                    3
                ],
                "threshold": 2
            }
        }
    },
    "HistorySummary": "Alarm updated from ALARM to OK"
}
+5
1

( INSFUCCIENT_DATA, Cloudwatch ( 6- ), 6- . 6- (, 12- , ).

" " , 1 /3600 , . INSFUCCIENT_DATA, .

, (.. , )?

1, , 0, . < 1 3 - 3600 , , , ( , ). INSFUCCIENT_DATA , , .

, .

+5

All Articles