This is not a typical question, but I have no ideas and donβt know where else to go. If there are better places to ask about this, just point them to me in the comments. Thank.
Situation
We have this web application that uses the Zend Framework , so it runs on PHP on the Apache web server . We use MySQL to store data and memcached to cache objects.
The application has a very unique usage and download pattern. This is a mobile web application where, every full hour, cronjob scans the database for users who have some information waiting or performing actions, and sends this information to the (external) notification server, which pushes these notifications to them. After users receive these notifications, go to the application and use it mainly for a very short time. An hour later, the same thing happens.
Problem
In the past few weeks, the use of the app has really begun to grow. Over the past few days, we have encountered a very high load and a doubling of the application response time during and after sending these notifications (this is basically every hour). The server does not crash and does not respond to requests, it becomes slower and slower, and recovery takes no more than 20 minutes - until the same thing starts again at full hour.
We have extensive monitoring in place (New Relic, collectd), but I cannot understand what happened; I can not find the bottle. This is where you came from:
Can you help me figure out what happened, and maybe how to fix it?
Additional Information
16- Intel Xeon ( , 8 ) 12 Ubuntu 10.04 (Linux 3.2.4-20120307 x86_64). Apache - 2.2.x, PHP - 5.3.2-1ubuntu4.11.
- , , .
collectd
(, - gif , , )