I run the following query weekly, but it gets to the point that it takes 22 hours now! The purpose of the report is to combine the data on impressions and conversions in the placement and the date of placement, so the main table that I request does not have a primary key, since there can be several events with the same date / placement.
The main data set contains about 400 thousand records, so it does not take more than a few minutes to run this report.
Description of the table:
tbl_ads (400,000 entries)
day_est DATE (index)
conv_day_est DATE (index)
placement_id INT (index)
adunit_id INT (index)
cost_type VARCHAR(20)
cost_value DECIMAL(10,2)
adserving_cost DECIMAL(10,2)
conversion1 INT
estimated_spend DECIMAL(10,2)
clicks INT
impressions INT
publisher_clicks INT
publisher_impressions INT
publisher_spend DECIMAL (10,2)
source VARCHAR(30)
map_external_id (75,000 entries)
placement_id INT
adunit_id INT
external_id VARCHAR (50)
primary key(placement_id,adunit_id,external_id)
SQL Query
SELECT A.day_est,A.placement_id,A.placement_name,A.adunit_id,A.adunit_name,A.imp,A.clk, C.ads_cost, C.ads_spend, B.conversion1, B.conversion2,B.ID_Matched, C.pub_imps, C.pub_clicks, C.pub_spend, COALESCE(A.cost_type,B.cost_type) as cost_type, COALESCE(A.cost_value,B.cost_value) as cost_value, D.external_id
FROM (SELECT day_est, placement_id,adunit_id,placement_name,adunit_name,cost_type,cost_value,
SUM(impressions) as imp, SUM(clicks) as clk
FROM tbl_ads
WHERE source='delivery'
GROUP BY 1,2,3 ) as A LEFT JOIN
(
SELECT conv_day_est, placement_id,adunit_id, cost_type,cost_value, SUM(conversion1) as conversion1,
SUM(conversion2) as conversion2,SUM(id_match) as ID_Matched
FROM tbl_ads
WHERE source='attribution'
GROUP BY 1,2,3
) as B on A.day_est=B.conv_day_est AND A.placement_id=B.placement_id AND A.adunit_id=B.adunit_id
LEFT JOIN
(
SELECT day_est,placement_id,adunit_id,SUM(adserving_cost) as ads_cost, SUM(estimated_spend) as ads_spend,sum(publisher_clicks) as pub_clicks,sum(publisher_impressions) as pub_imps,sum(publisher_spend) as pub_spend
FROM tbl_ads
GROUP BY 1,2,3 ) as C on A.day_est=C.day_est AND A.placement_id=C.placement_id AND A.adunit_id=C.adunit_id
LEFT JOIN
(
SELECT placement_id,adunit_id,external_id
FROM map_external_id
) as D on A.placement_id=D.placement_id AND A.adunit_id=D.adunit_id
INTO OUTFILE '/tmp/weekly_report.csv';
EXPLAIN Results:
+----+-------------+--------------------+-------+---------------+---------+---------+------+--------+----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------------+-------+---------------+---------+---------+------+--------+----------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 136518 | |
| 1 | PRIMARY | <derived3> | ALL | NULL | NULL | NULL | NULL | 5180 | |
| 1 | PRIMARY | <derived4> | ALL | NULL | NULL | NULL | NULL | 198190 | |
| 1 | PRIMARY | <derived5> | ALL | NULL | NULL | NULL | NULL | 23766 | |
| 5 | DERIVED | map_external_id | index | NULL | PRIMARY | 55 | NULL | 20797 | Using index |
| 4 | DERIVED | tbl_ads | index | NULL | PIndex | 13 | NULL | 318400 | |
| 3 | DERIVED | tbl_ads | ALL | NULL | NULL | NULL | NULL | 318400 | Using filesort |
| 2 | DERIVED | tbl_ads | index | NULL | PIndex | 13 | NULL | 318400 | Using where |
+----+-------------+--------------------+-------+---------------+---------+---------+------+--------+----------------+