How to compare the values ​​between the lines and find the average value of the answers?

I have a table of MySQL users answers to yes / no questions. Looks like that:

| user_id    | poll_id  | response  |
|------------|----------|-----------|
|    111     |    1     |   'yes'   |
|    111     |    2     |   'no'    |
|    111     |    3     |   'no'    |
|    222     |    1     |   'yes'   |
|    222     |    2     |   'yes'   |
|    222     |    3     |   'yes'   |
|    333     |    1     |   'no'    |
|    333     |    2     |   'no'    |
|    333     |    3     |   'no'    |

I would like to calculate the similarity between the responses of each user and the responses of other users. Thus, user 111 and user 222 are equal to 0.333 (since they have 1 of 3 identical answers), and user 111 and user 333 are equal to 0.666 (because they have 2 of 3 identical answers).

I wrote a query that will give me the number of identical answers for the two specified users:

SELECT  COUNT(*) AS same_count 
FROM    (
            SELECT  response 
            FROM    results 
            WHERE   user_id = 111
        ) AS t1
    ,   (
            SELECT  response 
            FROM    results 
            WHERE   user_id = 222
        ) AS t2 
WHERE   t1.response = t2.response

Now I'm trying to find a way to get this information for all users in order to get these results:

| user_1  |  user_2  |  same_count  |
|---------|----------|--------------|
|  111    |   222    |    0.333     |
|  111    |   333    |    0.666     |
|  222    |   111    |    0.333     |
|  222    |   333    |    0         |
|  333    |   111    |    0.666     |
|  333    |   222    |    0         |

Or, if possible, without redundant information:

| user_1  |  user_2  |  same_count  |
|---------|----------|--------------|
|  111    |   222    |    0.333     |
|  111    |   333    |    0.666     |
|  222    |   333    |    0         |

, MySQL-, PHP. - ?

+3
2

, * poll_id * * user_id *. , , , alias1 table user_id, , alias2 table user_id.

, SQL Fiddle.

Script:

CREATE TABLE poll
(
    user_id     INT         NOT NULL
  , poll_id     INT         NOT NULL
  , response    VARCHAR(10) NOT NULL  
);

INSERT INTO poll (user_id, poll_id, response) VALUES
   (111, 1, 'yes'),
   (111, 2, 'no'),
   (111, 3, 'no'),
   (222, 1, 'yes'),
   (222, 2, 'yes'),
   (222, 3, 'yes'),
   (333, 1, 'no'),
   (333, 2, 'no'),
   (333, 3, 'no');

SELECT      p1.user_id AS user_1
        ,   p2.user_id AS user_2, 
            AVG(CASE 
                    WHEN p1.response = p2.response THEN 1 
                    ELSE 0 
                END) Average_Response
FROM        poll p1
,           poll p2 
WHERE       p1.poll_id = p2.poll_id 
AND         p1.user_id < p2.user_id
GROUP BY    p1.user_id
        ,   p2.user_id;

:

USER_1 USER_2 AVERAGE_RESPONSE
------ ------ ----------------
111     222      0.3333
111     333      0.6667
222     333      0
+3

:

SELECT
  t1.user_id AS user_1,
  t2.user_id AS user_2,
  SUM(CASE WHEN t1.response = t2.response THEN 1 ELSE 0 END) / COUNT(1)
    AS same_count
FROM t t1
JOIN t t2 ON ( t2.user_id > t1.user_id AND t2.poll_id = t1.poll_id )
GROUP BY t1.user_id, t2.user_id
ORDER BY user_1, user_2

:

111 222 0.333333333333333
111 333 0.666666666666667
222 333 0

CASE MySQL (t1.response = t2.response), .
, , , .

t2.user_id > t1.user_id (111 - 222, 222 - 111).

+1

All Articles