Grouping by column A but comparing column B

Over the past few hours, I have been at a standstill, and at this point I think I need help ...

I need to compare several groups from the same table and determine where the items listed in col B. are located. For example: -

Col A...............Col B
John................Apple
John................Orange
John................Banana
Mary................Orange
Mary................Strawberry
David...............Apple
David...............Orange
David...............Banana

I want John and David to come back because their elements in col B match. Hope this makes sense! Thanks in advance! G

+5
source share
3 answers

Here's the SQL Fiddle for this solution so you can play with it yourself.

 select A.ColA Person1, B.ColA Person2
    from (select ColA, count(ColB) CountBs
          from tbl
          group by ColA) G1
    join (select ColA, count(ColB) CountBs
          from tbl
          group by ColA) G2 on G1.ColA < G2.ColA
                           and G1.CountBs = G2.CountBs
    join tbl A on A.ColA = G1.ColA
    join tbl B on B.ColA = G2.ColA and A.ColB = B.ColB
group by A.ColA, B.ColA, G1.CountBs
having count(distinct A.ColB) = G1.CountBs

-- subqueries G1 and G2 are the same and count the expected colB per colA
-- G1 and G2 are joined together to get the candidate matches
--    of ColA with the same number of ColB's
-- we then use G1 and G2 to join into tbl, and further join
--    between A and B where the ColB match
-- finally, we count the matches between A and B and make sure the counts match
--    the expected count of B for the pairing
+6
source

, b, ( , , 2 ?):

SELECT tableName.ColA, tableName.ColB
FROM (SELECT ColB
    FROM tableName
    GROUP BY ColB
    HAVING COUNT(1) > 1) fruits
INNER JOIN tableName ON fruits.ColB = tableName.ColB
ORDER BY tableName.ColB, tableName.ColA
0

ColA1 corresponds to ColA2 if:
Count (ColA1) = Count (ColA2) = Count (ColA1 x ColA2)

This approach attempts to optimize the query speed.

Materialize the original invoice as it is used more than once and can declare PK.
(CTE is just syntax and evaluated)

Where RA.rawcount = RB.rawcount only allows you to evaluate the connection if the counts are equal. And the query plan shows that it is executed first.

create table #rawcount
(ColA varchar(50) not null primary key, rawcount int  not null)  
insert into #rawcount
select   [ColA], COUNT(*) as  [rawCount]
from     [tbl]
group by [ColA]
order by [ColA]

select a.ColA as ColA1, b.ColA as ColA2, COUNT(*) [matchcount]
from tbl A
join tbl B
 on  a.ColB = b.ColB 
 and a.ColA < b.ColA
join #rawcount RA 
 on  RA.ColA = A.ColA
join #rawcount RB 
 on  RB.ColA = B.ColA
where RA.rawcount = RB.rawcount  -- only evaluate if count same
group by a.ColA, b.ColA, RA.rawcount
having COUNT(*) = RA.rawcount 
0
source

All Articles