Search for similarities between users in a telecommunication network

I have an anonymous table in which there are two columns: UserId and PhoneNumber.

It was selected from the table of call details records. Now I would like to create a network based on similarity between users. There should be a connection between users if they call at least 3 identical numbers.

There are over 20 million lines. When I use a simple program written in C #, it will take more than 4 days to complete this task. I wonder if it is possible to write an SQL query that will give me the same result, and if there is a similarity, just insert a row into a new table with two columns, user1 and user2, or just pass it to the output?

Perhaps there is another good solution for this task?

+3
source share
1 answer

Assuming your table is called a CallingList, you should be able to use this query:

SELECT C1.UserID AS User1, C2.UserID AS User2
  FROM CallingList AS C1
  JOIN CallingList AS C2 ON C1.PhoneNumber = C2.PhoneNumber
 WHERE C1.UserID < C2.UserID
 GROUP BY C1.UserID, C2.UserID
HAVING COUNT(*) >= 3

Is it left to see what will be faster than C #.

Make sure you have a pointer to the CallingList (PhoneNumber) if your optimizer does not automatically create it behind the scenes.

+2
source

All Articles