Submission and good similarity between tweets to define a topic

I plan to write a tool to identify topics on Twitter. I thought about the good measure of the similarities between the two tweets and how to represent them, given:

  • #hashtags (I think hashtags are very important when detecting topics on Twitter).
  • Answers (if someone answers tweets, these tweets can talk about the same topic, although two people can start talking about the samsung galaxy and end the conversation about iphone jailbreaking, etc.).

I’m thinking about implementing what I have and doing some experiments. I will apply classical models (for example, TF*IDFand use Euclidean distance, angular cosine, etc.) And Boolean models with several similarity measures (Hamming, Jacquard, etc.).

Any ideas on how to adapt your existing model to Twitter or a few ideas on how to create a new one?

+5
source share
1 answer

Twitter affinity tags discuss some details about the various affinity methods you can use to cluster Twitter data. We did some research on twitter clustering users based on user connections, user mentions, geolocation, content similarities between tweets, content similarities between user descriptions and general #hashtags.

twitter, , , , , . .

+5

All Articles