I plan to write a tool to identify topics on Twitter. I thought about the good measure of the similarities between the two tweets and how to represent them, given:
#hashtags (I think hashtags are very important when detecting topics on Twitter).- Answers (if someone answers tweets, these tweets can talk about the same topic, although two people can start talking about the samsung galaxy and end the conversation about iphone jailbreaking, etc.).
I’m thinking about implementing what I have and doing some experiments. I will apply classical models (for example, TF*IDFand use Euclidean distance, angular cosine, etc.) And Boolean models with several similarity measures (Hamming, Jacquard, etc.).
Any ideas on how to adapt your existing model to Twitter or a few ideas on how to create a new one?
source
share