I use the LDA algorithm to group many documents on different topics. The LDA algorithm requires an input parameter: the number of topics. How can I determine this?
I use Reuter enclosures to compare my solution. And Reuter cases have theme numbers. Should I enter the same subject number when clustering Reuter text? And comparing my clustering result with Reuter's?
But when in production, how can I find out the number of topics before I actually cluster based on topics. This looks like a chicken egg problem.
source
share