So, I looked through a ton of articles and forums before publishing them, but I continue to read conflicting answers. Firstly, the OS is not a problem, I can use either Windows or Unix, whatever is better for my problem. I have a ton of data that I need to use for read-only purposes (I don’t know why this matters, but if so, the data structure that I am going to go through is an array of arrays, hash arrays, whose values are also are arrays). I essentially compare the “query” with tons of different “sentences” and compute their relative similarities. Of these values (several million) I want to take the top x% and do something with them. I need to parallelize this process. There simply isn’t a good way to reduce space - I need to compare everything,to get good results, and it takes too long with some kind of thread / branching. Again, I have seen many conflicting answers and do not know what to do.
Any help would be greatly appreciated. Thanks in advance.
EDIT: I don't think that memory usage will be a problem, but I don't know (8 GB of RAM)
source
share