If I wanted to make large amounts of data using matrices that were too large to fit in memory, what tools / libraries could I learn? In particular, if I was working with data from a website, usually using php + mysql, how would you suggest making a standalone process that could run large matrix operations in a reasonable amount of time?
Possible answers may be similar to "you must use this language with this distributed matrix algorithm to display abbreviations on many machines." I believe that php is not the best language for this, so the stream will be more like another standalone process that reads data from a database, learns and saves the rules in a format that php can use later (since other parts of the site are built in php) .
Not sure if this is the right place to ask about it (would have asked him in a machine learning SE, but he never left the beta).
, , . - - Map/Reduce , , Apache Mahout. ,
, , , - , Weka, / .
, .
Machine Learning - (, ). , , , , , , , - , . , (.. , ), , , ). , , , / .
, , ( TF-IDF, , , chi2), ( ) - . liblinear vowpal wabbit .