Big Machine Learning from Web Data

If I wanted to make large amounts of data using matrices that were too large to fit in memory, what tools / libraries could I learn? In particular, if I was working with data from a website, usually using php + mysql, how would you suggest making a standalone process that could run large matrix operations in a reasonable amount of time?

Possible answers may be similar to "you must use this language with this distributed matrix algorithm to display abbreviations on many machines." I believe that php is not the best language for this, so the stream will be more like another standalone process that reads data from a database, learns and saves the rules in a format that php can use later (since other parts of the site are built in php) .

Not sure if this is the right place to ask about it (would have asked him in a machine learning SE, but he never left the beta).

+3
source share
2 answers

, , . - - Map/Reduce , , Apache Mahout. ,

  • K-, Fuzzy K-Means
  • .

, , , - , Weka, / .

, .

+3

Machine Learning - (, ). , , , , , , , - , . , (.. , ), , , ). , , , / .

, , ( TF-IDF, , , chi2), ( ) - . liblinear vowpal wabbit .

+2

All Articles