We run Solr on an Amazon Web Services EC2 instance with a 1TB EBS volume to store the index so we can easily run additional servers with the same index (read-only). However, our index will soon exceed 1 TB, and I really don't want to deal with interleaving multiple EBS volumes to store the index. In addition, index recovery is very slow. I would like to move index generation - and possibly hosting - to Hadoop, and preferably to Amazon Elastic MapReduce, although I can set up separate Hadoop servers if necessary. We use RightScale, so our ServerTemplates library is available to us.
What would be the best place to get started with Lucene / Solr on Hadoop?
source
share