amazon emr - Vectorizing a solr index with mahout using lucene.vector -

January 15, 2012

i'm trying run clustering job on amazon emr using mahout. have solr index uploaded on s3 , want vectorize using mahouts lucene.vector.(this first step in job flow)

the parameters step are:

jar: s3n://mahout-bucket/jars/mahout-core-0.6-job.jar
mainclass: org.apache.mahout.driver.mahoutdriver
args: lucene.vector --dir s3n://mahout-input/solr_index/ --field name --dictout /test/solr-dict-out/dict.txt --output /test/solr-vectors-out/vectors

the error in log is:

unknown program 'lucene.vector' chosen.

i've done same process locally hadoop , mahout , worked fine. how should call lucene.vector function on emr?

program name, lucene.vector should after bin/mahout

/homes/cuneyt/trunk/bin/mahout lucene.vector --dir /homes/cuneyt/lucene/index --field 0 --output lda/vector --dictout /homes/cuneyt/lda/dict.txt

Search This Blog

Funaction

amazon emr - Vectorizing a solr index with mahout using lucene.vector -

Comments

Post a Comment

Popular posts from this blog

java - Play! framework 2.0: How to display multiple image? -

gmail - Is there any documentation for read-only access to the Google Contacts API? -

php - Controller/JToolBar not working in Joomla 2.5 -