amazon emr - Vectorizing a solr index with mahout using lucene.vector -
i'm trying run clustering job on amazon emr using mahout. have solr index uploaded on s3 , want vectorize using mahouts lucene.vector.(this first step in job flow)
the parameters step are:
- jar: s3n://mahout-bucket/jars/mahout-core-0.6-job.jar
- mainclass: org.apache.mahout.driver.mahoutdriver
- args: lucene.vector --dir s3n://mahout-input/solr_index/ --field name --dictout /test/solr-dict-out/dict.txt --output /test/solr-vectors-out/vectors
the error in log is:
unknown program 'lucene.vector' chosen.
i've done same process locally hadoop , mahout , worked fine. how should call lucene.vector function on emr?
program name, lucene.vector should after bin/mahout
/homes/cuneyt/trunk/bin/mahout lucene.vector --dir /homes/cuneyt/lucene/index --field 0 --output lda/vector --dictout /homes/cuneyt/lda/dict.txt
Comments
Post a Comment