Tuesday, November 27, 2012

Clustering Commands

Clustering :
cd mahout0.6



Sequencial File Generation

bin/mahout seqdirectory -i /home/Textfiles/ -o /home/SequenceFiles/ -c UTF-8 -chunk 64

 Term Vector Creation.

bin/mahout seq2sparse -i /home/SequenceFiles/ -o /home/SequenceFiles-sparse --maxDFPercent 85 --namedVector --minDF 15


K means Clustering

bin/mahout kmeans -i /home/SequenceFiles-sparse/tfidf-vectors/ -c /home/kmeans-clusters -o /home/kmeans -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 -k 10 -ow --clustering


ClusterDumper

bin/mahout clusterdump -s hdfs://<<host name>>:9000/home/kmeans/clusters-2-final/ -d hdfs://<<host name>>:9000/home/SequenceFiles-sparse/dictionary.file-0 -dt sequencefile -b 100 -n 100

No comments: