Clustering :
cd mahout0.6
Sequencial File Generation
bin/mahout
seqdirectory -i /home/Textfiles/ -o /home/SequenceFiles/ -c UTF-8
-chunk 64
Term
Vector Creation.
bin/mahout
seq2sparse -i /home/SequenceFiles/ -o /home/SequenceFiles-sparse
--maxDFPercent 85 --namedVector --minDF 15
K
means Clustering
bin/mahout
kmeans -i /home/SequenceFiles-sparse/tfidf-vectors/ -c
/home/kmeans-clusters -o /home/kmeans -dm
org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 -k 10
-ow --clustering
ClusterDumper
bin/mahout
clusterdump -s hdfs://<<host
name>>:9000/home/kmeans/clusters-2-final/ -d hdfs://<<host
name>>:9000/home/SequenceFiles-sparse/dictionary.file-0 -dt
sequencefile -b 100 -n 100
No comments:
Post a Comment