Mahout For You
Wednesday, December 5, 2012
Tuesday, November 27, 2012
Clustering Commands
Clustering :
cd mahout0.6
Sequencial File Generation
bin/mahout seqdirectory -i /home/Textfiles/ -o /home/SequenceFiles/ -c UTF-8 -chunk 64
Term Vector Creation.
bin/mahout seq2sparse -i /home/SequenceFiles/ -o /home/SequenceFiles-sparse --maxDFPercent 85 --namedVector --minDF 15
K means Clustering
bin/mahout kmeans -i /home/SequenceFiles-sparse/tfidf-vectors/ -c /home/kmeans-clusters -o /home/kmeans -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 -k 10 -ow --clustering
ClusterDumper
bin/mahout clusterdump -s hdfs://<<host name>>:9000/home/kmeans/clusters-2-final/ -d hdfs://<<host name>>:9000/home/SequenceFiles-sparse/dictionary.file-0 -dt sequencefile -b 100 -n 100
cd mahout0.6
Sequencial File Generation
bin/mahout seqdirectory -i /home/Textfiles/ -o /home/SequenceFiles/ -c UTF-8 -chunk 64
Term Vector Creation.
bin/mahout seq2sparse -i /home/SequenceFiles/ -o /home/SequenceFiles-sparse --maxDFPercent 85 --namedVector --minDF 15
K means Clustering
bin/mahout kmeans -i /home/SequenceFiles-sparse/tfidf-vectors/ -c /home/kmeans-clusters -o /home/kmeans -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 -k 10 -ow --clustering
ClusterDumper
bin/mahout clusterdump -s hdfs://<<host name>>:9000/home/kmeans/clusters-2-final/ -d hdfs://<<host name>>:9000/home/SequenceFiles-sparse/dictionary.file-0 -dt sequencefile -b 100 -n 100
Basic of Mahout
Mahout is basically Machine Learning algorithms which solve 3 major problems.
- Recommendation.
- Clustering
- Classification
Recommendation will recommend a similar taste of items where the user is really interested in . Basically Recommendation is done by based on the user activity based on history. In general there are 3 types of recommendation
- User
Based
- Item
Based
- Content
Based
Lets take real time example as amazon book purchase. when a user purchase any books in amazon, Amazon guys are recommending some more items along with that which are similar to the user taste
Item Based Recommendation:
Real time Example is Facebook recommends a friends for you. If you noticed the friends which they are recommending with be some what known the user.
Clustering :
Clustering
is a process of grouping the text documents into groups of topically
related documents.Clustering done based on TF-IDF
- K-Means
- Mean Shifting
- Fuzzy K-Means
Classification
:
Classification
learns from exisiting categorized documents what documents of a
specific category look like and is able to assign unlabelled
documents to the (hopefully) correct category.
Above
all the Methods are readily available . But our main work is to
preparing the dataset in a proper way in which it can produce the efficient result.
Mahout Installation Guide
Mahout Installation Guide :
Mahout installation is pretty much easy once you found Hadoop is working fine . Setting up mahout will become so easy task .
Step 1: Check whether the Hadoop is Working fine .
Step 2: Check Whether JAVA_HOME is Set properly(echo $JAVA_HOME)
Step 3:Check Whether HADOOP_HOME and HADOOP_CONF_DIR is set
Mahout installation is pretty much easy once you found Hadoop is working fine . Setting up mahout will become so easy task .
Step 1: Check whether the Hadoop is Working fine .
Step 2: Check Whether JAVA_HOME is Set properly(echo $JAVA_HOME)
Step 3:Check Whether HADOOP_HOME and HADOOP_CONF_DIR is set
Subscribe to:
Posts (Atom)