Skip to content

Eq5 - Dockerfile + README - Add basic support for MapReduce #6 #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion 2.6.0/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ ADD hdfs-site.xml $HADOOP_CONF_DIR/hdfs-site.xml
# 50475 = dfs.datanode.https.address (HTTPS / Secure UI)
# HDFS: Secondary NameNode (SNN)
# 50090 = dfs.secondary.http.address (HTTP / Checkpoint for NameNode metadata)
EXPOSE 9000 50070 50010 50020 50075 50090
# 50030 (HTTP for Job Tracker web UI)
EXPOSE 9000 50070 50010 50020 50075 50090 50030 9001

CMD ["hdfs"]
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,5 +93,63 @@ Each component provide its own web UI. Open you browser at one of the URLs below
| HDFS NameNode | [http://dockerhost:50070](http://dockerhost:50070) |
| HDFS DataNode | [http://dockerhost:50075](http://dockerhost:50075) |
| HDFS Secondary NameNode | [http://dockerhost:50090](http://dockerhost:50090) |
##Running MapReduce example

The General workflow for MapRecuce is:'Input—>Map—>Reduce->Output'.

Below are steps to implement the workflow.

###1) Configuration

Ammend the following configuration to mapred-site.xml:

Set the property mapred.job.tracker to hdfs-namenode:9001.
Remove the property mapreduce.framework.name


###2) Input Data

###2.1) create directory for the input file in HDFS

hadoop fs -mkdir /WordCount
hadoop fs -mkdir /WordCount/Input

###2.2) Prepare the input file

mkdir ~/hdp-ex/
cd ~/hdp-ex/

touch in.txt

In this example we are using the following words:

hello world hello docker hello hadoop hello mapreduce h

###2.3) copy the input file to HDFS for processing by map reduce

hadoop fs -copyFromLocal ~/hdp-ex/in.txt hdfs://hdfs-namenode:9000/WordCount/Input

###3) run the mapreduce, word count

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /WordCount/Input/in.txt /WordCount/Output/

###4) output: check the output

hadoop fs -ls /WordCount/Output/

Found 2 items
-rw-r--r-- 2 root supergroup 0 2015-09-27 21:00 /WordCount/Output/_SUCCESS
-rw-r--r-- 2 root supergroup 50 2015-09-27 21:00 /WordCount/Output/part-r-00000

Read the output file:

hadoop fs -cat /WordCount/Output/part-r-00000

------------
docker 1
h 1
hadoop 1
hello 4
mapreduce 1
world 1