An example Hadoop Install

About This Presentation

Title:

An example Hadoop Install

Description:

A practical example of how Hadoop can be installed and a cluster created using low cost hardware. – PowerPoint PPT presentation

Number of Views:378

Slides: 20

Provided by: semtechs

Category: Medicine, Science & Technology

more less

Transcript and Presenter's Notes

Title: An example Hadoop Install

1
Apache Hadoop Install Example

Using Ubuntu 12.04
Java 1.6
Hadoop 1.2.0
Static DNS
3 Machine Cluster

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
2
Install Step 1

Install Ubuntu Linux 12.04 on each machine
Assign a host name and static IP address to each
machine
Names used here
hc1nn ( hadoop cluster 1 name node )?
hc1r1m1 ( hadoop cluster 1 rack 1 machine 1 )
hc1r1m2 ( hadoop cluster 1 rack 1 machine 2 )
Install ssh daemon on each server
Install vsftpd ( ftp ) deamon on each server
Update /etc/host with all hostnames on each
server

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
3
Install Step 2

Generate ssh keys for each server under hadoop
user
Copy keys to all server's hadoop account
Install java 1.6 ( we used openjdk )?
Obtain the Hadoop software from
hadoop.apache.org
Unpack Hadoop software to /usr/local
Now consider cluster architecture

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
4
Install Step 3

Start will three single installs
For Hadoop
Then cluster the
Hadoop machines

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
5
Install Step 4

Ensure auto shh
From name node (hc1nn) to both data nodes
From each machine to itself
Create symbolic link
Named hadoop
Pointing to /usr/local/hadoop-1.2.0
Set up Bash .bashrc on each machine hadoop user
set
HADOOP_HOME
JAVA_HOME

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
6
Install Step 5

Create Hadoop tmp dir on all servers
sudo mkdir -p /app/hadoop/tmp
sudo chown hadoophadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp
Set Up conf/core-site.xml
( on all servers )

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
7
Install Step 5

ltpropertygt
ltnamegthadoop.tmp.dirlt/namegt
ltvaluegt/app/hadoop/tmplt/valuegt
ltdescriptiongtA base for other temporary
directories.lt/descriptiongt
lt/propertygt
ltpropertygt
ltnamegtfs.default.namelt/namegt
ltvaluegthdfs//localhost54310lt/valuegt
ltdescriptiongtThe name of the default file
system. A URI whose
scheme and authority determine the FileSystem
implementation. The
uri's scheme determines the config property
(fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's
authority is used to
determine the host, port, etc. for a
filesystem.lt/descriptiongt
lt/propertygt

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
8
Install Step 6

Set Up conf/mapred-site.xml
( on all servers )?
ltpropertygt
ltnamegtmapred.job.trackerlt/namegt
ltvaluegtlocalhost54311lt/valuegt
ltdescriptiongtThe host and port that the
MapReduce job tracker runs
at. If "local", then jobs are run in-process
as a single map
and reduce task.
lt/descriptiongt
lt/propertygt

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
9
Install Step 7

Set Up conf/hdfs-site.xml
( on all servers )?
ltpropertygt
ltnamegtdfs.replicationlt/namegt
ltvaluegt1lt/valuegt
ltdescriptiongtDefault block replication.
The actual number of replications can be
specified when the file is created.
The default is used if replication is not
specified in create time.
lt/descriptiongt
lt/propertygt

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
10
Install Step 8

Format the Hadoop file system ( on all servers
)?
hadoop namenode -format
Dont do this on a running HDFS you will lose all
data !!
Now start Hadoop ( on all servers )?
HADOOP_HOME/bin/start-all.sh
Check Hadoop is running with
sudo netstat -plten grep java
you should see ports like 54310 and 54311 being
used.
All Good ? Stop Hadoop on all servers
HADOOP_HOME/bin/stop-all.sh

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
11
Install Step 9

Now set up the cluster do on all servers
Set HADOOP_HOME/conf/masters file to contain
hc1nn
Set HADOOP_HOME/conf/slaves file to contain
hc1r1m1
hc1r1m2
hc1nn
We will be using the name node as a data node as
well

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
12
Install Step 10

on all machines
Change conf/core-site.xml on all machines
fs.default.name hdfs//hc1nn54310
Change conf/mapred-site.xml
mapred.job.tracker hc1nn54311
Change conf/hdfs-site.xml
dfs.replication 3

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
13
Install Step 11

Now reformat the HDFS on hc1nn
hadoop namenode -format
On name node start HDFS
HADOOP_HOME/bin/start-dfs.sh
On name node start Map Reduce
HADOOP_HOME/bin/start-mapred.sh

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
14
Install Step 12

Run a test Map Reduce job
I have data in /tmp/gutenberg
Load Data into HDFS
hadoop dfs -copyFromLocal /tmp/gutenberg
/usr/hadoop/gutenberg
List Data in HDFS
hadoop dfs -ls /usr/hadoop/gutenberg
Found 18 items
-rw-r--r-- 3 hadoop supergroup 674389
2013-07-30 1931 /usr/hadoop/gutenberg/pg20417.txt
-rw-r--r-- 3 hadoop supergroup 674389
2013-07-30 1931 /usr/hadoop/gutenberg/pg20417.txt
1
...............
-rw-r--r-- 3 hadoop supergroup 834980
2013-07-30 1931 /usr/hadoop/gutenberg/pg5000.txt4
-rw-r--r-- 3 hadoop supergroup 834980
2013-07-30 1931 /usr/hadoop/gutenberg/pg5000.txt5

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
15
Install Step 13

Run the Map Reduce job
cd HADOOP_HOME
hadoop jar hadoopexamples.jar wordcount
/usr/hduser/gutenberg /usr/hduser/gutenberg-output
Check the output
13/07/30 193413 INFO input.FileInputFormat
Total input paths to process 18
13/07/30 193413 INFO util.NativeCodeLoader
Loaded the native-hadoop library
13/07/30 193414 INFO mapred.JobClient Running
job job_201307301931_0001
13/07/30 193415 INFO mapred.JobClient map 0
reduce 0
13/07/30 193426 INFO mapred.JobClient map 11
reduce 0
13/07/30 193434 INFO mapred.JobClient map 16
reduce 0
13/07/30 193435 INFO mapred.JobClient map 22
reduce 0
13/07/30 193442 INFO mapred.JobClient map 33
reduce 0
13/07/30 193443 INFO mapred.JobClient map 33
reduce 7
13/07/30 193448 INFO mapred.JobClient map 44
reduce 7
13/07/30 193452 INFO mapred.JobClient map 44
reduce 14
13/07/30 193454 INFO mapred.JobClient map 55
reduce 14
13/07/30 193501 INFO mapred.JobClient map 66
reduce 14
13/07/30 193502 INFO mapred.JobClient map 66
reduce 18

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
16
Install Step 13

13/07/30 193517 INFO mapred.JobClient map 88
reduce 29
13/07/30 193518 INFO mapred.JobClient map
100 reduce 29
13/07/30 193523 INFO mapred.JobClient map
100 reduce 33
13/07/30 193527 INFO mapred.JobClient map
100 reduce 100
13/07/30 193528 INFO mapred.JobClient Job
complete job_201307301931_0001
13/07/30 193528 INFO mapred.JobClient
Counters 29
13/07/30 193528 INFO mapred.JobClient Job
Counters
13/07/30 193528 INFO mapred.JobClient
Launched reduce tasks1
13/07/30 193528 INFO mapred.JobClient
SLOTS_MILLIS_MAPS119572
13/07/30 193528 INFO mapred.JobClient
Total time spent by all reduces waiting after
reserving slots (ms)0
13/07/30 193528 INFO mapred.JobClient
Total time spent by all maps waiting after
reserving slots (ms)0
13/07/30 193528 INFO mapred.JobClient
Launched map tasks18
13/07/30 193528 INFO mapred.JobClient
Data-local map tasks18
13/07/30 193528 INFO mapred.JobClient
SLOTS_MILLIS_REDUCES61226
13/07/30 193528 INFO mapred.JobClient File
Output Format Counters
13/07/30 193528 INFO mapred.JobClient
Bytes Written725257
13/07/30 193528 INFO mapred.JobClient
FileSystemCounters
13/07/30 193528 INFO mapred.JobClient
FILE_BYTES_READ6977160
13/07/30 193528 INFO mapred.JobClient
HDFS_BYTES_READ17600721

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
17
Install Step 14

Check the job output
hadoop dfs -ls /usr/hadoop/gutenberg-output
Found 3 items
-rw-r--r-- 3 hadoop supergroup 0
2013-07-30 1935 /usr/hadoop/gutenberg-output/_SUC
CESS
drwxr-xr-x - hadoop supergroup 0
2013-07-30 1934 /usr/hadoop/gutenberg-output/_log
s
-rw-r--r-- 3 hadoop supergroup 725257
2013-07-30 1935 /usr/hadoop/gutenberg-output/part
-r-00000
Now get results out of HDFS
hadoop dfs -cat /usr/hadoop/gutenberg-output/part-
r-00000 gt /tmp/hrun/cluster_run.txt
head -10 /tmp/hrun/cluster_run.txt
"(Lo)cra" 6
"1490 6
"1498," 6
"35" 6
"40," 6
"A 12
"AS-IS". 6

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
18
Install Step 15

Congratulations you now have
A working HDFS cluster
With three data nodes
One name node
Tested via a Map Reduce job
Detailed install instructions available from our
site shop

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
19
Contact Us

Feel free to contact us at
www.semtech-solutions.co.nz
info_at_semtech-solutions.co.nz
We offer IT project consultancy
We are happy to hear about your problems
You can just pay for those hours that you need
To solve your problems

Write a Comment

User Comments (0)

About PowerShow.com

An example Hadoop Install - PowerPoint PPT Presentation

An example Hadoop Install

A practical example of how Hadoop can be installed and a cluster created using low cost hardware. – PowerPoint PPT presentation