An example Hadoop Install - PowerPoint PPT Presentation

About This Presentation
Title:

An example Hadoop Install

Description:

A practical example of how Hadoop can be installed and a cluster created using low cost hardware. – PowerPoint PPT presentation

Number of Views:378
Slides: 20
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: An example Hadoop Install


1
Apache Hadoop Install Example
  • Using Ubuntu 12.04
  • Java 1.6
  • Hadoop 1.2.0
  • Static DNS
  • 3 Machine Cluster

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
2
Install Step 1
  • Install Ubuntu Linux 12.04 on each machine
  • Assign a host name and static IP address to each
    machine
  • Names used here
  • hc1nn ( hadoop cluster 1 name node )?
  • hc1r1m1 ( hadoop cluster 1 rack 1 machine 1 )
  • hc1r1m2 ( hadoop cluster 1 rack 1 machine 2 )
  • Install ssh daemon on each server
  • Install vsftpd ( ftp ) deamon on each server
  • Update /etc/host with all hostnames on each
    server

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
3
Install Step 2
  • Generate ssh keys for each server under hadoop
    user
  • Copy keys to all server's hadoop account
  • Install java 1.6 ( we used openjdk )?
  • Obtain the Hadoop software from
  • hadoop.apache.org
  • Unpack Hadoop software to /usr/local
  • Now consider cluster architecture

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
4
Install Step 3
  • Start will three single installs
  • For Hadoop
  • Then cluster the
  • Hadoop machines

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
5
Install Step 4
  • Ensure auto shh
  • From name node (hc1nn) to both data nodes
  • From each machine to itself
  • Create symbolic link
  • Named hadoop
  • Pointing to /usr/local/hadoop-1.2.0
  • Set up Bash .bashrc on each machine hadoop user
    set
  • HADOOP_HOME
  • JAVA_HOME

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
6
Install Step 5
  • Create Hadoop tmp dir on all servers
  • sudo mkdir -p /app/hadoop/tmp
  • sudo chown hadoophadoop /app/hadoop/tmp
  • sudo chmod 750 /app/hadoop/tmp
  • Set Up conf/core-site.xml
  • ( on all servers )

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
7
Install Step 5
  • ltpropertygt
  • ltnamegthadoop.tmp.dirlt/namegt
  • ltvaluegt/app/hadoop/tmplt/valuegt
  • ltdescriptiongtA base for other temporary
    directories.lt/descriptiongt
  • lt/propertygt
  • ltpropertygt
  • ltnamegtfs.default.namelt/namegt
  • ltvaluegthdfs//localhost54310lt/valuegt
  • ltdescriptiongtThe name of the default file
    system. A URI whose
  • scheme and authority determine the FileSystem
    implementation. The
  • uri's scheme determines the config property
    (fs.SCHEME.impl) naming
  • the FileSystem implementation class. The uri's
    authority is used to
  • determine the host, port, etc. for a
    filesystem.lt/descriptiongt
  • lt/propertygt

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
8
Install Step 6
  • Set Up conf/mapred-site.xml
  • ( on all servers )?
  • ltpropertygt
  • ltnamegtmapred.job.trackerlt/namegt
  • ltvaluegtlocalhost54311lt/valuegt
  • ltdescriptiongtThe host and port that the
    MapReduce job tracker runs
  • at. If "local", then jobs are run in-process
    as a single map
  • and reduce task.
  • lt/descriptiongt
  • lt/propertygt

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
9
Install Step 7
  • Set Up conf/hdfs-site.xml
  • ( on all servers )?
  • ltpropertygt
  • ltnamegtdfs.replicationlt/namegt
  • ltvaluegt1lt/valuegt
  • ltdescriptiongtDefault block replication.
  • The actual number of replications can be
    specified when the file is created.
  • The default is used if replication is not
    specified in create time.
  • lt/descriptiongt
  • lt/propertygt

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
10
Install Step 8
  • Format the Hadoop file system ( on all servers
    )?
  • hadoop namenode -format
  • Dont do this on a running HDFS you will lose all
    data !!
  • Now start Hadoop ( on all servers )?
  • HADOOP_HOME/bin/start-all.sh
  • Check Hadoop is running with
  • sudo netstat -plten grep java
  • you should see ports like 54310 and 54311 being
    used.
  • All Good ? Stop Hadoop on all servers
  • HADOOP_HOME/bin/stop-all.sh

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
11
Install Step 9
  • Now set up the cluster do on all servers
  • Set HADOOP_HOME/conf/masters file to contain
  • hc1nn
  • Set HADOOP_HOME/conf/slaves file to contain
  • hc1r1m1
  • hc1r1m2
  • hc1nn
  • We will be using the name node as a data node as
    well

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
12
Install Step 10
  • on all machines
  • Change conf/core-site.xml on all machines
  • fs.default.name hdfs//hc1nn54310
  • Change conf/mapred-site.xml
  • mapred.job.tracker hc1nn54311
  • Change conf/hdfs-site.xml
  • dfs.replication 3

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
13
Install Step 11
  • Now reformat the HDFS on hc1nn
  • hadoop namenode -format
  • On name node start HDFS
  • HADOOP_HOME/bin/start-dfs.sh
  • On name node start Map Reduce
  • HADOOP_HOME/bin/start-mapred.sh

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
14
Install Step 12
  • Run a test Map Reduce job
  • I have data in /tmp/gutenberg
  • Load Data into HDFS
  • hadoop dfs -copyFromLocal /tmp/gutenberg
    /usr/hadoop/gutenberg
  • List Data in HDFS
  • hadoop dfs -ls /usr/hadoop/gutenberg
  • Found 18 items
  • -rw-r--r-- 3 hadoop supergroup 674389
    2013-07-30 1931 /usr/hadoop/gutenberg/pg20417.txt
  • -rw-r--r-- 3 hadoop supergroup 674389
    2013-07-30 1931 /usr/hadoop/gutenberg/pg20417.txt
    1
  • ...............
  • -rw-r--r-- 3 hadoop supergroup 834980
    2013-07-30 1931 /usr/hadoop/gutenberg/pg5000.txt4
  • -rw-r--r-- 3 hadoop supergroup 834980
    2013-07-30 1931 /usr/hadoop/gutenberg/pg5000.txt5

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
15
Install Step 13
  • Run the Map Reduce job
  • cd HADOOP_HOME
  • hadoop jar hadoopexamples.jar wordcount
    /usr/hduser/gutenberg /usr/hduser/gutenberg-output
  • Check the output
  • 13/07/30 193413 INFO input.FileInputFormat
    Total input paths to process 18
  • 13/07/30 193413 INFO util.NativeCodeLoader
    Loaded the native-hadoop library
  • 13/07/30 193414 INFO mapred.JobClient Running
    job job_201307301931_0001
  • 13/07/30 193415 INFO mapred.JobClient map 0
    reduce 0
  • 13/07/30 193426 INFO mapred.JobClient map 11
    reduce 0
  • 13/07/30 193434 INFO mapred.JobClient map 16
    reduce 0
  • 13/07/30 193435 INFO mapred.JobClient map 22
    reduce 0
  • 13/07/30 193442 INFO mapred.JobClient map 33
    reduce 0
  • 13/07/30 193443 INFO mapred.JobClient map 33
    reduce 7
  • 13/07/30 193448 INFO mapred.JobClient map 44
    reduce 7
  • 13/07/30 193452 INFO mapred.JobClient map 44
    reduce 14
  • 13/07/30 193454 INFO mapred.JobClient map 55
    reduce 14
  • 13/07/30 193501 INFO mapred.JobClient map 66
    reduce 14
  • 13/07/30 193502 INFO mapred.JobClient map 66
    reduce 18

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
16
Install Step 13
  • 13/07/30 193517 INFO mapred.JobClient map 88
    reduce 29
  • 13/07/30 193518 INFO mapred.JobClient map
    100 reduce 29
  • 13/07/30 193523 INFO mapred.JobClient map
    100 reduce 33
  • 13/07/30 193527 INFO mapred.JobClient map
    100 reduce 100
  • 13/07/30 193528 INFO mapred.JobClient Job
    complete job_201307301931_0001
  • 13/07/30 193528 INFO mapred.JobClient
    Counters 29
  • 13/07/30 193528 INFO mapred.JobClient Job
    Counters
  • 13/07/30 193528 INFO mapred.JobClient
    Launched reduce tasks1
  • 13/07/30 193528 INFO mapred.JobClient
    SLOTS_MILLIS_MAPS119572
  • 13/07/30 193528 INFO mapred.JobClient
    Total time spent by all reduces waiting after
    reserving slots (ms)0
  • 13/07/30 193528 INFO mapred.JobClient
    Total time spent by all maps waiting after
    reserving slots (ms)0
  • 13/07/30 193528 INFO mapred.JobClient
    Launched map tasks18
  • 13/07/30 193528 INFO mapred.JobClient
    Data-local map tasks18
  • 13/07/30 193528 INFO mapred.JobClient
    SLOTS_MILLIS_REDUCES61226
  • 13/07/30 193528 INFO mapred.JobClient File
    Output Format Counters
  • 13/07/30 193528 INFO mapred.JobClient
    Bytes Written725257
  • 13/07/30 193528 INFO mapred.JobClient
    FileSystemCounters
  • 13/07/30 193528 INFO mapred.JobClient
    FILE_BYTES_READ6977160
  • 13/07/30 193528 INFO mapred.JobClient
    HDFS_BYTES_READ17600721

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
17
Install Step 14
  • Check the job output
  • hadoop dfs -ls /usr/hadoop/gutenberg-output
  • Found 3 items
  • -rw-r--r-- 3 hadoop supergroup 0
    2013-07-30 1935 /usr/hadoop/gutenberg-output/_SUC
    CESS
  • drwxr-xr-x - hadoop supergroup 0
    2013-07-30 1934 /usr/hadoop/gutenberg-output/_log
    s
  • -rw-r--r-- 3 hadoop supergroup 725257
    2013-07-30 1935 /usr/hadoop/gutenberg-output/part
    -r-00000
  • Now get results out of HDFS
  • hadoop dfs -cat /usr/hadoop/gutenberg-output/part-
    r-00000 gt /tmp/hrun/cluster_run.txt
  • head -10 /tmp/hrun/cluster_run.txt
  • "(Lo)cra" 6
  • "1490 6
  • "1498," 6
  • "35" 6
  • "40," 6
  • "A 12
  • "AS-IS". 6

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
18
Install Step 15
  • Congratulations you now have
  • A working HDFS cluster
  • With three data nodes
  • One name node
  • Tested via a Map Reduce job
  • Detailed install instructions available from our
    site shop

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
19
Contact Us
  • Feel free to contact us at
  • www.semtech-solutions.co.nz
  • info_at_semtech-solutions.co.nz
  • We offer IT project consultancy
  • We are happy to hear about your problems
  • You can just pay for those hours that you need
  • To solve your problems
Write a Comment
User Comments (0)
About PowerShow.com