Title: Big Data and Hadoop Components
1Hadoop Components Architecture Big Data
Hadoop Training
2Understand how the hadoop ecosystem works to
master Apache Hadoop skills and gain in-depth
knowledge of big data ecosystem and hadoop
architecture. However, before you enroll for any
big data hadoop training course it is necessary
to get some basic idea on how the hadoop
ecosystem works. Learn about the various hadoop
components that constitute the Apache Hadoop
architecture in this presentation.
3Defining Architecture Components of the Big Data
Ecosystem
4(No Transcript)
5Core Hadoop Components
-
- Hadoop Common
- 2) Hadoop Distributed File System (HDFS)
- 3) MapReduce- Distributed Data Processing
Framework of Apache Hadoop - 4)YARN
-
- Read More in Detail about Hadoop Components -
https//www.dezyre.com/article/hadoop-components-a
nd-architecture-big-data-and-hadoop-training/114
6Data Access Components of Hadoop Ecosystem- Pig
and Hive
7Apache Pig
- ?Apache Pig is a convenient tool developed by
Yahoo for analysing huge data sets efficiently
and easily. It provides a high level data flow
language Pig Latin that is optimized, extensible
and easy to use.
8Apache Hive
- ? Hive developed by Facebook is a data warehouse
built on top of Hadoop and provides a simple
language known as HiveQL similar to SQL for
querying, data summarization and analysis. Hive
makes querying faster through indexing.
9Data Integration Components of Hadoop Ecosystem-
Sqoop and Flume
10Apache Sqoop
- Sqoop component is used for importing data from
external sources into related Hadoop components
like HDFS, HBase or Hive. It can also be used for
exporting data from Hadoop o other external
structured data stores.
11Flume
- ?Flume component is used to gather and aggregate
large amounts of data. Apache Flume is used for
collecting data from its origin and sending it
back to the resting location (HDFS).
12Data Storage Component of Hadoop Ecosystem HBase
13HBase
- HBase is a column-oriented database that uses
HDFS for underlying storage of data. HBase
supports random reads and also batch computations
using MapReduce. With HBase NoSQL database
enterprise can create large tables with millions
of rows and columns on hardware machine.
14Monitoring, Management and Orchestration
Components of Hadoop Ecosystem- Oozie and
Zookeeper
15Oozie
- Oozie is a workflow scheduler where the workflows
are expressed as Directed Acyclic Graphs. Oozie
runs in a Java servlet container Tomcat and makes
use of a database to store all the running
workflow instances, their states ad variables
along with the workflow definitions to manage
Hadoop jobs (MapReduce, Sqoop, Pig and Hive).
16Zookeeper
- Zookeeper is the king of coordination and
provides simple, fast, reliable and ordered
operational services for a Hadoop cluster.
Zookeeper is responsible for synchronization
service, distributed configuration service and
for providing a naming registry for distributed
systems. -
- To Know about other Hadoop Components -
https//www.dezyre.com/article/hadoop-components-a
nd-architecture-big-data-and-hadoop-training/114