Big Data and Hadoop Components

About This Presentation

Title:

Big Data and Hadoop Components

Description:

Explore the various hadoop components that constitute the overall Hadoop Ecosystem and Hadoop Architecture. – PowerPoint PPT presentation

Number of Views:1830

Slides: 17

Provided by: DeZyre

Category: How To, Education & Training

more less

Transcript and Presenter's Notes

Title: Big Data and Hadoop Components

1
Hadoop Components Architecture Big Data
Hadoop Training
2
Understand how the hadoop ecosystem works to
master Apache Hadoop skills and gain in-depth
knowledge of big data ecosystem and hadoop
architecture. However, before you enroll for any
big data hadoop training course it is necessary
to get some basic idea on how the hadoop
ecosystem works. Learn about the various hadoop
components that constitute the Apache Hadoop
architecture in this presentation.
3
Defining Architecture Components of the Big Data
Ecosystem
4
(No Transcript)
5
Core Hadoop Components

Hadoop Common
2) Hadoop Distributed File System (HDFS)
3) MapReduce- Distributed Data Processing
Framework of Apache Hadoop
4)YARN
Read More in Detail about Hadoop Components -
https//www.dezyre.com/article/hadoop-components-a
nd-architecture-big-data-and-hadoop-training/114

6
Data Access Components of Hadoop Ecosystem- Pig
and Hive
7
Apache Pig

?Apache Pig is a convenient tool developed by
Yahoo for analysing huge data sets efficiently
and easily. It provides a high level data flow
language Pig Latin that is optimized, extensible
and easy to use.

8
Apache Hive

? Hive developed by Facebook is a data warehouse
built on top of Hadoop and provides a simple
language known as HiveQL similar to SQL for
querying, data summarization and analysis. Hive
makes querying faster through indexing.

9
Data Integration Components of Hadoop Ecosystem-
Sqoop and Flume
10
Apache Sqoop

Sqoop component is used for importing data from
external sources into related Hadoop components
like HDFS, HBase or Hive. It can also be used for
exporting data from Hadoop o other external
structured data stores.

11
Flume

?Flume component is used to gather and aggregate
large amounts of data. Apache Flume is used for
collecting data from its origin and sending it
back to the resting location (HDFS).

12
Data Storage Component of Hadoop Ecosystem HBase
13
HBase

HBase is a column-oriented database that uses
HDFS for underlying storage of data. HBase
supports random reads and also batch computations
using MapReduce. With HBase NoSQL database
enterprise can create large tables with millions
of rows and columns on hardware machine.

14
Monitoring, Management and Orchestration
Components of Hadoop Ecosystem- Oozie and
Zookeeper
15
Oozie

Oozie is a workflow scheduler where the workflows
are expressed as Directed Acyclic Graphs. Oozie
runs in a Java servlet container Tomcat and makes
use of a database to store all the running
workflow instances, their states ad variables
along with the workflow definitions to manage
Hadoop jobs (MapReduce, Sqoop, Pig and Hive).

16
Zookeeper

Zookeeper is the king of coordination and
provides simple, fast, reliable and ordered
operational services for a Hadoop cluster.
Zookeeper is responsible for synchronization
service, distributed configuration service and
for providing a naming registry for distributed
systems.
To Know about other Hadoop Components -
https//www.dezyre.com/article/hadoop-components-a
nd-architecture-big-data-and-hadoop-training/114