Title: WHAT IS BIGDATA AND HADOOP ECOSYSTEM?
1WHAT IS BIGDATA AND HADOOP ECOSYSTEM?
2Data is a collection of facts and stats for
analysis. The advent along with the popularity of
the internet has led to the collection of tonnes
and tonnes of data every day. This data that the
internet collects is not the organized kind. This
data is heavily unorganized and unformatted. This
unformatted data keeps on increasing with no
regard to any aspects. This data is now given the
term Big Data. Big Data is huge amounts of data
received daily from websites, social media, and
email. This information is complex and in
unstructured formats. Big data associates itself
with five concepts Volume (quantity of data),
Variety (nature of data), Velocity (generation
and processing speed of data), veracity (data
quality of captured data), and value. Big data
grows rapidly through cheap and numerous IoT
devices such as mobile devices, aerial software
logs, microphones, RFID (radio-frequency
identification) readers, wireless sensor
networks, and cameras. To be able to extract all
this information, business groups need an
application that can analyze this data and
convert it into a readable, understandable and
structured batch of information.
3Hadoop is a Java-based programming platform that
processes large sets of data in an assigned
computing workspace. Hadoop allows running of
applications on systems that are working on
thousands of nodes involving thousands of
terabytes. It supports the distributed file
system that allows rapid data transfer rates and
allows uninterrupted system execution in case of
a node failure. This reduces the risk of
cataclysmic system failure even if a large number
of nodes stop working. Hadoop works on
Googles MapReduce, which breaks down an
application into a large number of smaller
parts. Also Read Relation
between Big data, Hadoop and R language
4 The Hadoop ecosystem is built of the
following Hadoop Common This contains java
set of files that the Hadoop modules use. These
libraries provide OS-level abstraction and
contain the most essential java documents and
boots Hadoop. Hadoop Distributed File System
This file system allows access to the data hence
allowing high bandwidth across the cluster. The
main components of HDFS are NameNode, DataNode
and Secondary NameNode. The NameNode maintains
the files and manages blocks present on
DataNodes. The DataNode are responsible for
serving read/write requests for clients. The
secondary NameNodes are to perform checkpoints.
This helps start a NameNode in case of a
failure. Hadoop YARN this is responsible for
job scheduling and resource management.
5(No Transcript)
6 Hadoop MapReduce this element of the ecosystem
is used to process large datasets. The MapReduce
has two common functions the map task and the
reduce task. The map task is required to convert
and divide data into parts. The reduce task forms
an output that is the solution to our
problem. Also Read BIG DATA VS HADOOP Who
Will Win? Hadoop has picked up its notoriety
because of its capacity for putting away,
dissecting and getting to the expansive measure
of information, rapidly and cost successfully
through groups of production equipment. It wont
be right that we say that Apache Hadoop is really
an accumulation of a few parts and not only a
solitary item. Hadoop is strongly recommended
because it is easy to integrate with any
component. It is also capable of storing,
analyzing and accessing large chunks of data in
limited time. Thus, Hadoop is highly user
convenient.
7Learn Big Data Hadoop by taking course from Big
Data Hadoop Institute in Delhi. Madrid Software
offers 3 months course for Big data Hadoop.