Hadoop MapReduce Vs Spark: Which big data framework to choose - PowerPoint PPT Presentation

About This Presentation
Title:

Hadoop MapReduce Vs Spark: Which big data framework to choose

Description:

A classic approach to compare the advantages and disadvantages of each platform is unlikely to help, as businesses should consider each and every framework from the perspective of their particular needs. – PowerPoint PPT presentation

Number of Views:22

less

Transcript and Presenter's Notes

Title: Hadoop MapReduce Vs Spark: Which big data framework to choose


1
Hadoop MapReduce Vs Spark Which big data
framework to choose
MapReduce Vs Spark Amongst multiple large data
frameworks available on the market, choosing the
right is a difficult challenge. A classic
approach to compare the advantages and
disadvantages of each platform is unlikely to
help, as businesses should consider each and
every framework from the perspective of their
particular needs. First of all, what is Spark
and Hadoop MapReduce? Spark This is an open
source big data framework, providing a faster and
more general-purpose data processing
engine. Spark is originally designed for fast
calculations. It also includes a wide range of
workloads - for example, batch, interactive,
repeater, and streaming. Hadoop MapReduce Its
an open-source framework for writing
applications. It also processes structured and
unorganized data stored in HDFS. Hadoop
MapReduce is designed in a way to process a large
amount of data on a group of commodity hardware.
2
MapReduce can process data in batch mode. A
quick glance at the market situation Hadoop and
Spark, both are open source projects by Apache
Software Foundation and the two are major
products in big data analytics. Hadoop is leading
a big data market for more than 5 years. As per a
recent market research, installed base of Hadoop
extends to 50,000 customers, while Spark has
only 10,000 installations. However, the
popularity of Spark jumped in recent times to
overcome Hadoop in only one year. A past
installation growth rate (2016/2017) shows that
the trend is still going on. Spark is performing
better than Hadoop with 47 vs 14
respectively. The key difference between Hadoop
MapReduce and Spark
Via quoracdn.net To make comparisons fair and
square, here we will contrast Spark with Hadoop
MapReduce since both of them are responsible for
data processing.
3
  • In fact, the important difference between them
    lies in the processing approach Spark does it
    in-memory, while Hadoop MapReduce has to read
    from and write to a disk.
  • Consequently, the speed of processing varies
    greatly - Spark maybe 100 times faster. However,
    the quantity of processed data is also different
    Hadoop MapReduce is capable of working with far
    larger data sets than Spark.
  • Now, let's take a look at the tasks that are good
    for each framework. Tasks Hadoop MapReduce is
    good for
  • Linear processing of huge datasets
  • Hadoop MapReduce allows parallel processing of
    large amounts of data.
  • It breaks a large part into small sizes to
    separately process on different data nodes and
    automatically gathers the results across several
    nodes to return a single result.
  • The resulting dataset is larger than the
    available RAM? O that occasion, Hadoop MapReduce
    may beat Spark.
  • Economic solutions, if no immediate results are
    expected
  • A study considers MapReduce a good solution on
    the condition, speed of processing is not
    important.
  • It was observed that data processing done during
    night hours makes sense to consider using Hadoop
    MapReduce.
  • Tasks Spark is good for
  • Fast data processing
  • Spark is faster in terms of In-memory processing
    than Hadoop MapReduce - up to 100 times for data
    in RAM and up to 10 times for data in storage.

4
  • Iterative processing
  • With the condition that task is to process data
    repeatedly - Spark defeats Hadoop MapReduce.
  • Hadoop Apache Hadoop offers batch processing.
    Hadoop develops a great deal in creating new
    algorithms and component stack to improve access
    to batch processing on a large scale.
  • Sparc's Resilient Distributed Datasets (RDDs)
    enable several map
  • operations in memory, while the Hadoop MapReduce
    has to write temporary results to a disk.
  • Near real-time processing
  • If a business needs instant insights, then they
    should choose Spark and its in-memory
    processing.
  • Spark It processes real-time data, i.e. data
    approaching from real-time event streams at the
    rate of millions of events per second, such as
    Twitter and Facebook data. The strength of Spark
    is to process live streams efficiently.
  • Hadoop MapReduce MapReduce has its failures when
    it comes to real-
  • time data processing, as it was designed to
    perform batch processing on vast amounts of
    data.
  • Graph processing
  • Sparks computational/estimation model is good
    for iterative or repetitive computations that
    are typical in graph processing. And Apache Spark
    consists of GraphX an API for graph
    computation.
  • Machine learning
  • Spark has MLlib a built-in machine learning
    library, whereas talking about Hadoop, it needs
    a third-party to provide it.
  • MLlib has unthinkable algorithms that also run in
    memory. But if

5
  • When talking of user-friendly, Spark is easier to
    use than Hadoop. As Spark has user-friendly APIs
    for Scala (its native language), Java, Python,
    and Spark SQL.
  • An interactive REPL (Read-Eval-Print-Loop) allows
    Spark users to get
  • immediate feedback for commands.
  • Hadoop Hadoop, in contrast, is written in Java,
    is difficult to program, and requires
    abstractions.
  • Although there is no interactive mode available
    with Hadoop MapReduce, tools like Pig and Hive,
    making it easier for users to work with it.
  • Security in Spark
  • Sparks security is currently in its immaturity
    stage, offering only authentication support
    through shared secret (password authentication).
  • Hadoop MapReduce Hadoop MapReduce possesses
    better security features than Spark. Hadoop
    supports Kerberos authentication, a good
    security feature but difficult to manage.
  • Also Read Who To Choose When Its Apache
    Cassandra Vs. Hadoop Distributed File System
  • Examples of practical applications
  • We examined several examples of practical
    applications and concluded that Spark is likely
    to perform far better than MapReduce in all
    applications below, thanks to fast, or even
    closer to real-time processing. Let's look at
    examples.
  • Customer Segmentation
  • Analyze customer behavior and identify customers'
    segments that display the same behavior
    patterns, help businesses understand customer
    preferences and create a unique customer
    experience.
  • Risk management
  • Predicting various potential scenarios can help
    managers make the right decisions by choosing a
    non-risky option.

6
4. Industrial Big Data Analysis This is also
about the detection and prediction of anomalies,
but in this case, these anomalies relate to the
machinery breakdown. A well-configured system
collects data from the sensor to detect
pre-failure conditions. Cost Hadoop and Spark,
both come in the category of open-source
projects, therefore they are for free. When it
comes to cost, organizations need to see their
requirements. If its about processing large
amounts of large data, Hadoop will be cheaper
because the hard disk space comes at a very low
rate than memory space. Compatibility The
compatibility of both Hadoop and Spark is good
with each other. The compatibility of Spark is
with all data sources and file formats supported
by Hadoop. Therefore, its not wrong to say
that the compatibility of Spark with data types
and data sources is similar to that of Hadoop
MapReduce. Which framework to choose? Its your
particular business requirements which should
determine the choice of a framework. Linear
processing of huge datasets has the advantage of
Hadoop MapReduce, while Spark provides faster
performance, iterative processing, real-time
analytics, graph processing, machine learning,
and more. In many cases, Spark is far better and
much more advanced than Hadoop MapReduce. The
good news is, Spark is fully compatible with
Hadoop Eco- System and works comfortably with
Hadoop Distributed File System, Apache Hive
etc. Spark can handle any type of requirement
you ask for, whether its batch, interactive,
iterative, streaming or graph, whereas MapReduce
limits to batch processing. Also Read React
Native Vs Swift Which Framework To Choose For
The Project?
Write a Comment
User Comments (0)
About PowerShow.com