Apache Spark with Java 8 - PowerPoint PPT Presentation

About This Presentation
Title:

Apache Spark with Java 8

Description:

Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. – PowerPoint PPT presentation

Number of Views:124
Slides: 22
Provided by: learntek12
Tags:

less

Transcript and Presenter's Notes

Title: Apache Spark with Java 8


1
  • Apache Spark with Java 8

2
  • CHAPTER 4
  • THE BASICS OF SEARCH ENGINE FRIENDLY DESIGN
    DEVELOPMENT

3
Apache Spark with Java 8 Training Why
Spark? Apache Spark with Java 8 Training Spark
was introduced by Apache Software Foundation for
speeding up the Hadoop software computing
process. The main feature of Spark is
its in-memory cluster computing that highly
increases the speed of an application
processing. Spark is designed to cover a wide
range of workloads such as batch applications,
iterative algorithms, interactive queries and
streaming applications by reducing the management
burden of maintaining separate tools.
4
Apache Spark also have the following
features. Speed- Spark helps to run an
application in Hadoop cluster, up to 100 times
faster in memory and 10 times faster when running
on disk by reducing number of read/write
operations to disk and by storing the
intermediate processing data in memory. Supports
multiple languages- Spark comes up with 80
high-level operators for interactive querying and
provides application development with built-in
APIs in different languages in Java, Scala, or
Python. Advanced Analytics- Spark not only
supports Map and reduce programming but it
also supports SQL queries, Streaming data,
Machine learning (ML), and Graph algorithms.
5
Apache Spark with Java 8 Training Why
Java8 With the introduction of lambda
expression in Java8, it has provided support of
functional programming in a beautiful way. In
addition to lambda expression, it has also
introduced Streaming API, which can be thought of
as a collection framework for functional
programming in Java without storing the elements.
With of introduction of lambda expression in
Java8, code can be written in more concise and
elegant way. Learning curve has also become quite
smooth as one has to learn just Apache Spark API,
not Scala.
6
Apache Spark with Java Overview of
Java8 Overview of Interface, Static method and
Default method in interface Anonymous Inner
Classes Introduction to Lambda Expressions Functio
nal Interface, type inference Method
references Composing Lambda Understanding
Closure Overview of Streams Working with
Streams Infinite Streams
7
Apache Spark with java Introduction to
Spark Introduction to Big Data Big Data
Problem Scale-Up Vs Scale-Out Architecture Charact
eristics of Scale-Out Introduction to Hadoop,
Map-Reduce and HDFS Introducing Spark
8
Hortonworks Data Platform (HDP) using Virtual
box Importing HDP VM image using Virtual box on
local machine Configuring HDP Overview of Ambari
and its components Overview of services
configuration using Ambari Overview of Apache
Zeppelin Creating, importing and executing
notebooks in Apache Zeppelin
IDEs for Spark Applications Intellij Eclipse Reso
lving dependencies for Spark applications
9
Spark Basics Spark Shell Overview of Spark
architecture Storage layers for Spark Initialize
a Spark Context and building applications Submitti
ng a Spark Application Use of Spark History
Server Spark Components Spark Driver
Process Spark Executor Spark Conf and Spark
Context SparkSession object Overview of
spark-submit command Spark UI
10
RDDs Overview of RDD RDD and Partitions Ways of
Creating RDD RDD transformations and Actions Lazy
evaluation RDD Lineage Graph (DAG) Element wise
transformations Map Vs FlatMap Transformation Set
Transformation RDD Actions Overview of RDD
persistence Methods for persisting RDD Persisting
RDD with Storage option Illustration of Caching
on an RDD in DAG Removal of Cached RDD
11
Pair RDDs Overview of Key-Value Pair RDD Ways
of creating Pair RDDs Transformations on Pair
RDD ReduceByKey(), FoldByKey(),MapValues(),
FlatMapValues(),keys() and Values()
Transformation Grouping, Joining, Sorting on Pair
RDD ReduceByKey() Vs GroupByKey() Pair RDD Action
12
Launching Spark on cluster Configure and launch
Spark Cluster on Google Cloud Configure and
launch Spark Cluster on Microsoft Azure
Logging and Debugging a Spark Application Setting
up a window environment for executing Spark
Application using IDE Steps of using slf4j
logging mechanism in Spark Application Attaching
a debugger to Spark Application Example of
debugging a Spark application running inside a
cluster
13
Spark Application Architecture Spark
Application Distributed Architecture Spark
Application submission Mode Overview of Cluster
Manager Example of using Standalone Cluster
Manager Driver and its responsibilities Overview
of Job, Stage and Tasks Spark Job
Hierarchy Executor Spark-submit command and
various submission options Yarn Cluster
Manager Yarn Architecture Client and Cluster
Deploy-mode
14
Advance concepts in Spark Accumulator Broadcast
RDD partitioning Re-partition RDD Determining RDD
partitioner Partition based RDD like
mapPartitions, mapPartitionsWithIndex, mapPartitio
nsToPair
15
Spark SQL Introduction to SparkSQL Creating
SparkSession with Hive Support DataFrame Ways of
Creating DataFrame Registering a DataFrame as
View DataFrame Transformations API DataFrame SQL
statement Aggregate Operations DataFrame
Action Catalyst Optimizer Limitation of
DataFrame Introduction to Dataset
16
Introduction to Encoder Creating
Dataset Functional transformation on
Dataset Loading CSV, JSON, Parquet format file in
SparkSQL Loading and saving data from/in Hive,
JDBC, HDFS, Cassandra
Introduction to User-Defined-Function
(UDF) Customizing a UDF Usage of UDF in DataFrame
Transformations API Usage of UDF in Spark SQL
statement Introduction to Window Function Steps
of defining a window function Illustration of
Window function usage
17
Introduction to UDAF Customizing a
UDAF Illustration of customized UDAF usage
18
Basic Spark Streaming Introduction to data
streaming Spark Streaming framework Spark
Streaming and Micro batch Introduction of
DStreams DStreams and RDD Word Count example
using Socket Text Stream streaming with Twitter
feeds Setting up a Twitter App Resolving Twitter
dependency in Spark Streaming Application
19
Steps of creating Uber Jar Example of extracting
hashtags from tweet data Troubleshooting Twitter
Streaming issue in Spark Application Steps of
creating Spark Streaming Application Architecture
of Spark Streaming Stateless Transformations Twitt
er Streaming examples using stateless
transformation Introduction to stateful
Transformations
Window Duration and Slide Duration Window
Operations Naive and inverse window reduce
operation
Checkpoint Tracking State of an event using
updateStateByKey operation
20
Interact directly with RDD using transform ()
operation Example of HDFS file streaming Example
of Spark-Kafka interaction Saving DStreams to
external file system
Prerequisites of Apache Spark with Java
8 Understanding of OOPS concept and programming
construct in Java will be required. Having
programming experience in Java7 will be
mandatory. Having understanding or experience of
Lambda expressions in Java8 will be an added
advantage.
21
For more Training Information , Contact
Us Email info_at_learntek.org USA 1734 418
2465 INDIA 40 4018 1306
7799713624
Write a Comment
User Comments (0)
About PowerShow.com