Scala & Spark Online Training - PowerPoint PPT Presentation

About This Presentation
Title:

Scala & Spark Online Training

Description:

Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional. – PowerPoint PPT presentation

Number of Views:136
Learn more at: http://www.learntek.org
Slides: 24
Provided by: Learntek

less

Transcript and Presenter's Notes

Title: Scala & Spark Online Training


1
  • Scala Spark

2
Scala Spark
  • The following topics will be covered in our
  • Scala Spark Online Training

3
What is Scala?
  • Scala spark Training Scala is a modern
    multi-paradigm programming language designed to
    express common programming patterns in a concise,
    elegant, and type-safe way. Scala, the word came
    from Scalable Language, is a hybrid functional
    programming language which smoothly integrates
    the features of objected oriented and functional
    programming languages and it is compiled to run
    on the Java Virtual Machine. Scala has been
    created by Martin Odersky and released in 2003.

4
Why Scala?
  • Scala is a type-safe JVM language that
    incorporates both object oriented and functional
    programming features into an extremely concise,
    logical, simple and extremely powerful language.
  • Scala creates a better Java alternative by
    remaining its syntax very close to the Java
    language syntax, so that to minimize the learning
    difficulty.
  • Scala was created specifically with the goal of
    creating a better language, in contrast with
    those restrictive, overly tedious, or frustrating
    features of Java.

5
What is Spark?
  • Spark is a fast cluster computing technology,
    designed for fast computation in Hadoop clusters.
    It is based on Hadoop MapReduce programming and
    it extends the MapReduce model to efficiently use
    it for more types of computations, like
    interactive queries and stream processing. Spark
    uses Hadoop in two different ways one
    is storage and another one is processing. As
    Spark is having its own cluster management
    computation, it uses Hadoop for storage purpose
    only.

6
Why Spark?
  • Spark was introduced by Apache Software
    Foundation for speeding up the Hadoop software
    computing process.
  • The main feature of Spark is its in-memory
    cluster computing that highly increases the speed
    of an application processing.
  • Spark is designed to cover a wide range of
    workloads such as batch applications, iterative
    algorithms, interactive queries and streaming
    applications by reducing the management burden of
    maintaining separate tools.

7
Introduction to Scala
  • Scala spark Training Overview of Scala
  • Installing Scala
  • Scala Basics
  • IDE for Scala

8
Scala Programming
Variables Methods Literals Reserved Words Operators Precedence Rules If Expression For Expression Exception handling with Try Expression Match Expression While Loops Do-While Loops Implicit Conversion
9
Functions in Scala
Methods First class Function Higher Order Methods Function Literal Partially Applied Function Tail Recursion Closure Currying Control Abstraction
10
Traits OOPs in Scala
  • Traits
  • Classes Objects
  • Abstract Class
  • Access Modifiers
  • Functional Programming
  • Scala Class Hierarchy
  • Package and Imports

11
Case Class  Pattern Matching
  • Pattern type
  • Pattern Guard
  • Sealed Class
  • Option Type
  • Extractor

12
Scala Collection
  • Immutable And Mutable collection
  • Array
  • Sets
  • Lists
  • Tuples
  • Maps

13
Introduction to Spark
  • Scala spark Training Problems with
    Traditional Large-Scale Systems
  • Introducing Spark
  • What is Spark?

14
Spark Basics
  • Spark Installation
  • Configure HDP 2.4 (or 2.5) on local machine
  • Spark Shell
  • Storage layers for Spark
  • Overview of Spark architecture
  • Initialize a Spark Context and building
    applications

15
IDEs for Spark Applications
  • SBT and its overview
  • Intellij
  • Eclipse
  • Resolving dependencies for Spark applications

16
RDDs
  • RDD Basics
  • RDD transformations and Actions
  • Lazy evaluation
  • Element wise transformations

17
Pair RDDs
Key-Value Pair RDD Creating Pair RDDs Transformations on Pair RDD Grouping , Joining, Sorting on Pair RDD Data Partitioning Determining a partition of Pair RDD Operations that Benefit from Partitioning Operations those affect the partitioning Page Rank Example
18
Advance concepts in Spark
  • Accumulator
  • Broadcast
  • Working on per-partition basis

19
Launching Spark on cluster
  • Configure and launch Spark Cluster on AWS
  • Configure and launch Spark Cluster on Microsoft
    Azure

20
Running Spark on Cluster
Spark Runtime Architecture Driver Executor Cluster Manager Components of Execution Job, Stage and Task Spark Web URL Driver and Executor logs Spark-submit command
21
Caching and Persistence
  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

22
Spark Algorithms
  • Spark SQL
  • Spark Streaming
  • MLlib
  • GraphX

23
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com