Apache Beam - PowerPoint PPT Presentation

About This Presentation
Title:

Apache Beam

Description:

This presentation gives an overview of the Apache Beam project. It shows that it is a means of developing generic data pipelines in multiple languages using provided SDK's. The pipelines execute on a range of supported runners/executors. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:193
Slides: 14
Provided by: semtechs
Tags: apache | beam | data | etl | pipeline

less

Transcript and Presenter's Notes

Title: Apache Beam


1
What Is Apache Beam ?
  • A unified programming model
  • To define and execute data processing pipelines
  • For ETL, batch and stream
  • Open source / Apache 2.0 license
  • Written in Java, Python, Go
  • Cross platform support
  • Pipelines define using Beam SDK's

2
How Does Beam Work ?
  • Use provided SDK's to define pipelines
  • In Java, Python, Go
  • Beam SDK isolated in Docker container
  • Can be run by any execution runners
  • A supported group of runners execute the pipeline
  • Capability matrix defines
  • Relative capabilities of runners
  • See beam.apache.org for matrix

3
Beam Programming Guide ?
  • A guide for user to create data pipelines
  • Examples in Java, Python, Go
  • Can design, create and test pipelines
  • Provides multi language functions for
  • Pcollections
  • Transforms
  • Pipeline I/O
  • Schemas
  • Data encoding / type safety
  • Windowing
  • Triggers
  • Metrics
  • State and Timers

4
Beam Pipelines
  • When designing pipelines consider
  • Where data is stored
  • What does the data look like
  • What do you want to do with the data
  • What does your output data look like
  • Where should the data go
  • Use PCollection and PTransform functions to
    define pipelines

5
Beam Example Pipelines
6
Beam Example Pipelines
7
Beam Runners
  • Supported Beam Runners are
  • Direct Runner (test and development )
  • Apache Apex
  • Apache Flink
  • Apache Gearpump
  • Apache Hadoop MapReduce
  • Apache Nemo
  • Apache Samza
  • Apache Spark
  • Google Cloud Dataflow
  • Hazelcast Jet
  • IBM Streams
  • JStorm

8
Beam Capability Matrix What Computed
9
Beam Capability Matrix Where Computed
10
Beam Capability Matrix When Computed
11
Beam Capability Matrix How Computed
12
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

13
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com