Apache Airflow - PowerPoint PPT Presentation

About This Presentation
Title:

Apache Airflow

Description:

This presentation gives an overview of the Apache Airflow project. It explains Apache Airflow in terms of it's pipelines, tasks, integration and UI. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:3459
Slides: 14
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: Apache Airflow


1
What Is Apache Airflow ?
  • A work flow management platform
  • Uses Python based work flows
  • Schedule by time or event
  • Open source Apache 2.0 license
  • Written in Python
  • Monitor work flows in UI
  • Has a wide range of integration options
  • Originally developed at Airbnb

2
What Is Apache Airflow ?
  • Uses SqlLite as a back end DB but can use
  • MySQL, Postgres, JDBC etc
  • Install extra packages using pip command
  • Wide variety available, includes
  • Many databases, cloud services
  • Hadoop eco system
  • Security, web services, queues
  • Many more

3
Airflow Pipelines
  • These are Python based work flows
  • Are actually directed acyclic graphs ( DAG's )
  • Pipelines use Jinja templating
  • Pipelines contain user defined tasks
  • Tasks can run on different workers at different
    times
  • Jinja scripts can be embedded in tasks
  • Comments can be added in tasks in varying
    formats
  • Inter task dependencies can be defined

4
Airflow Pipelines
5
Airflow Tasks
  • Tasks have a lifecycle
  • Tasks use operators to execute, depends upon
    type
  • For instance MySqlOperator
  • Hooks are used to access external systems i.e.
    databases
  • Worker specific queues can be used for tasks
  • Xcom allows tasks to exchange messages
  • Pipelines or DAG's allow
  • Branching
  • Sub DAG's
  • Service level agreements ( SLA )
  • Triggering rules

6
Airflow Task Stages
  • Tasks have life cycle stages

7
Airflow Task Life Cycle
8
Airflow UI
  • Airflow UI provides views
  • DAG, Tree, Graph, Variables, Gantt Chart
  • Task duration, Code view
  • Select a task instance in any view to manage
  • Monitor and troubleshoot pipelines in views
  • Monitor DAG's by owner, schedule, run time etc
  • Use views to find pipeline problem areas
  • Use views to find bottle necks

9
Airflow UI
10
Airflow Integration
  • Airflow Integrates with
  • Azure Microsoft Azure
  • AWS Amazon Web Services
  • Databricks
  • GCP Google Cloud Platform
  • Cloud Speech Translate Operators
  • Qubole
  • Kubernetes
  • Run tasks as pods

11
Airflow Metrics
  • Airflow can send metrics to StatsD
  • A network daemon that runs on Node.js
  • Listens for statistics, like counters, gauges,
    timers
  • Statistics sent over UDP or TCP
  • Install metrics using pip command
  • Specify which stats to record i.e.
  • scheduler,executor,dagrun

12
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

13
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com