Apache NiFi - PowerPoint PPT Presentation

About This Presentation
Title:

Apache NiFi

Description:

This presentation attempts to give an overview of the Apache NiFi project. I had intended to specifically examine the registry but found that there was more to say about Nifi itself. It does examine the Registry project as well as extensions and a possible registry for that area. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:1398
Slides: 11
Provided by: semtechs
Category:

less

Transcript and Presenter's Notes

Title: Apache NiFi


1
What Is Apache NiFi ?
  • A data flow automation system maintained by
    Cloudera
  • Written in Java
  • Open source / Apache 2 License
  • Cluster based and scaleable
  • Has web based user interface
  • Widely extendable
  • Offers data flow monitoring

2
How does Nifi work ?
  • NiFi runs in JVM on servers in cluster
  • Uses ZooKeeper for configuration/coordination
  • One node as a Cluster Coordinator
  • One node as a primary
  • JVM encapsulates
  • Web server
  • Processor / Extensions
  • Repositories for
  • FlowFile / Content / Data Provenance

3
Nifi Architecture
4
Nifi Architecture
  • Web Server for monitoring and administration
  • Flow controller manages extensions and resources
  • FlowFile processor 1 .. N actual data flow
    worker
  • Each processor supports NiFi data flow
  • Extensions allow remote system connectivity
  • Can be user defined
  • FlowFile Repo tracks and maintains current
    flows
  • Content Repo maintains data in transit
  • Provenance Repo historic data flow information

5
NiFi Flow Management
  • Guaranteed data delivery
  • Uses write ahead logs and content repositories
  • Queue buffering / back pressure
  • Queue priority configuration
  • Flow configuration ( latency / throughput )
  • UI based data flow builds
  • UI based data flow monitoring
  • UI based data provenance

6
NiFi Cluster
7
NiFi Cluster
  • Nifi Can act in cluster mode, configured by
    ZooKeeper
  • Each node works on a different set of data
  • ZooKeeper
  • Elects a single cluster coordinator node
  • Handles node fail over
  • Cluster coordinator manages cluster membership
  • ZooKeeper elects a node as a DataFlow manager

8
NiFi Repository Storage
  • All repository storage is pluggable
  • Storage could be change by user defined
    development
  • The default is file system storage with
  • Multiple file system locations used
  • Multiple physical partitions used
  • RAID configurations to optimize I/O
  • Archiving available for the content repository
  • Deletion is automatic and configurable

9
NiFi Extensions
  • Extensions are stored in Nifi Archives ( NAR's )
  • Points of extension include can be
  • processors, Controller Services, Reporting Tasks,
    Prioritizers, and Customer User Interfaces
  • See these example NAR's by Frank Sauer
  • For InfluxDB access
  • JSON transformation
  • https//github.com/fsauer65/NiFi-Extensions

10
What Is Apache NiFi Registry ?
  • A subproject of Apache NiFi
  • For storage and management of shared resources
  • Across one or more instances of NiFi and/or
    MiNiFi
  • Offers version control for flows
  • Define users, groups and policies for flows
  • Support for Linux, Unix and Mac OS X

11
NiFi Extension Registry
  • There was also an extension registry proposal in
    2016
  • Prototyped by Puspendu Banerjee
  • Created on github at
  • https//github.com/PuspenduBanerjee/nifi/tree/NIFI
    -ExtRegistry
  • Seems like a good idea
  • A central location for extensions
  • But no update since 2016
  • For proposal or prototype

12
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

13
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com