Apache Fluo - PowerPoint PPT Presentation

About This Presentation
Title:

Apache Fluo

Description:

This presentation gives an overview of the Apache Fluo project. It explains Apache Fluo in terms of it's architecture, functionality and transactions. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:40
Slides: 13
Provided by: semtechs
Category:

less

Transcript and Presenter's Notes

Title: Apache Fluo


1
What Is Apache Fluo ?
  • For large scale data set incremental updates
  • Open source Apache 2.0 license
  • Based upon Apache Accumulo
  • Uses Hadoop HDFS to store data
  • Uses ZooKeeper for configuration
  • Partitions tables into tablets
  • It is a distributed system
  • Supports cross node transactions

2
What Is Apache Fluo ?
  • Allows monitoring of large datasets to
  • Identify small changes
  • Join changes into the larger data set
  • Without processing all data
  • Transactions allows many current changes
  • Without data corruption
  • Fluo uses code based observers which
  • Act on table column changes
  • Offers a Fluo Java based API

3
What Is Apache Fluo ?
  • Use of Fluo is code based and low level
  • Fluo uses Hadoop YARN to run its processes
  • Fluo uses ZooKeeper to
  • Store its meta data
  • Store its state information
  • Fluo data is stored in Fluo tables on Accumulo (
    HDFS)
  • Same structure as Accumulo except
  • Row has no timestamps

4
Fluo Architecture
5
Fluo Architecture
  • Large scale computation through small scale
    transactions
  • Clients access Fluo through Java API
  • Clients ingest data through the API
  • Application Oracle processes apply transaction
    timestamps
  • Application worker processes run user code
  • User code/observers monitor column changes
  • Multiple workers can run the same observers
  • Transactions change data, snapshots read data

6
Fluo Architecture
  • Fluo provides snapshot isolation
  • A snapshot only sees pre committed transactions
  • Transaction overlap / collision is possible
  • In this case a write skew is possible if
  • Different keys are concurrently updated
  • Fluo supports scanners to read data ranges or
    spans
  • Fluo has a transaction based LoaderExecutor
  • To aid the loading of data

7
Fluo Architecture
  • Fluo supports incremental processing via
  • Notifications
  • Persistent markers set by a transaction that
    Indicate
  • An Observer should run later for a certain
    rowcolumn
  • Observers
  • User provided code that is registered to
  • Process notifications for a certain column
  • Observer receives row/column that triggered it
    plus transaction
  • Fluo worker processes running across a cluster
  • Will execute Observers

8
Fluo Architecture
  • Fluo supports two types of notification
  • Strong notification
  • Guarantee an observer will run at most once
  • When a column is modified
  • Even for multiple rowcolumn updates
  • Weak notification
  • Cause an observer to run at least once
  • Observers may run multiple times and/or
    concurrently
  • Based on a single weak notification

9
Fluo Row Locking
10
Fluo Row Locking
  • For cross node transactions Fluo uses
  • Accumulo conditional mutations
  • Conditional mutations lock entire rows
  • On the server side when checking conditions
  • Row locks can impact the transaction performance
  • May be a problem if
  • Many transactions will update separate columns in
    a row
  • Those transactions are very likely to run
    concurrently

11
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

12
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com