Apache Tephra - PowerPoint PPT Presentation

About This Presentation
Title:

Apache Tephra

Description:

This presentation gives an overview of the Apache Tephra project. It explains Tephra in terms of Pheonix, HBase and HDFS. It examines the project architecture and configuration. Links for further information and connecting – PowerPoint PPT presentation

Number of Views:49
Slides: 15
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: Apache Tephra


1
What Is Apache Tephra ?
  • Provides transactions for HBase and Phoenix
  • Apache incubating project
  • Uses HBase's native data versioning to
  • Provide multi-versioned concurrency control
    (MVCC)
  • For transactional reads and writes
  • Provides snapshot isolation of concurrent
    transactions
  • Open source / Apache 2.0 license

2
Tephra Architecture
  • Tephra has three main components
  • Transaction Server
  • Maintains global view of transaction state
  • Assigns new transaction IDs
  • Performs conflict detection
  • Transaction Client
  • Coordinates start, commit
  • And rollback of transactions

3
Tephra Architecture
  • Tephra has three main components
  • TransactionProcessor Coprocessor
  • Applies filtering to the data read
  • (based on a given transaction's state)
  • Cleans up any data from old
  • (no longer visible) transactions
  • Multiple transaction server instances can run
    concurrently
  • Allows for automatic failover
  • One server instance is actively serving requests
  • Configured by ZooKeeper

4
Tephra Phoenix
  • Tephra is an incubating Apache project
  • Phoenix uses Tephra for transaction support
  • So this functionality is in a beta stage
  • It gives cross row and cross table transaction
    support
  • And full ACID semantics
  • Remember that Phoenix uses Hbase as it's backing
    store
  • Next slides show configuration

5
Phoenix Architecture ( Reminder )
6
Tephra Phoenix Config
  • Add the following config
  • To your client side hbase-site.xml file
  • To enable transactions
  • ltpropertygt
  • ltnamegtphoenix.transactions.enabledlt/namegt
  • ltvaluegttruelt/valuegt
  • lt/propertygt

7
Tephra Phoenix Config
  • Add the following config
  • To your server side hbase-site.xml file
  • To configure the transaction manager
  • ltpropertygt
  • ltnamegtdata.tx.snapshot.dirlt/namegt
  • ltvaluegt/tmp/tephra/snapshotslt/valuegt
  • lt/propertygt

8
Tephra Phoenix Config
  • Add the following config
  • To your server side hbase-site.xml file
  • To set the transaction timeout
  • ltpropertygt
  • ltnamegtdata.tx.timeoutlt/namegt
  • ltvaluegt60lt/valuegt
  • lt/propertygt
  • Then you can start Tephra on Phoenix
  • ./bin/tephra

9
Tephra Requirements
Component Java HDFS Hbase ZooKeeper
Source Apache Hadoop CDH or HDP MapR Apache CDH
or HDP MapR Apache CDH or HDP MapR
Version 1.7.xx / 1.8.xx 2.0.2-alpha - 2.7.x (CDH)
5.0.0 - 5.12.0 /(HDP) 2.0 2.6 4.1 - 5.1 (with
MapR-FS) 0.96.x, 0.98.x, 1.0.x, 1.1.x, 1.2.x,
1.3.x (except 1.1.5 and 1.2.2) and
2.0.x (CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 2.6 4.1 -
5.1 (with Apache Hbase) Version 3.4.3 -
3.4.5 (CDH) 5.0.0 - 5.12.0 /(HDP) 2.0 2.6 4.1 -
5.1
10
Tephra Transaction Server Config
  • Add changes to hbase-site.xml

data.tx.bind.port data.tx.bind.address data.tx.ser
ver.io.threads data.tx.server.threads data.tx.time
out data.tx.long.timeout data.tx.cleanup.interval
data.tx.snapshot.dir data.tx.snapshot.interval
data.tx.snapshot.retain data.tx.metrics.period
15165 0.0.0.0 2 20 30 86400 10 300 10 60
Port to bind to Server address to listen
on Number of threads for socket IO Number of
handler threads Timeout for a transaction to
complete Timeout for a long run trans to
complete Frequency to check for timed out
trans HDFS directory used to store
snapshots requency to write new snapshots No. old
transaction snapshots to retain Frequency for
metrics reporting
11
Tephra Transaction Client Config
  • Add changes to hbase-site.xml

data.tx.client.timeout data.tx.client.provider d
ata.tx.client.count data.tx.client.obtain.timeout
data.tx.client.retry.strategy data.tx.client.retry
.attempts data.tx.client.retry.backoff.initial dat
a.tx.client.retry.backoff.factor data.tx.client.re
try.backoff.limit
30000 Pool 50 3000 Backoff 2 100 4 30000
Client socket timeout (milliseconds) Client
provider strategy "pool" uses a pool of
clients "thread-local" a client per
thread Max number of clients for "pool"
provider Pool provider clients get timeout
(ms) Client retry strategy(Backoff/n-times) Number
of times to retry (n-times) Initial sleep time
(backoff) Multiplication factor for sleep
time Exit when sleep time reaches this limit
12
Tephra HBase Coprocessor Configuration
  • Tephra requires an HBase coprocessor to be
    installed
  • On all tables where transactional reads and
    writes
  • Will be performed, Add this change
  • To hbase-site.xml
  • ltpropertygt
  • ltnamegthbase.coprocessor.region.classeslt/namegt
  • ltvaluegtorg.apache.tephra.hbase.coprocessor.Transa
    ctionProcessorlt/valuegt
  • lt/propertygt
  • Use Tephra binary to start once configured
  • ./bin/tephra start

13
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

14
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com