Apache Kudu PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Apache Kudu


1
What Is Apache Kudu ?
  • A column oriented data store
  • Open source / Apache 2.0 license
  • Written in C
  • Provides fast processing of OLAP workloads
  • Integrates with
  • MapReduce, Spark, Hadoop ecosystem, Impala
  • Scales to large datasets and large clusters
  • Choose consistency requirements on a per-request
    basis

2
Kudu Architecture
  • Kudu tables are split into tablet units
  • A Kudu cluster may have multiple Masters
  • One Master will lead whilst the others follow
  • Tablet servers support tablet data
  • Raft consensus is used to elect leaders and
    followers
  • A tablet server may lead other tablet servers
  • This architecture supports
  • Fault tolerance
  • High availability

3
Kudu Architecture
4
Kudu Schema
  • Structured data model similar to RDBMS
  • Three main concerns for schema design
  • Column design
  • Primary key design
  • Partitioning design
  • Kudu has strongly-typed columns
  • It uses a columnar on-disk storage format

5
Kudu Schema
  • Schema design should accomplish
  • Efficient partition design
  • Even distribution of data across tablet servers
  • Even distribution of reads/writes across tablet
    servers
  • Even growth of data across tablet servers
  • Scans would read the minimum amount of data
  • The last point is also impacted by
  • Primary key design

6
Kudu Partitioning
  • Partitioning involves
  • Partitioning tables into tablets
  • Across tablet servers
  • Partitioning affects performance
  • Aim to partition evenly across cluster
  • Strategies include
  • Range, hash, multilevel

7
Kudu Column Types
  • Supported column types include
  • boolean
  • 8-bit signed integer
  • 16-bit / 32-bit / 64-bit signed integer
  • date (32-bit days since the Unix epoch)
  • unixtime_micros (64-bit microseconds since the
    Unix epoch)
  • single-precision (32-bit) IEEE-754 floating-point
    number
  • double-precision (64-bit) IEEE-754 floating-point
    number
  • decimal
  • varchar
  • UTF-8 encoded string (up to 64KB uncompressed)
  • binary (up to 64KB uncompressed)

8
Kudu Replication
  • Kudu is rack aware
  • It knows the server rack assignments
  • It replicates operations not on disk data
  • It performs logical replication not physical
  • Inserts and updates do not transmit data over the
    network
  • Deletes do not need to move any data
  • Compaction does not transmit the data over the
    network
  • Tablets performing compactions dont need to
  • Perform at the same time
  • Use the same schedule
  • Remain in synchronisation

9
Kudu Replication Terms
  • Kudu hot replica
  • A tablet replica that is continuously receiving
    writes
  • Kudu cold replica
  • A tablet replica that is not hot
  • A replica that is not frequently receiving writes
  • Kudu data on disk
  • Total amount of data stored on a tablet server
  • Across all disks

10
Kudu Example Scale
  • 3 master servers
  • 100 tablet servers
  • 8 TiB of stored data per tablet server
  • post-replication and post-compression.
  • 1000 tablets per tablet server
  • post-replication.
  • 60 tablets per table
  • per tablet server, at table-creation time.
  • 10 GiB of stored data per tablet.

11
Available Books
  • See Big Data Made Easy
  • Apress Jan 2015
  • See Mastering Apache Spark
  • Packt Oct 2015
  • See Complete Guide to Open Source Big Data
    Stack
  • Apress Jan 2018
  • Find the author on Amazon
  • www.amazon.com/Michael-Frampton/e/B00NIQDOOM/
  • Connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020

12
Connect
  • Feel free to connect on LinkedIn
  • www.linkedin.com/in/mike-frampton-38563020
  • See my open source blog at
  • open-source-systems.blogspot.com/
  • I am always interested in
  • New technology
  • Opportunities
  • Technology based issues
  • Big data integration
Write a Comment
User Comments (0)
About PowerShow.com