An introduction to Cloudera Impala - PowerPoint PPT Presentation

About This Presentation
Title:

An introduction to Cloudera Impala

Description:

An introduction to Cloudera Impala, what is it and how does it work ? How can it bring real time performance gains to Apache Hadoop ? – PowerPoint PPT presentation

Number of Views:2271
Slides: 9
Provided by: semtechs
Category:

less

Transcript and Presenter's Notes

Title: An introduction to Cloudera Impala


1
Impala
  • What is it ?
  • How does it work ?
  • Performance
  • Formats
  • Architecture

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
2
Impala What is it ?
  • Adhoc real time query for Hadoop
  • Open source
  • Developed by Cloudera
  • Based on Google 2010 dremel paper
  • Direct data access via Impala engine
  • Future Hadoop parquet update will
  • Add columnar binary storage to Hadoop
  • Improve Impala performance

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
3
Impala How does it work ?
  • Direct data access
  • Query planning / coordination on data nodes
  • Node based query engine
  • Low latency
  • Perfomance imrovement
  • Query data on HDFS or Hbase
  • Uses same Hive QL syntax ( SQL like )?
  • Has the Hue GUI
  • Allows table joins and aggregation

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
4
Impala Performance
  • Impala delivers performance gains
  • IO bound queries hardware limitations
  • Min 3 times
  • Complex multiple MapReduce stages
  • Min 7 times
  • Cached queries
  • Min 20 times

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
5
Impala Formats
  • Supported formats
  • Text Sequence Files which can be compressed as
  • Snappy
  • GZIP
  • BZIP
  • Future support for
  • Avro
  • RCFile
  • LZO text file
  • Parquet

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
6
Impala Architecture
www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
7
Impala Requirements
  • What does Impala need to run ?
  • CentOS 6.2
  • or RHEL (Red Hat Enterprise Linux)?
  • CDH 4.1 (Cloudera Hadoop Distribution)?
  • Cloudera Manager ( advised )

www.semtech-solutions.co.nz info_at_semtech-solutions
.co.nz
8
Contact Us
  • Feel free to contact us at
  • www.semtech-solutions.co.nz
  • info_at_semtech-solutions.co.nz
  • We offer IT project consultancy
  • We are happy to hear about your problems
  • You can just pay for those hours that you need
  • To solve your problems
Write a Comment
User Comments (0)
About PowerShow.com