An introduction to Databricks - PowerPoint PPT Presentation

About This Presentation
Title:

An introduction to Databricks

Description:

A introduction to Databricks, what is it and how does it work ? What can it do ? – PowerPoint PPT presentation

Number of Views:3826
Slides: 11
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: An introduction to Databricks


1
Databricks
  • What is Databricks ?
  • Cloud services used
  • Functionality
  • Languages
  • Spark Usage
  • 3rd Party Apps
  • Architecture
  • Books

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
2
Databricks What is it ?
  • A Cloud based Apache Spark cluster service
  • Offers scalable Spark clusters based on AWS
  • Developed by the same people who created Spark
  • Multiple cluster management
  • Job scheduling and library import
  • Offers access to all Spark modules

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
3
Databricks Cloud Services
  • Currently uses Amazon AWS
  • Uses EC2 and has access to S3 buckets
  • Uses a minimum of 2 EC2 instances
  • Attempts to optimise EC2 usage
  • Plans to extend to other cloud providers

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
4
Databricks Functionality
  • Architecture based on Notebooks and folders
  • Has a cluster manager for
  • Defined (min 54gb) clusters
  • Spot clusters
  • On Demand clusters
  • Has a job manager and scheduler
  • Has user management
  • Has full Spark functionality
  • Has strong data visualisation capability
  • Can export reports and dashboards

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
5
Databricks Languages
  • Can have Notebooks in
  • Scala
  • Python
  • SQL
  • SQL can be executed in non SQL Notebooks
  • Markdown comments can be placed in Notebooks
  • Notebooks can be shared by multiple sessions
  • Libraries can be imported and called in Notebooks

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
6
Databricks Spark Usage
  • Lastest Spark version available
  • i.e. DB 1.3.4 uses Spark 1.3.1 at June 2015
  • All Spark modules available
  • SQL, GraphX, MlLib, Streaming
  • Strong integration between modules and
    visualisation
  • Extensive use of tables to import data
  • Tables available via SQL

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
7
Databricks 3rd Party Apps
  • Current available and more to come
  • Pentaho
  • Qlik
  • Tableau
  • TIBC Jaspersoft
  • PanTera
  • ZoomData

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
8
Databricks Architecture
www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
9
Available Books
  • See our Hadoop book from Apress / Springer
  • Big Data Made Easy
  • Look out for our Apache Spark based book
  • from Packt in 2015

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
10
Contact Us
  • Feel free to contact us at
  • www.semtech-solutions.co.nz
  • info_at_semtech-solutions.co.nz
  • We offer IT project consultancy
  • We are happy to hear about your problems
  • You can just pay for those hours that you need
  • To solve your problems
Write a Comment
User Comments (0)
About PowerShow.com