BRIEF OVERVIEW OF HIVE

About This Presentation

Title:

Description:

Number of Views:82

Avg rating:3.0/5.0

Slides: 8

Provided by: DanB197

Learn more at: http://www.ideal.ece.utexas.edu

Category:

Tags: brief | hive | overview | insulator | silicon

Transcript and Presenter's Notes

Title: BRIEF OVERVIEW OF HIVE

1
BRIEF OVERVIEW OF HIVE

2
Overview

Hive is a Massively Parallel Data Warehousing
environment
Hive provides SQL like programming environment
for Hadoop
Hadoop becoming common in Big Data houses
Hadoop makes it relatively easy to quickly
implement MapReduce jobs, but often requires
plug-ins or APIs be used to write jobs
Engineers though familiar with SQL and not
MapReduce may be more productive with SQL.
Hive queries are MapReduce operations

3
Background on Hadoop

4
Advantages

Hive allows developers to with SQL background to
ramp rapidly and perform Hive queries
Open Source Apache project
Hive is compatible with other MapReduce
operations in an infrastructure some groups can
use Hive and others native MapReduce
Can share tables with Hbase
Hive has built in functions for reducing data
such as sampling
Block Sampling
Bucket Sampling
Deterministic Sampling
Non-Deterministic Sampling

5
Disadvantages

Not for real time unless very small data (why are
you using Hadoop?)
Row updates are not generally allowed
Hive queries can be very time consuming
Similar to RDBMS some experience and knowledge of
writing efficient queries is necessary in Hive
Hive features require extending and modifying SQL
operations and some SQL operations behave
differently
SORT BY vs. ORDER BY (Local vs. Global reducer
behavior)
Large data sizes make some queries impossible to
finish due to individual system resources in a
meaningful time (doing an ORDER by on all columns
in a PetaByte search is a bad idea).
Queries are still IO bound
Hive optimizations still on-going
Consider using Hadoop natively, Hbase (Fast, row
edit), or Pig (transforms)