Get Started with Hadoop Hive HiveQL Languages - PowerPoint PPT Presentation

About This Presentation
Title:

Get Started with Hadoop Hive HiveQL Languages

Description:

Apache Hadoop is the storage system which is written in Java, which is an open-source, fault-tolerant, and scalable framework. – PowerPoint PPT presentation

Number of Views:101
Slides: 12
Provided by: janbasktraining

less

Transcript and Presenter's Notes

Title: Get Started with Hadoop Hive HiveQL Languages


1
Get Started with Hadoop Hive HiveQL
Languages
2
Career Options Of Hadoop Big Data Certification
  • Hadoop to HiveQL
  • Uses of Hadoop
  • Hive
  • Remember that Hive is not
  • Uses of HiveQL
  • Major Reasons to use Hadoop for Data Science
  • Bottom Line

3
Hadoop to HiveQL
Apache Hadoop is the storage system which is
written in Java, which is an open-source,
fault-tolerant, and scalable framework. It gives
a platform to process a large amount of
data. Hadoop makes use of Data Lake, which
supports the storage of data in its original or
exact format. Hadoop is designed in such a way
through which there can be a scale up from single
servers to thousands of machines, each of which
offering local computation and storage.
4
Uses of Hadoop
Uses of Hadoop
  • There is no need to preprocess data before
    storing it (you may store as much data as you
    want and decide later how to use it)
  • You may easily grow your system to handle more
    data easily by adding nodes (only a little
    administration is required)
  • It is convenient to use for millions or billions
    of transactions
  • Many cities, states, and countries make use of
    Hadoop to analyze data. For example, figuring out
    the traffic jams which can be controlled by the
    use of Hadoop (Concept of Smart City)
  • Big data is also used by many businesses to
    optimize their data performance in an effective
    manner

5
Hive
  • Big Data Analyst
  • Apache Hive is a data warehouse software project
    which was built on the top of Apache Hadoop for
    supplying data query and analysis.
  • It makes use of declarative language, which is
    similar to SQL called HQL.
  •  Hive allows programmers who are well-known with
    the language to write custom MapReduce framework
    to perform more knowledgeable analysis.

6
EcoSystem Components
The functional features of Hive are-
  • Data Summarization
  • Query
  • Analysis

7
HQL
  • The Hive Query Language is a SQL like an
    interface which is used to query data stored in
    the database and file systems that are integrated
    with Hadoop. It supports simple SQL like
    functions- CONCAT, SUBSTR, ROUND, etc. and
    aggregate functions like- SUM, COUNT, MAX, etc.
  • It also supports clauses- GROUP BY and SORT BY.
    Also, it is possible to write user-defined
    functions using Hive Query Language (HQL).
     Basically, it makes use of the well-known
    concepts from the relational database world,
    like- tables, rows, columns, and schema.

8
Uses of HiveQL
  • HQL is the twin of SQL
  • HQL allows programmers to plug-in custom mappers
    and reducers
  • HQL is scalable, familiar, extensible, and fast
    to use
  • It provides indexes to correct queries
  • HQL contains a large number of user function APIs
    which can be used to create custom behavior into
    the query engine
  • It perfectly fits in the requirement of a
    low-level interface of Hadoop

9
Major Reasons to use Hadoop for Data Science
  • When you have to deal with a large amount of
    data, Hadoop is the best option to choose When
    you are planning to implement Hadoop on your
    data, the first step is to understand the
    complexity level of data and the data-rate based
    on which data is going to grow.
  • In this case, cluster planning is required.
    Depending upon the size of data of the company
    (GBs or TBs), Hadoop is helpful here.
  • Different types of data
  • Numeric data
  • Nominal data
  • Different specific applications

10
Bottom Line
Hadoop has become de-facto of Data Science and is
the gateway of Big Data related technologies. It
is the foundation of other Big Data technologies
like Spark, Hive, etc. As per Forbes Hadoop
market is expected to reach 99.318 by 2022 at a
CAGR of 42.1 percent. So, this is the right time
to give a push to your skills in the field of Big
Data. Happy Reading!
11
Thank you
Happy learning
Write a Comment
User Comments (0)
About PowerShow.com