An Introduction to Apache Pig - PowerPoint PPT Presentation

About This Presentation
Title:

An Introduction to Apache Pig

Description:

An Introduction to Apache Pig, what is it used for ? How does it work and why use it compared to Map Reduce native code ? – PowerPoint PPT presentation

Number of Views:1405
Slides: 9
Provided by: semtechs

less

Transcript and Presenter's Notes

Title: An Introduction to Apache Pig


1
Apache Pig
  • What is it ?
  • How does it work ?
  • Why use it ?
  • PigLatin Data Types
  • PigLatin Maths
  • PigLatin Example

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
2
Pig What is it ?
  • A high level language
  • Used to analyse large data sets
  • Used to create MapReduce jobs
  • Abstracts definition of jobs
  • Uses Pig Latin to define jobs
  • Less code needed
  • Compiles to MapReduce code

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
3
Pig How does it work ?
  • Three ways to use it
  • Grunt Pig's interactive shell
  • Write Pig Latin in a script file
  • Embed Pig commands in another language
  • Run modes
  • Local mode single machine
  • Hadoop run on a Hadoop/MapReduce cluster
  • Creates MapReduce code automatically

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
4
Pig Why use it ?
  • It is quicker
  • It is data omnivorous
  • It is easy to learn
  • It is widely used
  • Minor performance loss
  • Compared to native code
  • It can be extended via user defined functions (
    UDF )?

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
5
PigLatin Data Types
  • Int
  • Long
  • Float
  • Double
  • Chararray
  • Bytearray
  • Tuple
  • Bag
  • Map

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
6
PigLatin Maths
  • Some of the built in maths functions
  • ABS
  • CEIL
  • EXP
  • FLOOR
  • LOG
  • ROUND
  • SIN
  • TAN

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
7
PigLatin Example
  • Example borrowed from Wikipedia
  • input_lines LOAD '/tmp/my-copy-of-all-pages-on-i
    nternet' AS (linechararray)
  • -- Extract words from each line and put them into
    a pig bag
  • -- datatype, then flatten the bag to get one word
    on each row
  • words FOREACH input_lines GENERATE
    FLATTEN(TOKENIZE(line)) AS word
  • -- filter out any words that are just white
    spaces
  • filtered_words FILTER words BY word MATCHES
    '\\w'
  • -- create a group for each word
  • word_groups GROUP filtered_words BY word
  • -- count the entries in each group
  • word_count FOREACH word_groups GENERATE
    COUNT(filtered_words) AS count, group AS word
  • -- order the records by count
  • ordered_word_count ORDER word_count BY count
    DESC
  • STORE ordered_word_count INTO '/tmp/number-of-word
    s-on-internet'

www.semtech-solutions.co.nz info_at_semtech-solutio
ns.co.nz
8
Contact Us
  • Feel free to contact us at
  • www.semtech-solutions.co.nz
  • info_at_semtech-solutions.co.nz
  • We offer IT project consultancy
  • We are happy to hear about your problems
  • You can just pay for those hours that you need
  • To solve your problems
Write a Comment
User Comments (0)
About PowerShow.com