Hadoop & Bigdata Training in Bangalore - PowerPoint PPT Presentation

About This Presentation
Title:

Hadoop & Bigdata Training in Bangalore

Description:

Learn Bigdata & Be a Data Analyst ! CodeFrux Technologies offers Hadoop & Bigdata Training for professionals. We provide Online , Classroom & Corporate training. New Batch Starts on 20th SEP 2014 Demo is available work us on live scenario based project. 91-80-65639331 / 9738058993 (Mobile) contact@codefruxtechnology.com – PowerPoint PPT presentation

Number of Views:702

less

Transcript and Presenter's Notes

Title: Hadoop & Bigdata Training in Bangalore


1
Big Data Hadoop
2
Agenda
  • Introduction to Big Data
  • Why Big Data?
  • Big Data Overview
  • Hadoop Overview
  • Why Hadoop?
  • Who can learn Hadoop?
  • Trending Jobs for Hadoop and Java
  • Hadoop Architecture Ecosystem

3
Introduction to Big Data
  • Big Data A Term for collective data sets with
    large and complex volumes of data.
  • Volumes are in Petabytes (1024 TB) or Exabytes
    (1024 PB) will soon be Zettabytes (1024 EB).
  • Hence, the data are hard to interpret process
    in the existing traditional data processing
    application and tools.

4
Why Big Data
  • To Manage Huge Data in a better way.
  • Benefit of Data Speed, Capacity Scalability
    from cloud storage.
  • Potential insights by Data Analysis Methods.
  • Companies can find new prospects Business
    Opportunities.
  • Unlike other methods, with Big Data, Business
    Users can Visualize the Data.

5
Big Data Overview
  • Big Data include
  • Traditional Structured Databases from
    inventories, orders and customer information.
  • Unstructured Data from web, social networking
    sites etc.,
  • The problem with these massive datasets are that
    it cant be analyzed with standard tool
    procedures.
  • Processing these data appropriately can help an
    Organization gain useful insights on the business
    prospects.

6
Unstructured data Growth
No. of Emails sent per second 2.9 Million
Videos uploaded on YouTube per min 20 hours
Data processed by Google per day 20PetaBytes
Tweets per day 50 Million
Minutes spent on FaceBook per month 700 Billion
Data sent received by mobile users per day 1.3 ExaBytes
Products ordered on Amazon per second 73 items
Sourcehttp//www. http//ibm.com/
7
Unstructured data Growth
8
(No Transcript)
9
Sourcehttp//www. http//forbes.com/
10
Hadoop Overview
  • Hadoop allows batch processing for colossal data
    sets (Petabytes Exabytes) as a series of
    parallel processes.
  • Hadoop cluster comprises a number of server
    "nodes.
  • Nodes store and process data in a parallel and
    distributed fashion.
  • Its a parallelized, distributed storage
    processing framework that can operate on
    commodity servers.

11
Commodity Hardware
  • Its the average amount of computing resources.
  • It doesnt imply low quality but, affordability.
  • Hadoop Clusters run on Commodity Servers.
  • Commodity servers have an average ratio of disk
    space to memory which is not like specialized
    servers with high memory or CPU.
  • Servers are not designed specifically to
    distribute storage and process framework, but its
    made to fit the purpose.

12
Benefits of Hadoop
  • Scalable Hadoop can store and distribute very
    large data sets across hundreds of inexpensive
    servers that operate in parallel.
  • Failure Tolerance HDFS can replicate files for
    specified number of times and can automatically
    re-replicate data blocks on nodes that have
    failed.

13
Benefits of Hadoop
  • Cost-Effective Hadoop is a scale-out
    architecture that stores all the company's data
    for later use, for which it offers computing and
    storage capabilities for a reasonable price.
  • Speed Hadoops unique storage method is based
    on a distributed file system, resulting in much
    faster data processing.
  • Flexible Hadoop easily access new data sources
    and different types of data to generate insights.

14
Sourcehttp//www. http//datanami.com/
15
Why Hadoop
  • It provides insights into daily operations
  • Drives new product ideas
  • Used by companies for research and development
    and marketing analysis
  • Image and text processing.
  • Analyses huge amount of data in comparatively
    less time.
  • Network monitoring
  • Log and/or click stream analysis of various kinds.

16
Hadoop Forecast
Sourcehttp//www. http//alliedmarketresearch.co
m/
17
Who can Learn Hadoop
  • Anyone with basic knowledge of Java Linux.
  • Even if you arent introduced to Java Linux
    before, You can learn it parallel along with
    Hadoop.
  • Hadoop projects are available as Architect,
    Developer, Tester, Linux/Network/Hardware
    Administrator.
  • Some need the knowledge of Java and some dont.

18
Who can Learn Hadoop
  • SQL knowledge will help in learning HiveQL, which
    is a feature in Hadoop Ecosystem.
  • Knowledge of Linux in will be helpful in
    understanding Hadoop command line Parameters.
  • But even without any prerequisite knowledge of
    Java Linux, with the help of few basic classes
    you can Learn Hadoop.

19
Trending Hadoop Jobs
Sourcehttp//www. http//the451group.com/
20
Job Opportunities in Hadoop
  • MNCs like IBM, Microsoft Oracle have
    integrated with Hadoop.
  • Also, companies like Facebook, HortonWorks,
    Amazon, ebay and Yahoo! are currently looking for
    Hadoop Professionals.
  • So, companies are looking for IT professionals
    with enough Hadoop Mapreduce skills.

21
Salary Trend in Hadoop
Sourcehttp//www. http//itproportal.com/
22
Hadoop Architecture
  • The 2 main components of Hadoop are
  • Hadoop Distributed File System (HDFS) is the
    storage component that breaks files into blocks,
    replicates and stores them across the cluster.
  • MapReduce, the processing component that
    distributes the workload for operations on files
    stored in HDFS and automatically restarts failed
    work.

23
Sourcehttp//www. http//cloudera.com/
24
Hadoop Ecosystem
  • Apache Hadoop Distributed File System offers
    storage of large files across multiple machines. 
  • Apache MapReduce is a program for processing
    large data sets with a parallel distributed
    algorithm on a clusters.
  • Apache Hive data warehouse in distributed storage
    facilitating data summarization, queries and
    managing large datasets.
  • Apache Pig is an engine for executing data flows
    in parallel on Apache Hadoop.
  • Apache HBase Non-relational distributed database
    performing real-time operations in large tables.

25
Hadoop Ecosystem
  • Apache Flume is an Unstructured data aggregator
    to HDFS.
  • Apache Sqoop is a system for transferring bulk
    data between HDFS and relational databases.
  • Apache Oozie is a workflow scheduler system to
    manage Apache Hadoop jobs.
  • Apache Zookeeper is a coordinator with tools to
    write correct distributed applications. 
  • Apache Avro is a framework for modelling,
    serializing and making Remote Procedure Calls.

26
Q A
  • Q A

27
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com