Hadoop & Bigdata Training in Bangalore - PowerPoint PPT Presentation

About This Presentation

Title:

Hadoop & Bigdata Training in Bangalore

Description:

Learn Bigdata & Be a Data Analyst ! CodeFrux Technologies offers Hadoop & Bigdata Training for professionals. We provide Online , Classroom & Corporate training. New Batch Starts on 20th SEP 2014 Demo is available work us on live scenario based project. 91-80-65639331 / 9738058993 (Mobile) contact@codefruxtechnology.com – PowerPoint PPT presentation

Number of Views:703

Updated: 17 October 2014

Slides: 28

Provided by: codefrux

Category: How To, Education & Training

more less

Transcript and Presenter's Notes

Title: Hadoop & Bigdata Training in Bangalore

1
Big Data Hadoop
2
Agenda

Introduction to Big Data
Why Big Data?
Big Data Overview
Hadoop Overview
Why Hadoop?
Who can learn Hadoop?
Trending Jobs for Hadoop and Java
Hadoop Architecture Ecosystem

3
Introduction to Big Data

Big Data A Term for collective data sets with
large and complex volumes of data.
Volumes are in Petabytes (1024 TB) or Exabytes
(1024 PB) will soon be Zettabytes (1024 EB).
Hence, the data are hard to interpret process
in the existing traditional data processing
application and tools.

4
Why Big Data

To Manage Huge Data in a better way.
Benefit of Data Speed, Capacity Scalability
from cloud storage.
Potential insights by Data Analysis Methods.
Companies can find new prospects Business
Opportunities.
Unlike other methods, with Big Data, Business
Users can Visualize the Data.

5
Big Data Overview

Big Data include
Traditional Structured Databases from
inventories, orders and customer information.
Unstructured Data from web, social networking
sites etc.,
The problem with these massive datasets are that
it cant be analyzed with standard tool
procedures.
Processing these data appropriately can help an
Organization gain useful insights on the business
prospects.

6
Unstructured data Growth
No. of Emails sent per second 2.9 Million
Videos uploaded on YouTube per min 20 hours
Data processed by Google per day 20PetaBytes
Tweets per day 50 Million
Minutes spent on FaceBook per month 700 Billion
Data sent received by mobile users per day 1.3 ExaBytes
Products ordered on Amazon per second 73 items
Sourcehttp//www. http//ibm.com/
7
Unstructured data Growth
8
(No Transcript)
9
Sourcehttp//www. http//forbes.com/
10
Hadoop Overview

Hadoop allows batch processing for colossal data
sets (Petabytes Exabytes) as a series of
parallel processes.
Hadoop cluster comprises a number of server
"nodes.
Nodes store and process data in a parallel and
distributed fashion.
Its a parallelized, distributed storage
processing framework that can operate on
commodity servers.

11
Commodity Hardware

Its the average amount of computing resources.
It doesnt imply low quality but, affordability.
Hadoop Clusters run on Commodity Servers.
Commodity servers have an average ratio of disk
space to memory which is not like specialized
servers with high memory or CPU.
Servers are not designed specifically to
distribute storage and process framework, but its
made to fit the purpose.

12
Benefits of Hadoop

Scalable Hadoop can store and distribute very
large data sets across hundreds of inexpensive
servers that operate in parallel.
Failure Tolerance HDFS can replicate files for
specified number of times and can automatically
re-replicate data blocks on nodes that have
failed.

13
Benefits of Hadoop

Cost-Effective Hadoop is a scale-out
architecture that stores all the company's data
for later use, for which it offers computing and
storage capabilities for a reasonable price.
Speed Hadoops unique storage method is based
on a distributed file system, resulting in much
faster data processing.
Flexible Hadoop easily access new data sources
and different types of data to generate insights.

14
Sourcehttp//www. http//datanami.com/
15
Why Hadoop

It provides insights into daily operations
Drives new product ideas
Used by companies for research and development
and marketing analysis
Image and text processing.
Analyses huge amount of data in comparatively
less time.
Network monitoring
Log and/or click stream analysis of various kinds.

16
Hadoop Forecast
Sourcehttp//www. http//alliedmarketresearch.co
m/
17
Who can Learn Hadoop

Anyone with basic knowledge of Java Linux.
Even if you arent introduced to Java Linux
before, You can learn it parallel along with
Hadoop.
Hadoop projects are available as Architect,
Developer, Tester, Linux/Network/Hardware
Administrator.
Some need the knowledge of Java and some dont.

18
Who can Learn Hadoop

SQL knowledge will help in learning HiveQL, which
is a feature in Hadoop Ecosystem.
Knowledge of Linux in will be helpful in
understanding Hadoop command line Parameters.
But even without any prerequisite knowledge of
Java Linux, with the help of few basic classes
you can Learn Hadoop.

19
Trending Hadoop Jobs
Sourcehttp//www. http//the451group.com/
20
Job Opportunities in Hadoop

MNCs like IBM, Microsoft Oracle have
integrated with Hadoop.
Also, companies like Facebook, HortonWorks,
Amazon, ebay and Yahoo! are currently looking for
Hadoop Professionals.
So, companies are looking for IT professionals
with enough Hadoop Mapreduce skills.

21
Salary Trend in Hadoop
Sourcehttp//www. http//itproportal.com/
22
Hadoop Architecture

The 2 main components of Hadoop are
Hadoop Distributed File System (HDFS) is the
storage component that breaks files into blocks,
replicates and stores them across the cluster.
MapReduce, the processing component that
distributes the workload for operations on files
stored in HDFS and automatically restarts failed
work.

23
Sourcehttp//www. http//cloudera.com/
24
Hadoop Ecosystem

Apache Hadoop Distributed File System offers
storage of large files across multiple machines.
Apache MapReduce is a program for processing
large data sets with a parallel distributed
algorithm on a clusters.
Apache Hive data warehouse in distributed storage
facilitating data summarization, queries and
managing large datasets.
Apache Pig is an engine for executing data flows
in parallel on Apache Hadoop.
Apache HBase Non-relational distributed database
performing real-time operations in large tables.

25
Hadoop Ecosystem

Apache Flume is an Unstructured data aggregator
to HDFS.
Apache Sqoop is a system for transferring bulk
data between HDFS and relational databases.
Apache Oozie is a workflow scheduler system to
manage Apache Hadoop jobs.
Apache Zookeeper is a coordinator with tools to
write correct distributed applications.
Apache Avro is a framework for modelling,
serializing and making Remote Procedure Calls.

26
Q A

27
(No Transcript)

Write a Comment

User Comments (0)