Title: Hadoop & Bigdata Training in Bangalore
1Big Data Hadoop
2Agenda
- Introduction to Big Data
- Why Big Data?
- Big Data Overview
- Hadoop Overview
- Why Hadoop?
- Who can learn Hadoop?
- Trending Jobs for Hadoop and Java
- Hadoop Architecture Ecosystem
3Introduction to Big Data
- Big Data A Term for collective data sets with
large and complex volumes of data. - Volumes are in Petabytes (1024 TB) or Exabytes
(1024 PB) will soon be Zettabytes (1024 EB). - Hence, the data are hard to interpret process
in the existing traditional data processing
application and tools.
4Why Big Data
- To Manage Huge Data in a better way.
- Benefit of Data Speed, Capacity Scalability
from cloud storage. - Potential insights by Data Analysis Methods.
- Companies can find new prospects Business
Opportunities. - Unlike other methods, with Big Data, Business
Users can Visualize the Data.
5Big Data Overview
- Big Data include
- Traditional Structured Databases from
inventories, orders and customer information. - Unstructured Data from web, social networking
sites etc., - The problem with these massive datasets are that
it cant be analyzed with standard tool
procedures. - Processing these data appropriately can help an
Organization gain useful insights on the business
prospects.
6Unstructured data Growth
No. of Emails sent per second 2.9 Million
Videos uploaded on YouTube per min 20 hours
Data processed by Google per day 20PetaBytes
Tweets per day 50 Million
Minutes spent on FaceBook per month 700 Billion
Data sent received by mobile users per day 1.3 ExaBytes
Products ordered on Amazon per second 73 items
Sourcehttp//www. http//ibm.com/
7Unstructured data Growth
8(No Transcript)
9Sourcehttp//www. http//forbes.com/
10Hadoop Overview
- Hadoop allows batch processing for colossal data
sets (Petabytes Exabytes) as a series of
parallel processes. - Hadoop cluster comprises a number of server
"nodes. - Nodes store and process data in a parallel and
distributed fashion. - Its a parallelized, distributed storage
processing framework that can operate on
commodity servers.
11Commodity Hardware
- Its the average amount of computing resources.
- It doesnt imply low quality but, affordability.
- Hadoop Clusters run on Commodity Servers.
- Commodity servers have an average ratio of disk
space to memory which is not like specialized
servers with high memory or CPU. - Servers are not designed specifically to
distribute storage and process framework, but its
made to fit the purpose.
12Benefits of Hadoop
- Scalable Hadoop can store and distribute very
large data sets across hundreds of inexpensive
servers that operate in parallel. - Failure Tolerance HDFS can replicate files for
specified number of times and can automatically
re-replicate data blocks on nodes that have
failed.
13Benefits of Hadoop
- Cost-Effective Hadoop is a scale-out
architecture that stores all the company's data
for later use, for which it offers computing and
storage capabilities for a reasonable price. - Speed Hadoops unique storage method is based
on a distributed file system, resulting in much
faster data processing. - Flexible Hadoop easily access new data sources
and different types of data to generate insights.
14Sourcehttp//www. http//datanami.com/
15Why Hadoop
- It provides insights into daily operations
- Drives new product ideas
- Used by companies for research and development
and marketing analysis - Image and text processing.
- Analyses huge amount of data in comparatively
less time. - Network monitoring
- Log and/or click stream analysis of various kinds.
16Hadoop Forecast
Sourcehttp//www. http//alliedmarketresearch.co
m/
17Who can Learn Hadoop
- Anyone with basic knowledge of Java Linux.
- Even if you arent introduced to Java Linux
before, You can learn it parallel along with
Hadoop. - Hadoop projects are available as Architect,
Developer, Tester, Linux/Network/Hardware
Administrator. - Some need the knowledge of Java and some dont.
18Who can Learn Hadoop
- SQL knowledge will help in learning HiveQL, which
is a feature in Hadoop Ecosystem. - Knowledge of Linux in will be helpful in
understanding Hadoop command line Parameters. - But even without any prerequisite knowledge of
Java Linux, with the help of few basic classes
you can Learn Hadoop.
19Trending Hadoop Jobs
Sourcehttp//www. http//the451group.com/
20Job Opportunities in Hadoop
- MNCs like IBM, Microsoft Oracle have
integrated with Hadoop. - Also, companies like Facebook, HortonWorks,
Amazon, ebay and Yahoo! are currently looking for
Hadoop Professionals. - So, companies are looking for IT professionals
with enough Hadoop Mapreduce skills.
21Salary Trend in Hadoop
Sourcehttp//www. http//itproportal.com/
22Hadoop Architecture
- The 2 main components of Hadoop are
- Hadoop Distributed File System (HDFS) is the
storage component that breaks files into blocks,
replicates and stores them across the cluster. - MapReduce, the processing component that
distributes the workload for operations on files
stored in HDFS and automatically restarts failed
work.
23Sourcehttp//www. http//cloudera.com/
24Hadoop Ecosystem
- Apache Hadoop Distributed File System offers
storage of large files across multiple machines. - Apache MapReduce is a program for processing
large data sets with a parallel distributed
algorithm on a clusters. - Apache Hive data warehouse in distributed storage
facilitating data summarization, queries and
managing large datasets. - Apache Pig is an engine for executing data flows
in parallel on Apache Hadoop. - Apache HBase Non-relational distributed database
performing real-time operations in large tables.
25Hadoop Ecosystem
- Apache Flume is an Unstructured data aggregator
to HDFS. - Apache Sqoop is a system for transferring bulk
data between HDFS and relational databases. - Apache Oozie is a workflow scheduler system to
manage Apache Hadoop jobs. - Apache Zookeeper is a coordinator with tools to
write correct distributed applications. - Apache Avro is a framework for modelling,
serializing and making Remote Procedure Calls.
26Q A
27(No Transcript)