Big Data for Enterprise: Managing Data and Values - PowerPoint PPT Presentation

About This Presentation
Title:

Big Data for Enterprise: Managing Data and Values

Description:

Summary Data management is a pain-staking task for the organizations. A range of disciplines are applied for effective data management that may include governance, data modelling, data engineering, and analytics. To lead a data and big data analytics domain, proficiency in big data and its principles of data management need to be understood thoroughly. Register here to watch the recorded session of the webinar: Webinar Agenda: * How to manage data efficiently Database Administration and the DBA Database Development and the DAO Governance - Data Quality and Compliance Data Integration Development and the ETL * How to generate business value from data Big Data Data Engineering Business Intelligence Exploratory and Statistical Data Analytics Predictive Analytics Data Visualization – PowerPoint PPT presentation

Number of Views:547

less

Transcript and Presenter's Notes

Title: Big Data for Enterprise: Managing Data and Values


1
Big Data for Enterprise Managing Data and Values
Tarun Sukhani NetCom Learning
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
2
Agenda
  • Data Information
  • What is a Database Management System?
  • File Management Systems
  • Distribution Strategies for Databases
  • Data Management Framework
  • Key Supporting Data Management Components to Big
    Data
  • Data Governance Council Roles and
    Responsibilities
  • ETL
  • Data Cleansing
  • Overview of Big Data and Analytics
  • Data Lake
  • Hadoop Its Role
  • IoT and real-time data
  • Modern Data Warehouse

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
3
Data and Information
DATA Facts concerning people, objects, vents or
other entities. Databases store
data. INFORMATION Data presented in a form
suitable for interpretation. Data is converted
into information by programs and queries. Data
may be stored in files or in databases. Neither
one stores information. KNOWLEDGE Insights into
appropriate actions based on interpreted data.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
4
Using a DBMS
  • Data
  • Database Design
  • Metadata
  • DBMS Engine Access
  • Direct access
  • Host language
  • Data Management

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
5
DATABASE A shared collection of interrelated
data designed to meet the varied information
needs of an organization.
Basic Principles
DATABASE MANAGEMENT SYSTEM A collection of
programs to create and maintain a
database. Define Construct Manipulate
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
6
Advantages of Database Processing
  • More information from
  • same data
  • Shared data
  • Balancing conflicts among users
  • Controlled redundancy
  • Consistency
  • Integrity
  • Security
  • Increased productivity
  • Data independence

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
7
Disadvantages of Database Processing
  • Increased size
  • Increased complexity
  • More expensive personnel
  • Increased impact of failure
  • Difficulty of recovery
  • Cost
  • Especially server and mainframe systems

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
8
Objectives of the DBMS Approach
  • SELF-DESCRIBING
  • DATA INDEPENDENCE
  • MULTIPLE VIEWS
  • MULTIPLE USERS

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
9
What is a Database Management System?
Data Files Directory Access Engine Utility
Programs
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
10
Database
DATA
METADATA METADATA
ACCESS ENGINE ACCESS ENGINE ACCESS ENGINE
UTILITIES UTILITIES UTILITIES UTILITIES
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
11
Metadata
Files and Databases
Data about data Description of fields Display
and format instructions Structure of files and
tables Security and access rules Triggers and
operational rules
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
12
History of Database Management
  • File Management Systems
  • Hierarchical Model
  • IBM Information Management System (IMS) 1966
  • Network Model
  • Charles Bachmans Integraded Data Store (IDS)
    1965
  • Conference on Data Systems Languages /DataBase
    Task Group CODASYL/DBTG (1971)
  • Relational Model
  • E.F. Codd, 1970

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
13
File Management Systems
Provided facilities to extract data and share
files, but did not implement any way to connect
records in one file to those in another.
Relationships had to be implemented in
application code.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
14
Database vs File Systems
FILE SYSTEM
Meta-Data
Program 1
Data
Meta-Data
Program 2
Program 3 Meta-Data
DATABASE
Program 1
Meta- Data
Data
Program 2
Program 3
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
15
Structured Databases
Relationships were implemented by physical
pointers (called sets) which allowed records
to be connected in different files. Hierarchical
databases allow only one parent set networks
allow several. These permit efficient processing
but the sets must be constructed on data entry
and cannot be rearranged later.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
16
Relational Models
Relational models implement relationships with
matched data values in related files (called
primary and foreign keys). Any attributes can be
matched. The connection is established at
retrieval so interconnections can be developed as
needed.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
17
Hierarchy
SECTION
STUDENT
INSTRUCTOR
COLLEGE
COLLEGE
Each file can have only one parent. To implement
a second parent (COLLEGE) we have to implement
a shadow copy.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
18
Network
SECTION
STUDENT
INSTRUCTOR
COLLEGE
Each file can have several parents. Both SECTION
and COLLEGE are parent files..
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
19
Relational
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
20
Relational Terminology
  • Entity
  • Person, place, thing or event about which we wish
    to keep data
  • Attribute
  • property of an entity
  • Relationship
  • an association among entities (entity records)

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
21
Distribution Strategies for Databases
Centralized Data and Processing Dumb terminal
with "screen scraping". Intelligent Terminal
Data and processing centralized data preparation
and display on remote devices. Distributed
Logic Data storage distributed processed at
the optimal location. A version of parallel
processing. Client Server Data (usually
departmental) maintained on a server. Sub
setting occurs on the server, processing
on client machines. Distributed Database Data
distributed among different locations
processing access data wherever it is
located. Data may be replicated or partitioned.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
22
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
23
Data Management
  • Designing and managing information in a data base
    environment requires
  • Understanding the principles of data modeling in
    system design.
  • Using SQL for data manipulation.
  • Understanding the concepts of managing data in a
    database environment.

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
24
Information System Modeling Approaches
PROCESS MODELING The traditional method of
designing systems by following the changes to
data flows. DATA MODELING An approach to system
development that specifies the file structure
that conforms to the things important to the
organization. PROTOTYPING An iterative approach
that focuses on building small operating OBJECT
MODELING (Event driven design) Defines objects
that contain data and associated processing rules
encapsulated together.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
25
Data Management Framework
  • Holistic approach to understand the information
    needs of the enterprise its stakeholders
  • Consistency for planning process development
  • 10 major functional areas, including
  • governance
  • Aligns data with business strategy (above) and
    technology (below)
  • Takes into account the data lifecycle creation
    through destruction
  • Internationally recognized through Data
    Management Association International (DAMA)

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
26
Key Supporting Data Management Components to Big
Data
  • Data Governance Exercise of authority and
    controls over the management of data assets.
    Policies, processes, standards, definitions,
    metrics.
  • Councils, stewards, trustees roles and
    responsibilities defined
  • Data Architecture - Defines data requirements,
    guides integration and control of data assets,
    aligns data investments with business strategy.
    Part of an overall enterprise architecture
    framework
  • Enterprise data models, definitions, and
    taxonomies Enterprise data delivery
  • Master Data Management Control over master data
    values to enable consistent, contextual use
    across systems of the most accurate, timely and
    relevant version of truth about essential
    business entities.
  • Meta Data Management Descriptive tags about
    data, concepts, and connections between data and
    concepts.
  • Business, technical, process, and stewardship
  • Data Security Planning, development, and
    execution of security policies and
  • procedures to provide proper authentication,
    authorization

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
27
Key Questions to Drive Business Value from Data
  • What business opportunity/problem are we trying
    to solve?
  • What questions do we need to answer to solve the
    problem?
  • What data do we need to answer the questions?
  • What data do we have?
  • How can data help differentiate us in the market?
  • What data is IP for us? Revenue generating for us?
  • How do we integrate the right data together?
  • How do we manage the quality of the data?
  • What data does this relate to (master data)?
  • Do we have all the data about this (person,
    event, thing, etc.)?
  • What are the permissible purposes of the data?
    (compliance, regulatory environment)
  • Who is allowed to access the data? Use this data?

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
28
Data Management Maturity in a Social Business
Partial Source Social Business by Design, Dion
Hinchcliffe
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
29
Data Governance Council Roles and Responsibilities
DGC DGP
Task Forces Tiger Teams
Lines-of-Business
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
30
Data Governance Operational View
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
31
ETL
The process of updating the data warehouse.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
32
Two Data Warehousing Strategies
  • Enterprise-wide warehouse, top down, the Inmon
    methodology
  • Data mart, bottom up, the Kimball methodology
  • When properly executed, both result in an
    enterprise-wide data warehouse

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
33
The Data Mart Strategy
  • The most common approach
  • Begins with a single mart and architected marts
    are added over time for more subject areas
  • Relatively inexpensive and easy to implement
  • Can be used as a proof of concept for data
    warehousing
  • Can perpetuate the silos of information problem
  • Can postpone difficult decisions and activities
  • Requires an overall integration plan

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
34
The Enterprise-wide Strategy
  • A comprehensive warehouse is built initially
  • An initial dependent data mart is built using a
    subset of the data in the warehouse
  • Additional data marts are built using subsets of
    the data in the warehouse
  • Like all complex projects, it is expensive, time
    consuming, and prone to failure
  • When successful, it results in an integrated,
    scalable warehouse

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
35
Data Sources and Types
  • Primarily from legacy, operational systems
  • Almost exclusively numerical data at the present
    time
  • External data may be included, often purchased
    from third-party sources
  • Technology exists for storing unstructured data
    and expect this to become more important over
    time

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
36
Extraction, Transformation, and Loading (ETL)
Processes
  • The plumbing work of data warehousing
  • Data are moved from source to target data bases
  • A very costly, time consuming part of data
    warehousing

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
37
Recent Development More Frequent Updates
  • Updates can be done in bulk and trickle modes
  • Business requirements, such as trading partner
    access to a Web site, requires current data
  • For international firms, there is no good time to
    load the warehouse

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
38
Recent Development Clickstream Data
  • Results from clicks at web sites
  • A dialog manager handles user interactions. An
    ODS (operational data store in the data staging
    area) helps to custom tailor the dialog
  • The clickstream data is filtered and parsed and
    sent to a data warehouse where it is analyzed
  • Software is available to analyze the clickstream
    data

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
39
Data Extraction
  • Often performed by COBOL routines
  • (not recommended because of high program
    maintenance and no automatically generated meta
    data)
  • Sometimes source data is copied to the target
    database using the replication capabilities of
    standard RDMS (not recommended because of dirty
    data in the source systems)
  • Increasing performed by specialized ETL software

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
40
Sample ETL Tools
  • Teradata Warehouse Builder from Teradata
  • DataStage from Ascential Software
  • SAS System from SAS Institute
  • Power Mart/Power Center from Informatica
  • Sagent Solution from Sagent Software
  • Hummingbird Genio Suite from Hummingbird
    Communications

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
41
Reasons for Dirty Data
  • Dummy Values
  • Absence of Data
  • Multipurpose Fields
  • Cryptic Data
  • Contradicting Data
  • Inappropriate Use of Address Lines
  • Violation of Business Rules
  • Reused Primary Keys,
  • Non-Unique Identifiers
  • Data Integration Problems

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
42
Data Cleansing
  • Source systems contain dirty data that must be
    cleansed
  • ETL software contains rudimentary data cleansing
    capabilities
  • Specialized data cleansing software is often
    used. Important for performing name and address
    correction and householding functions
  • Leading data cleansing vendors include Vality
    (Integrity), Harte-Hanks (Trillium), and
    Firstlogic (i.d.Centric)

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
43
  • Steps in Data Cleansing
  • Parsing
  • Correcting
  • Standardizing
  • Matching
  • Consolidating

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
44
Parsing
  • Parsing locates and identifies individual data
    elements in the source files and then isolates
    these data elements in the target files.
  • Examples include parsing the first, middle, and
    last name street number and street name and
    city and state.

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
45
Correcting
  • Corrects parsed individual data components using
    sophisticated data algorithms and secondary data
    sources.
  • Example include replacing a vanity address and
    adding a zip code.

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
46
Standardizing
  • Standardizing applies conversion routines to
    transform data into its preferred (and
  • consistent) format using both standard and custom
    business rules.
  • Examples include adding a pre name, replacing a
    nickname, and using a preferred street name.

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
47
Matching
  • Searching and matching records within and across
    the parsed, corrected and
  • standardized data based on predefined business
    rules to eliminate duplications.
  • Examples include identifying similar names and
    addresses.

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
48
Consolidating
  • Analyzing and identifying relationships between
    matched records and
  • consolidating/merging them into ONE
    representation.

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
49
Data Staging
  • Often used as an interim step between data
    extraction and later steps
  • Accumulates data from asynchronous sources using
    native interfaces, flat files, FTP sessions, or
    other processes
  • At a predefined cutoff time, data in the staging
    file is transformed and loaded to the warehouse
  • There is usually no end user access to the
    staging file
  • An operational data store may be used for data
    staging

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
50
Data Transformation
  • Transforms the data in accordance with the
    business rules and standards that have been
    established
  • Example include format changes, deduplication,
    splitting up fields, replacement of codes,
    derived values, and aggregates

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
51
Data Loading
  • Data are physically moved to the data warehouse
  • The loading takes place within a load window
  • The trend is to near real time updates of the
    data warehouse as the warehouse is increasingly
    used for operational applications

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
52
Meta Data
  • Data about data
  • Needed by both information technology personnel
    and users
  • IT personnel need to know data sources and
    targets database, table and column names
    refresh schedules data usage measures etc.
  • Users need to know entity/attribute definitions
    reports/query tools available report
    distribution information help desk contact
    information, etc.

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
53
Recent Development Meta Data Integration
  • A growing realization that meta data is critical
    to data warehousing success
  • Progress is being made on getting vendors to
    agree on standards and to incorporate the
    sharing of meta data among their tools
  • Vendors like Microsoft, Computer Associates, and
    Oracle have entered the meta data marketplace
    with significant product offerings

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
54
What differentiates todays thriving
organizations?
Overview of Big Data and Analytics
Data.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
55
What is Big Data, really?
Data in all forms sizes is being generated
faster than ever before
Capture combine it for new insights better,
faster decisions
11
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
56
Collect any data
Harness the growing and changing nature of data
Structured Unstructured Streaming

Challenge is combining transactional data stored
in relational databases with less structured
data Big Data All Data Get the right
information to the right people at the right time
in the right format
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
57
An illustration of the velocity of data created
Kalakota, R. (2012, October 22). Sizing Mobile
Social Big Data Stats. Retrieved from
http//practicalanalytics.wordpress.com/
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
58
The three Vs
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
59
Technology innovation accelerates value
Value
Machine learning
In-memory Any data
Operational reporting
Internet of Things
Dashboards
Ad hoc analysis
OLAP
ETL
Hadoop
Transactional systems
Enterprise data warehouse
Complex implementations
Spreadmarts
Siloed data
Innovation
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
60
Discover and connect
Answering new questions
Value
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
61
Put data to work for everyone in your organization
Inspire innovation Accelerate decision-making
Learn from share insights
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
62
Embrace Big Data across your business
Marketing Build deeper customer relationships
Finance Impact your companys bottom line
Sales Improve revenue performance
HR Maximize employee engagement
XT2000 Status List Show Only Problems
Units Sold, Discounts, and Profit before Tax
Revenue and Target by Region
Departments Headcount
Sales RD Marketing IT Human Resources Finance Cu
stomer Support Administration Accounting
15
Indicator
2M
Status
Product A
Preliminary Budget Materials and Packaging Review
Discounts (Millions)
1.5M
10
(Thousands )
Product D
Product C
on South
Regi Targ Highl
1M
et 13450 ighted
Book Advertising Slots
5
4900
Fall Showcase Event Analysis
0.5M
Product F
End User Survey
0
0M Product G 50K 60K 70K 80K
North
South
Technical Review Milestone
90K 100K 110
Revenue
Target
0
5
10
15
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
63
The Data Divide
0.5 being analyzed
3 prepared for analysis
70 80
of data of data
generated by stored
customers
lt0.5 being operationalized
IDC says that right now, about 22 of data is
useful. By 2020 that number will climb to 37.
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
64
Major Fail
Gartner Through 2017, 60 of big-data projects
will fail to go beyond piloting and
experimentation Paradigm4 76 of those who have
used Hadoop or Apache Spark complained of
significant limitations
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
65
Analytics Solution
Capture and integrate data from multiple
internal and external sources
Derive insight from data with rich, interactive
dashboards and reports using the tools you know
Put insight into action to increase
efficiency and constituent satisfaction
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
66
Advanced Analytics Defined
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
67
Analytics Example
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
68
The end result of Big Data - Icing on the cake
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
69
Use Cases
Data Analytics is neededeverywhere
Legal discovery and document archiving
IT infrastructure Web App optimization
Intelligence Gathering
Recommenda- tion engines
Social network analysis
Traffic flow optimization
Weather forecasting for business planning
Location-based tracking services
Oil Gas exploration
Churn analysis
Healthcare outcomes
Personalized Insurance
Smart meter monitoring
Equipment monitoring
Advertising analysis
Life sciences research
Fraud detection
Pricing Analysis
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
70
Personalized Insurance
Personalized policies can reduce costs
better meet customer needs
Insurance companies can help (and some have
already started helping) their customers with
truly personalized insurance plans tailored to
their needs and risks
Insurance Companies can collect real-time data
from in- car sensors and combine it with
geolocation and in-house systems. With
information such as distance and speed, provide
personalized insurance offers based on driving
amount, risk, and other factors, for a truly
personalized plan that may often save drivers
money
1,600/yr. US national avg. car insurance premium
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
71
Recommendation Engines
Significantly improve up-sell and cross-sell
oppor tunities
Retailers can use customer purchase rating
information to serve recommendations to current
customers, based on similarities across many
dimensions
The vast amount of current and ever-growing
customer purchase, rating and click data can all
be collected and managed with an Hadoop-based
solution, to pinpoint preferences based on
purchase history and demographics, and be able
to serve useful and compelling cross-sell and
up-sell recommendations.
158 Items sold/second by Amazon.com on
11/29/2010 (Cyber Monday)
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
72
Pricing Analysis
Significantly improve sales and customer
satisfaction
Retailers can use customer past purchase,
preference, and demo- graphic information to
serve real- time custom pricing, instant
discounts when near the store.
Retailers whether large, small, online or
in-store can improve margins with more
detailed pricing analysis. When a customer is in
range of a transaction (either in the store,
online or perhaps passing by), offer
personalized offers, real-time price quotes, or
other frequent-buyer perks to help bring more
customers to the store and improve repeat
business.
up to 30 Additional price Mac users accepted
for travel from Orbitz
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
73
Using Big data to determine the best train
schedules
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
74
Data Lake
  • What is a datalake?
  • A storage repository, usually Hadoop, that
    holds a vast amount of raw data in its
    native format until it is needed.
  • A place to store unlimited amounts of data in any
    format inexpensively
  • Allows collection of data that you may or may not
    use later just in case
  • A way to describe any large data pool in which
    the schema and data requirements are not defined
    until the data is queried just in time or
    schema on read
  • Complements EDW and can be seen as a data source
    for the EDW capturing all data but only
    passing relevant data to the EDW
  • Frees up expensive EDW resources (storage and
    processing), especially for data refinement
  • Allows for data exploration to be performed
    without waiting for the EDW team to model and
    load the data
  • Some processing in better done on Hadoop than ETL
    tools like SSIS
  • Also called bit bucket, staging area, landing
    zone or enterprise data hub (Cloudera)

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
75
Traditional Approaches
MONITORING AND TELEMETRY
Current stateofadatawarehouse
DATA WAREHOUSE
ETL
DATA SOURCES
BI AND ANALYTCIS

Star schemas, views other read- optimized
structures
Emailed, centrally stored Excel reports and
dashboards


OLTP
ERP CRM LOB
Well manicured, often relational sources Known
and expected data volume and formats Little to
no change
Flat, canned or multi-dimensional access to
historical data Many reports, multiple versions
of the truth 24 to 48h delay
Complex, rigid transformations Required
extensive monitoring Transformed historical into
read structures
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
76
Traditional Approaches
Current state of a data warehouse
MONITORING AND TELEMETRY
DATA WAREHOUSE
ETL
DATA SOURCES
BI AND ANALYTCIS

Star schemas,
views other read- optimized structures
Emailed, centrally stored Excel reports and
dashboards


OLTP
ERP CRM LOB
STALE REPORTING
INCREASE IN TIME
INCREASING DATA VOLUME
NON-RELATIONAL DATA
Complex, rigid transformations cant longer keep
pace
Reports become invalid or unusable Delay in
preserved reports increases Users begin to
innovate to relieve starvation
Increase in variety of data sources Increase in
data volume Increase in types of data
Monitoring is abandoned Delay in data, inability
to transform volumes, or react to new
sources Repair, adjust and redesign ETL
Pressure on the ingestion engine
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
77
New Approaches
DATA WAREHOUSE
BI AND ANALYTCIS
Star schemas,
Discover and
views other read- optimized structures
consume predictive analytics, data sets and
other reports
Data Lake Transformation (ELT not ETL)
DATA SOURCES
DATA REFINERY PROCESS (TRANSFORM ON READ)
DATA LAKE
EXTRACT AND LOAD
Transform relevant data into data sets
OLTP
ERP CRM LOB
FUTURE DATA SOURCES
OTHER REFINERY PROCESSES
NON-RELATIONAL DATA
Extract and load, no/minimal transform Storage of
data in near-native format Orchestration becomes
possible
All data sources are considered Leverages the
power of on-prem technologies and the cloud for
storage and capture
Refineries transform data on read Produce
curated data sets to integrate with traditional
warehouses Users discover published data
sets/services using familiar tools
Streaming data accommodation becomes possible
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266 1998-2018 NetCom Learning
Native formats, streaming data, big data
78
Hadoop and its role
What is Hadoop? ? Distributed, scalable system on
commodity HW ? Composed of a few parts ? HDFS
Distributed file system ? MapReduce Programming
model ? Other tools Hive, Pig, SQOOP, HCatalog,
HBase, Flume, Mahout, YARN, Tez, Spark, Stinger,
Oozie, ZooKeeper, Flume, Storm ? Main players
are Hortonworks, Cloudera, MapR ? WARNING
Hadoop, while ideal for processing huge volumes
of data, is inadequate for analyzing that data
in real time (companies do batch
analytics instead)
DATA SERVICES
OPERATIONAL SERVICES
AMBARI OOZIE FALCON
FLUME
HIVE HCATALOG
PIG
SQOOP
HBASE
LOAD
EXTRACT
MAP REDUCE
NFS
YARN
Core Services
WebHDFS
HDFS
Hadoop Cluster compute
. . .
. . .
. .
storage
. .
compute storage
Hadoop clusters provide scale-out storage and
distributed data processing on commodity hardware
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
79
Hortonworks Data Platform 2.2
Simply put, Hortonworks ties all the open source
products together (20)
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
80
The real cost of Hadoop
http//www.wintercorp.com/tcod-report/
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
81
Use cases using Hadoop and a DW in combination
Bringing islands ofHadoopdatatogether
  • Archiving data warehouse data to Hadoop (move)
  • (Hadoop as cold storage)
  • Exporting relational data to Hadoop
    (copy) (Hadoop as backup/DR, analysis, cloud
    use)
  • Importing Hadoop data into data warehouse
    (copy) (Hadoop as staging area, sandbox, Data
    Lake)

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
82
What is the Internet of Things?
IoT and real-time data
Connectivity
Data
Analytics
Things
IoT sensor-acquired data
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
83
What is the Internet of Things (IoT)?
  • Internet-connected devices that can perceive the
    environment in some way, share their data, and
    communicate with you. IoT is just a catch-all
    term for ways of using machine-generated data to
    create something useful.
  • Has it one processor and sensor to collect
    information
  • Examples heart monitoring implants, biochip
    transponders on farm animals, automobiles with
    build-in sensors, field operation devices that
    assist firefighters in search and rescue
  • Excludes computers, tablets, and smart phones
  • But really, its in the sphere of business
    intelligence that IoT will really make a
    difference.
  • Cool possibilities
  • When a milk carton is almost empty it will ping
    you when you are near a store
  • An alarm clock that signals your coffee maker to
    start brewing when you wake up
  • An embedded chip that monitors your vital signs
    and notifies a medical provider if exceeds limit
  • Gartner 10 billion devices connected to the
    internet today, 26B by 2020
  • At some point in the future, nearly every manmade
    object will contain a device that transmits data!

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
84
Modern Data Warehouse
  • Think about future needs
  • Increasing data volumes
  • Real-time performance
  • New data sources and types
  • Cloud-born data
  • Multi-platform solution
  • Hybrid architecture

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
85
Modern Data Warehouse Defined
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
86
Modern Data Warehouse
The Dream
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
87
The Reality
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
88
Federated Querying
Other names Data virtualization, logical data
warehouse, data federation, virtual database,
and decentralized data warehouse.
A model that allows a single query to retrieve
and combine data as it sits from multiple data
sources, so as to not need to use ETL or learn
more than one retrieval technology
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
89
Federated Querying
Select
Result set
EDW
SQL Server DB2 Oracle MongoDB
Relational Data
Windows Azure HDInsight Cloudera CHD Linux
Hortonworks HDP
Query Model
Non- Relational Data
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
90
DW and the Cloud
Can I use the cloud with my DW?
  • Public and private cloud
  • Cloud-born data vs on-prem born data
  • Transfer cost from/to cloud and on-prem
  • Sensitive data on-prem, non-sensitive in cloud
  • Look at hybrid solutions

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
91
TDWI Best Practices Report (2015)
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
92
SMP vs MPP
  • Multiple CPUs used to complete individual
    processes simultaneously
  • All CPUs share the same memory, disks, and
    network controllers (scale-up)
  • All SQL Server implementations up until now have
    been SMP
  • Mostly, the solution is housed on a shared SAN
  • SMP - Symmetric

SMP
Multiprocessing
  • MPPM- PMPassively Parallel Processing
  • Uses many separate CPUs running in parallel to
    execute a single program
  • Shared Nothing Each CPU has its own memory and
    disk (scale-out)
  • Segments communicate using high-speed network
    between nodes

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
93
DW SCALABILITY SPIDER CHART
Data Volume
MPP Multidimensional Scalability SMP
Tunable in one dimension on cost of other
dimensions
Mixed Workload
5 PB
Query Concurrency
500 TB
Strategic, Tactical Loads, SLA
100 TB
The spiderweb depicts important attributes to
consider when evaluating Data Warehousing
options.
10.000
Strategic, Tactical Loads
50 TB
10 TB
1.000
Strategic, Tactical
100
Strategic
Big Data support is newest dimension.
Data Freshness
Query complexity
3-5 Way Joins
Near Real Time Data Feeds
Weekly Load
Daily Load
  • Joins
  • OLAP operations
  • Aggregation
  • Complex Where
  • constraints
  • Views
  • Parallelism

5-10 Way Joins
Simple Star
Batch Reporting, Repetitive Queries
Multiple, MBs Integrated Stars Normalized
Ad Hoc Queries Data Analysis/Mining
Multiple, Integrated Stars and Normalized
GBs
Query Freedom
Schema Sophistication
TBs
Query Data Volume
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
94
Recorded Webinar Video
To watch the recorded webinar video for live
demos, please access the link https//goo.gl/rPrj
Zf
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
95
About NetCom Learning
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
96
Recommended Courses Marketing Assets
  • Courses
  • 20778 Analyzing Data with Power BI - Class
    scheduled on Jan 7
  • 20775 Performing Data Engineering on Microsoft
    HD Insight - Class scheduled on Jan 14
  • Tableau Desktop Level 1 Introduction - Class
    scheduled on Jan 21
  • GL660 - Hadoop For Systems Administrators -
    Class scheduled on Feb 4
  • Marketing Assets
  • Blog - Top AI, Big Data, Analytics Trends to
    Follow in 2018
  • Whitepaper - Curtailing the Talent Gap in Data
    Science

www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
97
Top Reasons to Master Agile Scrum and its
Benefits Clean Architecture Patterns,
Practices, and Principles CEH Understanding
Ethical Hacking SQL Server 2017 Application
Development Best Practices
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
98
Promotions
The year 2018 is coming to an end, though
learning is a continuous process! Build yours,
or teams, or departments skills with the best
training courses of 2018-19. With a range of
Cloud, Security, Networking, Data AI, Design
Multimedia, Business Application, Application
Development and Business Process training at
limited-time prices, you can imbibe in-demand
skills while making a huge saving on the training
cost. Learn More
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
99
Follow Us On
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
100
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
101
THANK YOU !!!
www.netcomlearning.com info_at_netcomlearning.com
(888) 563 8266
Write a Comment
User Comments (0)
About PowerShow.com