Innovations in Database Technology

1 / 31
About This Presentation
Title:

Innovations in Database Technology

Description:

... not implemented within Infobright include BLOB, ENUM, SET and Auto Increment. ... Warranty and Indemnification. No. Included. INSERT/UPDATE/DELETE. No ... – PowerPoint PPT presentation

Number of Views:225
Avg rating:3.0/5.0
Slides: 32
Provided by: jode9

less

Transcript and Presenter's Notes

Title: Innovations in Database Technology


1
Innovations in Database Technology
IRMAC BI/DW SIG May 28, 2009
2
Agenda
  • About Infobright
  • Data Warehousing Challenge
  • Use Cases
  • Infobright Approach
  • Infobright Architecture
  • Infobright Versions System Requirements

3
  • About Infobright

4
About Infobright
5
  • Data Warehousing Challenge

6
Data Warehousing Challenges
.
More Kinds of Output Needed by More Users, More
Quickly
More Data, More Data Sources
Limited Resources and Budget
0101010101010101010101010101
Real time data
10
0101010101010101010101010101
10
101
Multiple databases
0101010101010101010101010
0101010101010101010101010
10
10
01
10
External Sources
101
0101010101010101010101
1
0101010101010101010101
0
1
10
10
10
10
0
101
01
01
10
10
1
0
01
1
1
01
0
0
1
1
10
10
0
0
101
10
01
10
1
01
1
01
10
10
1
0
0
01
10
0
10
101
1
101
101
10
101
1
1010
10
1
010
01
0
1
0
1
010
1010
1
0
1
10
0
01
0
1
0
01
0
0101
10
101
1
01
0
101
0
0
101
0
  • Labor intensive, heavy indexing and partitioning
  • Hardware intensive massive storage big servers
  • Expensive and complex

Traditional Data Warehousing
7
Data Warehousing Raising The Bar
New Demands Larger transaction volumes driven
by the internet Impact of Cloud Computing More -gt
Faster -gt Cheaper
Data Warehousing Matures Near real time
updates Integration with master data
management Data mining using discrete business
transactions Provision of data for business
critical applications
Early Data Warehouse Characteristics Integration
of internal systems Monthly and weekly
loads Heavy use of aggregates
8
  • Use Cases

9
Use Cases
Infobright is a good fit for
  • Loading millions of transactions with a limited
    batch window
  • Summarizing transactional data for trend analysis
  • Extracting transactional detail based on specific
    constraints
  • Ad hoc query support across many dimensional
    attributes

Avoid using Infobright for
  • Real-time transactional updates (operational data
    entry)
  • Full data extracts (select from )
  • Row based operations that need to access all
    columns of a table are typically better suited to
    row based databases

10
Customer Experience Load Speed
Business Requirement
  • Mavenir - OEM customer deploying a world wide
    telco application
  • Application provides operators with access to
    detailed SMS traffic
  • Needed a low cost solution with the ability to
    load 20K records per second
  • Peak of 70M messages per hour during Chinese New
    year

Solution
  • Custom front end developed using MySQL JDBC
    driver
  • Completed design, test, deployment in lt 3 months
    with no assistance from Infobright
  • Allowed for expansion from 7 to 90 days of online
    SMS history
  • Supports plan for 70 annual growth
  • Rollout to allow for 120 concurrent users

11
Customer Experience Query Performance
Business Requirement
  • Sulake - Online Social Networking service with
    126M users across 31 countries
  • 990M page impressions per month
  • Need to quickly analyze online spend on a daily
    basis to enhance online experience and drive
    additional revenue
  • Existing InnoDB solution was able to process
    business queries in a reasonable time frame
    (queries taking hours to complete)
  • Business opportunities were being lost due to
    inability to analyze subscriber behavior using
    transactions

Solution
  • Customer used existing data model and deployed
    the application using Business Objects Data
    Integrator for ETL, Web-Intelligence for BI
  • Existing ETL workflows were converted to
    Infobright in less than 4 weeks without
    assistance
  • Historically long running queries (hours) now
    running in minutes and seconds
  • Additional benefits due to compression were a
    reduced need for disk storage and an overall
    reduction in I/O and network traffic

12
Customer Experience - TCO
Business Requirement
  • A global provider of electronic trading solutions
    across 22 time zones and 700 financial exchanges
  • Wanted to expand analytical access to financial
    transactions to include both current (30 days)
    and archived transactions (4 years)
  • Expansion of existing Sybase solution was too
    costly

Solution
  • Infobright was able to achieve performance
    benchmarks within the first 3 days of a proof of
    concept using production data
  • 28,000 records per second load speed
  • Join 100M row with a 30Mrow table -gt 400k rows,
    returned in 185 seconds
  • Additional queries that did not complete using
    Sybase, finished in minutes using Infobright
  • Final solution deployed using Pentaho Kettle for
    ETL and Crystal Reports for BI
  • Success with modest data size (150GB) has opened
    opportunities for additional more detailed
    transactional analysis

13
Customer Experience Query Performance and TCO
Business Requirement
  • TradeDoubler Based in Sweden, a global digital
    marketing company, serving 1600 online
    advertisers across Europe and Asia.
  • TradeDoubler optimizes Web marketing campaigns by
    analyzing Web clicks, impressions and purchases.
  • Analyzing terabytes of data about the results of
    its programs is central to the companys success.
  • Selected Infobright to produce analytical results
    rapidly, seamless interoperability with their
    MySQL database and low TCO

Solution
  • Deployed solution using a single, 12,500 Dell
    server with 8 CPU cores and 16 GB RAM
  • Used Pentaho Kettle for ETL and Jaspersoft Server
    Pro Reports for BI
  • Needed to process and analyze data 20 billion
    online transactions/month
  • In POC, loaded gt 3.2 billion rows at gt 300,000
    rows / second
  • In production, achieved 30x data compression
  • Extremely fast query speed. 3 queries that
    previously did not return, now returned within a
    minute

14
  • Infobright Approach

15
Introducing Infobright
Column Orientation
  • Smarter architecture
  • Load data and go
  • No indices or partitions to build and maintain
  • Knowledge Grid automatically updated as data
    packs are created or updated
  • Super-compact data foot- print can leverage
    off-the-shelf hardware

Knowledge Grid statistics and metadata
describing the super-compressed data
Data Packs data stored in manageably sized,
highly compressed data packs Data compressed
using algorithms tailored to data type
16
Column vs. Row-Oriented
17
Data Packs and Compression
  • Data Packs
  • Each data pack contains 65, 536 data values
  • Compression is applied to each individual data
    pack
  • The compression algorithm varies depending on
    data type and data distribution

64K
64K
64K
64K
18
Knowledge Grid
This metadata layer 1 of the compressed volume
19
A Simple Query using the Knowledge Grid
SELECT count() FROM employees WHERE salary gt
50000 AND age lt 65 AND job Shipping AND
city TORONTO
20
A Join Query using the Knowledge Grid
Car Sales
SELECT MIN(sale), MAX(discount), name FROM
carsales, salesperson WHERE carsales.id
salesperson.id AND carsales.prov ON AND
carsales.date 2008-02-29 GROUP BY name
  • Any subsequent queries will be able to use the
    PPN to resolve joins between Car Sales and Sales
    Person

21
  • Infobright Architecture

22
Infobright Embedded With MySQL
MySQL/Infobright Architecture
CONNECTORS Native C API, JDBC, ODBC, .NET, PHP,
Python, Perl, Ruby, VB
Infobright Loader / Unloader
CONNECTION POOL Authentication, Thread Reuse,
Connection Limits, Check Memory, Caches
Management Services Utilities
Infobright ships with the full MySQL binaries.
The MySQL architecture is used to support
database components such as connectors, security
and memory management.
Infobright Optimizer and Executor
SQL Interface
MySQL Loader
Parser
Caches Buffers
My SQL Optimizer
  • MyISAM
  • Views
  • Users
  • Permissions
  • Tables Defs

23
Optimized SQL for Infobright
MySQL
The Infobright Optimizer supports a large amount
of MySQL syntax and functions. When the optimizer
encounters SQL syntax that is not supported, then
the query is executed using the MySQL optimizer.
  • Infobright Optimized SQL
  • Select Statements
  • Comparison Operators
  • Logical Operators
  • String Comparison Functions (LIKE, ..)
  • Aggregate Functions
  • Arithmetic Operators
  • Data Manipulation Language (I/U/D)
  • Data Definition Language (CREATE DROP)
  • String Functions
  • Date/Time Functions
  • Numeric Functions
  • Trigonometric Functions
  • Case Statements

24
Infobright Data Types
Date
Most of the data types expected for a MySQL
database engine are fully supported. The data
types that are currently not implemented within
Infobright include BLOB, ENUM, SET and Auto
Increment.
Numeric
String
25
ETL Integration
Leverage existing IT tools and resources for
fast, simple deployments and low TCO
  • Increased efficiency with popular platforms
  • Deeper ETL Integration
  • Jaspersoft, Talend, Pentaho
  • Leverages end-to-end data management provided by
    ETL tools
  • Improved support for Data Manipulation Language
    (DML)

26
Data Loading with without custom ETL connectors
  • Loading Infobright tables with custom connectors
  • Kettle from Pentaho
  • Talend ETL from Talend
  • Jaspersoft ETL (Talend) from Jaspersoft
  • Two ways to invoke Infobright loader without
    connectors
  • Generate a CSV or binary file and invoke the
    Infobright loader to load the file
  • Named pipe technique
  • Create a named pipe (i.e. mkfifo
    /home/mysql/s_mysession1.pipe)
  • Launch the Infobright loader in the background to
    read from the pipe
  • Launch the ETL process that writes data to the
    named pipe
  • When the ETL process runs, as records are written
    to the named pipe, the loader reads them and
    writes them to an Infobright database table
  •  

27
  • Infobright Versions System Requirements

28
Comparison of ICE and IEE
29
System Requirements
30
For More Information
Data Warehouse Evangelist Bob Newell Bob.Newell_at_in
fobright.com
Or join our open source community
at www.infobright.org
31
Query performance
Time in minutes, seconds, milliseconds
Write a Comment
User Comments (0)