Title: FAST Corporate Presentation
1Why Search Engines are used increasingly to
Offload Queries from Databases
Bjørn Olstad CTO FAST Search Transfer Adjunct
Prof. The Norwegian University of Science
Technology Email bjorn.olstad_at_fast.no Cell 47
48011157
2The Typo Problem...
3Talent Offloading ....
4The Web Search Experience
5The RDBMS Experience
High input barrier
You are viewing 5 random jobs out of 2461 jobs
in total....
6CareerBuilderUse scenario, part 1
30956 jobs
7CareerBuilderUse scenario, part 2
1084 jobs
8CareerBuilderUse scenario, part 3
30 jobs
9CareerBuilderUse scenario, part 4
5 jobs
30956 ? 5 targeted jobs in 3 steps
10Challenger Shuttle Launch
Fax to NASA from contractor with O-ring concern
11Presentation Matters
12IYP A Disruptive Change
Taylor or Gibson guitar? Good local
offers? Compare offerings Phone /
Directions BTW Im using my iPAQ
What is the phone numberto Wills Barber shop?
Product ServicesBlogs
Companyweb site
13ISVs A Disruptive Change
Siebel 2000
Siebel 2005
my CRM Application
my CRM Application
Information Access Layer
3rd party content
Search is a strategic enabler
Search is a tactical afterthought
14Revisit the Assumptions
2003 24B
2002 12B
Cave paintings,Bone tools 40,000 BCE
Writing 3500 BCE
2001 6B
0 C.E.
Paper 105
2000 3B
Printing 1450
Electricity, Telephone 1870
80 Unstructured
Transistor 1947
Computing 1950
Internet (DARPA) Late 1960s
The Web 1993
1999
15Extreme Capabilities?
- Feeding/streaming, transaction, retrieval or
analytics centric? - Content size M, L, VL, VVVL or Vn?8 L?
- Schema centric, Semi-structured XML, Text,
Agnostic? - Fuzzy Value vs. Binary Completeness?
- Discovery primitives?
- User interaction part of design target?
16Query LatencyRDBMS vs ESP
Test Data
- Structured data
- 5 million records
- 13 fields per record
- Structured queries
- 22 SQL queries( Representative in ERP )
17Query Per SecondRDBMS vs ESP
QPS
Identical HW single node, 2 CPU, 4GB ram 3 SCSI
disks Identical data auction data from eBay,
3.6 million docs Identical queries 200 queries
defined by Oracle
18Disruptive Change
Queries that fit The Model Queries that dont fit
The Model
Alternative I
Alternative II
- Star, snowflake schemas
- Cubes / datamarts ? Incremental fixes to
painful shortcomings? Adds complexity
- Schema agnostic
- Scalable ad-hoc querying
- BLOBS ? Contextual Insight
- Real-time fusion of disparate data models
- Massive fault tolerant scalability
19Extreme CapabilitiesESP Design Targets
Powering Search Derivative Applications (SDAs)
Game Changer driven by Extreme Retrival and
on-the-fly Analytics
20Database Query OffloadingExample AutoTrader.com
RDBMS
- HW-cost 320K (32CPU on 4 Sun servers)
- 90 sub-second query responseAverage 12 s for
the rest . - Relevance Sorting
- 5 FTE to maintain
ESP
- HW-cost 90K
- 100 sub-second query response
- Flexible relevance and discovery
- 0.5 FTE to maintain
Car Dealers - Product Supply
21Content ScalabilityRDBMS vs ESP
Examples of ESP deployments
- Compliance case
- 50B documents _at_ 80k average
- ? 4 PB (around 100 web indexes)
- Storage
- Intelligent content addressable storage
- XML metadata and full content
- EMC Centera N 256TB (N1..400)
- Webmining Webfountain
- 60.000 1 in query capacity (ESP DB)
22Intelligent StorageStorage and Search Unite
Discover
Simple
Scalable
Secure
23Contextual Search
Any new supiciousfinancial transactionpatterns?
Where is the emailfrom Peter aboutROI analysis?
FIND
EXPLORE
Contextual Relevance
Contextual Navigation
- Best of WebRecommender / Authority
- Best of EnterpriseLinguistic / Statistic
- Contextual fact discovery
- On-the-fly meta-dataanalysis
24Turning around the PyramidHBZ.de Leading
German Library Service Center
From
Librarians
To
Researchers
Single Field Search
Quering
FAST ESP
WWW (HTML, XML, WML, JavaScript)
SQL LIB
DB
DB
DB
DB
DB
STRUCTURED
25ESP _at_ SCOPUS
- gt200M articles / 180M citations
- 180TB capacity / 14000 journals
David Goodman standing up and declaring in
public, that Scopus is the best-designed database
he's ever seen
26Relevance Drives Revenue
Search Reduces Clicks to Purchase and Browsing
and Drives Revenue
- Reduced of clicks to buy content from gt 4 to lt
2 - 50 reduction in ringtone browsing
- 100 increase in search
- 20 increase in ringtone revenue
Launched search
Launched search
4.50
140
140
4.00
120
120
3.50
100
100
3.00
Search
page views per sale
2.50
80
80
Clicks to Purchase
2.00
60
60
1.50
40
40
1.00
Revenue
20
20
0.50
0.00
0
0
-20
-20
Week 1
Week 10
Week 1
Week 10
-40
-40
-60
-60
Browsing
27Business AnalyticsProcessing of real-time
streams
Example Norwegian Customs Foreign Exchange
Transaction Monitoring
SECURITY ACCESS MODULE
ACL Monitor
User Monitor
Real-time Registration
Queries
Message Queue
Results
Database connector
Alerts
Transaction Log
Data
Validation
Firewall
Firewall
28Technology Maturity...RDBMS vs ESP
29Business IntelligenceESP vs. RDBMS Technology
OBSERVATIONThe Enterprise Search Platform
(ESP), a relatively new concept, integrating
advanced technologies typically associated with
search engines, database tools, and analytical
systems, is fast becoming able to solve modern
business intelligence problems (using both
structured and unstructured data) in a way that
is fundamentally different from, and ultimately
superior to, that of other currently available
analytical or database software. PREDICTIONEnter
prise Search Platform and search centric
application technology represents a true paradigm
shift in the way data will be stored, analyzed
and reported on in the future. Resulting
realignments in the marketplace may be both rapid
and tumultuous.
- Chief strategist leading BI vendor
30If your only tool is a hammer ....
... every problem looks like a nail
31UIMA Architecture
32Text ? Structure
ltCategorygtFINANCIALlt/ Category gt
ltAuthorgtGeorge Steinlt/ Author gt
BC-dynegy-enron-offer-update5 Dynegy May Offer at
Least 8 Bln to Acquire Enron (Update5) By George
Stein SOURCEc.2001 Bloomberg News BODY
ltCompanygtDynegy Inclt/Companygt
ltPersongtRoger Hamiltonlt/Persongt
ltCompanygtJohn Hancock Advisers Inc. lt/Companygt
ltPersonPositionCompanygt ltOFFLEN OFFSET"3576"
LENGTH"63" /gt ltPersongtRoger
Hamiltonlt/Persongt ltPositiongtmoney
managerlt/Positiongt ltCompanygtJohn Hancock
Advisers Inc.lt/Companygt lt/PersonPositionCompanygt
. Dynegy has to act fast,'' said Roger
Hamilton, a money manager with John Hancock
Advisers Inc., which sold its Enron shares in
recent weeks. If Enron can't get financing and
its bonds go to junk, they lose counterparties
and their marvelous business vanishes.''
Moody's Investors Service lowered its rating on
Enron's bonds to Baa2'' and Standard Poor's
cut the debt to BBB.'' in the past two weeks.
Fact
ltCompanygtEnron Corplt/Companygt
ltCompanygtMoody's Investors Servicelt/Companygt
ltCreditRatinggt ltOFFLEN OFFSET"3814"
LENGTH"61" /gt ltCompany_SourcegtMoody's
Investors Servicelt/Company_Sourcegt
ltCompany_RatedgtEnron Corplt/Company_Ratedgt
ltTrendgtdowngradedlt/Trendgt ltRank_NewgtBaa2lt/Rank_
Newgt lt__Typegtbondslt/__Typegt lt/CreditRatinggt
Event
33The BI hammer Approach
Document Vector
Antiobiotics,Peptidyl,Eubacteria,RNA,Mg,
SVD Analysis
( ?1, ?2, ..., ?n )
?1, ?2, ..., ?n, Structured attributes
34Contextual RefinementETL and Semantic
understanding unite
Direct access to RDBMs for info from some Telcos
ESP lookup
Logic for cleansing
Ordered hits (by quality)
XML feed from other Telcos
Cleansed data to ESP
XML
Flat files (CSV or fixed)from the laggards
Ambigous data (close hits or unidentified)
clean data
Error database for manual inspection,
correction, storage/learning
Master database for persistant storage
35Contextual InsightQuery-time fact analysis _at_
sub-document level
entry probe carried toSaturns moon Titanas
part of the
Intent
Concepts
36Contextual NavigationThisIsTravel
37Revisit the Assumptions
2003 24B
Scalable Search
2002 12B
Cave paintings,Bone tools 40,000 BCE
Writing 3500 BCE
2001 6B
0 C.E.
Paper 105
2000 3B
Printing 1450
Electricity, Telephone 1870
80 Unstructured
Transistor 1947
Computing 1950
Internet (DARPA) Late 1960s
The Web 1993
1999