The Structure of Computer Scientific Revolutions

About This Presentation

Title:

The Structure of Computer Scientific Revolutions

Description:

Dow Jones Enterprise Ventures. May 2006. Michael Franklin. UC Berkeley. Amalgamated Insight ... Dow Jones EV Summit May 2006. Whither Structured Data? ... – PowerPoint PPT presentation

Number of Views:120

Avg rating:3.0/5.0

Slides: 31

Provided by: jeff70

Category:

more less

Transcript and Presenter's Notes

Title: The Structure of Computer Scientific Revolutions

1
The Structure of (Computer) Scientific Revolutions
Michael Franklin UC Berkeley Amalgamated
Insight

Dow Jones Enterprise Ventures
May 2006

2
Data Management Then
Structured Data Processing
3
Data Management Now
4
The Structure Spectrum

Structured data (schema-first)
regular, known, conforming,
e.g., Relational database
Unstructured data (schema-never) freeform,
irregular,
e.g., plain text, images, audio,
Semi-structured data (schema-later)
Provides structural information, but less
constrained. e.g., XML, tagged text/media

5
Whither Structured Data?

Conventional Wisdom 20 of data is structured
currently.
Consumer apps, enterprise search, media apps are
placing downward pressure on this.

6
A Contrarian View?

Two reasons why structured data is where the
action will be
The Data Industrial Revolution Data
used to be hand-crafted, now its
generated by computers!!!
The Data Integration quagmire structure provides
crucial cues for making data usable.

7
The New Landscape

Bells Law Every decade, a new, lower cost,
class of computers emerges, defined by platform,
interface, and interconnect
Mainframes 1960s
Minicomputers 1970s
Microcomputers/PCs 1980s
Web-based computing 1990s
Devices (Cell phones, PDAs, wireless sensors,
RFID) 2000s

Enabling a new generation of applications
for Operational Visibility, monitoring, and
alerting.
8
Data Streams ? Data Flood
PoS System
Barcodes
Phones
Sensors
RFID

Exponential data growth
New challenges continuous, inter-connected,
distributed, physical
Shrinking business cycles
More complex decisions

Inventory
Transactional Systems
Telematics
Clickstream
9
State of the Art

Custom-coded implementations that are expensive
and often unsuccessful.
Can we develop the right infrastructure to
support large-scale data streaming apps?

10
High Fan In Systems

A data management infrastructure for large-scale
data streaming environments.
Uniform Declarative Framework
Every node is a data stream processor that speaks
SQL-ese
? stream-oriented queries at all levels
Hierarchical, stream-based views as an organizing
principle.
Can impose a view over messy devices.

11
HiFi - Taming the Data Flood
Hierarchical Aggregation Spatial Temporal
Headquarters
Regional Centers
In-network Stream Query Processing and Storage
Warehouses, Stores
Fast Data Path vs. Slow Data Path
Dock doors, Shelves
Receptors
12
Device Issues example
Shelf RIFD Test - Ground Truth
13
Actual RFID Readings
Restock every time inventory goes below 5
14
Query-based Data Cleaning
Smooth
CREATE VIEW smoothed_rfid_stream AS (SELECT
receptor_id, tag_id FROM cleaned_rfid_stream
range by 5 sec, slide by 5
sec GROUP BY receptor_id, tag_id HAVING
count() gt count_T)
Point
15
Query-based Data Cleaning
Arbitrate
CREATE VIEW arbitrated_rfid_stream AS (SELECT
receptor_id, tag_id FROM smoothed_rfid_stream rs
range by 5 sec, slide by 5
sec GROUP BY receptor_id, tag_id HAVING
count() gt ALL (SELECT count() FROM
smoothed_rfid_stream range by 5
sec, slide by 5 sec
WHERE tag_id rs.tag_id GROUP BY
receptor_id))
Smooth
Point
16
After Query-based Cleaning
Restock every time inventory goes below 5
17
Once you have the right abstractions

Soft Sensors
Quality and lineage
Optimization (power, etc.)
Pushdown of external validation information
Data archiving
Model-based sensing
Imperative processing

18
Data Integration

Integration is the ultimate schema-first problem.
Structure is both a key enabler and a key
impediment here.

19
Search vs. Query

What if you wanted to find out which actors
donated to John Kerrys presidential campaign?

20
Search vs. Query
21
Search vs. Query

What if you wanted to find out which actors
donated to John Kerrys presidential campaign?

22
Search vs. Query

Search can return only whats been previously
stored.

23
Also

What if you wanted to find out the average
donation of actors to each candidate?
What if you wanted to compare actor donations
this campaign to the last one?
What if you wanted to find out who gave the most
to each candidate?
What if you wanted to know where the information
came from, and how old it was?

24
A Deep-Web Query Approach
SELECT y.name,f.occupation, FROM Yahoo_Actors y,
FECInfo f WHERE y.name f.name
25
Yahoo Actors JOIN FECInfo
Q Did it Work?
26
The Fundamental Tradeoff
Structure enables computers to help users
manipulate and maintain the data.
Semi-Structured (schema-later)
Structured (schema-first)
Unstructured (schema-less)
27
Dataspaces

Deal with all the data from an enterprise in
whatever form
Data co-existence
no integrated schema, no single warehouse
Pay-as-you-go services
Keyword search is bare minimum.
Data manipulation and increased consistency as
you add work.

From Databases to Dataspaces A New
Abstraction for Information Management, Michael
Franklin, Alon Halevy, David Maier, SIGMOD
Record, December 2005.
28
Dataspaces vs. Databases

Single Schema
Centralized Administration
Structured Query
Strict Integrity Constraints

Data Coexistence
Autonomous Sources
Search, Browse, Approximate Answer
Best Effort Guarantees

29
The World of Dataspaces
Web Search
Far
Virtual Organization
Administrative Proximity
Federated DBMS
Near
Desktop Search
DBMS
High
Low
Semantic Integration
30
Conclusions

Structured data not going away.
In fact, there will be lots more of it.
and it must be processed as fast as it is
created.
Structure is crucial for successful data
integration and manipulation.
Much effort will be expended to add structural
information to text and media.
Traditional (structured) database technology is
not up to the task.
Great opportunities for innovation.
HiFi and Dataspaces are examples.

Write a Comment

User Comments (0)

About PowerShow.com

The Structure of Computer Scientific Revolutions - PowerPoint PPT Presentation

The Structure of Computer Scientific Revolutions

Dow Jones Enterprise Ventures. May 2006. Michael Franklin. UC Berkeley. Amalgamated Insight ... Dow Jones EV Summit May 2006. Whither Structured Data? ... – PowerPoint PPT presentation