Web Data Management - PowerPoint PPT Presentation

About This Presentation
Title:

Web Data Management

Description:

Web Data Management Raghu Ramakrishnan QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy Massively distributed Fine ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 9
Provided by: Glen138
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Web Data Management


1
Web Data Management
  • Raghu Ramakrishnan

2
QUIQ Lessons
  • Structured data management powers scalable
    collaboration environments
  • ASP
  • Multi-tenancy
  • Massively distributed
  • Fine-grained permissions, hierarchical acls
  • RDBMSs were a lousy fit

3
Cloud Computing Computing as a Service
Packaged Software
Cloud Computing
CPU Intensive
Data Intensive
High-throughput E.g., Condor
Analytic E.g., SSDS, Hadoop
4
Implications
  • Data management as a service
  • Scientists and others whove resisted
    (installing, maintaining, and) using DBMSs will
    find it much easier to reap the benefits
  • Data centers and Computing Centers will come
    into vogue again
  • Hosted back-ends and RAD tools will make Web
    application development accessible to all
  • The Web is becoming open
  • E.g., OpenSocial, OpenID
  • Ideas will be the most valuable currency, not the
    wherewithal to build complex systems
  • Paradigm shifts possible for how we do research
    in many fields
  • Build applications that embed your algorithms and
    test them directly in the fieldComputer
    Scientists can interact directly with users
    (ironically, this would still be a breakthrough
    of sorts after four decades!)
  • Many other disciplines (e.g., Sociology,
    microeconomics) can design and conduct online
    experiments involving unprecedented numbers of
    participants

5
PNUTS DB in the Cloud
Indexes and views
CREATE TABLE Parts ( ID VARCHAR, StockNumber
INT, Status VARCHAR )
Geographic replication
Parallel database
Structured, flexible schema
Hosted, managed infrastructure
6
Basic Consistency Model
  • Goal
  • Make it easier for applications to reason about
    updates and cope with asynchronyalternative to
    transactions in an asynchronous world
  • What happens to a record with primary key
    Brian?
  • Guarantees
  • Every reader will always see some consistent, but
    possibly stale version
  • Readers can request a more up-to-date version,
    but may pay extra latency
  • Special case Critical read (writer/readers see
    their own writes)
  • Writers can verify that the record is still at
    the version they expect

Record inserted
Record inserted
Update
Record inserted
Update
Update
Update
Delete
Delete
Update
Delete
v. 1
v. 2
v. 3
v. 1
v. 2
v. 4
v. 3
v. 1
Time
Generation 1
Generation 2
Generation 3
7
Lots of Issues to Re-think
  • Massive distribution replication
  • Asynchrony
  • Availability
  • Consistency
  • DBA to the world
  • Auto-tuning
  • Multi-tenancy
  • Access control (granularity, online ids)
  • Encryption
  • App-support
  • Caching

8
Querying the Web
  • Search will become more semanticbest-effort
    match-making between
  • Query intent (NLP, query logs )
  • Interpreted web content
  • Deep web has a lot of structured data
  • How we get a handle on it is an interesting
    problem
  • But this is only part of the problem lots of
    data not here
  • Semantic web isnt working
  • Site-wrapping doesnt scale
  • Solutions?
  • Domain-wrapping
  • Mass collaboration
  • ??
Write a Comment
User Comments (0)
About PowerShow.com