Probabilistic Databases - PowerPoint PPT Presentation

About This Presentation
Title:

Probabilistic Databases

Description:

Mt's that explain it. Or could provide a probability distribution on the possible Mt's. ... Mt 1. Xt 1. Ot 1. Statistical Modeling of Sensor Data ... – PowerPoint PPT presentation

Number of Views:134
Avg rating:3.0/5.0
Slides: 19
Provided by: amol70
Category:

less

Transcript and Presenter's Notes

Title: Probabilistic Databases


1
Probabilistic Databases
  • Amol Deshpande, University of Maryland

2
Overview
  • V.S. Subrahmanian
  • ProbView, PXML, Temporal Probabilistic Databases,
    Probabilistic Aggregates
  • Lise Getoor
  • Statistical Relational Learning, Probabilistic
    Relational Models, Entity Resolution
  • Amol
  • MauveDB Statistical Modeling in Databases,
    Correlated tuples in probabilistic databases

3
Overview of Todays Presentation
  • Model-based Views/MauveDB Amol
  • Statistical Relational Learning Lise
  • Representing arbitrarily correlated data and
    processing queries over it Prithviraj

4
Overview of Todays Presentation
  • Model-based Views/MauveDB Amol
  • Goal Making it easy to continuously apply
    statistical models to streaming data
  • Current focus on designing declarative
    interfaces, and on efficient maintenance
    algorithms
  • Less on the probabilistic databases issues
  • Statistical Relational Learning Lise
  • Representing arbitrarily correlated data and
    processing queries over it Prithviraj

5
Motivation
  • Unprecedented, and rapidly increasing,
    instrumentation of our every-day world
  • Huge data volumes generated continuously that
    must be processed in real-time
  • Typically imprecise, unreliable and incomplete
    data
  • Measurement noises, low success rates, failures
    etc

6
Data Processing Step 1
  • Process data using a statistical/probabilistic
    model
  • Regression and interpolation models
  • To eliminate spatial or temporal biases, handle
    missing data, prediction
  • Filtering techniques (e.g. Kalman Filters),
    Bayesian Networks
  • To eliminate measurement noise, to infer hidden
    variables etc

Temperature monitoring
GPS Data
Kalman Filters et
Regression/interpolation models
7
A Motivating Example
  • Inferring transportation mode/ activities
    Henry Kautz et al
  • Using easily obtainable sensor data, e.g. GPS,
    RFID proximity data
  • Can do much if we can infer these automatically

Have access to noisy GPS data Infer the
transportation mode walking, running, in a
car, in a bus
8
Motivating Example
  • Inferring transportation mode/ activities
    Henry Kautz et al
  • Using easily obtainable sensor data, e.g. GPS,
    RFID proximity data
  • Can do much if we can infer these automatically

home
office
Preferred end result Clean path annotated
with transportation mode
9
Dynamic Bayesian Network
Use a generative model for describing how the
observations were generated
Time t
Need conditional probability distributions
e.g. a distribution on
(velocity, location) given the
transportation mode Prior knowledge or learned
from data
Mt
Xt
Ot
10
Dynamic Bayesian Network
Use a generative model for describing how the
observations were generated
Time t1
Time t
Mt1
Mt
Xt
Xt1
Ot1
Ot
11
Dynamic Bayesian Network
Given a sequence of observations (Ot), find the
most likely Mts that explain it. Or could
provide a probability distribution on the
possible Mts.
Time t1
Time t
Mt1
Mt
Xt
Xt1
Ot1
Ot
12
Statistical Modeling of Sensor Data
  • No support in database systems --gt Database ends
    up being used as a backing store
  • With much replication of functionality
  • Very inefficient, not declarative
  • How can we push statistical modeling inside a
    database system ?

13
Abstraction Model-based Views
  • An abstraction analogous to traditional database
    views
  • Present the output of the application of model as
    a database view
  • That the user can query as with normal database
    views

14
Example DBN View
User view of the data - Smoothed locations
- Inferred variables
e.g. select count() group by mode
sliding window 5 minutes
User Time Location Mode prob
John 5pm (x1, y1) Walking 0.9
John 5pm (x1, y1) Car 0.1
John 505pm (x2, y2) Walking 0
John 505pm (x2, y2) Car 1
Application of the model/inference is pushed
inside the database Opens up many optimization
opportunities e.g. can do inference lazily when
queried etc
User Time Location
John 5pm (x1, y1)
John 505pm (x2, y2)
Original noisy GPS data
15
Correlations
User
User Time Location Mode prob
John 5pm (x1, y1) Walking 0.9
John 5pm (x1, y1) Car 0.1
John 505pm (x2, y2) Walking 0
John 505pm (x2, y2) Car 1
Strong and complex correlations across
tuples - Mutual exclusivity -
Temporal correlations
16
MauveDB Status
  • Written in the Apache Derby Java open source
    database system
  • Support for Regression- and Interpolation-based
    views
  • Neither produce probabilistic data
  • SIGMOD 2006 (w/ Sam Madden)
  • Currently building support for views based on
    Dynamic Bayesian networks Bhargav
  • Kalman Filters, HMMs etc
  • Initial focus on the user interfaces and
    efficient inference
  • Will generate probabilistic data may not be able
    to do anything too sophisticated with it

17
Research Challenges/Future Work
  • Generalizing to arbitrary models ?
  • Develop APIs for adding arbitrary models
  • Try to minimize the work of the model developer
  • Probabilistic databases
  • Uncertain data with complex correlation patterns
  • Query processing, query optimization
  • View maintenance in presence of high-rate
    measurement streams

18
Thanks !!
Mauve Model-based User Views
Write a Comment
User Comments (0)
About PowerShow.com