Title: Elke A. Rundensteiner
1Elke A. Rundensteiner
Database Systems Research Group
Email rundenst_at_cs.wpi.edu Office
Fuller 238 Phone Ext.
5815 WebPages
http//www.cs.wpi.edu/rundenst http//davis.w
pi.edu/dsrg
2Project Topics in a Nutshell
- Distributed Data Sources
- EVE Data Warehousing over Distributed Data
- TOTAL-ETL Distributed Extract Transform Load
- NSF96,NSF02,IBM
- XML/Web Data Systems
- RAINBOW XML to Relational Databases
- MASS Native XQuery Processing System
- Verizon,IBM,NSF05
- Databases Visualization
- Scalable Visual High-Dim. Data Exploration
- Data and Visual Quality Support in XMDV
- NSF97,NSF01,NSF05
- Stream Monitoring System
- Scalable Query Engine for Data Streams
- Fire Prediction and Monitoring Appl.
- NSF06, NEC
3 Databases Upside Down
data
static data
data
Standing queries
data
Query
data
data
streams of data
one-time queries
data
4Engine for Querying and Monitoring Streaming Data
- Example of Stream Data Applications
-
- Market Analysis
- Streams of Stock Exchange Data - get rich
- Critical Care
- Streams of Vital Sign Measurements save lives
- Physical Plant Monitoring
- Streams of Environmental Readings protect env
- Computer and Network Management
- Streams of Flows and System Probes manage chaos
- Business, Inventory, and Life Management
- Streams of RFID and Sensor Readings detect
correct
5Stream Query Processing
Register Continuous Queries
Receive Answers
High workload of queries
Real-time and accurate responses required
Distributed Stream Query Engine
Streaming Data
Streaming Result
May have time-varying rates and high-volumes
Available resources for executing each operator
may vary over time.
Memory- and CPU resource limitations
Run-time Distribution and Adaptations required.
6Research Contributions
- Scalable Query Operators (Punctuations)
- Adapt and select among tasks such as memory
purging, stream reading, memory-to-disk
shuffling, punctuation propagation, index
selection, etc. - Synchronized Plan Spilling
- Operators selectively spill data to disk to
off-set the system overload with adaptive re-load
to improve performance - Adaptive Operator Scheduling
- Selector scores alternate scheduling algorithm
based on their effect on QoS requirements, and
selects candidate. - On-line Query Plan Migration
- On-line plan restructuring and then online
migration to the new plan even for stateful
operators. - Distributed Plan Execution
- Adaptively distribute computations across
multiple machines to optimize QoS requirements
without information loss
7Good news for a research student
- We can lean on the oldie and goodie,
- Yet so many new and unsolved problems at our
finger tips due to new angle (and spirit) ! - Interesting (yet doable) research challenges
- Real potential for practical impact and
possibilities for start-up (if so inclined)
8Skills to apply, acquire and perfect. . . ?
- If you are a theory-inclined guy
- ? algorithms for np-complete optimization, graph
theory - If you like system-ish stuff
- ? distributed allocation, scheduling, and
parallelism of query execution - If you are into networking land
- ? quality-of-query, load-shedding,
grid-computing - If you are from the intelligent plant/AI
- ? learning of scheduling selection, run-time
adaptation - If you are a software engineering guru
- ? huge query engine code base, we really need
you ?
So where is the database in this stuff?
9- One answer
- Who cares ? If its fun, its database stuff ?
- Second answer
- Its all over data-centric frame of mind as
CS borders are breaking down - Third answer
- Development of next generation DB engine
10- A driving application FIRE
11Sensors in Buildings
- Track a smoke cloud? Any sensor readings faulty?
What path to leave building ? Is this a
prank ?
12Managing My Live (and Yours)
Are my kids together?
13If Questions, email me rundenst_at_cs.wpi.edu Or,
drop by DSRG Labs Fuller 319 318
My office Fuller 238