Title: Streaming Knowledge Bases
1(No Transcript)
2Streaming Knowledge Bases
- Onkar Walavalkar, Anupam JoshiTim Finin and
Yelena Yesha - University of Maryland, Baltimore County
- 27 October 2008
3Streaming Knowledge Bases
- Onkar Walavalkar, Anupam JoshiTim Finin and
Yelena Yesha - University of Maryland, Baltimore County
- 27 October 2008
4Streaming Knowledge Bases
- Onkar Walavalkar, Anupam JoshiTim Finin and
Yelena Yesha - University of Maryland, Baltimore County
- 27 October 2008
5Overview
- Motivation
- Streaming databases
- Streaming knowledge bases
- Experiments and results
- Conclusions
? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
6Operating Room of the Future
drugs
RFID
RFID
ORF
tools
AwarePoint
WIFI
patient Monitors
Bluetooth
devices
staff
- ORs will be awash in low-level data, much of it
noisy or incomplete - Challenges include coping with the noise and
interpreting the low-level data to recognize
high-level events and activities
? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
7Initial work in OR training
- UMD Mastri Center is experimenting with OR
technologies and training environments - The Human Patient Simulator from METI
- Designed to react like a human
- Responds to medical treatment
- Generates continuous streams of data, moderated
by - Initial conditions (e.g. blunt trauma multiple
injuries scenario) - human interactions
? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
8Efficient Data Stream Management
Index
Queries
Index
Data
Traditional DBMS
Stream Management System
- Data is stored/indexed in system
- Queries applied to stored data as they stream
through
- Queries stored/indexed in system
- Data applied to stored queries as they stream
through
Several efforts Tapestry, Aurora, TelegraphCQ
? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
9? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
10Whats wrong with this picture?
- We need to enhance this to support semantic
interoperability for medical data knowledge - The medial community has a long history
developing using standard ontologies metadata - Incoming streams of data can be in rdf
- And reference terms in appropriate ontologies
? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
11Whats wrong with this picture?
- Streaming Database systems use continuous queries
specified over a sliding time window - e.g., range by 30 seconds slide by 10
seconds - Issues
- Where do we we do reasoning?
- How do we answer queries against a sliding window
of data?
? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
12RDF Stream Processing
Query for Class of Concern
Input Triple Stream
Detected Instances
input stream handler
Special domainrules queries
Enhanced Stream
Static Data Store
RangeInfo
DomainInfo
Classtree
PropertyTree
InverseInfo
? Motivation ? Stream DBs ? Stream KBs ?
Experiments ? Conclusions ?
13Experiments and results
- Three simple reasoners
- Jena, in core
- Pre-computed custom hash tables
- Using tables in TelegraphCQ
- Various scenarios
- Ontology size 118 - 23.1 MB
- Number of subclasses 49 - 57,000
- Subclass depth 2 - 9
- Data rate 1 - 50 triples per second
14Domain Example
- Monitor data stream looking for observations of
invasive species from Bioblitz and eco-blogging
data streams - Uses our Ethan ontologies for ecoinformatics
- Tree of life (340K taxons from ITIS and other
sources) - Species profiles
- Invasive species definitions
- Observation
15Reasoning delay comparison for all approaches
16Reasoning delay comparison for all approaches
17Reasoning delay comparison for all approaches
18Reasoning delay comparison for all approaches
19VM Usage comparison of all 3 approaches
20VM Usage for Jena for different classes
21VM usage comparison for Hashtable and TCQ
22Conclusions
- If the incoming triple data rate goes beyond a
certain limit, the reasoning speed starts to lag
and tends to slow down the incoming stream. - The speedup achieved by using TCQ and a hashtable
prove the value of pre-processing an ontology,
particularly for fast streaming facts.
23http//ebiquity.umbc.edu/