Title: CoBase: Scalable and Extensible Cooperative Information System
1CoBase Scalable and Extensible Cooperative
Information System
Wesley W. Chu Computer Science Department Universi
ty of California, Los Angeles http//www.cobase.c
s.ucla.edu
2Conventional Query Answering
- Need to know the detailed database schema
- Cannot get approximate answers
- Cannot answer conceptual queries
- Cooperative Query Answering
- Derive approximate Answers
- Answer Conceptual Queries
3Cooperative Queries
4Generalization and Specialization
5Type Abstraction Hierarchy (TAH)
Provide multi-level knowledge representations
Chemical-Suit Size TAH (A non-numerical TAH)
All_Sizes
Small_Size
Large_Size
Very_Small
Large_to_Extra_Large
Small_to_Medium
Very_Large
XL
XXL
L
M
S
XXS
XXXS
6Type Abstraction Hierarchy (TAH)
(Location Example)
7Relaxation Agent
Use knowledge-based approach (generalization and
specialization via Type Abstraction Hierarchy) to
relax the followings for matching
- query conditions
- constraints
8Query Relaxation
9(No Transcript)
10Visualization of Relaxation Process
Query Find seaports in the given region.
relaxed region
given region
11(No Transcript)
12Relaxation Control Primitives
- not-relaxable
- runway-length
- relaxation-order
- (runway length, location)
- preference-list
- unacceptable-list
- answer-size
- relaxation-level
13Relaxation Primitives
- (approximate)
- 9 am
- between
- near-to (context-sensitive)
- Airport near-to LAX
- Restaurant near-to UCLA
- similar-to
- Airport similar-to LAX base-on (traffic,runway)
- within
14Similar-to
Find all airports in Tunisia similar to the
Bizerte airport based on runway length and (more
importantly) runway width.
select aport_name, runway_length,
runway_width from runways, countries where
aport_name similar-to Bizerte based-on
((runway_length 1.0)
(runway_width 2.0)) and
country_state_name Tunisia and
countries.glc_cd runways.glc_cd
15Similar-to Result
Similar-to module ranks the returned
answers according to mean-squared error.
16Unacceptable List Operator
Constraint
CoBase Relaxation Manager
Tunisia
Tunisia
Central Tunisia
SW Tunisia
NE Tunisia
Central Tunisia
SW Tunisia
NW Tunisia
...
Gafsa
El Borma
Bizerte
El Borma
Gafsa
Trimmed TAH
Type Abstraction Hierarchy
17TAH Generation for Numerical Attribute Values
- Relaxation Error
- Difference between the exact value and the
returned approximate value - The expected error is weighted by the probability
of occurrence of each value - DISC (Distribution Sensitive Clustering) is based
on the attribute values and frequency
distribution of the data
18TAH Generation for Non-numerical Attribute Values
- Pattern Based Knowledge Induction (PBKI)
- Rule-based approach
- Clusters attribute values into TAH based on other
attributes in the relation (i.e.,
Inter-Attributes Relationships) - Provides attribute correlation value (measure how
well the rules applied to the databases)
19Type Abstraction Hierarchy (TAH)
Provide multi-level knowledge representations
20Associative Query Answering
Provide relevant information not explicitly asked
by the user User Query List all airports with
runway length between 8500 and approximately
10000 feet
21CoBase and GLADIntegration
22CoBase Functionality
- Provide approximate matching
- Find HETs with capacity of approximate 5-ton
- Provide conceptual query answering
- Find Earth Moving Equipment
- Provide content-sensitive spatial queries
- Find storage sites near selected location
- (Integration with MATT map server)
- Provide relaxation control
- Relaxation order
- Not-relaxable
- At-least (answer set, quantity on hand)
23Cooperative Operations Added to GLAD
- Implicit Query Relaxation
- Explicit Query Relaxation
- Approximate operator
- Similar-to/based-on
- Spatial relaxation
- Relaxation Control
- Relaxation-order
- Not-relaxable
- At-least (answer-set size, quantity on hand)
24CoBase Features Added to GLAD
- Enhance GLAD queries with cooperative operators
(similar-to, relaxation-order, etc.) - Display the query relaxation process
- modified query conditions (value, spatial)
- type abstraction hierarchies
- Rank returned answers with similarity measures
- e.g., spatial relaxation ranks answers according
to their distance from the selected location
25CoBase and GLAD TIE
Report Collection
Spatial Area Selection
Filter Editor
Display Generator
Query Collection
NSNs
Object Cache
Report Query Constructor
CoBase Query Editor
CoBase Relaxation Manager
GLAD
Data Cache
CoBase Data Source Manager
Databases
26GLAD Query
Find NSNs of aircraft with passenger capacity gt
10, combat type 'I', capacity weight lt 2 tons
and price lt 700,000. select nsn, price,
pax_capacity_qty, capacity_wt_ston from
nsn_description where (upper(class) '7' and
upper(cbs_category_nomen) 'AIRCRAFT' and price
lt 700000 and pax_capacity_qty gt 10 and upper
(combat_type) 'I' and capacity_wt_ston lt 2)
27CoGLAD Query with Relaxation Control Operators
Find NSNs of aircrafts with passenger capacity gt
10, combat type 'I', capacity weight lt 2 tons
and price lt 700,000. Attribute
passenger capacity is not relaxable. Relax price
first and then capacity weight. select nsn,
price, pax_capacity_qty, capacity_wt_ston from
nsn_description where (upper(class) '7' and
upper(cbs_category_nomen) 'AIRCRAFT' and price
lt 700000 and pax_capacity_qty gt 10 and upper
(combat_type) 'I' and capacity_wt_ston lt
2) not-relaxable pax_capacity_qty relaxation-order
price capacity_wt_ston
28CoGLAD Querywith Similar-to Operator
Find aircraft similar to NSN '0000IB0000961'
based on the attributes price, passenger capacity
and air mileage. Passenger capacity has a weight
of 8 and price and air mileage has a weight of
1. select nsn from nsn_description where
upper(nsn) similar-to '0000IB0000961' based-on
((price 1.0) (pax_capacity_qty 8.0)
(air_mileage 1.0)) at-least 4 '0000IB0000961'
is an answer from the previous query
29CoGLAD Querywith Approximate Operator
Find DLA stock report with NSN like 8340 (FSC
for tents and tarpaulin) and on-hand quantity is
approximate 150. select nsn, ric from
dla_stock_report where nsn like 8340
and on_hand_quantity 150
30Adding Constraints to a Query
GLAD query select nsn, ric from
dla_stock_report where nsn like 8340 and
nomenclature like TARP Query with added
constraints select nsn, ric from
dla_stock_report where nsn like 8340
and nomenclature like TARP and
on_hand_quantity 150 and size_in_square_fee
t 350
31Example of Spatial Relaxation
32Spatial Relaxation with Relaxation Control
- relaxation-order size, (latitude, longitude)
- not-relaxable price
- at-least
- value size of the tarpaulin
- quantity on hand relax until enough quantity on
hand (specified by the user) is obtained
33Scalable and Extensible CoBase Architecture
34Mediator Inter-Communications via KQML
CoBase Ontology
Module Objects APIs
Content Language Data Actions
35(No Transcript)
36Query Answers Without CoBase
Query find chemical suits
37(No Transcript)
38(No Transcript)
39(No Transcript)
40(No Transcript)
41(No Transcript)
42(No Transcript)
43Electronic Warfare
- Identify and locate sources of radiated
electromagnetic energy - Determine emitter type based on the operating
parameters of observed signals - Radio Frequency (RF)
- Pulse Repetition Frequency (PRF)
- Pulse Duration (PD)
- Scan Period (SP)
- other operating parameters
- Determine platform sites near the line of the
bearing of an emitter
This research is a joint effort between CoBase
and Lockheed Martin Communication Systems (Russ
Frew, et al.), Camden, NJ
44Performance Improvement by Using CoBase in EW
Conventional DB parameter ranges from emitter
specifications CoBase DB peak parameters
(RF,PRF) and parameter ranges (PD,SP) KB TAHs
based on RF and PRF peak parameters TAHs based
on PD and SP parameter ranges Case 1 emitter
signals without noise Case 2 add noise - PD SP
(10), PRF (5), RF (2.5) Sample Size 1000
signals Emitter Types 75
This research is a joint effort between CoBase
and Lockheed Martin Communication Systems (Russ
Frew, et al.), Camden, NJ
45Current CoBase Users and Applications
46Conclusions
- Provide user and context sensitive query
relaxations (structured and unstructured data) - Provide additional information (associative query
answering) based on past cases - CoSQL (Cooperative SQL)
- similar-to, near-to, approximate
- relaxation control operators
- GUI
- map server, high-level query formation
47(No Transcript)
48CoSent An Active Data Base Technology
- Natural language-like rule supports conceptual
approximate terms - Decompose natural language-like rule to low level
rules via knowledge based (TAH) - Mimic human cognitive process and thus ease in
rule specification - Ease in rule maintenance
49CoSent An Active Database Technologies
CoSent monitors temporal composition events and
executes rules with conceptual and approximate
terms.
- Trigger with high-level rules containing
- conceptual term (e.g., bad, heavy) and
- approximate operators (e.g., similar-to, near-to,
approximate) - Allow trigger conditions to be specified with
fuzzy and conceptual terms - Mimic human cognitive expression
50Key Features of CoSent
- User defined rules transformed into low-level
range values via knowledge base--Type Abstraction
Hierarchies (TAHs) - TAHs are typically generated from data sources
automatically - Leveraged on conventional DBMS (e.g., Oracle,
Sybase, Teradata) triggering systems - Rule definition is either specified by domain
expert or derived by data mining technologies
51Example of Rule Definitions with Data Mining
Technology
- Find attributes that frequently appear together
for a given target attribute. - If bad road condition and also bad weather, then
cause traffic congestion. - If a person wrote many bad checks and also has
past eviction, then this person is a poor credit
risk. - Based on the frequency of occurrence, the derived
rules can be ranked according to certain
information measure.
52Conventional vs. Natural Language-Like Rules
Conventional Rule If wind_speed gt MAX_WIND_SPEED
and wave_height gt MAX_WAVE_HEIGHT then notify
affected units in regions.
- Natural Language-Like Rul
- If the weather turns bad,
- then notify all affected units in that region and
all those that are near to that region.
53Natural Language-Like Rule Specifications
Example 1 If the number of departures of large
cargo carrier (e.g., C-5, C-141) becomes
significantly low in the past seven days, notify
the Air Mobility Command.
Example 2 If the aircraft has a fuel
contamination problem and the aircraft type is
similar-toC-5 based on the fuel type and
fueling method, then notify the authority
54Example
DoD Transportation PlanningWeather Report Table
Wind Speed is the hourly average over an
eight-minute period for buoys and a two-minute
period for land stations Wave height is sampled
in a 20-minute period
55TAH Example
Wave Height
Wave Height 0.6, 7.2
VERY HIGH 2.45, 7.2
LOW 1.25, 1.75
HIGH 1.75, 2.45
VERY LOW 0.6, 1.25
56A Portion of Wave Height TAH
57Triggering Based on Temporal Composite Events
Notify the commander if within the past seven
days, the total departure of C-5 is significantly
low and the filter problem on C-5 is extremely
high.
C-5 Departure
C-5 Filter Problem
Low 9-134.5
High 134.5-208
High 53-79
Low 0-53
Ex High 60-79
Signt. Low 9-53
Very High 134.5-162
Extra. Low 0-36
Very High 53-60
Very Low 53-134.5
Signt High 162-208
Very Low 36-53
58Natural Language-Like Rule Translations
Rule Translation/Relaxation
59CoSent Architecture
60CoSent Demo
- Natural Language-like rule with conceptual terms
very high wave height and very strong wind
speed - Natural language-like rule with approximate term
nearby and conceptual term bad weather - Install trigger by drag-and-drop on the desired
location on the map
61Natural Language-Like Rule
- Natural language-like rule containing conceptual
terms, such as wave_height very-high and
wind_speed very-strong, can be translated to
range values by domain knowledge. For instance,
type abstraction hierarchy. - Natural language-like rules reduce the number of
rules, thus easing rule maintenance
62(No Transcript)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67Rules With Approximate Terms
- Rules can contain approximate terms, such as
near-by and approximate, thus ease in rule
specification - The Trigger can be installed on the desired
location on a map by drag-and-drop method - The near-by region affected by the bad weather
condition is specified by the trigger condition
shown by a red circle
68(No Transcript)
69(No Transcript)
70(No Transcript)
71(No Transcript)
72(No Transcript)
73(No Transcript)
74(No Transcript)
75Map Server Architecture
76Current Capabilities of Map Server
- Visualization of Query Answers
- Icons
- Paths
- Enter Query Constraints Graphically
- Visualization of Query Relaxation Process
77Visualization of Relaxation Process
Query Find seaports in the given region.
relaxed region
given region
78Explanation Agent
- Based on process traces and invocation rules,
generate English-like explanation of - Relaxation process
- Quality of approximate matching
- Further explanation on definitions and terms in
explanation
79Explanation of Relaxation Process
80Relaxation Primitive within
81Extend near-to Primitive Points to Regions
82Dynamic Nearness
- Uses transaction history to identify nearness
between tuples and values - If two tuples (or attribute values) appear
together in a query answer, then that is a piece
of evidence that they should be clustered
together. - Gather evidence over time
- Evolve the hierarchy
83The BOOKS Relation
84Schematic of a Browsing System
85Schematic of a Query Modification System
86The Links Between Tuples in BOOKS
87Dynamic Links After Two Queries
88Links with Counts
89Number of Links with Threshold Value q
90Number of Links is determined by Maximum Answer
Set Size a
91Query Formation From High-LevelConcepts for
Relational Databases
- Guogen Zhang
- Wesley Chu
- Frank Meng
- Gladys Kong
92Outlines
- Overview
- Semantic Graph Model
- High-Level Query Formation for SPJ queries
- Incremental Query Formation for Complex Queries
- Conclusions
93Overview Query Formation
- Based on semantic graph model, including
user-defined relationships - User specifies requests and constraints
- Formulate simple query by graph search technique
- Candidates ranked by information measure
- English-like query description
- A complex query can be formulated by a series of
simple queries
94Related Work
- Query formulation as Steiner tree problem (Wald
and Sorenson, 1984) - limited to partial 2-tree graphs
- Formulate simple Select-Project-Join (SPJ)
queries via Universal Relation Model no need to
specify natural joins (Ullman 1988, Vardi, 1988) - Object-oriented query path expression completion
partial order relationship between different path
for ranking (Ioannidis and Lashkari, 1994) - Query-by-Icon (QBI) Massari and Chrysanthis,
1995 - Natural language interfaces (text/voice) logical
form to query
95Semantic Graph Model
- Weighted graph G(V,E)
- Nodes entities -- strong, weak, user-defined
- Links relationships -- ISA, HAS, simple,
complex, user-defined - For relational databases
- nodes relations
- links natural and user-defined joins
- Weight information measure of a node or link
96Query Feature
- Query expression in a semantic graph
- Query Topic, T A set of Joins represented by
links - Query Constraints, C Query Conditions
- Query Aspect, A Attribute list
97A query topic for aircraft can land on airports
at geographical locations of countries
98Semi-Automatic Generation of Semantic Model
- Find natural joins through key and foreign key
between nodes. - User-defined links can be added into the graph
model. - Designers need to specify link types and assign
names to all the elements in the graph.
99Example of Semantic Model Generation
- AIRPORT APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT,
- key APORT_NM.
- RUNWAY APORT_NM, RUNWAY_NM, GLC_CD,
RUNWAY_LENGTH_FT, - RUNWAY_WIDTH_FT, key RUNWAY_NM.
- GEOLOC GLC_CD, GLC_NM, CY_CD, LATITUDE,
LONGITUDE, - key GLC_CD.
- COUNTRY CY_CD, CY_NM, key CY_CD.
- Links
- AIRPORT--RUNWAY APORT_NM
- AIRPORT--GEOLOC GLC_CD
- RUNWAY--GEOLOC GLC_CD
- GEOLOC--COUNTRY CY_CD
100Information Measure
- Information measure of a node or link, a
- I(a) - log P(a)
- where P(a) is the probability of a being
used - in queries.
- Assume nodes and links are independent, for a
subgraph with a set of elements Aai i 1, ,
n, information measure is additive - n
- I(A) SUM I(ai)
- i 1
101Information Measure (cont.)
- Initial Information Measure
- all the nodes 1
- different nodes have a different value
-
- Information measure is normalized and converted
into counts - Probability of a node or a link is P(ai) ci/c
- Update Information measure
- Ranking based on Information measure, thus adapt
to user feedback
102Query Formulation
- To formulate (simple) queries without knowledge
of query language or database schema - Example
- Find airports in Tunisia that can land a C-5
cargo plane - User input
- Query aspect AIRPORTS.APORT_NM
- Constraints
- AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME C-5
- COUNTRY_STATE.CY_NM Tunisia
- Links CAN LAND
103Formulated Query
- SELECT R3.APORT_NM
- FROM AIRCRAFT_AIRFIELD_CHARS R0
- AIRPORTS R3, COUNTRY_STATE R11
- GEOLOC R12, RUNWAYS R16
- WHERE R0.AC_TYPE_NM C-5
- AND R11.CY_NM Tunisia
- AND R0.WT_MIN_AVG_LAND_DIST_FT lt
- R16.RUNWAY_LENGTH-FT
- AND R0.WT_MIN_RUNWAY_WIDTH_FT lt
- R16.RUNWAY_WIDTH_FT
- AND R11.GLC_CD R3. GLC_CD
- AND R3.APORT_NM R16.APORT_NM
- AND R11.CY_CD R11.CY_CD
104Query Completion as Graph Search Problem
- Given An incomplete input query topic Ti
- Find a set of links to complete the topic (to
make Ti connected) - Minimum Missing Information principle
- The query completion candidate Tc (the missing
links and nodes) for an incomplete input topic Ti
contains the minimum information
105Query Formulation Algorithm
- Input subgraph T of the semantic graph G
- Find candidates with the minimum Information
measure - Two methods used to limit the search scope
- L-step-bound paths paths that connect two
components with at most L links, to limit search
within the neighborhood of the input subgraph - k-minimum completion candidates only at most k
candidates with minimum Information measure are
kept (alpha-beta pruning)
106Initial Components and 2-Step-BoundPaths For the
CAN LAND Query
107The Semantic Graph For theTransportation Domain
108Incremental Query Formulation
- Incremental Query Formulation
- To assist user reach a complex query goal with a
series of simple queries - The subsequent queries may depend on results of
preceding queries (derived relations) - Issues
- Incorporate derived relations into the semantic
graph - Suggest missing attributes to link isolated
derived nodes to the graph
109Incremental Query Examples
- Find airports in Tunisia.
- Which of these airports can land a C-5?
- What is the weather at these airports?
110Incorporating Derived Relations
- Source relation contributes attributes to the
derived relations - Derived relation inherits properties of
attributes from their source relations - Deriving link links to the source relations
through inherited keys - Inherited link inherits links from the source
relations
111Extended semantic graph showing derived nodes,
derived links and inherited links
112Suggesting Key Attributes for a Query
- Find source relations for the isolated derived
relation. - Suggest key of the source relations as attributes
to include.
113Concept and Attribute Specification Interface
114Query Constraint Specification
115Action Specification
116English-Like Query Descriptionand the Formulated
Query
117Conclusions
- Semantic graph model provides a basis for query
formulation search - Ranking of query candidates by information
measure in formulation provides adaptive behavior - Incremental query formulation is effective for
complex queries - GUI and voice interface can be built for query
formulation from high-level concepts
118(No Transcript)