Title: Complex relationships for the semantic web
1Complex relationships for the semantic web
- Amir Ali Khosravi
- 82702903
2- The emphasis will shift from finding documents to
finding facts, actionable information, and
insights - Discover relevant and interesting relationships
amongst the entities that these documents describe
3Introduction
- To take maximum advantage of awareness we need
support meaningful information requests - Current ontological represetational schemes
represent knowledge as a hierarchical taxonomy of
concepts and relationships such as
is-a/role-of,instance of/member-of and part of - Support limited complexity
- InfoQuilt extend support for semantics by
supporting computations involving lateral, user
defined relationships - Relationships across domains, may not
necessareily hierarchical in nature
4Introduction
- Does nuclear Testing cause Earthquakes?
- Natural-Disasters.Earthquake, Nuclear-Weapons.Nucl
ear-Testing - The meaning of cause could be based on the
proximity in time and distance between two events
5Classification of relationships
- Content independent relationships two documents
may be relatedd to each other by virtue of them
being stored on the same server or file system,
or relationship between a document and its date
of modification
6- Content Dependent relationships either the
information content they refer to or based on
some representation of it thereof - Direct Content dependent relationships relation
between two entities being mentioned in the same
pragraph - Content descriptive relationships the fact that
X is CEO of a company Y is computed based on the
existence of an ontlogy (intra-domain
relationships and inter-domain relationships
7Content descriptive relationships
- Direct Semantic relationships Intel
is-a-competitior-of Motorola - Complex Transitive RelationshipsRemzi and Dick
linked to the same teroorist organization - Inter-domain Multi-ontology Relationships
8Representation of relationships
- Arity
- Cardinality
- Direct vs. transitive relationships
- Crisp vs. fuzzy
- Properties vs. relations
- Structural composition
9- Deficiency shared by most of these languages is
the absence of a mechanism that can model complex
operators - Earthquake causing tsunami Temporal and spatial
proximity - InfoQuilts supports such operators by allowing
the use of user defined functions as operators - The system particularly needs to extract
domain-specific or contextually relevant metadata
from all the data sources - Knowledge of the characteristics of the domains
of interest gives us the ability to optimize the
processing of information requests
10- A form of knowledge discovery HAND
- Domain modeling uses ontologies, domain rules,
functional dependency, support of user defined
inter-ontological relationships - Rich and powerful querying mechanism IScape
- User defined functions as complex operators
- Ability to handle heterogeneous content
- Information request processing utilizing domain
and resource characteristics
11Domain Modeling
- InfoQuilt uses ontologies to model the domains
12- Attributes
- Earthquake(latitude, longitude, region,
eventDate, Description, damagePhoto,
numberOfDeaths, magnitude) - Domain Rules
- latitude gt -90 , latitude lt 90
- Earthquake(latitude, longitude, region,
eventDate, Description, damagePhoto,
numberOfDeaths, magnitude, - latitude gt -90 , latitude lt 90
- longitude gt -180, longitude lt 180)
13- Functional Dependencies(FD)
- is used to retrieve information (attribute
values) that is missing from a resource by using
another resource - testSite -gt latitude longitude
- Earthquake(latitude, longitude, region,
eventDate, Description, damagePhoto,
numberOfDeaths, magnitude, - latitude gt -90 , latitude lt 90
- longitude gt -180, longitude lt 180,
- testSite -gt latitude longitude )
14Interontological relationships
- We can say that some nuclear test could have
caused an earthquake if we see that the
earthquake occurred some time after the nuclear
test was conducted and in nearby region - NuclearTest Causes Earthquake
- lt dateDifference (NuclearTest.eventDate,
Earthquake.eventDate) lt 30 AND distance(
NuclearTest.latitude, NuclearTest.longitude,
Earthquake.latitude, Earthquake.longitude) lt 10000
15Operations
- Find all earthquakes with epicenter in a 5000
mile radius area of the location at latitude
60.790 North and longitude 97.570 East - Another important advantage the system can
support complex post-processing of data - To be able to dynamically and easily add new
operations as well as update and delete existing
ones, InfoQuilt maintains Function Store
16Information Scapes (IScapes)
- query generally explicitly specifies the exact
sources, how data from these sources should be
integrated, it does not understand what the user
is asking - IScape, can understand what the user is inquiring
by embedding semantic information
17- a computing paradigm that allows users to query
and analyze the data available from a diverse
autonomous sources, gain better understanding of
the domains and their interactions as well as
discover and study relationships - Find all earthquakes with epicenter in a 5000
mile radius area of the location at latitude
60.790 North and longitude 97.570 East and find
all tsunamis that they might have caused
18- The preset constraint and runtime configurable
constraint , similar to WHERE - IScape builder to construct and execute IScapes
and analyze results
19Human Assisted Knowledge Discovery(HAND)
- Transitive relationships Earthquake causes
Tsunami, Tsunami affects Enviornment,
Earthquake affects Enviornment - IScape 1 When was the earliest recorded nuclear
test conducted - IScape 2 Find the total number of earthquakes
with a magnitude 5.8 or higher on the Richter
scale per year starting from year 1990 - IScape 3 Find the average number of earthquakes
per year with a magnitude 5.8 or higher on the
Richter scale for the period 1900-1949 and for
the period 1950-present
20- IScape 4 For each group of earthquakes with
magnitudes in the ranges 5.8-6, 6-7, 7-8, 8-9,
and magnitudes higher than 9 on the richter scale
starting from year 1900, find the number of
earthquakes - IScape 5 Find nuclear tests conducted after
january 1,1950 and find any earthquakes that
occurred not later than a certain number of days
after the test and such that its epicenter was
located no farther than a certain distance from
the test site
21InfoQuil runtime architecture
- Multi-agent information brokering architecture at
runtime
22Steps of precessing of an IScape
- User agent sends IScape to the broker agent for
processing - Broker agent sends the IScape to the planning
agent - Planning agent creates an execution plan for the
IScape - Planning agent returns the plan to the broker
Agent - Broker agent sends the plan to the correlation
agent for processing - Correlation agent starts executing the plan
- Correlation agent returns the final result to the
broker agent - Broker agent forwards the result to the user
agent - Supports easy dynamic addition and removal of
resources
23Information Extraction and integration
- Extract it as needed Vs. extract it offline and
maintain it in a local database - Important metrics for Information extraction
recall , precision - Information integration systems usually provide a
uniform means of representing the information
from multiple sources - It is done offline and for sources with
relatively static data - METIS toolkit is used to create a single
repository of information for a domain
24METIS architecture
25Resource Modeling
- Resource Attributes
- SignificantEarthquakesDB( eventDate, Description,
region, magnitude, latitude, longitude,
numberofDeaths, damagePhoto) - EarthquakesAfter1990( eventDate, region,
magnitude, numberOfDeaths, damagePhoto ) - Binding Patterns (BP)
- A set of attributes that the system must be able
to suply values for in order to query the
resource - fromCity, fromState, toCity, toState,
departureDate
26- Data Characteristic (DC) Rules
- AirTranAirways (airlinecompany, flightNumber,
fromCity, fromState, toState, DepartureDate,
fare, departureTime, arrivalTime, dc
airlineCompany AirTran Airways, fromCity,
fromState, toCity, toState, departureDate ) - Find all the flights operated by Delta Airlines
from Boston, MA to Los Angeles, CA on February
19,2001
27- Local Completeness (LC) Rules
- All the information for the subset
- AirTranAirways (airlinecompany, flightNumber,
fromCity, fromState, toState, DepartureDate,
fare, departureTime, arrivalTime, dc
airlineCompany AirTran Airways, lc
airlineCompany AirTran Airways)
28Planning and optimization
- Two ontologies NuclearTest, Earthquake
- Specifications of the information sources
- NuclearTestsDB( testSite, explosiveYield,
bodyWaveMagnitude, testType, eventDate,
conductedBy, dcbodyWaveMagnitudegt 3, dc
eventDate gt January 1, 1985 ) - NuclearTestSites (testSite, latitude, longitude
) - SignificantEarthquakeDB( eventDate, description,
region, magnitude, latitude, longitude,
numberOfDeaths, damagePhoto, dc eventDategt
january 1, 1970 )
29Planning and optimization
- Planning agent uses the following rules to select
the resources that are relevant - Locally complete Sources
- Non Locally Complete Sources
- Binding Patterns
- Associate resource to supply values for missing
attributes
30- NuclearTestsDB has two missing attributes
latitude nad longitude, planner uses FD testSite
-gt latitude longitude using NuclearTestSites as
an associate resource, function testSiteEquals
from the function store is used
31IScape Execution and monitoring
- User agent passes the IScape to the Broker agent.
- The broker agent starts the processing of the
IScape by coordinating other agents, - first asks planning agent to create an execution
plan planning agent interacts with the knowledge
agent to access information about domains, in
IScape and resources - Creates an execution plan
- Sends it back to the broker
- Broker sends to correlation agent
- Final result returned to broker and user agent
32IScape processing monitor
- Execution of the plan in the correlation agent is
multi-threaded and parallel
33Knowledge Builder (KB)
- Create specifications of domains, inter-domain
relationships, operations and the available
information sources
34IScape Builder (IB)
- Provides a graphical interface to create and
execute IScapes - Step1 select the ontologies that he/she wants,
selection of the relationships
35- Step2 specify functions that operate on
attributes - step3 specify the conditions that make up the
structure of IScape - Step 4 specify additional constraints on the way
the result of the IScape query should be groupd - Step5 specify the runtime projection parameters
36(No Transcript)
37References
- Amith Sheth, Sanjeev Thacker and Shuchi Patel,
complex Relationships and knowledge Discovery
support in the InfoQuilt system - Amith Sheth, I.Budak Arpinar, and Vipul Kashyap,
Relatiosships at the heart of semantic Web
Modeling, Discovering, and Exploiting complex
semantic relationships - R. J. Bayardo Jr., W. Bohrer, R. Brice, et al.
InfoSleuth Agent-Based Semantic - Integration of Information in Open and Dynamic
Environments. In SIGMOD-97,pp. 195-206, Tucson,
AZ, USA, May 1997. - C. Bertram. InfoQuilt Semantic Correlation of
Heterogeneous Distributed - Assets. Masters Thesis, Computer Science
Department, University of Georgia,1998.
38References
- Alexandria Digital Earth Prototype.
http//www.alexandria.ucsb.edu/ - S. Adali and R. Emery. A uniform framework for
integrating knowledge in heterogeneous systems.
Proceedings of the Eleventh IEEE International
Conference of Data Engineering (March 1995). - Y. Arens, C. Hsu and C. A. Knoblock. Query
processing in the SIMS - information mediator. In Austin Tate, editor,
Advanced Planning Technology.The AAAI Press,
Menlo Park, CA, 1996. - J. Ambite, and C. Knoblock. Planning by
Rewriting Efficiently generating highquality
plans. Proceedings of the 14th National
Conference on Artificial Intelligence,
Providence, RI, 1997. - Y. Arens, C. A. Knoblock, and W. Shen. Query
reformulation for dynamic information
integration. Journal of Intelligent Information
Systems, Vol. 6, pp. 99-130, 1996.
39Any Question