Title: Diapositiva 1
1DOTTORATO DI RICERCA IN INGEGNERIA
DELLINFORMAZIONE XVI ciclo di dottorato - II
ciclo Nuova Serie
Dai Dati allInformazioneil sistema MOMIS
dott. ing. Francesco Guerra tutore prof. Sonia
Bergamaschi
2Outline
- Intelligent Integration of Information
- Matching
- The MOMIS system
- MOMIS in the Semantic Web
- MOMIS as the basis of a virtual marketplace
- MOMIS to manage collaborative processes (the WINK
project) - MOMIS as a semantic search engine (the SEWASIE
project)
3Intelligent Integration of Information
- Distinguishing elements
- Kinds of managed sources
- The Global-as-View vs. the Local-as-View approach
- Data Model
- Building the Global View
- Querying the Global View
- Description Logics techniques
- Updating the Global View
4Intelligent Integration of Informationthe systems
5Intelligent Integration of Informationthe systems
6Matching comparison
- Distinguishing elements
- Different kinds of mappings representation
(granularity, cardinality) - Mappings extraction (structure-instances
analysis, lexical analysis, external tools
exploitation)
7Matching comparison
Extended from E. Rahm and P.A. Bernstein. A
survey of approaches to automatic schema
matching, VLDB Journal, 10(4)334-350,2001
8Matching comparison
9The MOMIS System
- MOMIS (Mediator envirOnment for Multiple
Information Sources) is a framework to perform
information extraction and integration from both
structured and semistructured data sources. - An object-oriented language, with an underlying
Description Logic, called ODL-I3, derived from
the standard ODMG is introduced for information
extraction. Information integration is then
performed in a semi-automatic way, by exploiting
the knowledge in a Common Thesaurus and ODL-I3
descriptions of source schemas with a combination
of clustering techniques and Description Logics.
This integration process gives rise to a virtual
integrated view of the underlying sources (the
Global Virtual View) for which mapping rules and
integrity constraints are specified to handle
heterogeneity. - The MOMIS system, based on a conventional
wrapper/mediator architecture, provides methods
and open tools for data management in
Internet-based information systems by using a
CORBA-2 interface. MOMIS was developed as a joint
collaboration between the University of Modena
and Reggio Emilia and University of Milano and
Brescia.
10The MOMIS System
Distributed information stored in multiple,
heterogeneous sources
- Sources integration provides a Global Schema
(which is a virtual view) - the Global Schema allows the user to send a query
and get a unified answer from all the involved
sources (transparently) - All information in http//www.dbgroup.unimo.it
- INTERDATA (1999-2000) D2I (from Data to
Information) (2001-2002) Programmi di ricerca
scientifica di rilevante interesse nazionale
WINK (Web-linked Integration of Network-based
Knowledge) (2002-2003) SEWASIE (Semantic Webs
and AgentS in Integrated Economies) (2002-2005)
11The MOMIS System- Architecture
12MANUALANNOTATION
SEMI-AUTOMATIC ANNOTATION
13Local sources annotation
- The integration designer has to manually choose
the appropriate WordNet (www.cogsci.princeton.edu
/wn/) meaning for each element of the
conceptual schema provided by wrappers.
- Motivations of the annotation
- Exploiting semantics associated with the names of
the schemas/structures of the information sources - Having a well-known meaning for each term of the
sources
- The annotation phase is composed of two steps
- Word Form choice. The WordNet morphologic
processor aids the designer by suggesting a word
form corresponding to the given term. - Meaning choice. The designer can choose to map an
element on zero, one or more senses. Notice that
the user can choose a sense among the existing
ones in WordNet and he can add new senses in the
DB.
14Global Virtual View annotation
- The GVV has to be annotated to become exportable
knowledge. - Annotating a GVV means to provide Global Classes
with a name and with meanings. - By starting from annotations of local sources and
mappings between the GVV and the local
ontologies, we have developed a semi-automatic
methodology to generate the annotations of the
GVV.
15GVV annotation
CS.Essayltessay, essay1gt CS.Publicationltpublic
ation,publication2gt UNI.Articleltarticle,artic
le1gt
Annotated Local classes
The CT relationships
UNI.Article NT CS.Publication
CS.Essay NT CS.Publication
broadest meaning
meanings
BLCGCLC?GC ? ?y ?GC, (LC NT y ) v (y BT LC)
16Updating the GVV
- A created GVV can change
- By adding a new source on the system
- By updating an existing data source schema
- By deleting a previously integrated source
- Adding a new source two possible scenarios
- Integration from scratch the integration process
is applied again in this case only the Common
Thesaurus of the previously GVV can be exploited.
- Integration with the GVV the process exploits
the automatically annotated GVV and the Common
Thesaurus.
17Adding a new source
Annotated GVV
- Common Thesaurus
- intra/inter schema relationships
- (only new sources)
- lexicon relationships
- (GVV e new sources annotated)
- relationships inserted by user
- inferred relationships
XML
New
New
18Adding a new source
- Three scenarions
- A new global class is composed of only one old
global class and one or more new local classes - A global class of the new integrated schema is
composed of only new local classes - A global class of the new integrated schema is
composed of more than one global class of the old
GVV and at least one local class of the new source
19GVV- integrated ontology
- A GVV may be thought of as a domain ontology for
the integrated sources the usual approach in the
Semantic Web is based on a priori existence of
an ontology connected by means of semantic
markups to the sources
MOMIS
Semantic Web
Ontology
Ontology Builder
20GVV- integrated ontology
- The MOMIS ontology is composed of the following
components - Global Virtual View
- Mapping Rules
- Integrity constraint rules
- Intensional and extensional inter and
intra-schema relationships (Common Thesaurus) - We express the ontology by using the ODLI3
language or an OWL file.
21Using the MOMIS system
- The MOMIS system was exploited
- To create a virtual marketplace
- To support collaborative processes within the
European Wink project - To build an advanced semantic search engine
within the European SEWASIE project (under
development)
22SEWASIE
- SEWASIE (SEmantic Webs and AgentS in Integrated
Economies) is a research project funded by EU on
action line Semantic Web (May 2002/April 2005) - The consortium details
- Università degli Studi di Modena e Reggio Emilia
(ITALY) - CNA SERVIZI Modena s.c.a.r.l. (ITALY)
- Università degli Studi di Roma La Sapienza
(ITALY) - Rheinisch Westfaelische Technische Hochschule
Aachen (GERMANY) - Libera Università di Bolzano (ITALY)
- Thinking Networks AG (GERMANY)
- IBM Italia SPA (ITALY)
- Fraunhofer-Gesellschaft Institut Angewandte
Informationstechnik (GERMANY)
23SEWASIE Objectives
The SEWASIE project aims to develop an advanced
search engine enabling intelligent access to
heterogeneous data sources on the web, via
semantic enrichment, to provide the basis for
structured web-based communication.
- The SEWASIE project pursues the following aims
- To develop an agent-based secure, scalable and
distributed system architecture for semantic
search (based on ontologies) and for structured
web-based communication. - To develop a general framework for query
management and information reconciliation based
on a semantically enriched data and trusted agent
structure. - To develop an information brokering component
which includes methods for collecting,
contextualizing and visualizing data. - To provide the end-user with an efficient
interface for formulating queries using a
graphical representation and for intelligent
navigation through the semantically information
space.
24The SEWASIE architecture
-
- The SEWASIE system realizes a virtual network,
the SEWASIE Virtual Network (SVN), whose nodes
are SEWASIE Information Nodes (SINodes),
multi-database mediator-based systems, each
including a Virtual Data Store, an Ontology
Builder, and a Query Manager - Brokering Agents maintain the knowledge related
to the SEWASIE Virtual Network and the user
profiles. - In query solving phase, starting from a specified
SINode, a Query Agent accesses other SINodes and
thus collects partial answers. - To select SINodes useful to solve a query, a
Query Agent interacts with a/several Brokering
Agents.
25The SEWASIE architecture
The userinterface layer
Other users
user
User Profile
Monitor Profiles
OLAP Tool
OLAP Reports
User Interface
Visualisation
user
user
Comm. Agent
Monitoring Interface
Comm. Interface
Query Interface
Metadata Interface
Communication Tool
Monitoring Agent (MA)
Query Results
SINode
Virtual Data Store
Theinformationlayer
Virtual Data Store
Query
Query
Manager
Ontology builder
SEWASIE Interconnection infrastructure
Metadata
Metadata
BA
Repository
Repository
BA
Ontology
Ontology
BA
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper
Semantic
Semantic
Semantic
Semantic
Enrichment
Enrichment
Enrichment
Enrichment
The intermediarieslayer
BA
ltXMLgt
ltXMLgt
RDBs
RDBs
ltDATAgt...
ltDATAgt...
lt/DATAgt
lt/DATAgt
Structured
Structured
Semi
-
Structured
Databases
Databases
Databases
Databases
ltHTMLgt
ltHTMLgt
Unstructured
...
...
Text documents
26Future Work
- Ontology evolution within an SINode
- Update of existing sources
- Deletion of previously integrated sources
- Extending WordNet
- If a source description element has no
correspondent concept in WordNet, the designer
may add a new meaning and proper relationships
connecting them to existing meanings. - Multilingual functionalities
- SEWASIE multilingual technologies will allow
users to share information and resources
available all over the world, but also to
preserve their original local qualities. - Enrichment of multi-lingual lexicon ontology with
the aid of statistical analysis techniques for
multilingual text corpora (for example with
techniques for the generation of multilingual
dictionaries).
27Partecipazione a progetti di ricerca di carattere
nazionale ed europeo
- progetto D2I (From Data to Information)
finanziato dal MIUR Programma di ricerca
scientifica di rilevante interesse nazionale
(2000-2001) - progetto Agenti software e commercio
elettronico profili giuridici, tecnologici e
psico-sociali, finanziato dal MIUR Programma di
ricerca scientifica di rilevante interesse
nazionale (2001-2002) - progetto Tecnologie per arricchire e fornire
accesso a contenuti finanziato con il Fondo
Speciale Innovazione 2000 (2001-2002) - progetto SEWASIE (SEmantic Web and AgentS in
Integrated Economies) finanziato dalla Comunità
Europea (2002-2005) - progetto WINK (Web-linked Integration of
Network-based Knowledge) finanziato dalla
Comunità Europea (cluster EUTIST-AMI).
(2002-2003)
28Pubblicazioni
Riviste Internazionali (RI) e Capitoli in libri
Internazionali (CLI) RI1 S. Bergamaschi, G.
Cabri, F. Guerra, L. Leonardi, M. Vincini, F.
Zambonelli, Exploiting Agents to Support
Information Integration, Special Issue of the
International Journal on Cooperative Information
Systems vol. 11(3-4) 293-314, 2002, ISSN
0218-8430 RI2 I. Benetti, D. Beneventano, S.
Bergamaschi, F. Guerra, M. Vincini, An
Information Integration Framework for E-Commerce,
IEEE Intelligent Systems Magazine, Jan/Feb 2002,
pp. 18-25, RI3 D. Beneventano, S.
Bergamaschi, F. Guerra, M. Vincini, Synthesizing
an Integrated Ontology, IEEE Internet Computer,
September-October 2003, 42-51, ISSN
1089-7801 RI4 I. Benetti, S. Bergamaschi, F.
Guerra, M. Vincini, Soap-enabled web services for
knowledge management to appear in Int. J. Web
Engineering and Technology, InderScience
Publishers. RI5 D. Beneventano, F. Guerra, S.
Magnani, M. Vincini A Web Service based framework
for the semantic mapping between product
classification schemas, to appear in Journal of
Electronic Commerce Research, ISSN
15266133. CLI1 D. Beneventano, S. Bergamaschi,
J. Gelati, F. Guerra, M. Vincini MIKS an agent
framework supporting information access and
integration, Intelligent Information Agents - The
AgentLink Perspective, (editor S. Bergamaschi, M.
Klusch, P. Edwards, P. Petta) - March 2003,
Lecture Notes in Computer Science N. 2586 -
Springer Verlag, pp. 22-49 ISSN 0302-9743 ISBN
3-540-00759-8 Riviste Nazionali (RN) RN1 G.
Gelati, F. Guerra, M. Vincini, Agents Supporting
Information Integration the MIKS Framework,
AIIA Notizie, Periodico dellAssociazione
Italiana per lIntelligenza Artificiale, AnnoXIV,
N.4, Dicembre 2001
29Pubblicazioni
Congressi Internazionali (CI) CI1 D.
Beneventano, S. Bergamaschi, I. Benetti, A.
Corni, F. Guerra, G. Malvezzi, SI-Designer a
tool for intelligent integration of information,
34th Annual Hawaii International Conference on
System Sciences (HICSS-34), January 3-6, 2001,
Maui, Hawaii - Track 9. IEEE Computer
Society CI2 D. Beneventano, S. Bergamaschi, F.
Guerra, M. Vincini,The Momis approach to
Information Integration, IEEE and AAAI
International Conference on Enterprise
Information Systems (ICEIS01), Setùbal, Portugal,
7-10 July 2001, pp.194-198, ISBN
972-98050-2-4 CI3 I. Benetti, D. Beneventano,
S. Bergamaschi, F. Guerra, M. Vincini,
SI-Designer an Integration Framework for
E-Commerce, IJCAI01 Workshop on E-Business the
Intelligent Web Seattle, USA August 5 2001
CI4 S. Bergamaschi, G. Cabri, F. Guerra, L.
Leonardi, M. Vincini, F. Zambonelli, Supporting
information integration with autonomous agents,
Fifth International Workshop CIA-2001 on
COOPERATIVE INFORMATION AGENTS September 6 - 8,
2001 Modena, Italy pp, 88-99. CI5 D.
Calvanese, S. Castano, F. Guerra, D. Lembo, M.
Melchiori, G. Terracina, D. Ursino, M. Vincini,
Towards a comprehensive methodological framework
for integration, 8th International Workshop on
Knowledge Representation meets Databases
(KRDB-2001), Roma, Italy, 2001 CI6 S.
Bergamaschi, F. Guerra, M. Vincini, A Data
Integration Framework for E-commerce product
classification, 1st International Semantic Web
Conference (ISWC2002), Sardegna, Italy, 9-12 June
2002, LNCS 2342 Springer 2002, ISBN
3-540-43760-6, pp. 379-393, ISBN 3-540-43760-6
30Pubblicazioni
- CI7 S. Bergamaschi, F. Guerra, Peer to Peer
Paradigm for a Semantic Search Engine, in
proceedings of the International Workshop on
Agents and Peer-to-Peer Computing, Bologna, 15
July 2002, LNCS 2530, Springer ISBN
3-540-40538-0 - CI8 S. Bergamaschi, F. Guerra, M. Vincini,
Product Classification Integration for
E-Commerce, Second International Workshop on
Electronic Business Hubs - WEBH 2002 in
conjunction with DEXA 2002, September 2-6 2002,
Aix En Provence, France, published by IEEE
Computer Society, Los Alanitos (CA), ISBN
0-7695-1668-8, pp. 861-867 - CI9 D. Beneventano, S. Bergamaschi, S.
Castano, V. De Antonellis, A. Ferrara, F. Guerra,
F. Mandreoli, G. Ornetti, M. Vincini, Semantic
Integration and Query Optimization of
Heterogeneous Data Sources, 1st Int.l Workshop on
Efficient Web-based Information Systems (EWIS),
2002, Montpellier, France, pp.154-165. - CI10 S. Bergamaschi, F. Guerra, M. Vincini, A
peer-to-peer information system for the semantic
web, in proceedings of the International Workshop
on Agents and Peer-to-Peer Computing, in AAMAS
2003 Melbourne, Australia, July 14, 2003 - CI11 D. Beneventano, S. Bergamaschi, F.
Guerra, M. Vincini Building an Ontology with
MOMIS, in proceedings of the Semantic Integration
Workshop within the Second International Semantic
Web Conference, October 20, 2003 Sundial Resort,
Sanibel Island, Florida, USA. - CI12 D. Beneventano, S. Bergamaschi, F.
Guerra, M. Vincini, Building an integrated
Ontology within SEWASIE system, in proceedings of
the First International Workshop on Semantic Web
and Databases, Co-located with VLDB 2003 Berlin,
Germany, (2003) - CI13 S. Bergamaschi, G.Gelati, F. Guerra, M.
Vincini, WINK a Web-based Enterprise System for
Collaborative Project Management in Virtual
Enterprises, 4th International Conference on Web
Information Systems Engineering, Roma Italy,
10-12 December 2003
31Pubblicazioni
Congressi Nazionali (CN) CN1 D. Beneventano, S.
Bergamaschi, F. Guerra, M. Vincini, Exploiting
extensional knowledge for query reformulation and
object fusion in a data integration system,
Proceedings of SEBD2001, Venezia, 27-29 June,
2001, pp. 257-271 CN2 G. Gelati, F. Guerra, M.
Vincini, Agents Supporting Information
Integration the MIKS Framework, Proc. AIIA and
TABOO Workshop From Object to Agents, Pitagora
Editrice, Bologna, ISBN 88-371-1272-6, September
2001 CN3 D. Beneventano, S. Bergamaschi, D.
Bianco, F. Guerra, M. Vincini, SI-Web a Web
based interface for the MOMIS project,
Proceedings of SEBD2002, 19-22 June, 2002, pp.
407-411 CN4 D. Beneventano, S. Bergamaschi, D.
Gazzotti, G.Gelati, F. Guerra, M. Vincini, The
WINK Project for Virtual Enterprise Networking
and Integration, Proceedings of SEBD2002, 2002,
pp. 283-290 CN5 D. Beneventano, S.
Bergamaschi, M.Felice, D. Gazzotti, G.Gelati, F.
Guerra, M. Vincini,. An Agent framework for
Supporting the MIKS Integration, Proc. AIIA and
TABOO Workshop From Object to Agents, 18-19
Novembre 2002, Milano Università Bicocca CN6
D. Beneventano, S. Bergamaschi, A. Fergnani, F.
Guerra, M. Vincini, D. Montanari, A Peer-to-Peer
Agent-Based Semantic Search Engine, Proceedings
of SEBD2003, Cetraro (CS),2003, pp.283-290 CN7
S. Bergamaschi, G. Gelati, F. Guerra, M. Vincini,
A Experiencing AUML for the WINK Milti-Agent
System, Proc. AIIA and TABOO Workshop From
Object to Agents, 10-11 Settembre 2003,
Villasimius (CA)
32Global Instance Computation
- For the definition of a Global Class we have to
define the following elements - Mapping Table define the mapping between the
global class attributes and the local classes
attributes - Join condition we assume that there is a Join
Condition between each pair of overlapping
relations to identify tuples corresponding to the
same object and fuse them - Full disjunction the GC contains a unique tuple
containing a unique tuple resulting from the
merge of all different tuples representing the
same real world object.
33Global Instance Computation
- S(l1) (?rstn, lastn, year, e_mail)
- S(l2) (name, e_mail, dept_code, s_code)
- Two functions
- Global function renaming the attributes of the
local classes into attributes of the global class - Local Function converting a tuple of elements of
a local classby suitable functions such as string
concatenations .
34Global Instance Computation
- Semantic Homogeneity property condition
Join Attribute
Join Attribute
Full Disjunction
35Global Instance Computation
- Semantic Homogeneity property condition not
verified - Resolution functions
- Random
- Priority
- User defined function
36Clusters generation
- Clustering Algorithm (Hierarchical clustering
techniques - Heveritt) - Compute all pair-wise Global Affinity
coefficients GA(cji,chk). - Place each class into a separate cluster
- Repeat
- Select pair cji,chk of current cluster with the
highest GA in M(h,k) maxi,j M(i,j) - Form a new cluster by combining ch,ck
- Update cji,chk by deleting the rows and columns
corresponding to ch and ck - Define a new row and column (hk) for the new
cluster ch,k. until rank of M is greater than 1.
37Affinity evaluation
Two names have Affinity when connected through a
path in the Thesaurus
The Structural Affinity of two classes measures
the affinity of their Attributes (Dices
function)
The Global Affinity of two classes measures the
affinity computed as the weighted sum of the
Name and Structural Affinity
Extensional evaluation
38ODB-Tools
Schema Validator automatically builds the
class taxonomy and preserves the coherence
with respect to the inheritance and
aggregation hierarchies. Query Optimizer
executes the semantic expansion of the query
Available at http//www.dbgroup.unimo.it/ODB-Tools
.html
39Example
University source (relational)
Department(dept_code,dept_name,budget)
Research_Staff(name,e_mail,dept-code,s_code) FK
dept_code REF Department, s_code REF
Section School_Member(name,school,year,e_mail) Sec
tion(s_code,section_name,length,room_code) FK
room_code REF Department, s_code REF
Room Room(room_code,seats_number,notes)
Tax_Position source (XML)
lt!ELEMENT ListOfStudent (Student)gt lt!ELEMENT
Student (name,s_code,school_name,e_mail,tax_fee)gt
lt!ELEMENT name (PCDATA)gt
40Example
Computer_Science source (object)
CS_Person(first_name,last_name)
ProfessorCS_Person(belongs_toDivision,rank) Stu
dentCS_Person(year,takessetltCoursegt,rank,e_mail)
Division(description,addressLocation) Location(c
ity,street,number,country) Course(course_name,toug
ht_byProfessor)
41Source Acquisition Module
42Common Thesaurus (Domain Ontology)
Set of terminological relationships between
classes and attributes names (terms) expresses
both intra-schema and inter-schema
knowledge Relationships added to Common
Thesaurus (1) schema derived (2) lexicon
derived (3) designer supplied (4) inferred
exploiting ODB-Tools capabilities
43Schema-derived relationships
- Terminological and extensional intra-schema
relationships - RT relationships derived from foreign keys in
a relational schema UNI.Section RT
UNI.Department - BT/NT relationships derived from inheritance
relationships in a object-oriented schema or
integrity constraints in relational schema
CS.Student NT CS.CS_Person CS.Professor NT
CS.CS_Person
44Schema Derived Relationships
45Lexicon-derived relationships
Extracted from WordNet lexical database
(Princeton Un.) 129625 lemma organized in 99759
synonym set (synset) Synonymy Polysemy Tax_posi
tion_xml.Student.name SYN
University.School_member.name CS.Professor
NT CS.CS_Person
46Lexicon Derived Relationships
47Lexicon Derived Relationships
48Lexicon Derived Relationships
49Lexicon Derived Relationships
50Lexicon Derived Relationships
51Inferred relationships
Exploiting Description Logics techniques
(ODB-Tools system) a new set of terminological
relationships are inferred University.Research_Sta
ff RT CS.Course
52Common Thesaurus
53Mediator global schema
Global schema generation (interaction with
ARTEMIS module) Affinity calculation Cluster
generation Global attributes and mapping table
generation
- A global class gci is generated for each cluster
Cli - SI-Designer builds the attributes set to be
associated to the cluster - Union of the attributes of all classes belonging
to the cluster - Fusion of similar attributes
54Affinity tree and Cluster
55Affinity tree and Cluster
56Affinity tree and Cluster
57Mapping table example
- each global class includes mapping rules between
global and local attributes (and/or
relationships, default/null values) - a mapping is generated for each global class gci
58Mapping table
59Mapping table