Diapositiva 1 - PowerPoint PPT Presentation

About This Presentation
Title:

Diapositiva 1

Description:

... {LC GC| y GLC NT y ) v (y BT LC)} 16. Francesco Guerra DBGroup_at_unimo. Updating ... which includes methods for collecting, contextualizing and visualizing data. ... – PowerPoint PPT presentation

Number of Views:33
Avg rating:3.0/5.0
Slides: 57
Provided by: francesc55
Category:

less

Transcript and Presenter's Notes

Title: Diapositiva 1


1
DOTTORATO DI RICERCA IN INGEGNERIA
DELLINFORMAZIONE XVI ciclo di dottorato - II
ciclo Nuova Serie
Dai Dati allInformazioneil sistema MOMIS
dott. ing. Francesco Guerra tutore prof. Sonia
Bergamaschi
2
Outline
  • Intelligent Integration of Information
  • Matching
  • The MOMIS system
  • MOMIS in the Semantic Web
  • MOMIS as the basis of a virtual marketplace
  • MOMIS to manage collaborative processes (the WINK
    project)
  • MOMIS as a semantic search engine (the SEWASIE
    project)

3
Intelligent Integration of Information
  • Distinguishing elements
  • Kinds of managed sources
  • The Global-as-View vs. the Local-as-View approach
  • Data Model
  • Building the Global View
  • Querying the Global View
  • Description Logics techniques
  • Updating the Global View

4
Intelligent Integration of Informationthe systems
5
Intelligent Integration of Informationthe systems
6
Matching comparison
  • Distinguishing elements
  • Different kinds of mappings representation
    (granularity, cardinality)
  • Mappings extraction (structure-instances
    analysis, lexical analysis, external tools
    exploitation)

7
Matching comparison
Extended from E. Rahm and P.A. Bernstein. A
survey of approaches to automatic schema
matching, VLDB Journal, 10(4)334-350,2001
8
Matching comparison
9
The MOMIS System
  • MOMIS (Mediator envirOnment for Multiple
    Information Sources) is a framework to perform
    information extraction and integration from both
    structured and semistructured data sources.
  • An object-oriented language, with an underlying
    Description Logic, called ODL-I3, derived from
    the standard ODMG is introduced for information
    extraction. Information integration is then
    performed in a semi-automatic way, by exploiting
    the knowledge in a Common Thesaurus and ODL-I3
    descriptions of source schemas with a combination
    of clustering techniques and Description Logics.
    This integration process gives rise to a virtual
    integrated view of the underlying sources (the
    Global Virtual View) for which mapping rules and
    integrity constraints are specified to handle
    heterogeneity.
  • The MOMIS system, based on a conventional
    wrapper/mediator architecture, provides methods
    and open tools for data management in
    Internet-based information systems by using a
    CORBA-2 interface. MOMIS was developed as a joint
    collaboration between the University of Modena
    and Reggio Emilia and University of Milano and
    Brescia.

10
The MOMIS System
Distributed information stored in multiple,
heterogeneous sources
  • Sources integration provides a Global Schema
    (which is a virtual view)
  • the Global Schema allows the user to send a query
    and get a unified answer from all the involved
    sources (transparently)
  • All information in http//www.dbgroup.unimo.it
  • INTERDATA (1999-2000) D2I (from Data to
    Information) (2001-2002) Programmi di ricerca
    scientifica di rilevante interesse nazionale
    WINK (Web-linked Integration of Network-based
    Knowledge) (2002-2003) SEWASIE (Semantic Webs
    and AgentS in Integrated Economies) (2002-2005)

11
The MOMIS System- Architecture
12
MANUALANNOTATION
SEMI-AUTOMATIC ANNOTATION
13
Local sources annotation
  • The integration designer has to manually choose
    the appropriate WordNet (www.cogsci.princeton.edu
    /wn/) meaning for each element of the
    conceptual schema provided by wrappers.
  • Motivations of the annotation
  • Exploiting semantics associated with the names of
    the schemas/structures of the information sources
  • Having a well-known meaning for each term of the
    sources
  • The annotation phase is composed of two steps
  • Word Form choice. The WordNet morphologic
    processor aids the designer by suggesting a word
    form corresponding to the given term.
  • Meaning choice. The designer can choose to map an
    element on zero, one or more senses. Notice that
    the user can choose a sense among the existing
    ones in WordNet and he can add new senses in the
    DB.

14
Global Virtual View annotation
  • The GVV has to be annotated to become exportable
    knowledge.
  • Annotating a GVV means to provide Global Classes
    with a name and with meanings.
  • By starting from annotations of local sources and
    mappings between the GVV and the local
    ontologies, we have developed a semi-automatic
    methodology to generate the annotations of the
    GVV.

15
GVV annotation
CS.Essayltessay, essay1gt CS.Publicationltpublic
ation,publication2gt UNI.Articleltarticle,artic
le1gt
Annotated Local classes
The CT relationships
UNI.Article NT CS.Publication
CS.Essay NT CS.Publication
broadest meaning
meanings
BLCGCLC?GC ? ?y ?GC, (LC NT y ) v (y BT LC)
16
Updating the GVV
  • A created GVV can change
  • By adding a new source on the system
  • By updating an existing data source schema
  • By deleting a previously integrated source
  • Adding a new source two possible scenarios
  • Integration from scratch the integration process
    is applied again in this case only the Common
    Thesaurus of the previously GVV can be exploited.
  • Integration with the GVV the process exploits
    the automatically annotated GVV and the Common
    Thesaurus.

17
Adding a new source
Annotated GVV
  • Common Thesaurus
  • intra/inter schema relationships
  • (only new sources)
  • lexicon relationships
  • (GVV e new sources annotated)
  • relationships inserted by user
  • inferred relationships

XML
New
New
18
Adding a new source
  • Three scenarions
  • A new global class is composed of only one old
    global class and one or more new local classes
  • A global class of the new integrated schema is
    composed of only new local classes
  • A global class of the new integrated schema is
    composed of more than one global class of the old
    GVV and at least one local class of the new source

19
GVV- integrated ontology
  • A GVV may be thought of as a domain ontology for
    the integrated sources the usual approach in the
    Semantic Web is based on a priori existence of
    an ontology connected by means of semantic
    markups to the sources

MOMIS
Semantic Web
Ontology
Ontology Builder
20
GVV- integrated ontology
  • The MOMIS ontology is composed of the following
    components
  • Global Virtual View
  • Mapping Rules
  • Integrity constraint rules
  • Intensional and extensional inter and
    intra-schema relationships (Common Thesaurus)
  • We express the ontology by using the ODLI3
    language or an OWL file.

21
Using the MOMIS system
  • The MOMIS system was exploited
  • To create a virtual marketplace
  • To support collaborative processes within the
    European Wink project
  • To build an advanced semantic search engine
    within the European SEWASIE project (under
    development)

22
SEWASIE
  • SEWASIE (SEmantic Webs and AgentS in Integrated
    Economies) is a research project funded by EU on
    action line Semantic Web (May 2002/April 2005)
  • The consortium details
  • Università degli Studi di Modena e Reggio Emilia
    (ITALY)
  • CNA SERVIZI Modena s.c.a.r.l. (ITALY)
  • Università degli Studi di Roma La Sapienza
    (ITALY)
  • Rheinisch Westfaelische Technische Hochschule
    Aachen (GERMANY)
  • Libera Università di Bolzano (ITALY)
  • Thinking Networks AG (GERMANY)
  • IBM Italia SPA (ITALY)
  • Fraunhofer-Gesellschaft Institut Angewandte
    Informationstechnik (GERMANY)

23
SEWASIE Objectives
The SEWASIE project aims to develop an advanced
search engine enabling intelligent access to
heterogeneous data sources on the web, via
semantic enrichment, to provide the basis for
structured web-based communication.
  • The SEWASIE project pursues the following aims
  • To develop an agent-based secure, scalable and
    distributed system architecture for semantic
    search (based on ontologies) and for structured
    web-based communication.
  • To develop a general framework for query
    management and information reconciliation based
    on a semantically enriched data and trusted agent
    structure.
  • To develop an information brokering component
    which includes methods for collecting,
    contextualizing and visualizing data.
  • To provide the end-user with an efficient
    interface for formulating queries using a
    graphical representation and for intelligent
    navigation through the semantically information
    space.

24
The SEWASIE architecture
  • The SEWASIE system realizes a virtual network,
    the SEWASIE Virtual Network (SVN), whose nodes
    are SEWASIE Information Nodes (SINodes),
    multi-database mediator-based systems, each
    including a Virtual Data Store, an Ontology
    Builder, and a Query Manager
  • Brokering Agents maintain the knowledge related
    to the SEWASIE Virtual Network and the user
    profiles.
  • In query solving phase, starting from a specified
    SINode, a Query Agent accesses other SINodes and
    thus collects partial answers.
  • To select SINodes useful to solve a query, a
    Query Agent interacts with a/several Brokering
    Agents.

25
The SEWASIE architecture
The userinterface layer
Other users
user
User Profile
Monitor Profiles
OLAP Tool
OLAP Reports
User Interface
Visualisation
user
user
Comm. Agent
Monitoring Interface
Comm. Interface
Query Interface
Metadata Interface
Communication Tool
Monitoring Agent (MA)
Query Results
SINode
Virtual Data Store
Theinformationlayer
Virtual Data Store
Query
Query
Manager
Ontology builder
SEWASIE Interconnection infrastructure
Metadata
Metadata
BA
Repository
Repository
BA
Ontology
Ontology
BA
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper
Wrapper

Semantic
Semantic
Semantic
Semantic
Enrichment
Enrichment
Enrichment
Enrichment
The intermediarieslayer
BA
ltXMLgt
ltXMLgt
RDBs
RDBs
ltDATAgt...
ltDATAgt...
lt/DATAgt
lt/DATAgt
Structured
Structured
Semi
-
Structured
Databases
Databases
Databases
Databases
ltHTMLgt
ltHTMLgt
Unstructured
...
...
Text documents
26
Future Work
  • Ontology evolution within an SINode
  • Update of existing sources
  • Deletion of previously integrated sources
  • Extending WordNet
  • If a source description element has no
    correspondent concept in WordNet, the designer
    may add a new meaning and proper relationships
    connecting them to existing meanings.
  • Multilingual functionalities
  • SEWASIE multilingual technologies will allow
    users to share information and resources
    available all over the world, but also to
    preserve their original local qualities.
  • Enrichment of multi-lingual lexicon ontology with
    the aid of statistical analysis techniques for
    multilingual text corpora (for example with
    techniques for the generation of multilingual
    dictionaries).

27
Partecipazione a progetti di ricerca di carattere
nazionale ed europeo
  • progetto D2I (From Data to Information)
    finanziato dal MIUR Programma di ricerca
    scientifica di rilevante interesse nazionale
    (2000-2001)
  • progetto Agenti software e commercio
    elettronico profili giuridici, tecnologici e
    psico-sociali, finanziato dal MIUR Programma di
    ricerca scientifica di rilevante interesse
    nazionale (2001-2002)
  • progetto Tecnologie per arricchire e fornire
    accesso a contenuti finanziato con il Fondo
    Speciale Innovazione 2000 (2001-2002)
  • progetto SEWASIE (SEmantic Web and AgentS in
    Integrated Economies) finanziato dalla Comunità
    Europea (2002-2005)
  • progetto WINK (Web-linked Integration of
    Network-based Knowledge) finanziato dalla
    Comunità Europea (cluster EUTIST-AMI).
    (2002-2003)

28
Pubblicazioni
Riviste Internazionali (RI) e Capitoli in libri
Internazionali (CLI) RI1 S. Bergamaschi, G.
Cabri, F. Guerra, L. Leonardi, M. Vincini, F.
Zambonelli, Exploiting Agents to Support
Information Integration, Special Issue of the
International Journal on Cooperative Information
Systems vol. 11(3-4) 293-314, 2002, ISSN
0218-8430 RI2 I. Benetti, D. Beneventano, S.
Bergamaschi, F. Guerra, M. Vincini, An
Information Integration Framework for E-Commerce,
IEEE Intelligent Systems Magazine, Jan/Feb 2002,
pp. 18-25, RI3 D. Beneventano, S.
Bergamaschi, F. Guerra, M. Vincini, Synthesizing
an Integrated Ontology, IEEE Internet Computer,
September-October 2003, 42-51, ISSN
1089-7801 RI4 I. Benetti, S. Bergamaschi, F.
Guerra, M. Vincini, Soap-enabled web services for
knowledge management to appear in Int. J. Web
Engineering and Technology, InderScience
Publishers. RI5 D. Beneventano, F. Guerra, S.
Magnani, M. Vincini A Web Service based framework
for the semantic mapping between product
classification schemas, to appear in Journal of
Electronic Commerce Research, ISSN
15266133. CLI1 D. Beneventano, S. Bergamaschi,
J. Gelati, F. Guerra, M. Vincini MIKS an agent
framework supporting information access and
integration, Intelligent Information Agents - The
AgentLink Perspective, (editor S. Bergamaschi, M.
Klusch, P. Edwards, P. Petta) - March 2003,
Lecture Notes in Computer Science N. 2586 -
Springer Verlag, pp. 22-49 ISSN 0302-9743 ISBN
3-540-00759-8 Riviste Nazionali (RN) RN1 G.
Gelati, F. Guerra, M. Vincini, Agents Supporting
Information Integration the MIKS Framework,
AIIA Notizie, Periodico dellAssociazione
Italiana per lIntelligenza Artificiale, AnnoXIV,
N.4, Dicembre 2001
29
Pubblicazioni
Congressi Internazionali (CI) CI1 D.
Beneventano, S. Bergamaschi, I. Benetti, A.
Corni, F. Guerra, G. Malvezzi, SI-Designer a
tool for intelligent integration of information,
34th Annual Hawaii International Conference on
System Sciences (HICSS-34), January 3-6, 2001,
Maui, Hawaii - Track 9. IEEE Computer
Society CI2 D. Beneventano, S. Bergamaschi, F.
Guerra, M. Vincini,The Momis approach to
Information Integration, IEEE and AAAI
International Conference on Enterprise
Information Systems (ICEIS01), Setùbal, Portugal,
7-10 July 2001, pp.194-198, ISBN
972-98050-2-4 CI3 I. Benetti, D. Beneventano,
S. Bergamaschi, F. Guerra, M. Vincini,
SI-Designer an Integration Framework for
E-Commerce, IJCAI01 Workshop on E-Business the
Intelligent Web Seattle, USA August 5 2001
CI4 S. Bergamaschi, G. Cabri, F. Guerra, L.
Leonardi, M. Vincini, F. Zambonelli, Supporting
information integration with autonomous agents,
Fifth International Workshop CIA-2001 on
COOPERATIVE INFORMATION AGENTS September 6 - 8,
2001 Modena, Italy pp, 88-99. CI5 D.
Calvanese, S. Castano, F. Guerra, D. Lembo, M.
Melchiori, G. Terracina, D. Ursino, M. Vincini,
Towards a comprehensive methodological framework
for integration, 8th International Workshop on
Knowledge Representation meets Databases
(KRDB-2001), Roma, Italy, 2001 CI6 S.
Bergamaschi, F. Guerra, M. Vincini, A Data
Integration Framework for E-commerce product
classification, 1st International Semantic Web
Conference (ISWC2002), Sardegna, Italy, 9-12 June
2002, LNCS 2342 Springer 2002, ISBN
3-540-43760-6, pp. 379-393, ISBN 3-540-43760-6
30
Pubblicazioni
  • CI7 S. Bergamaschi, F. Guerra, Peer to Peer
    Paradigm for a Semantic Search Engine, in
    proceedings of the International Workshop on
    Agents and Peer-to-Peer Computing, Bologna, 15
    July 2002, LNCS 2530, Springer ISBN
    3-540-40538-0
  • CI8 S. Bergamaschi, F. Guerra, M. Vincini,
    Product Classification Integration for
    E-Commerce, Second International Workshop on
    Electronic Business Hubs - WEBH 2002 in
    conjunction with DEXA 2002, September 2-6 2002,
    Aix En Provence, France, published by IEEE
    Computer Society, Los Alanitos (CA), ISBN
    0-7695-1668-8, pp. 861-867
  • CI9 D. Beneventano, S. Bergamaschi, S.
    Castano, V. De Antonellis, A. Ferrara, F. Guerra,
    F. Mandreoli, G. Ornetti, M. Vincini, Semantic
    Integration and Query Optimization of
    Heterogeneous Data Sources, 1st Int.l Workshop on
    Efficient Web-based Information Systems (EWIS),
    2002, Montpellier, France, pp.154-165.
  • CI10 S. Bergamaschi, F. Guerra, M. Vincini, A
    peer-to-peer information system for the semantic
    web, in proceedings of the International Workshop
    on Agents and Peer-to-Peer Computing, in AAMAS
    2003 Melbourne, Australia, July 14, 2003
  • CI11 D. Beneventano, S. Bergamaschi, F.
    Guerra, M. Vincini Building an Ontology with
    MOMIS, in proceedings of the Semantic Integration
    Workshop within the Second International Semantic
    Web Conference, October 20, 2003 Sundial Resort,
    Sanibel Island, Florida, USA.
  • CI12 D. Beneventano, S. Bergamaschi, F.
    Guerra, M. Vincini, Building an integrated
    Ontology within SEWASIE system, in proceedings of
    the First International Workshop on Semantic Web
    and Databases, Co-located with VLDB 2003 Berlin,
    Germany, (2003)
  • CI13 S. Bergamaschi, G.Gelati, F. Guerra, M.
    Vincini, WINK a Web-based Enterprise System for
    Collaborative Project Management in Virtual
    Enterprises, 4th International Conference on Web
    Information Systems Engineering, Roma Italy,
    10-12 December 2003

31
Pubblicazioni
Congressi Nazionali (CN) CN1 D. Beneventano, S.
Bergamaschi, F. Guerra, M. Vincini, Exploiting
extensional knowledge for query reformulation and
object fusion in a data integration system,
Proceedings of SEBD2001, Venezia, 27-29 June,
2001, pp. 257-271 CN2 G. Gelati, F. Guerra, M.
Vincini, Agents Supporting Information
Integration the MIKS Framework, Proc. AIIA and
TABOO Workshop From Object to Agents, Pitagora
Editrice, Bologna, ISBN 88-371-1272-6, September
2001 CN3 D. Beneventano, S. Bergamaschi, D.
Bianco, F. Guerra, M. Vincini, SI-Web a Web
based interface for the MOMIS project,
Proceedings of SEBD2002, 19-22 June, 2002, pp.
407-411 CN4 D. Beneventano, S. Bergamaschi, D.
Gazzotti, G.Gelati, F. Guerra, M. Vincini, The
WINK Project for Virtual Enterprise Networking
and Integration, Proceedings of SEBD2002, 2002,
pp. 283-290 CN5 D. Beneventano, S.
Bergamaschi, M.Felice, D. Gazzotti, G.Gelati, F.
Guerra, M. Vincini,. An Agent framework for
Supporting the MIKS Integration, Proc. AIIA and
TABOO Workshop From Object to Agents, 18-19
Novembre 2002, Milano Università Bicocca CN6
D. Beneventano, S. Bergamaschi, A. Fergnani, F.
Guerra, M. Vincini, D. Montanari, A Peer-to-Peer
Agent-Based Semantic Search Engine, Proceedings
of SEBD2003, Cetraro (CS),2003, pp.283-290 CN7
S. Bergamaschi, G. Gelati, F. Guerra, M. Vincini,
A Experiencing AUML for the WINK Milti-Agent
System, Proc. AIIA and TABOO Workshop From
Object to Agents, 10-11 Settembre 2003,
Villasimius (CA)
32
Global Instance Computation
  • For the definition of a Global Class we have to
    define the following elements
  • Mapping Table define the mapping between the
    global class attributes and the local classes
    attributes
  • Join condition we assume that there is a Join
    Condition between each pair of overlapping
    relations to identify tuples corresponding to the
    same object and fuse them
  • Full disjunction the GC contains a unique tuple
    containing a unique tuple resulting from the
    merge of all different tuples representing the
    same real world object.

33
Global Instance Computation
  • S(l1) (?rstn, lastn, year, e_mail)
  • S(l2) (name, e_mail, dept_code, s_code)
  • Two functions
  • Global function renaming the attributes of the
    local classes into attributes of the global class
  • Local Function converting a tuple of elements of
    a local classby suitable functions such as string
    concatenations .

34
Global Instance Computation
  • Semantic Homogeneity property condition

Join Attribute
Join Attribute
Full Disjunction
35
Global Instance Computation
  • Semantic Homogeneity property condition not
    verified
  • Resolution functions
  • Random
  • Priority
  • User defined function

36
Clusters generation
  • Clustering Algorithm (Hierarchical clustering
    techniques - Heveritt)
  • Compute all pair-wise Global Affinity
    coefficients GA(cji,chk).
  • Place each class into a separate cluster
  • Repeat
  • Select pair cji,chk of current cluster with the
    highest GA in M(h,k) maxi,j M(i,j)
  • Form a new cluster by combining ch,ck
  • Update cji,chk by deleting the rows and columns
    corresponding to ch and ck
  • Define a new row and column (hk) for the new
    cluster ch,k. until rank of M is greater than 1.

37
Affinity evaluation
Two names have Affinity when connected through a
path in the Thesaurus
The Structural Affinity of two classes measures
the affinity of their Attributes (Dices
function)
The Global Affinity of two classes measures the
affinity computed as the weighted sum of the
Name and Structural Affinity
Extensional evaluation
38
ODB-Tools
Schema Validator automatically builds the
class taxonomy and preserves the coherence
with respect to the inheritance and
aggregation hierarchies. Query Optimizer
executes the semantic expansion of the query
Available at http//www.dbgroup.unimo.it/ODB-Tools
.html
39
Example
University source (relational)
Department(dept_code,dept_name,budget)
Research_Staff(name,e_mail,dept-code,s_code) FK
dept_code REF Department, s_code REF
Section School_Member(name,school,year,e_mail) Sec
tion(s_code,section_name,length,room_code) FK
room_code REF Department, s_code REF
Room Room(room_code,seats_number,notes)
Tax_Position source (XML)
lt!ELEMENT ListOfStudent (Student)gt lt!ELEMENT
Student (name,s_code,school_name,e_mail,tax_fee)gt
lt!ELEMENT name (PCDATA)gt
40
Example
Computer_Science source (object)
CS_Person(first_name,last_name)
ProfessorCS_Person(belongs_toDivision,rank) Stu
dentCS_Person(year,takessetltCoursegt,rank,e_mail)
Division(description,addressLocation) Location(c
ity,street,number,country) Course(course_name,toug
ht_byProfessor)
41
Source Acquisition Module
42
Common Thesaurus (Domain Ontology)
Set of terminological relationships between
classes and attributes names (terms) expresses
both intra-schema and inter-schema
knowledge Relationships added to Common
Thesaurus (1) schema derived (2) lexicon
derived (3) designer supplied (4) inferred
exploiting ODB-Tools capabilities
43
Schema-derived relationships
  • Terminological and extensional intra-schema
    relationships
  • RT relationships derived from foreign keys in
    a relational schema UNI.Section RT
    UNI.Department
  • BT/NT relationships derived from inheritance
    relationships in a object-oriented schema or
    integrity constraints in relational schema
    CS.Student NT CS.CS_Person CS.Professor NT
    CS.CS_Person

44
Schema Derived Relationships
45
Lexicon-derived relationships
Extracted from WordNet lexical database
(Princeton Un.) 129625 lemma organized in 99759
synonym set (synset) Synonymy Polysemy Tax_posi
tion_xml.Student.name SYN
University.School_member.name CS.Professor
NT CS.CS_Person
46
Lexicon Derived Relationships
47
Lexicon Derived Relationships
48
Lexicon Derived Relationships
49
Lexicon Derived Relationships
50
Lexicon Derived Relationships
51
Inferred relationships
Exploiting Description Logics techniques
(ODB-Tools system) a new set of terminological
relationships are inferred University.Research_Sta
ff RT CS.Course
52
Common Thesaurus
53
Mediator global schema
Global schema generation (interaction with
ARTEMIS module) Affinity calculation Cluster
generation Global attributes and mapping table
generation
  • A global class gci is generated for each cluster
    Cli
  • SI-Designer builds the attributes set to be
    associated to the cluster
  • Union of the attributes of all classes belonging
    to the cluster
  • Fusion of similar attributes

54
Affinity tree and Cluster
55
Affinity tree and Cluster
56
Affinity tree and Cluster
57
Mapping table example
  • each global class includes mapping rules between
    global and local attributes (and/or
    relationships, default/null values)
  • a mapping is generated for each global class gci

58
Mapping table
59
Mapping table
Write a Comment
User Comments (0)
About PowerShow.com