Data integration - PowerPoint PPT Presentation

About This Presentation
Title:

Data integration

Description:

Given: Several sources and a query ... sources; Need to have a global picture. ... LaV approach a particular set of queries can be considered a priori and their ... – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 8
Provided by: publi5
Category:

less

Transcript and Presenter's Notes

Title: Data integration


1
Data integration
  • Chitta Baral
  • Arizona State University

2
Example 1
  • Data Source 1
  • List of courses with the title Database
    Systems taught anywhere, their instructors, and
    university names.
  • View R1(prof, course, university)
  • Data Source 2
  • List of Ph.D level courses taught at ASU,
    professors name and course
  • View R2(title,prof,course)
  • Query List the courses of courses taught at
    ASU, and the professor names who teach the
    course.
  • Partial answer obtained by using the following
    query
  • SELECT course, prof
  • FROM R1 WHERE university ASU
  • UNION
  • SELECT course, prof
  • FROM R2

3
General question
  • Given Several sources and a query
  • Problem How do we best answer this query using
    the several sources that are available?
  • First Step Need to model the sources Need to
    have a global picture.
  • Two basic approaches Global as view (GaV) and
    Local as view (LaV).
  • LaV for the last example.
  • Global schema Teaches(prof,course,title,semester
    ,university)
  • Create view R1 as
  • SELECT prof, course, university FROM Teaches
  • WHERE title Database systems .
  • Create view R2 as
  • SELECT title, prof, course
    FROM Teaches
  • WHERE univesrity ASU and course gt 500
  • Now given a query (in English), we need to
    express it in terms of the global schema, and
    then reformulate it (to the extent possible) in
    terms of the sources (R1 and R2 here).
  • To do that use the relation between R1 (and R2)
    and the global schema.

4
A Global as View (GaV) example
  • 3 movie sources
  • S1(title,dir,year,genre) from until 1980.
  • S2(title,dir) since 1970
  • S3(title, year, genre) all movies
  • A global view S1 union (S2 join S3)
  • SELECT FROM S1
  • UNION
  • SELECT S2.title, S2.dir, S3.year, S3.genre
  • FROM S2, S3 WHERE S2.title S3.title
  • Another global view union of S1, (S2 join S3)
    and 4-tuples made up of tuples in S2 and S3
    (where the title does not appear in the other)
    with added null values.
  • If we have S2(xyz, uvw) and xyz is not a title
    that appears in S3 then we assume
    (xyz,uvw,null,null) is part of the global view.

5
LaV vs GaV
  • Given a query reformulating it in terms of the
    sources
  • Is easier in GAV (just needs unfolding of the
    query)
  • Is harder in LaV
  • Adding a new source
  • Supposedly easier in LaV (just need to express
    the new source as a view of the global schema)
  • Harder in GaV (as the global schema needs to be
    revised)

6
Steps for Projects of type 2
  • Given Some NIH/NCBI/Others data sources
  • Goal Virtual integration of these sources
  • First Step Explore each data source to figure
    out the view of each source.
  • Second Step Come up with a global schema (dont
    think too much about the sources or keep the
    global schema general enough that if GaV is used
    then adding new sources does not change the
    global schema)
  • Third Step (GaV based approach)
  • Define the global schema in terms of the source
    views.
  • Now any global query can be unfolded to a query
    (can be done in real time) in terms of the source
    views.
  • Alternative third step (LaV approach)
  • Define each source in terms of the global schema.
  • Now any global query needs to be reformulated
    in terms of the source views.
  • Several reformulation techniques are available.
  • For LaV approach a particular set of queries can
    be considered a priori and their reformulation
    could be made before hand (rather than in real
    time)
  • Havasu and Biohavasu follow this approach.

7
HW 2 Review Sample answers
  • Correct understanding about the queries
  • List all human genes, name of their discoverer
    and the project through which it was discovered
  • In this I tried to find out the human genes that
    are completely discovered.
  • Querying the genome database for human genome did
    this.
  • But the links field did not give links to the
    Pubmed articles relevant to that specific gene.
  • Had to type in another query with the specific
    gene information for Pubmed.
  • Misunderstanding about what queries are
  • How are the DNA probes identified in a DNA chip?
  • What are the four nucleotide bases in a DNA ?
Write a Comment
User Comments (0)
About PowerShow.com