CSE 636 Data Integration - PowerPoint PPT Presentation

About This Presentation
Title:

CSE 636 Data Integration

Description:

Data structures and algorithms. Knowledge Representation. Distributed ... heterogeneous (different data models, schemas) structured (at least semistructured) ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 14
Provided by: michailpe
Learn more at: https://cse.buffalo.edu
Category:

less

Transcript and Presenter's Notes

Title: CSE 636 Data Integration


1
CSE 636Data Integration
  • Introduction

2
Staff
  • Instructor Dr. Michalis Petropoulos
  • Email mpetropo_at_cse.buffalo.edu
  • Location 210 Bell Hall
  • Office Hours Wednesday Friday 100-200pm
  • By Appointment
  • Web Page
  • http//www.cse.buffalo.edu/mpetropo/CSE636-FA08/
  • Newsgroup
  • sunyab.cse.636

3
Course Goals
  • Data integration applications and architectures
  • Issues in building such applications
  • Really big and currently active research area
  • Solutions to several of them
  • Provide foundation for
  • understanding current research problems
  • criticizing proposed solutions
  • proposing your own solution!
  • Acquire valuable experience by implementing the
    project

4
Prerequisites
  • An introductory database course
  • CSE 520, CSE 562 or equivalent
  • Data structures and algorithms
  • Knowledge Representation
  • Distributed systems
  • Complexity theory
  • Mathematical Logic
  • Curiosity!
  • You should ask a lot of questions
  • Have a lot of fun!

5
Relevant Material
  • Textbooks
  • Database Systems The Complete Book
  • by Garcia-Molina, Ullman and Widom
  • Database Management Systems
  • by Ramakrishnan
  • Fundamentals of Database Systems
  • by Elmasri and Navathe
  • Foundations of Databases
  • by Abiteboul, Hull and Vianu
  • Data on the Web
  • by Abiteboul, Buneman and Suciu

6
Course Format
  • Assignments 15
  • Three assignments will be given, 5 each
  • Final 20 (take home)
  • Projects 60
  • Detailed specs will be given
  • Can be used to satisfy the M.S. project
    requirement
  • Participation 5

7
What is Data Integration?
  • The problem of providing
  • uniform (sources transparent to users)
  • access to (query)
  • multiple (even 2 is a problem)
  • autonomous (not affect the behavior of sources)
  • heterogeneous (different data models, schemas)
  • structured (at least semistructured)
  • data sources (not only databases)

8
The Data Integration Problem
MyBookstore.com Mediated Schema
Books
Inventory
Orders
Shipping
Reviews
DB
Site
Morgan Kaufman
East
DB
Orders
Site
FedEx
DB
Customer Reviews
Addison Wesley
West
UPS
Site
NY Times
DB
Site
WS
Prentice Hall
Site

WS
Uniform query capability across
autonomous,heterogeneous data sources on the
Internet
9
Motivation
  • Enterprise data integration
  • Web site construction
  • WWW
  • Comparison shopping
  • Portals integrating data from multiple sources
  • B2B, electronic marketplaces
  • Sciences
  • Geology integrate geological data across the US
    continent (text as well as spatial data)
  • Biology integrating genomic data

10
Current Solutions
  • Mostly ad-hoc programming
  • Create a special solution for every case
  • Pay consultants a lot of money
  • Data Warehousing (Data Exchange)
  • Load all the data periodically into a warehouse
  • Separates operational DBMS from decision support
    DBMS (not only a solution to data integration)
  • Performance is good
  • Data may not be fresh
  • Need to clean data

11
Course Outline (Tentative)
  • Data Integration Scenarios Architectures
  • Find out what the problems are
  • Data Models Type Systems
  • XML/Semistructured Data, DTDs, XML Schema
  • Query Transformation Languages
  • Datalog, XPath, XQuery, XSLT
  • Data Integration Approaches
  • Different approaches depending on application
    characteristics
  • Schema Integration
  • Schema Mapping/Matching
  • Semi-automate the discovery of schema mappings

12
Course Outline (cont)
  • Distributed Query Processing Algorithms
  • Query Rewriting Algorithms
  • Limited Query Capabilities
  • We dont have full access to any database
  • Consistent Query Answers
  • Web Services
  • What can they do for data integration?
  • Semantic Web
  • RDF SPARQL
  • Workflow Languages
  • How is this related to data integration?

13
References
  • Data Integration a Status Report
  • Alon Halevy
  • German Database Conference (BTW), 2003
  • Invited Talk
  • Lecture Slides
  • Alon Halevy
  • http//www.cs.washington.edu/education/courses/cse
    544/00sp/lectures/ps/l12.ps
Write a Comment
User Comments (0)
About PowerShow.com