Integrating Structured - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Integrating Structured

Description:

Query language & data model. Sharp vs fuzzy / complete vs best-effort ... Feature extraction from unstructured data. Role of meta data & integrity constraints ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 12
Provided by: RakeshA5
Category:

less

Transcript and Presenter's Notes

Title: Integrating Structured


1
Integrating Structured Unstructured Data
2
Goals
  • Identify some applications that have crucial
    requirement for integration of unstructured and
    structured data
  • Identify key technical issues in integrating
    unstructured and structured data
  • Identify potential approaches

3
Definitions (simplified)
  • Structured object
  • ltoid, ltname, valuegtgt
  • Unstructured object
  • ltoid, wordgt
  • ltoid, unknown/complex structuregt
  • Semi-structured object
  • ltoid, ltname, valuegt, wordgt
  • ltname, valuegt pairs may be
  • Given (e.g. author, title, etc.)
  • Extracted (e.g. Date, Zipcode, etc.)
  • Inferred (e.g. Topic)

4
Representative Applications
  • BPI Messasges- unstructured
  • Web Applications unstructured pages
  • Corporate Portals
  • DSS involving Combination of simulation with
    database system
  • News syndication author etc story
  • Call centers customer interaction structured
    component of complaint
  • Mail system/document systems
  • Tourist information system
  • Product catalogs/engineering spec sheets
  • Patents/chenistry documents
  • Matching Legal documents (with cross citations)
    with building codes --- representative

5
Key Technical Issues
  • Query language data model
  • Sharp vs fuzzy / complete vs best-effort
  • Boolean vs similarity queries (relationship to
    value)
  • Integration strategies
  • Loose vs. tight coupling Architectures (many
    possibilities)
  • Search engine into DBMS or DBMS into search
    engine
  • Late early binding (warehousing vs virtual)
  • Integration vs articulation (union vs
    intersection)
  • Feature extraction from unstructured data
  • Role of meta data integrity constraints
  • Inconsistency of data sources
  • Priorty rules for mediation
  • Management data organization issues
  • Version management , freshness, security
  • Continuous queries over streams

6
  • StrucuredPeople(firstname, lastname, company,
    location)
  • Semi-structuredPapers(title, authors, text)
  • Unstructured Reviews
  • Q1 Reviews of papers by Almaden authors on II
  • Search reviews using Join(People.ltfn,lngt,
    Papers.authors).keywords
  • Q2 Folks in Almaden and Watson working on same
    topic
  • Join of Papers.text followed by joined with names
    in People
  • Q3 Papers on privacy data mining by Agarwal in
    Watson
  • Combine ranks of results from People and Papers
  • Q4 Almaden authors whose papers had negative
    reviews
  • Infer sentiment of a review and interesting joins
  • Q5 Crrent research topics in Almaden
  • Join People and Papers followed by clustering

7
Combining Scores
Papers on privacy data mining by Agarwal in
Watson
Result
Query
  • DB
  • Aggarwal, Watson, s1
  • Agarwal, Almaden, s2
  • Agrawal, Almaden, s3
  • IR
  • Sigmod 00 paper, r2
  • PODS 01 papers, r1
  • KDD00 paper, r3

Chopper
Combiner
DB
IR
8
Query Processing
9
Approaches (1)
  • Query Languages
  • XML-based extensions for queries
  • W3C working group on Xquery considering extension
    for full text
  • XXL (Weikum), XIRQL (Fuhr)
  • Specialized languages for highly structured data
    (e.g. chemical molecules)?
  • Graph-based models languages (RDF, Protégé
    Stanford)
  • Extended relational (e.g. SQL/MM)
  • Inverse queries on business events
  • Reasoning systems
  • Statistical approaches (approximate/ data mining)

10
Approaches (2)
  • Pluses of tight coupling
  • Enforcement of ontologies, schemas
  • Security, management, query optimization,
    integriry constraints
  • Negatives of tight coupling
  • Does not address federation issues/autonomy
  • Pluses of loose coupling
  • Flexibility
  • Negatives of loose coupling

And the dinner bell rings
11
Concluding Remarks
  • We need further discussion on issues and
    approaches during the rest of the workshop
Write a Comment
User Comments (0)
About PowerShow.com