Web Data Management Panel Presentation WITS - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Web Data Management Panel Presentation WITS

Description:

... Web pages TSIMMIS Data Extraction (Stanford University) Information discovery Too much quantity, too little quality WebMining (University of Florida) ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 12
Provided by: Joach76
Category:

less

Transcript and Presenter's Notes

Title: Web Data Management Panel Presentation WITS


1
Web Data ManagementPanel Presentation WITS 97
  • Joachim Hammer
  • University of Florida
  • December 14, 1997

2
World Wide Web
  • Easy to use interface (pages and links)
  • Ubiquitous
  • Excellent storefront
  • Lots of valuable information
  • Irregular structure (semistructured)
  • Highly dynamic
  • Difficult searching/navigation
  • Limited querying
  • Human is query processor

3
Web Management Issues Research
  • Information browsing through the Web
  • Effective, simple to program GUI
  • MOBIE (Stanford University)
  • Dealing with existing (static) Web data
  • Cant query static Web pages
  • TSIMMIS Data Extraction (Stanford University)
  • Information discovery
  • Too much quantity, too little quality
  • WebMining (University of Florida)

4
MOBIE
  • Formats and displays data objects as a web of
    hypertext documents
  • Traverse hyperlinks to explore nested structure
    and contents
  • Based on HTTP and HTML
  • Provides simple, world-wide access to information
    servers
  • New way of exploring databases
  • Much like readers explore contents of a book

5
Raw Data Object
ltcollection, b1, a1, ...gt b1 ltbook, t, agt
t lttitle, Database and ...gt a
ltauthor, Jeff Ullmangt a1 ltarticle, v, w,
xgt v lttitle, Mediators in ...gt
w ltauthor-list, ...gt x ...
...
. . .
6
Formatted - Hyperlinked
collection book title Database and
... author Jeff Ullman article
title Mediators in the ... author Gio
Wiederhold
7
Data Extraction Querying
  • Configurable parser (Python)
  • Declarative description of HTML source
  • Location of data on page
  • How to package data into result object
  • Regular expression-like syntax
  • Human intelligence rather than A.I.
  • Returns data as OEM (Object Exchange Model)
    objects
  • TSIMMIS interchange format

8
Approach
  • Extract data from Web page(s)
  • On demand
  • Periodic/on update
  • Use wrapper/DBMS as query processor

Wrapper
Query/ Result
World Wide Web
Extractor
or
Persistent Storage
Query/ Result
Specification
9
Evaluation
  • Better than
  • Writing programs
  • YACC, PERL, etc.
  • Want to do even better
  • GUI tool to simplify the generation of extractor
    specification

10
Information Discovery
  • Improve existing search engines
  • Efficient crawling techniques (reduce data
    shipping)
  • Quality ranking of pages
  • Apply data mining techniques to categorize web
    pages
  • e.g., clustering algorithms, proximity
  • Inferencing of knowledge
  • Making connections among entities

11
Goals of Web Management
  • Putting vast amounts of previously unavailable
    data on the Web
  • Digital libraries, long distance learning,
    research, etc.
  • Data managed by DBMS
  • Leverage existing DBMS technology
  • New tools for managing semistructured data
  • Support of electronic storefronts
  • Dynamic creation of customizable Web pages
Write a Comment
User Comments (0)
About PowerShow.com