Table Structure Understanding by Sibling Page Comparison - PowerPoint PPT Presentation

About This Presentation
Title:

Table Structure Understanding by Sibling Page Comparison

Description:

Find mappings between table cells. Find structure patterns. 12/2/09. 9. HTML Table Components ... Molecular biology: 95.6% Car ad: 100%. Dynamic adjustment ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 17
Provided by: cui1
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Table Structure Understanding by Sibling Page Comparison


1
Table Structure Understanding by Sibling Page
Comparison
Cui Tao Data Extraction Group Department of
Computer Science Brigham Young University
Supported by NSF
2
Table Structure Understanding
  • Motivation
  • Many documents contain tables
  • Data extraction
  • Data integration
  • Ontology evolution
  • Solution
  • Locate tables
  • Locate table labels
  • Locate table values
  • Find label/value associations

3
Table Structure Understanding
4
Table Structure Understanding
2
(Gene Model, 1) F18H3.5a (Gene Model, 2)
F18H3.5b
5
(No Transcript)
6
(No Transcript)
7
Sibling Pages
  • Generated output pages
  • user query
  • results in predefined page structure
  • Same web site same structure

8
Problems
  • Data rich area --- discard the irrelevant parts
  • Find table correspondences
  • Find mappings between table cells
  • Find structure patterns

9
HTML Table Components
10
Data Rich Area
11
Table Unnesting
12
DOM Tree
13
Simple Tree Matching
  • Simple Tree Matching (STM) Yang91
  • Maximum matching pairs of nodes
  • O(mn)

14
Table Structure Pattern
15
Table Structure Pattern
16
Experimental Results
  • Initial Test
  • General pattern extraction
  • Molecular biology 95.6
  • Car ad 100
  • Dynamic adjustment
  • Unseen structure
  • Structure variations
Write a Comment
User Comments (0)
About PowerShow.com