Structure Based Information Extraction (SBIE) - PowerPoint PPT Presentation

1 / 13
About This Presentation
Title:

Structure Based Information Extraction (SBIE)

Description:

Define ontologies, lexicons and data patterns for each domain. ... href='http://www.kmart.com/product/index.jsp?productId=1789425&cp=78486 7. ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 14
Provided by: BYU1
Category:

less

Transcript and Presenter's Notes

Title: Structure Based Information Extraction (SBIE)


1
Structure Based Information Extraction (SBIE)
Hua Lei 06/09/2004
2
The disadvantages of BYU tool
  1. Define ontologies, lexicons and data patterns for
    each domain.
  2. Define and update ontologies, lexicons and data
    patterns manually.
  3. Results heavily rely on lexicons and data
    patterns.

3
(No Transcript)
4
HTML code of a data group
ltTD vAligntop width"33"gtltA classprodtitle
href"http//www.kmart.com/product/index.jsp?prod
uctId1789425ampcp784867.784872.784574amppare
ntPagefamily"gt ltCENTERgtltIMG src"kmart_files/p147
2053th.gif" border0gtlt/CENTERgtltBRgtSharp
LC-20B4U-S 20-Inch Flact LCD Television in
Silverlt/AgtltBRgtltBRgtltSPAN classlistpricegtList
Price 1299.99 ltBRgtlt/SPANgtltSPAN
classourpricegtltBgtOur Price 1199.99lt/Bgtlt/SPANgt
lt/TDgt
Phenomenon all data groups in the same web page
have a same structure. Idea extracts data from
web pages based on the data group structure.
5
Method of SBIE
Step 1. choose a data group as the initial one
which has a typical data structure.
6
Method of SBIE
Step 2. Analyze HTML code of the web page and
find the structure of data group.
  • Recognize the annotations, data patterns and
    relative positions of each data.
  • This algorithm can integrate all these structure
    information and get the structure of the data
    group.
  • Validate the data group structure by recognizing
    other data groups based on this data group
    structure.

7
ltTD vAligntop width"33"gtltA classprodtitle
href"http//www.kmart.com/product/index.jsp?prod
uctId1789425ampcp784867.784872.784574amppare
ntPagefamily"gt ltCENTERgtltIMG src"kmart_files/p147
2053th.gif" border0gtlt/CENTERgtltBRgtSharp
LC-20B4U-S 20-Inch Flact LCD Television in
Silverlt/AgtltBRgtltBRgtltSPAN classlistpricegtList
Price 1299.99 ltBRgtlt/SPANgtltSPAN
classourpricegtltBgtOur Price 1199.99lt/Bgtlt/SPANgt
lt/TDgt
8
Method of SBIE
Step 3. use the result of step 2 to extract other
data groups in that web page.
9
Method of SBIE
10
(No Transcript)
11
Method of SBIE
  • Machine Learning
  • A machine learning technique will combine with
    the structure recognizing algorithm.
  • A database for data group structures, lexicons
    and data patterns will be created or update after
    each extraction.
  • The machine learning tool can analyze new data
    group structure based on the structure
    information in the database.
  • When SBIE is well trained, SBIE could analyze the
    data group structures in the HTML code and
    extract data without the initial data group (step
    1).

12
Conclusion
SBIE
  • Extend BYU tool.
  • Extract information without defining ontologies.
  • Create and update data patterns and lexicons
    automatically.
  • Extracting data based on their relative
    positions.
  • Machine learning technique makes it smart.

13
Questions
Write a Comment
User Comments (0)
About PowerShow.com