Semiautomatic Generation of Resilient Data Extraction Ontologies - PowerPoint PPT Presentation

About This Presentation
Title:

Semiautomatic Generation of Resilient Data Extraction Ontologies

Description:

relations between the concepts. participation constraints. Resilient ... Other cities City --Kandahar City Mazar-e-Sharif City Konduz City ... – PowerPoint PPT presentation

Number of Views:21
Avg rating:3.0/5.0
Slides: 20
Provided by: deg7
Learn more at: https://www.deg.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Semiautomatic Generation of Resilient Data Extraction Ontologies


1
Semiautomatic Generation of Resilient Data
Extraction Ontologies
  • Yihong Ding
  • Data Extraction Group
  • Brigham Young University
  • Sponsored by NSF

2
Data Extraction Ontology
  • Goal extract data from web pages
  • Components
  • concepts
  • relations between the concepts
  • participation constraints
  • Resilient
  • Difficulty manual ontology generation is costly

3
Generation Procedure
Train
Test
Knowledge Selection Processing
Extraction Processing
Database
Knowledge Sources
4
Knowledge Collection
  • Assumptions about knowledge base
  • general
  • contains meaningful relationships
  • pre-existing
  • XML or easy to transfer to XML
  • Current input
  • Mikrokosmos ontology Mik
  • auxiliary data frame library

5
Selection of Concepts
  • PROCEDURE ConceptSelection(Tdoc, Kbase)
  • SourceDoc Parse(Tdoc)
  • PrimarySelectedConceptsList
    MikroSelection(M-Ontology)
  • SecondarySelectedConceptsList
    DataFrameSelection(DF-Library)
  • ConflictHandling()
  • SelectedSubgraphGeneration()
  • MANY ISSUES
  • selection strategies, conflict resolution,

6
Basic Selection Strategy
  • Select from Mikrokosmos Ontology
  • Afghanistan
  • smaller than Texas.
  • Area 648,000 sq. km.
  • Capital--Kabul,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • Population17.7 million.
  • Agriculture Wheat, corn, barley,rice, cotton,
    fruit, nuts, karakul pelts, wool, mutton.

7
Basic Selection Strategy
  • Select from Mikrokosmos Ontology
  • concept names and their synonyms
  • Afghanistan
  • smaller than Texas.
  • ArealtGeographicalAreagt 648,000 sq. km.
  • CapitalltCapitalCitygtltFinancialCapitalgt--Kabul,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • PopulationltPopulationgt17.7 million.
  • AgricultureWheat, corn, barley,rice, cotton,
    fruit, nuts, karakul pelts, wool, mutton.

8
Basic Selection Strategy
  • Select from Mikrokosmos Ontology
  • concept names and their synonyms
  • concept values and their synonyms
  • AfghanistanltNationgt
  • smaller than TexasltUSStategt.
  • ArealtGeographicalAreagt 648,000 sq. km.
  • CapitalltCapitalCitygtltFinancialCapitalgt--KabulltCapi
    talCitygt,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • PopulationltPopulationgt17.7 million.
  • AgricultureWheatltFoodStuffgtltAgriculturalProductgt,
    corn, barley,rice, cotton, fruit, nuts, karakul
    pelts, wool, mutton.

9
Basic Selection Strategy
  • Select from Mikrokosmos Ontology
  • concept names and their synonyms
  • concept values and their synonyms
  • Select from Data Frame Libraries
  • Afghanistan
  • smaller than Texas.
  • Area 648,000 sq. km.
  • Capital--Kabul,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • Population17.7 million.
  • Agriculture Wheat, corn, barley,rice, cotton,
    fruit, nuts, karakul pelts, wool, mutton.

10
Basic Selection Strategy
  • Select from Mikrokosmos Ontology
  • concept names and their synonyms
  • concept values and their synonyms
  • Select from Data Frame Libraries
  • extract result based on the data frames
  • Afghanistan
  • smaller than Texas.
  • Area 648,000ltAreagtltMileagegt sq. km.
  • Capital--Kabul,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • Population17.7ltTimegt millionltPopulationgtltPricegt.
  • Agriculture Wheat, corn, barley,rice, cotton,
    fruit, nuts, karakul pelts, wool, mutton.

11
Document-Level Conflict
  • Afghanistan
  • smaller than Texas.
  • Area 648,000ltAreagtltMileagegt sq. km.
  • CapitalltCapitalCitygtltFinancialCapitalgt--KabulltCapi
    talCitygt,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • Population17.7ltTimegt millionltPopulationgtltPricegt.
  • Agriculture Wheat, corn, barley,rice, cotton,
    fruit, nuts, karakul pelts, wool, mutton.

12
Concept-Level Conflict
  • Afghanistan
  • smaller than Texas.
  • ArealtGeographicalAreagt 648,000ltAreagt sq. km.
  • Capital--Kabul,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • PopulationltPopulationgt 17.7 millionltPopulationgt.
  • Agriculture WheatltFoodStuffgtltAgriculturalProductgt
    , corn, barley,rice, cotton, fruit, nuts, karakul
    pelts, wool, mutton.

13
Relation Retrieval
  • Theoretical solution
  • all paths in the subgraph
  • too expensive NP-Complete
  • Heuristic solution
  • find the shortest path between any two nodes
  • set a threshold distance

14
Participation Constraints
  • AfghanistanltNationgt
  • smaller than Texas.
  • Area 648,000 sq. km.
  • CapitalKabulltCapitalCitygt,
  • Other cities--Kandahar Mazar-e-Sharif Konduz
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • Population 17.7 million.
  • Agriculture Wheat, corn, barley,rice, cotton,
    fruit, nuts, karakul pelts, wool, mutton.

CapitalCity 11 IsA.CITY.PartOf Nation 11
15
Participation Constraints (cont.)
  • AfghanistanltNationgt
  • smaller than Texas.
  • Area 648,000 sq. km.
  • Capital--KabulltCitygt,
  • Other citiesltCitygt--KandaharltCitygt
    Mazar-e-SharifltCitygt KonduzltCitygt
  • Terrain Landlocked mostly mountains and desert.
  • Climate Dry, with cold winters and hot summers.
  • Population 17.7 million.
  • Agriculture Wheat, corn, barley,rice, cotton,
    fruit, nuts, karakul pelts, wool, mutton.

City 11 PartOf Nation 1
16
Performance Evaluation
  • Speed of generation
  • Precision and recall of the generation process
  • Precision and recall of the generated ontology

17
Generation Time with Distance Threshold
18
PR of Generation Process
19
Conclusion
  • Data Extraction Ontology generated
  • Knowledge sources exploited
  • Many issues applied
  • Many more to explore
Write a Comment
User Comments (0)
About PowerShow.com