Concepts, Ontologies, and Project TANGO - PowerPoint PPT Presentation

About This Presentation
Title:

Concepts, Ontologies, and Project TANGO

Description:

David Embley (BYU CS), Steve Liddle (BYU Marriott School) and ... DNA FINGERPRINTING. MRI MAGNETIC RESONANCE IMAGING. NANOTECHNOLOGY. THE NATIONAL OBSERVATORIES ... – PowerPoint PPT presentation

Number of Views:1108
Avg rating:3.0/5.0
Slides: 96
Provided by: derylewl
Learn more at: https://tango.byu.edu
Category:

less

Transcript and Presenter's Notes

Title: Concepts, Ontologies, and Project TANGO


1
Concepts, Ontologies, and Project TANGO
  • Deryle Lonsdale
  • BYU Linguistics and English Language
  • lonz_at_byu.edu

2
Outline
  • NSF projects
  • Semantic Web
  • Concepts
  • Project TIDIE
  • Ontologies
  • Project TANGO
  • Tables
  • Ontology generation

3
Acknowledgements
  • NSF
  • David Embley (BYU CS), Steve Liddle (BYU Marriott
    School) and Yuri Tijerino
  • BYU Data Extraction Group members

4
The National Science Foundation
  • Federal agency, 5.5 billion budget, funds 20 of
    all federally supported basic research conducted
    by Americas colleges and universities
  • 7 directorates
  • Biological Sciences, Computer and Information
    Science and Engineering, Engineering,
    Geosciences, Mathematics and Physical Sciences,
    Social, Behavioral and Economic Sciences, and
    Education and Human Resources
  • 200,000 scientists, engineers, educators and
    students at universities, laboratories and field
    sites
  • 10,000 awards/year, 3 years duration (avg.)

5
The NSF Nifty 50 (general)
  • ACCELERATING, EXPANDING UNIVERSE
  • ANTARCTIC OZONE HOLE RESEARCH
  • ARABIDOPSISA PLANT GENOME PROJECT
  • BAR CODES
  • BLACK HOLES CONFIRMED
  • BUCKY BALLS
  • COMPUTER VISUALIZATION TECHNIQUES
  • DATA COMPRESSION TECHNOLOGY
  • DISCOVERY OF PLANETS
  • DOPPLER RADAR
  • EFFECTS OF ACID RAIN
  • EL NIÑO AND LA NIÑA PREDICTIONS
  • FIBER OPTICS
  • GEMINI TELESCOPES
  • HANTAVIRUS IDENTIFICATION
  • DNA FINGERPRINTING
  • MRIMAGNETIC RESONANCE IMAGING
  • NANOTECHNOLOGY
  • THE NATIONAL OBSERVATORIES
  • OVERCOMING HEAVY METALS
  • OVERCOMING SALT TOXICITY
  • TISSUE ENGINEERING
  • TUMOR DETECTION
  • VOLCANIC ERUPTION DETECTION
  • YELLOW BARRELS

6
Language-related Nifty 50
  • AMERICAN SIGN LANGUAGE DICTIONARY DEVELOPMENT
  • COMPUTER VISUALIZATION TECHNIQUES
  • THE DARCI CARD
  • DATA COMPRESSION TECHNOLOGY
  • THE "EYE CHIP" OR RETINA CHIP
  • THE INTERNET
  • PERSONS WITH DISABILITIES ACCESS TO THE WEB
  • PROJECT LISTEN
  • SPEECH RECOGNITION TECHNOLOGY
  • vBNSVERY HIGH SPEED BACKBONE NETWORK SYSTEM
  • WEB BROWSERS

7
Browsing the Semantic Web
8
Browsing the Semantic Web
9
Desirable, not (yet) possible
  • Word sense disambiguation
  • Other types of queries (e.g. services)
  • What is the cheapest available round-trip flight
    to Cancun the day after finals this semester?
  • Set up an appointment with my optometrist for
    next week.
  • List available 4-person BYU-approved apartments
    in Orem for under 150/month.
  • Find me a linguistics job in Tahiti.

10
Project TIDIE
  • Apr 10, 2001 May 12, 2005

11
Overview of TIDIE
  • 3-year NSF project at BYU
  • Total amount about 430,000
  • PI David Embley (BYU CS), 4 co-PIs from BYU
  • 18 grad students, 45 publications
  • Demos, tools, papers, presentations at website
    (www.deg.byu.edu/)

12
Goal of TIDIE
  • Target-Based Independent-of-Document Information
    Extraction
  • Target-based user specifies what to find
  • Not just keyword search, but concept-based search
    using an ontology
  • Document independent
  • Should work even if pages change over time, on
    new documents
  • IE match, merge, retrieve, format information
  • Present in way that user can search, query results

13
Document-based IE
14
Recognition and extraction
15
Concepts
  • What drive the matching process for IE
  • Inherent in words, numbers, phrases, text
  • Linguistics lexical semantics
  • Denotations entities, attributes
  • Location relationships
  • Occurrences constraints

16
Concept matching
  • We use exhaustive concept matching techniques to
    find concepts in documents including
  • Lexical information (lexicons)
  • Natural language processing (NLP) techniques
  • Similarity of values
  • Features of value
  • Data frames
  • Constraints

17
Lexicons
  • Repositories of enumerable classes of lexical
    information
  • FirstNames, LastNames, USStates, ProvoOremApts,
    CarMakes, Drugs, CampGroundFeats, etc.
  • WordNet (synonyms, word senses,
    hypernyms/hyponyms)

18
The data-frame library
  • Snippets of real-world knowledge about data
    (type, length, nearby keywords, patterns as in
    regexps, functional relations, etc)
  • Low-level patterns implemented as regular
    expressions
  • Match items such as email addresses, phone
    numbers, names, etc.
  • Mileage matches 8
  • constant extract "\b1-9\d0,2k"
    substitute "kK" -gt "000" ,
  • extract "1-9\d0,2?,\d3"
  • context "\\d1-9\d0,2?,\d3\d"
    substitute "," -gt "",
  • extract "1-9\d0,2?,\d3"
  • context "(mileage\\s)\\d1-9\d0,2
    ?,\d3\d" substitute "," -gt "",
  • extract "1-9\d3,6"
  • context "\\d1-9\d3,6\smi(\
    .\b\les\b)",
  • extract "1-9\d3,6"
  • context "(mileage\\s)\\d1-9
    \d3,6\b"
  • keyword "\bmiles\b", "\bmi\.", "\bmi\b",
    "\bmileage\b"
  • end

19
Isolated concepts are OK, but...
  • Were also interested in the relations between
    concepts
  • This is often best done graphically
  • Ontology arrangement of concepts that
    explicitizes their relations, constraints
  • Conceptual modeling field of CS / linguistics
    that deals with formalizing concepts, using such
    information
  • BYU has its own well-known conceptual modeling
    framework (OSM)

20
Conceptual modeling (OSM)
21
Ontologies and IE
Source
Target
22
Constant/keyword recognition
'97 CHEVY Cavalier, Red, 5 spd, only 7,000 miles.
Previous owner heart broken! Asking only
11,995. 1415. JERRY SEINER MIDVALE, 566-3800
or 566-3888
Descriptor/String/Position(start/end)
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
23
Database instance generator
Year9723 MakeCHEV58 MakeCHEVY59 ModelCav
alier1118 FeatureRed2123 Feature5
spd2630 Mileage7,0003842 KEYWORD(Mileage)mil
es4448 Price11,995100105 Mileage11,9951001
05 PhoneNr566-3800136143 PhoneNr566-38881481
55
insert into Car values(1001, 97, CHEVY,
Cavalier, 7,000, 11,995,
556-3800) insert into CarFeature values(1001,
Red) insert into CarFeature values(1001, 5
spd)
24
Car ads extraction ontology
25
Car ads ontology (textual)
  • Car -gtobject
  • Car 0..1 has Year 1..
  • Car 0..1 has Make 1..
  • Car 0...1 has Model 1..
  • Car 0..1 has Mileage 1..
  • Car 0.. has Feature 1..
  • Car 0..1 has Price 1..
  • PhoneNr 1.. is for Car 0..
  • PhoneNr 0..1 has Extension 1..
  • Year matches 4
  • constant extract \d2
  • context "(\\d)4-9\d
    \d"
  • substitute "" -gt "19" ,
  • End

26
A gene ontology
27
A geneology data model
28
Finding jobs in linguistics
  • Built ontology for linguistics jobs what defines
    a linguistics job
  • Data frames and lexicons language names
    (www.ethnologue.com), subfields of linguistics
    (www.linguistlist.org), tools linguists use,
    programming languages, activities,
    responsibilities, country names
  • Documents 3500 web pages emails to me
  • Complete results reported in DLLS 2003

29
Sample query
30
Sample output
31
Subfield expertise sought
32
Technical skills sought
33
Sample observations
  • 270 dont have linguist (!)
  • Computer/computational background required for
    almost 1/3 (1116)
  • Noticeable amount of headhunting, particularly in
    Seattle, DC areas
  • Often a job title is not even listed (!)
  • Great need for ontologies related to linguistics
  • job titles
  • theoretical frameworks, subfields
  • typical linguist job activities
  • linguistic research/development venues

34
An engineering discipline?
  • 160 linguistics jobs ending in engineer
  • Software development cycle
  • research e., software design e.
  • development e., software e.
  • software quality e., linguistic test e.,
    linguistic quality e.
  • linguistic support e., user experience e.
  • presales e., technical sales e.
  • Specific subfields
  • web site e.
  • speech e., voice recognition e., speech
    recognition application e., speech e., ASR
    tuning e., audio e.
  • dialog e.
  • tools e.
  • AI e., NLP e.
  • knowledge e., ontology e.
  • linguist e., natural language e.
  • staff e.
  • human factors e., user interface e.

35
A recent ontologist job ad
  • Date Thu, 28 Jul 2005 114440
  • Subject General Linguistics Ontologist, Denver,
    USA
  • Job Rank Ontologist
  • Specialty Areas General Linguistics
  • Position Summary Ontologist
  • This person will be responsible for modifying
    editing Ontology structures.
  • Skills
  • Basic computer skills such as Internet, email,
    and spreadsheet programs
  • In-depth knowledge of any major industry, such as
    Health Care, Automotive, Legal, Construction,
    and so forth helpful
  • Superior communication skills, both oral and
    written. Ability to communicate effectively with
    reports, peers, superiors, and customers
    essential
  • Travel /or foreign language experience desired
  • Personal Characteristics
  • A healthy sense of logic, and a love for details
  • A deep and abiding love of language, and of
    rule-governed classification systems. This
    person should be excited by the challenge of
    figuring out the precise place where a word
    belongs, and be delighted with the prospect of
    performing such tasks as the major part of their
    job
  • Position Qualifications
  • -Bachelor's degree, preferably in Linguistics,
    Library Science, English, or related field

36
Another recent ontologist ad
  • Position Summary Lead Ontologist
  • The Lead Ontologist will be responsible for
    creating designing Ontology and Ontology
    structures. This person will be responsible for
    innovation and general Ontology development as
    Ontology requirements change. They will serve as
    Team Lead on various Ontology projects, and they
    will assist the Director with certain aspects of
    management, including the development of
    department culture and standards. They will also
    serve as a liaison between the Director and the
    rest of the team.
  • Skills
  • Ability to edit manipulate text highly desired,
    using tools such as Emacs and Perl. High level
    programming language experience and SQL also
    desired
  • Knowledge of Ontology structures, and experience
    with developing and maintaining such structures
  • Ability to assist with Ontology development and
    use problem-solving skills to overcome obstacles
  • Ability to QA own Ontology work, and work of
    others
  • Ability to lead projects from set-up through to
    QA
  • Leadership or management experience a plus
  • Position Qualifications
  • -Bachelor's degree in Linguistics, Library
    Science, or related field
  • -2-3 years experience in Ontology or related
    field
  • Application Deadline Open until filled.

37
Matching request with ontology
  • Tell me about cruises on San Francisco Bay. Id
    like to know scheduled times, cost, and the
    duration of cruises on Friday of next week.

38
Building a query
Friday, Oct. 29th
cost
duration
?
?
Result
(
)
39
StartTime Price Duration Source
1045 am, 1200 pm, 115, 230, 400 20.00, 16.00, 12.00 1
1000 am, 1045 am, 1115 am, 1200 pm, 1230 pm, 115 pm, 145 pm, 230 pm, 300 pm, 345 pm, 415 pm, 500 pm 17.00, 16.00, 12.00 1 Hour 2
40
Another example
  • Service Request
  • Match with Task Ontology
  • Domain Ontology
  • Process Ontology
  • Complete, Negotiate, Finalize

I want to see a dermatologist next week any day
would be ok for me, at 400 p.m. The
dermatologist must be within 20 miles from my
home and must accept my insurance.
41
Service domain ontology
42
?
?
?
?
?
?
43
Relevant mini-ontology
44
Ontologies issues
  • Most successful in data-rich, narrow- domain
    applications
  • Ambiguities are problematic, context only
    partially eliminates
  • Incompleteness implicit information
  • Commonsense world pragmatics evasive
  • Knowledge prerequisites are steep
  • Major efforts in creation, maintenance
  • Must be created by experts
  • Experts are biased in knowledge, agreement needed
  • Ontologies continually change upkeep a massive
    task

45
Ontologies possible solutions
  • Some automation is needed
  • Current automatic generation of ontologies is not
    successful, because extracted from free-form,
    unstructured text.
  • A more effective alternative is to extract
    ontologies from structured data on the web
    (tables, charts, etc.)
  • TANGO project
  • Part 1 Extract tables from the web
  • Part 2 Define mini-ontologies from tables
  • Part 3 Merge into growing domain ontology

46
Project TANGO
47
Overview
  • Table ANalysis for Generating Ontologies
  • 3-year NSF-funded project
  • Joint BYU/RPI project
  • Uses and extends TIDIE concepts, ontologies
  • Goal is to process tables, generate ontologies,
    use results for IE

48
Motivation
  • Keyword or link analysis search not enough to
    search for information in tables
  • Structure in tables can lead to domain knowledge
    which includes concepts, relationships and
    constraints (ontologies)
  • Tables on web created for human use can lead to
    robust domain ontologies

49
Table understanding
  • What is a table?
  • Why table normalization?
  • What is table understanding?
  • What is mini-ontology generation?

50
What is a table?
  • a two-dimensional assembly of cells used to
    present information
  • Lopresti and Nagy
  • Normalized tables (row-column format)
  • Small paper (using OCR) and/or electronic tables
    (marked up) intended for human use

51
?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
52
?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
53
?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
54
?
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3.7
Focal Length min 6.3 mm Focal Length max 63.0 mm
55
Digital Camera
Olympus C-750 Ultra Zoom Sensor Resolution 4.2
megapixels Optical Zoom 10 x Digital Zoom 4
x Installed Memory 16 MB Lens Aperture F/8-2.8/3
.7 Focal Length min 6.3 mm Focal Length
max 63.0 mm
56
?
Flight Class From Time/Date To
Time/Date Stops Delta 16 Coach JFK
605 pm CDG 735 am 0
02 01 04
03 01 04 Delta 119 Coach CDG
1020 am JFK 100 pm 0
09 01 04
09 01 04
57
?
Flight Class From Time/Date To
Time/Date Stops Delta 16 Coach JFK
605 pm CDG 735 am 0
02 01 04
03 01 04 Delta 119 Coach CDG
1020 am JFK 100 pm 0
09 01 04
09 01 04
58
Airline Itinerary
Flight Class From Time/Date To
Time/Date Stops Delta 16 Coach JFK
605 pm CDG 735 am 0
02 01 04
03 01 04 Delta 119 Coach CDG
1020 am JFK 100 pm 0
09 01 04
09 01 04
59
?
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,000 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
60
?
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,000 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
61
?
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,000 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
62
Maps
Place Bonnie Lake County Duchesne State Utah Typ
e Lake Elevation 10,100 feet USGS Quad Mirror
Lake Latitude 40.711ºN Longitude 110.876ºW
63
Table normalization
Raw table
  • take any table, produce a standard row-column
    table with all data cells containing expanded
    values and type information

Country GDP/PPP GDP/PPP Per Capita Real- Growth Rate Inflation
Afghanistan 21,000,000,000 800 ? ?
Albania 13,200,000,000 3,800 7.3 3.0
Algeria 177,000,000,000 5,600 3.8 3.0
Andorra 1,300,000,000 19,000 3.8 4.3
Angola 13,300,000,000 1,330 5.4 110.0
Antigua and Barbuda 674,000,000 10,000 3.5 0.4

Normalized table
64
Normalizing across hyperlinks
65
Normalized table
?? Population Population Growth rate Population Density Birth Rate Death Rate Migration Rate Life Expectancy Male Life Expectancy Female Infant Mortality
Afghanistan 25,824,882 3.95 39.88 persons/km2 4.19 1.70 1.46 47.82 years 46.82 years 14.06
Albania 3,364,571 1.05 122.79 persons/km2 2.07 0.74 -0.29 65.92 years 72.33 years 4.29
Algeria 31,133,486 2.10 13.07 persons/km2 2.70 0.55 -0.05 68.07 years 70.46 years 4.38
American Samoa 63,786 2.64 320.53 persons/km2 2.65 0.40 0.39 71.23 years 79.95 years 1.02
Andorra 65,939 2.24 146.53 persons/km2 1.03 0.55 1.76 80.55 years 86.55 years 0.41
Angola 11,510 2.84 8.97 persons/km2 4.31 1.64 0.16 46.08 years 50.82 years 12.92

Western Sahara 239,333 2.34 0.90 persons/km2 4.54 1.66 -0.54 47.98 years 50.57 years 13.67
World 5,995,544,836 1.30 14.42 persons/km2 2.20 0.90 ? 61.00 years 65.00 years 5.60
Yemen 16,942,230 3.34 32.09 persons/km2 4.33 0.99 0.00 58.17 years 61.88 years 6.98
Zambia 9,663,535 2.12 13.05 persons/km2 4.45 2.26 0.08 36.72 years 37 21 years 9.19
Zimbabwe 11,163,160 1.02 28.87 persons/km2 3.06 2.04 ? 38.77 years 38.94 years 6.12
66
How to understand tables
  • Captions in vicinity of table (above, below
    etc)
  • Footnotes on annotated column labels or data
    cells
  • Embedded information in rows, columns or cells
    e.g., , , (1,000), billions, etc
  • Links to other views of the table, possibly with
    new information

67
Use of normalized data
  • Take a table as an input and produce standard
    records in the form of attribute-value pairs as
    output
  • Discover constraints among columns
  • Understand the data values

ltCountry Afghanistangt, ltGDP/PPP
21,000,000,000gt, ltGDP/PPP per capita 800gt,
ltReal-growth rate ?gt, ltInflation ?gt
has(Country, GDP/PPP),has(Country,GDP/PPP Per
Capita), has(Country,Real-growth rate),
has(Country, Inflation)
Left-most, primary key
Country GDP/PPP GDP/PPP Per Capita Real-Growth Rate Inflation
Afghanistan 21,000,000,000 800 ? ?
Albania 13,200,000,000 3,800 7.3 3.0
Algeria 177,000,000,000 5,600 3.8 3.0
Andorra 1,300,000,000 19,000 3.8 4.3
Angola 13,300,000,000 1,330 5.4 110.0
Antigua and Barbuda 674,000,000 10,000 3.5 0.4

Dollar amount (from data frame)
Percentage (from data frame)
Country names (from data frame)
68
Ontology generation overview
69
ExampleCreating a domain ontology
Longitude
Latitude
Latitude and longitude designates location
Distances
Name
Geopolitical Entity
Location
Includes procedural knowledge
has
names
Has GMT
Duration between Time zones
Time
Country
City
Has associated data frames
70
ExampleTable understanding to mini-ontology
generation
Agglomeration Population Continent Country
Tokyo 31,139,900 Asia Japan
New York-Philadelphia 30,286,900 The Americas United States of America
Mexico 21,233,900 The Americas Mexico
Seoul 19,969,100 Asia Korea (South)
Sao Paulo 18,847,400 The Americas Brazil
Jakarta 17,891,000 Asia Indonesia
Osaka-Kobe-Kyoto 17,621,500 Asia Japan

Niigata 503,500 Asia Japan
Raurkela 503,300 Asia India
Homjel 502,200 Europe Belarus
Zunyi 501,900 Asia China
Santiago 501,800 The Americas Dominican Republic
Pingdingshan 501,500 Asia China
Fargona 501,000 Asia Uzbekistan
Kirov 500,200 Europe Russia
Newcastle 500,000 Australia /Oceania Australia
71
ExampleConcept matching to ontology Merging
Merge
Results
Has GMT
Has GMT
72
Ontology merging/growing
  • Direct merge (no conflicts)
  • Use results of matching phase to find similar
    concepts in ontologies (e.g., data value
    similarities, data frames, NLP, etc)
  • Conflict resolution
  • Interactively identify evidence and counter
    evidence of functional relationships among
    mini-ontologies using constraint resolution
  • IDS Interaction with human knowledge engineer
  • Issues identify
  • Default strategy apply
  • Suggestions make

73
Example Another mini-ontology generation
74
Example Another mini-ontology generation
Merge
Longitude
Latitude
Population
Latitude and longitude designates location
Location
Name
Geopolitical Entity
has
names
has GMT
Time
City
Agglomeration
Country
Continent
75
Example Concept Mapping to Ontology Merging
Longitude
Latitude
Population
Latitude and longitude designates location
Location
Name
Geopolitical Entity
has
names
has GMT
Time
Geopolitical Entity with population
Elevation
USGS Quad
State
Place
Area
?
Country
Lake
Agglomeration
Country
Continent
City/town
Mine
Reservoir
76
Recognize Table Information

Religion
Population Albanian
Roman Shia
Sunni Country (July 2001 est.) Orthodox
Muslim Catholic Muslim Muslim
other Afganistan 26,813,057
15
84 1 Albania
3,510,484 20 70 30
77
Construct Mini-Ontology
78
Discover Mappings
79
Merge
80
Review the TANGO process
  • Start out with normalized table
  • Generate likely candidates for
  • Object Sets
  • Relationship Sets
  • Functional Constraints
  • Inclusion Constraints/Hierarchical Structure
  • Get help from user when needed
  • Choose best candidate for the ontology

81
Generate concepts
Create list of candidate concepts (usually column
names)
82
Example 1 Generate Concepts
Determine lexicalization (columns with associated
values are lexical)
83
Example 1 Generate Concepts
Current ontology
84
Example 1 Generate Relationships
  • Decide relationship sets
  • Exponential number of combinations
  • Basic assumption one main concept relates to all
    others (attributes)
  • Goal find central column of interest

85
Example 1 Generate Relationships
Look for mapping between one column and title of
table
86
Example 1 Generate Relationships
Current ontology
87
Example 1 Generate Constraints
  • FDs and Participation Constraints
  • FD definition X ? Y iff (Xi Xj) ? (Yi
    Yj) for all row indexes i and j.
  • Unless solid case (two or more same values), only
    consider FDs from central object to attributes
  • Use heuristics for setting exact participation
    (01,1, etc)

88
Example 1 Generate Concepts
Numerical values are usually functionally
determined by column of interest and have 0
participation constraint.
89
Example 1 Generate Constraints
Completed mini-ontology
90
Example 2 Generate Concepts
  • SubFamily, Group, and SubGroup are generic types
  • Enumerate column values as object sets because
    less than 5 divisions (recursively)

91
Example 2 Generate Relationships
  • Found mapping of central column of interest to
    title (Language)
  • Exceptions to basic assumption
  • Hierarchy (enumerated object sets)
  • Transitive FDs (X ? Y, Y ? Z, remove X ? Z)
  • Create ISA hierarchy from table structure

92
Example 2 Generate Relationships
Current ontology
93
Example 2 Generate Hierarchical Constraints
  • Assign members to each object set for easy
    calculation
  • Find inclusion dependencies
  • Union All members of parents are members of one
    or more child
  • Intersection (Less common) Child members are
    always in both parents
  • Mutual exclusion Intersection of any two child
    members is empty.

94
Example 2 Generate Hierarchical Constraints
Completed mini-ontology
95
Future direction
  • Start with multiple tables (or URLs) and generate
    mini-ontologies
  • Identify most suitable mini-ontologies to merge
    by calculating which tables have most overlap of
    concepts
  • Generate multiple domain ontologies
  • Integrate with form-based data extraction tools
    (smarter Web search engines)
Write a Comment
User Comments (0)
About PowerShow.com