Title: TANGO
1TANGO
- Table Analysis for Generating OntologiesDavid W.
Embley (BYU) George Nagy (RPI)under NSF
Awards 0414644 and 0414854 INFORMATION
KNOWLEDGE MANAGEMENTDr. Maria Zemankova (a)
Table Interpretation - (b) Query by Table
2TANGO STEPS
TABLE
Wang Notation Tool
INTERPRETED TABLE
Wang Notation XML
MINI ONTOLOGY
Ontology Editor
GROWING ONTOLOGY
Annotated Semantic Web Pages
Standard Ontology Language (OWL)
Ontology Based Web Services
Form Based Specification
Extraction Ontologies
Relational Databases
Query By Table
3This presentation
TABLE
Wang Notation Tool
INTERPRETED TABLE
Wang Notation XML
MINI ONTOLOGY
Ontology Editor
GROWING ONTOLOGY
Annotated Semantic Web Pages
Standard Ontology Language (OWL)
Ontology Based Web Services
Form Based Specification
Extraction Ontologies
Relational Databases
Query By Table
4(a) Table Interpretation
Confirm or correct
HTML web pages
Extract table
Matlab table
XMLtable
Construct Wang notation
Wang Notation
Confirm orcorrect
Mini Ontology
5Median Income tablehttp//www40.statcan.ca/l01/cs
t01/famil108a.htm?sdimedian20income
6Median Income table displayed from Canada
Statistics displayed in TANGO Wang Notation Tool
7Wang Notation
- Abstract table is specified by ordered pair (C,?)
- (category, delta) - C is a finite set of labeled domains (header, sub
headers of tables, etc) - ? represents each individual value within a table
corresponding to C.
8Categories
- Two categories in previous table.
- CATEGORY 1 (Region_Virtual,(Canada,phi),
(Newfoundland and Labrador,phi), (Prince Edward
Island,phi), (Nova Scotia,phi), (New
Brunswick,phi), (Quebec,phi), (Ontario,phi),
(Manitoba,phi), (Saskatchewan,phi),(Alberta,phi),(
British Columbia,phi),(Yukon Territory,phi),
(Northwest Territories,phi), (Nunavut,phi)) - CATEGORY 2 (Year_Virtual, (2001,phi),
(2002,phi), (2003,phi), (2004,phi), (2005,phi))
9Content (leaf) cells
- Delta Notation for two (of 15) rows
- delta(Year_Virtual.2001,Region_Virtual.Canada)5
3,500 - delta(Year_Virtual.2002,Region_Virtual.Canada)5
5,000 - delta(Year_Virtual.2003,Region_Virtual.Canada)5
6,000 - delta(Year_Virtual.2004,Region_Virtual.Canada)5
8,100 - delta(Year_Virtual.2005,Region_Virtual.Canada)6
0,600 - delta(Year_Virtual.2001,Region_Virtual.Newfoundla
nd and Labrador)41,400 - delta(Year_Virtual.2002,Region_Virtual.Newfoundla
nd and Labrador)43,200 - delta(Year_Virtual.2003,Region_Virtual.Newfoundla
nd and Labrador)44,800 - delta(Year_Virtual.2004,Region_Virtual.Newfoundla
nd and Labrador)46,100 - delta(Year_Virtual.2005,Region_Virtual.Newfoundla
nd and Labrador)47,600
10XML RepresentationSchema for (1) table (2)
categories (3) data cells (4) augmentation
- ltInterpretedTable xsinoNamespaceSchemaLocation"G
\RPI\XML\02_TableInterface.XS.070803.xml"
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce"gt - ltTable TableOID"Table2" Number"2"
DocumentCitation"Wang's Thesis" Title"Wang
table" Caption"Grades in 1991 and 1992"gt - ltCategoryNodesgt
- ltCategoryNode CategoryNodeOID"C1"
Label"Median Total Income"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C11"
Label"Canada"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C12"
Label"Newfoundland and Labrador"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C13"
Label"Prince Edward Island"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C14"
Label"Nova Scotia"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C15"
Label"New Brunswick"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C16"
Label"Quebec"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C17"
Label"Ontario"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C18"
Label"Manitoba"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C19"
Label"Saskatchewan"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C110"
Label"Alberta"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C111"
Label"British Columbia"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C112"
Label"Yukon Territory"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C113"
Label"Northwest Territories"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C114"
Label"Nunavut"gtlt/CategoryNodegt - ltCategoryNode CategoryNodeOID"C2"
Label"Year (Virtual)"gtlt/CategoryNodegt
11Verification tool category headers for a
selected content cell
12Verification toolcontent cells for a selected
header
13Verification toolhierarchical category
structure for a selected content cell
14(b) Query by Table
Income 2002 4500 2003 3300 2004 1240 2
005 3400
Income 2002 2003 2004 2005
QBT
InterpretQuery Table
Database
Ontology from many tables
15Query Table
Composed in MS-Excel by a person seeking
information from an ontology compiled from many
web tables
16Display of automatically processed Query Table
for human verification
17Wang notation for Query Table
18QBT identifies requested data
19URLs of tables in the Example Database
- Median Total Income http//www40.statcan.ca/l0
1/cst01/famil108a.htm?sdimedian20income - Number of Induced Abortions http//www40.statcan
.ca/l01/cst01/health40a.htm?sdiabortions - Number of Divorces http//www40.statcan.ca/l
01/cst01/famil02.htm?sdinumber20divorces - Infant Mortality Rate http//www40.statcan.ca
/l01/cst01/health21a.htm?sdiinfant20mortality20
rate - Trips By Canadians in Canada http//www40.statc
an.ca/l01/cst01/arts26a.htm - Number of Homicideshttp//www40.statcan.ca/l01/c
st01/legal12a.htm?sdihomicide - Populationhttp//www40.statcan.ca/l01/cst01/demo
02a.htm?sdipopulation - Number of Persons with Diabetes
- http//www40.statcan.ca/l01/cst01/health54a.
htm?sdidiabetes - Number of Persons with Asthma
- http//www40.statcan.ca/l01/cst01/health50a.htm?sd
iasthma - University Degrees Awarded to Males
- http//www40.statcan.ca/l01/cst01/educ51b.ht
m - University Degrees Awarded to Females
- http//www40.statcan.ca/l01/cst01/educ51c.ht
m - Food services and drinking places (13
tables)http//www40.statcan.ca/l01/cst01/serv24j
20Fields in the Example Database
- IDENTIFIER
- REGION
- YEAR
- NUMBER_OF_ABORTIONS
- ABORTION_RATE
- NUMBER_OF_DIVORCES
- INFANT_MORTALITY_RATE
- NUMBER_OF_TRIPS
- MEDIAN_TOTAL_INCOME
- POPULATION
- NUMBER_OF_HOMICIDES
- GENDER
- INCIDENCE_OF_DIABETES
- UNIVERSITY_DEGREES_AWARDED
- INCIDENCE_OF_ASTHMA
- RESTAURANT_OPERATING_REVENUE
- RESTAURANT_OPERATING_EXPENSES
- RESTAURANT_OPERATING_PROFIT_MARGIN
- RESTAURANT_OPERATING_WAGES
21QBT fills in requested data from Example Database
22A current puzzle
Year Region Gender Diabetics
2002 Alberta Male XX
2002 Alberta Female XX
2002 Ontario Male XX
2002 Ontario Female XX
Year Region Diabetics Diabetics
Year Region Male Female
2002 Alberta XX XX
2002 Ontario XX XX
- How can QBT tell that these two query tables
represent the same request? NB Although
plausible, both of these tables exemplify poor
layout.
23Next steps
- Complete the conversion of Wang/XML table
descriptions to mini ontologies - Improve the interface for generating cumulative
ontology from mini ontologies - Implement database generation from ontology
- Embed logging routines for statistical evaluation
of time/error trade-offs