Semantic web role and its method: Domain ontology

About This Presentation

Title:

Semantic web role and its method: Domain ontology

Description:

1. Department of Information Management Chaoyang University of Technology ... In order to resolve the antinomy of stability and plasticity, the ART network ... – PowerPoint PPT presentation

Number of Views:118

Avg rating:3.0/5.0

Slides: 80

Provided by: csieN4

Category:

more less

Transcript and Presenter's Notes

Title: Semantic web role and its method: Domain ontology

1
Semantic web role and its method Domain ontology

Rung Ching Chen (???)

2
(No Transcript)
3
??? ??

???????,???????????????,???????????????,?????
??????????,????????

4
????????????

???????,???????
???????,???????????????,????????

5
Outline

Introduction
Literature reviews
Ontology application
Ontology construction
Experimental results
Conclusions and current research

6
Introduction

Introduction
Background
Motivation
Objective
Literature reviews
Ontology construction
Experimental results
Conclusions and future works

7
Background (1/2)

The content of web sites changes rapidly and
grows very fast
How to understand querists needs and how to find
related web pages from the Internet are very
important.
Yahoo vs. Google

8
Background (2/2)

The main drawback of current search engines is
that they cant read the real semantic of the web
page content. They dont use the domain specific
knowledge for web page analyses.
The concept of Semantic Web has been proposed
recently.

9
Motivation

Semantic web and ontology
The construction of successful semantic web
depends on whether the ontology can be
constructed rapidly and easily.
Most of the research on ontology construction is
determined by domain experts. It is difficult to
modify the concepts of an existed domain ontology
for a semantic web.

10
Objective

A large number of ontology representation methods
have been proposed.
we use the hierarchical tree structure to
represent the domain ontology because it is the
most general one .
Methods of construct ontology
Manual construction
Semi-automatic construction
full-automatic construction

11
Literature reviews

Introduction
Literature reviews
Semantic web
Ontology
Information classification model
Single value decomposition
Adaptive resonance theory network
Ontology construction
Experimental results
Conclusions and future works

12
Semantic web (1/2)

Drawbacks of existing network
The information is presented in documents.
It is unable to process or extract the
information that people actually need.
Semantic web is an extension of the existing
network structure
Provide a new foundations of data description.
Promotional development network service
automatically.
Make the information understandable to machines.

13
Semantic web (2/2)

Builds the high-level languages on low-level
languages progressively.
Offers the information that the computer can read
without revising the existing webpage content.

14
Ontology (1/4)

The W3C has defined ontology as knowledge for
describing and expressing various domains using
concepts, definitions, and relations.
Ontology usually appears in the form of semantic
web.
A node represents a concept or an individual
entity on the semantic web.

15
Ontology (2/4)

Gruber definition An ontology is a formal,
explicit specification of a shared
conceptualization
Conceptualization a certain existing phenomenon
or the relevant abstract model of concept of the
definite phenomenon in the field.
Share ontology is shared by a group, not an
individual.
Formal ontology can be read and understood by
computers.
Explicit the concept form and restriction of
ontology can be expressed in clear way.

16
Ontology (3/4)

Gruber thought the elements of ontology include
Concept Concept can be used to represent any
thing in the real world. It is usually
organized as a tree structure in ontology.
Relation Relation means the connection between
concepts of the certain types.
Function Function is a special case for
Relation.
Axiom The axiom is used to model the fact.
Instance The instance is the appearance of
concretized concept.

17
Ontology (4/4)

Ontology language is extended from the XML
(Extensible Markup Language) syntax.
It is responsible for W3C to formulate and renew.

18
Domain Ontology Applications

Grigoris Antoniou
Frank van Harmelen

19
(No Transcript)
20

Horizontal Information Products at Elsevier
Data Integration at Audi
Skill Finding at Swiss Life
Think Tank Portal at EnerSearch
E-Learning
Web Services
Other Scenarios

21
Elsevier The Setting

Elsevier is a leading scientific publisher.
Its products are organized mainly along
traditional lines
Subscriptions to journals
Online availability of these journals has until
now not really changed the organisation of the
productline
Customers of Elsevier can take subscriptions to
online content

22
Elsevier The Problem

Traditional journals are vertical products
Division into separate sciences covered by
distinct journals is no longer satisfactory
Customers of Elsevier are interested in covering
certain topic areas that spread across the
traditional disciplines/journals
The demand is rather for horizontal products

23
Elsevier The Problem (2)

Currently, it is difficult for large publishers
to offer such horizontal products
Barriers of physical and syntactic heterogeneity
can be solved (with XML)
The semantic problem remains unsolved
We need a way to search the journals on a
coherent set of concepts against which all of
these journals are indexed

24
Elsevier The Contribution of Semantic Web
Technology

Ontologies and thesauri (very lightweight
ontologies) have proved to be a key technology
for effective information access
They help to overcome some of the problems of
free-text search
They relate and group relevant terms in a
specific domain
They provide a controlled vocabulary for indexing
information

25
Elsevier The Contribution of Semantic Web
Technology (2)

A number of thesauri have been developed in
different domains of expertise
Medical information MeSH and Elseviers life
science thesaurus EMTREE
RDF is used as an interoperability format between
heterogeneous data sources
EMTREE is itself represented in RDF

26
Elsevier The Contribution of Semantic Web
Technology (3)

Each of the separate data sources is mapped onto
this unifying ontology
The ontology is then used as the single point of
entry for all of these data sources

27
Ontology construction
28
Information classification model

There are three traditional information
classification models
Vector space model
Probabilistic model
Boolean model

29
Vector space and probabilistic model

Vector space model
The element represents the number of keywords
that appear in a document. The cosine similarity
method is used to find the related web pages.
Probabilistic model
This model uses a probabilistic approach to
evaluate the relationships among web pages and to
judge whether they are related.

30
Boolean model

It is the simplest categorized method, which is
based on set theory and Boolean algebra. Boolean
model can be divided into three relations
inheritance, intersection and independence

31
Single Value Decomposition (1/2)

Row represents documents and column indicates
keywords.
Whether a keywords appears in a document is
represented as an element.

32
Single Value Decomposition (2/2)

Latent Semantic Analysis, LSA project document
and keywords to a low dimension.
Using Singular Value Decomposition, SVD to remove
unnecessary information.

33
Adaptive resonance theory network (1/3)

ART network is an unsupervised learning network
Principle
The theory of ART grew from the theory of
cognition.
It is similar to a human neural system. Not only
does it learn new examples, but also preserves
old memories.

34
Adaptive resonance theory network (2/3)

Characteristic
It has the features of both stability and
plasticity.
In order to resolve the antinomy of stability and
plasticity, the ART network adjusts the vigilance
value.
Advantage
The learning speed is quick.
The consumption memory space is small.
Does not have beforehand to establish the group
number.

35
Adaptive resonance theory network (3/3)

The structure of the ART network
Input layer The input data is training samples.
Output layer This presents the results of the
trained network.
Weight connections This connects the input layer
and the output layer

36
Ontology construction

Introduction
Literature reviews
Ontology construction
Analyzing web pages
Finding the TF-IDF values of terms
Reducing the matrix and transfer elements to
duality data
Using a recursive ART network to cluster the web
pages
Applying a Boolean model to construct an ontology
Representing the ontology using a Jena package
Experimental results
Conclusions and future works

37
Ontology construction
WWW
Use TF-IDF to find the concept of each group
Boolean method
Constructrelation
Web pages analysis
Whether satisfied low document
Create ontology
Stop-word
Produce RDF ontology
Finding TF-IDF
ART networkfor cluster
SVD operation
38
Analyzing web pages (1/2)

After collect web page, the system removes stop
words.
Stop words can avoid wrong judgment when there
are some non-important words but appear the
frequency to be high.

39
Analyzing web pages (2/2)

Most web pages are written in HTML. HTML uses
open/closed tags to indicate web page commands.
Tij nij Wm
Tij expressed concept Cj appears in web page di
weight.
nij expressed concept Cj the frequency which
appears under the different tag.
Wm expressed the weight of tag.

40
TF-IDF

Our research uses the product of TF and IDF to
represent the importance of a keyword in the
document.
TFi,jit is the term relative to the frequency
of keyword i in a document j after weight
operation.
IDFi it is the inverse document frequency of
term i, that is the reciprocal of appear
frequency of term i in all document.
N is the number of all documents
ni is the number of appearances of term i in the
number of documents N.

41
Reducing the matrix and transfer elements to
duality data

We list out the keyword and webpage documents to
make a duality matrix.
If the keywords appear in the documents, the
keyword is set to 1 if not, it is set to 0. The
SVD operation is used to reduce the large matrix
to a small one

42
Using the recursive ART network to cluster the
web pages

We propose a recursive ART network algorithm to
produce a tree structure

43
Recursive ART
44
Recursive ART

45
Applying Boolean operation

The Boolean model is used to modulate and
construct the relation between different
concepts.
For example, imagine ten documents involving four
types of concepts Transports, flying, boats, and
airplanes.
Documents containing transports 1, 2, 3, 4, 5,
6, 7, 8, 9, 10.
Documents containing fly 2, 3, 6, 7, 9, 10.
Documents containing boat 1, 4, 5, 8.
Documents containing airplane 6, 7, 10.

46
Generating ontology through the Jena package (1/3)

A Resource description framework (RDF) is a
framework developed by W3C and metadata groups.
It is able to carry several metadata while
roaming on the Internet.
RDF provides interoperability between
applications that exchange machine-understandable
information on the web

47
Generating ontology through the Jena package (2/3)

Describe Web resource data
Resourceanything that have URI
Descriptiondescribe property of resource
Three main elements
Subject
Predicate
object

48
Generating ontology through the Jena package (3/3)

A given problem may be represented by a meaning
graph of the RDF
where the URI is a web resource and author is a
property with the value John

49
Experiments
50
Experimental results

Experiment environment
Pentium-4 2.4G
512MB RAM
JAVA program language
RDF ontology language

51
Experimental results

Introduction
Literature reviews
Ontology construction
Experimental results
First stage experiment
Second stage experiment
Conclusions and future works

52
First stage experiment

We select a musical instrument ontology
constructed by an expert for semi-automatic
experiment.
We use the keywords of the existing domain
ontology to produce a new ontology provided by
our method.
After the new ontology has been created, we
compare the new ontology with the expert ontology
to demonstrate the precision of our method.

53
Data (1/2)

Ontology
http//www.db-net.aueb.gr/thesus/onto/instrum.rdf
52 concepts
has and sub-class relations
Data
Collected Web pages on Music/Instruments/
domain.
There are 36 catalogs in that domain.
518 Web pages.

54
Data (2/2)
Category Number Category Number Category Number Category Number
Instrument 15 Lute 5 Gong 2 Woodwind 2
Synthesizer 5 Bass 32 Accordion 44 Bassoon 8
Stringed 3 Cello 9 Brass 17 Clarinet 12
Percussion 9 Viola 5 Horn 14 Flute 13
Wind 6 Violin 20 Saxophone 25 Oboe 12
Banjo 26 Mandolin 24 Trombone 11 Panpipes 3
Guitar 24 Piano 19 Trumpet 29 Piccole 5
Harp 20 Bell 3 Tuba 6 Recorder 26
Harpichord 14 Drums 33 Harmonium 6 Harmonica 14
55
Mark matrix

After analyses web pages, the column denotes
keywords, the row represents web documents. If
the keyword can be found in the web document, it
will be set to 1, otherwise it will be set to
0.

56
Recursive ART (1/2)

The recursive ART network will check whether the
output values are greater than the vigilance. We
test the vigilance step-by-step from 0.1 to 0.9
with an increment of 0.1.

57
Recursive ART (2/2)

The clustering of the ART network results in 78
groups.
we calculated the keywords TF/IDF values for each
group, using the highest value to represent the
keyword of the group.
Each group generates a representative keyword,
deleting identical representative keywords among
different groups, and then leaving only 40
keywords.

58
group Key-term group Key-term
1 Drum 21 Trumpet
2 Pinched 22 Viola
3 Bass 23 Tuba
4 Harp 24 Clarinet
5 Mandolin 25 String
6 Piccolo 26 Wind
7 Harmonica 27 Trombone
8 Piano 28 Flute
9 Harpsichord 29 Woodwind
10 Violin 30 Bell
11 Guitar 31 Brass
12 Cymbal 32 recorder
13 Accordion 33 Gong
14 Oboe 34 Panpipes
15 Cello 35 Battery
16 Lyre 36 Tambourine
17 Instrument 37 Triangle
18 Percussion 38 Harmonium
19 Synthesizer 39 Bassoon
20 Saxophone 40 banjo
59
Output ontology

we obtain a 5-level ontology from the 40
candidate nodes by Boolean logic level
operations.

60
Evaluation (1/5)

After producing the ontology, we will compared
this new ontology with the expert-defined
ontology.
Precision and recall rate are then used to
evaluate our ontology.
In order to estimate the precision of the system,
we defined two kind of precision evaluation
methods.

61
Evaluation (2/5)

Concept precision demonstrates the precision of
the keywords the system selects.
Concept_location precision not only demonstrates
the precision of the selected keywords but also
shows the precision of the location in the
hierarchy relations.
Precision (C_P)
Precision (C_L_P)
Recall (R)

62
Evaluation (3/5)
Expert- Defined concepts Concepts Not defined by expert Expert- defined, right location Expert- Defined location in error
Keywords generated by system A B C D
Keywords not generated by system E
Expert concepts
System keywords
63
Experts defined ontology

The ontology of the musical instrument domain
generated by the experts.

64
Evaluation (4/5)
Expert- Defined concepts Concepts Not defined by expert Expert- defined, right location Expert- Defined location in error
Keywords generated by system 40 0 29 11
Keywords not generated by system 12
Expert concepts
System keywords
65
Evaluation (5/5)

When compared with the ontology defined by an
expert, the experimental results indicate our
proposed method
Precision (C_P) 100 concept precision.
Precision (C_L_P) 73 concept hierarchy
precision.
recall rate of 77,

66
Second stage experiment (1/2)

We selected the beer domain and collected web
pages from the Internet. There are 18 catalogues,
212 web pages.

Catalogue Number Of web pages Catalogue Number Of web pages
ale 26 pilsner 7
beer 36 microbrewery 4
bitter 6 hop 23
brewery 26 festival 10
larger 14 bock 5
liquid 2 bitter 6
yeast 6 ingredient 11
stout 11 organization 5
porter 7 award 7
67
Second stage experiment (2/2)

The system selected 1,688 noun terms from the
6,914 input terms. The system then calculated
higher TF-IDF to obtain useful keywords from the
1,688 terms.
We also constructed a matrix in which the column
denotes ontology keywords while the row
represents web documents.
If the keyword can be found in the web document,
it will be set to 1 otherwise, it will be set
0.

68
keyword TF-IDF value keyword TF-IDF value
ale 0.91 fermentation 0.617
association 0.89 grist 0.61
award 0.88 kraeusen 0.61
beer 0.872 mash 0.61
bitter 0.81 maltose 0.6
bock 0.81 pasteurization 0.6
brewery 0.81 wort 0.6
festival 0.80 cask 0.6
hop 0.77 firkin 0.59
ingredient 0.72 exchanger 0.58
lager 0.71 adjunct 0.58
liquid 0.70 dme 0.57
malt 0.70 hops 0.57
microbrewery 0.698 malt 0.57
organization 0.698 yeast 0.56
pilsner 0.69 alcoholic 0.56
porter 0.69 aroma 0.56
shout 0.68 astringent 0.56
yeast 0.66 bitter 0.55
dope 0.66 diacetyl 0.55
69
dunker 0.66 esters 0.55
farmhouse 0.66 grainy 0.52
hefeweizen 0.66 happyhours 0.51
helles 0.658 skunked 0.5
kolsch 0.65 oxidation 0.5
lager 0.65 phenolic 0.5
lambic 0.64 yeasty 0.49
maibock 0.64 brewpub 0.483
marzen 0.63 camre 0.47
mead 0.62 breweriana 0.47
mild 0.62 rauchbier 0.46
munchener 0.62 saison 0.4
pilsener 0.51 steinbier 0.4
pilsner 0.51 stout 0.4
pils 0.51 vienna 0.4
porter 0.51
70
Recursive ART (1/2)

The recursive ART network will check whether the
output
values are greater than the vigilance. We test
the vigilance
step-by-step from 0.1 to 0.9 with an increment of
0.1.

71
Recursive ART (2/2)

The clustering performed by recursive ART network
yields 29 groups.

group documents group Documents
1 26 16 8
2 17 17 9
3 22 18 8
4 23 19 2
5 23 20 6
6 9 21 6
7 2 22 7
8 17 23 8
9 8 24 6
10 6 25 8
11 7 26 9
12 12 27 8
13 6 28 4
14 4 29 4
15 8
72
Output ontology

In this manner, Each group generates a
representative keyword, deleting identical
representative keywords among different groups,
and then leaving only 13 keywords. Boolean logic
is used to calculate relationships between levels
of concepts.

73
Evaluate (1/2)

After producing the ontology, its precision must
be evaluated. However, there was no another
ontology to compare with. So we invited domain
experts to evaluate its precision.

Identifies the term Does Not identify the term Identifies the term and location is right Identifies the term but location is in error
The system generates the concepts A B C D
User view of the terms
System terms
74
Evaluate (2/2)

Precision (C_P)
Precision (C_L_P)
The average Precision (C_P) of domain experts
evaluate is 0.794 (almost 79), and the average
Precision (C_L_P) of domain experts evaluate is
0.742 (almost 74).

75
RDF format

Finally, we used the W3C standard for ontology
web languages to record the ontology, and
outputted the results in a Jena package using an
RDF format.

76
Conclusions (1/2)

Ontology can help user to learn and search
related information effectively. Constructing an
ontology fast and correctly has become an
important topic for content based search on the
Internet.
Our proposed method does require less time to
select keywords and to define the relations
automatically with human intervention.

77
Conclusions (2/2)

The proposed method facilitates users
understanding of the content of data and its
relevancy, and is able to suggest content that is
highly relevant.
In the future, we will focus on investigations a
better method for finding multi-relations among
terms, and extend the systems abilities to cover
a multi-field ontology as the foundation for
robust and accurate ontology constructing.

78
Current Reasearch