Title: Semantic web role and its method: Domain ontology
1Semantic web role and its method Domain ontology
2(No Transcript)
3??? ??
- ???????,???????????????,???????????????,?????
??????????,????????
4????????????
- ???????,???????
- ???????,???????????????,????????
5Outline
- Introduction
- Literature reviews
- Ontology application
- Ontology construction
- Experimental results
- Conclusions and current research
6Introduction
- Introduction
- Background
- Motivation
- Objective
- Literature reviews
- Ontology construction
- Experimental results
- Conclusions and future works
7Background (1/2)
- The content of web sites changes rapidly and
grows very fast - How to understand querists needs and how to find
related web pages from the Internet are very
important. - Yahoo vs. Google
8Background (2/2)
- The main drawback of current search engines is
that they cant read the real semantic of the web
page content. They dont use the domain specific
knowledge for web page analyses. - The concept of Semantic Web has been proposed
recently.
9Motivation
- Semantic web and ontology
- The construction of successful semantic web
depends on whether the ontology can be
constructed rapidly and easily. - Most of the research on ontology construction is
determined by domain experts. It is difficult to
modify the concepts of an existed domain ontology
for a semantic web.
10Objective
- A large number of ontology representation methods
have been proposed. - we use the hierarchical tree structure to
represent the domain ontology because it is the
most general one . - Methods of construct ontology
- Manual construction
- Semi-automatic construction
- full-automatic construction
11Literature reviews
- Introduction
- Literature reviews
- Semantic web
- Ontology
- Information classification model
- Single value decomposition
- Adaptive resonance theory network
- Ontology construction
- Experimental results
- Conclusions and future works
12Semantic web (1/2)
- Drawbacks of existing network
- The information is presented in documents.
- It is unable to process or extract the
information that people actually need. - Semantic web is an extension of the existing
network structure - Provide a new foundations of data description.
- Promotional development network service
automatically. - Make the information understandable to machines.
13Semantic web (2/2)
- Builds the high-level languages on low-level
languages progressively. - Offers the information that the computer can read
without revising the existing webpage content.
14Ontology (1/4)
- The W3C has defined ontology as knowledge for
describing and expressing various domains using
concepts, definitions, and relations. - Ontology usually appears in the form of semantic
web. - A node represents a concept or an individual
entity on the semantic web.
15Ontology (2/4)
- Gruber definition An ontology is a formal,
explicit specification of a shared
conceptualization - Conceptualization a certain existing phenomenon
or the relevant abstract model of concept of the
definite phenomenon in the field. - Share ontology is shared by a group, not an
individual. - Formal ontology can be read and understood by
computers. - Explicit the concept form and restriction of
ontology can be expressed in clear way.
16Ontology (3/4)
- Gruber thought the elements of ontology include
- Concept Concept can be used to represent any
thing in the real world. It is usually
organized as a tree structure in ontology. - Relation Relation means the connection between
concepts of the certain types. - Function Function is a special case for
Relation. - Axiom The axiom is used to model the fact.
- Instance The instance is the appearance of
concretized concept.
17Ontology (4/4)
- Ontology language is extended from the XML
(Extensible Markup Language) syntax. - It is responsible for W3C to formulate and renew.
18Domain Ontology Applications
- Grigoris Antoniou
- Frank van Harmelen
19(No Transcript)
20- Horizontal Information Products at Elsevier
- Data Integration at Audi
- Skill Finding at Swiss Life
- Think Tank Portal at EnerSearch
- E-Learning
- Web Services
- Other Scenarios
21Elsevier The Setting
- Elsevier is a leading scientific publisher.
- Its products are organized mainly along
traditional lines - Subscriptions to journals
- Online availability of these journals has until
now not really changed the organisation of the
productline - Customers of Elsevier can take subscriptions to
online content
22Elsevier The Problem
- Traditional journals are vertical products
- Division into separate sciences covered by
distinct journals is no longer satisfactory - Customers of Elsevier are interested in covering
certain topic areas that spread across the
traditional disciplines/journals - The demand is rather for horizontal products
23Elsevier The Problem (2)
- Currently, it is difficult for large publishers
to offer such horizontal products - Barriers of physical and syntactic heterogeneity
can be solved (with XML) - The semantic problem remains unsolved
- We need a way to search the journals on a
coherent set of concepts against which all of
these journals are indexed
24Elsevier The Contribution of Semantic Web
Technology
- Ontologies and thesauri (very lightweight
ontologies) have proved to be a key technology
for effective information access - They help to overcome some of the problems of
free-text search - They relate and group relevant terms in a
specific domain - They provide a controlled vocabulary for indexing
information
25Elsevier The Contribution of Semantic Web
Technology (2)
- A number of thesauri have been developed in
different domains of expertise - Medical information MeSH and Elseviers life
science thesaurus EMTREE - RDF is used as an interoperability format between
heterogeneous data sources - EMTREE is itself represented in RDF
26Elsevier The Contribution of Semantic Web
Technology (3)
- Each of the separate data sources is mapped onto
this unifying ontology - The ontology is then used as the single point of
entry for all of these data sources
27Ontology construction
28Information classification model
- There are three traditional information
classification models - Vector space model
- Probabilistic model
- Boolean model
29Vector space and probabilistic model
- Vector space model
- The element represents the number of keywords
that appear in a document. The cosine similarity
method is used to find the related web pages. - Probabilistic model
- This model uses a probabilistic approach to
evaluate the relationships among web pages and to
judge whether they are related.
30Boolean model
- It is the simplest categorized method, which is
based on set theory and Boolean algebra. Boolean
model can be divided into three relations
inheritance, intersection and independence
31Single Value Decomposition (1/2)
- Row represents documents and column indicates
keywords. - Whether a keywords appears in a document is
represented as an element.
32Single Value Decomposition (2/2)
- Latent Semantic Analysis, LSA project document
and keywords to a low dimension. - Using Singular Value Decomposition, SVD to remove
unnecessary information.
33Adaptive resonance theory network (1/3)
- ART network is an unsupervised learning network
- Principle
- The theory of ART grew from the theory of
cognition. - It is similar to a human neural system. Not only
does it learn new examples, but also preserves
old memories.
34Adaptive resonance theory network (2/3)
- Characteristic
- It has the features of both stability and
plasticity. - In order to resolve the antinomy of stability and
plasticity, the ART network adjusts the vigilance
value. - Advantage
- The learning speed is quick.
- The consumption memory space is small.
- Does not have beforehand to establish the group
number.
35Adaptive resonance theory network (3/3)
- The structure of the ART network
- Input layer The input data is training samples.
- Output layer This presents the results of the
trained network. - Weight connections This connects the input layer
and the output layer
36Ontology construction
- Introduction
- Literature reviews
- Ontology construction
- Analyzing web pages
- Finding the TF-IDF values of terms
- Reducing the matrix and transfer elements to
duality data - Using a recursive ART network to cluster the web
pages - Applying a Boolean model to construct an ontology
- Representing the ontology using a Jena package
- Experimental results
- Conclusions and future works
37Ontology construction
WWW
Use TF-IDF to find the concept of each group
Boolean method
Constructrelation
Web pages analysis
Whether satisfied low document
Create ontology
Stop-word
Produce RDF ontology
Finding TF-IDF
ART networkfor cluster
SVD operation
38Analyzing web pages (1/2)
- After collect web page, the system removes stop
words. - Stop words can avoid wrong judgment when there
are some non-important words but appear the
frequency to be high.
39Analyzing web pages (2/2)
- Most web pages are written in HTML. HTML uses
open/closed tags to indicate web page commands. - Tij nij Wm
- Tij expressed concept Cj appears in web page di
weight. - nij expressed concept Cj the frequency which
appears under the different tag. - Wm expressed the weight of tag.
40TF-IDF
- Our research uses the product of TF and IDF to
represent the importance of a keyword in the
document. - TFi,jit is the term relative to the frequency
of keyword i in a document j after weight
operation. - IDFi it is the inverse document frequency of
term i, that is the reciprocal of appear
frequency of term i in all document. - N is the number of all documents
- ni is the number of appearances of term i in the
number of documents N.
41Reducing the matrix and transfer elements to
duality data
- We list out the keyword and webpage documents to
make a duality matrix. - If the keywords appear in the documents, the
keyword is set to 1 if not, it is set to 0. The
SVD operation is used to reduce the large matrix
to a small one
42Using the recursive ART network to cluster the
web pages
- We propose a recursive ART network algorithm to
produce a tree structure
43Recursive ART
44Recursive ART
45Applying Boolean operation
- The Boolean model is used to modulate and
construct the relation between different
concepts. - For example, imagine ten documents involving four
types of concepts Transports, flying, boats, and
airplanes. - Documents containing transports 1, 2, 3, 4, 5,
6, 7, 8, 9, 10. - Documents containing fly 2, 3, 6, 7, 9, 10.
- Documents containing boat 1, 4, 5, 8.
- Documents containing airplane 6, 7, 10.
46Generating ontology through the Jena package (1/3)
- A Resource description framework (RDF) is a
framework developed by W3C and metadata groups. - It is able to carry several metadata while
roaming on the Internet. - RDF provides interoperability between
applications that exchange machine-understandable
information on the web -
47Generating ontology through the Jena package (2/3)
- Describe Web resource data
- Resourceanything that have URI
- Descriptiondescribe property of resource
- Three main elements
- Subject
- Predicate
- object
48Generating ontology through the Jena package (3/3)
- A given problem may be represented by a meaning
graph of the RDF - where the URI is a web resource and author is a
property with the value John
49Experiments
50Experimental results
- Experiment environment
- Pentium-4 2.4G
- 512MB RAM
- JAVA program language
- RDF ontology language
51Experimental results
- Introduction
- Literature reviews
- Ontology construction
- Experimental results
- First stage experiment
- Second stage experiment
- Conclusions and future works
52First stage experiment
- We select a musical instrument ontology
constructed by an expert for semi-automatic
experiment. - We use the keywords of the existing domain
ontology to produce a new ontology provided by
our method. - After the new ontology has been created, we
compare the new ontology with the expert ontology
to demonstrate the precision of our method.
53Data (1/2)
- Ontology
- http//www.db-net.aueb.gr/thesus/onto/instrum.rdf
- 52 concepts
- has and sub-class relations
- Data
- Collected Web pages on Music/Instruments/
domain. - There are 36 catalogs in that domain.
- 518 Web pages.
54Data (2/2)
Category Number Category Number Category Number Category Number
Instrument 15 Lute 5 Gong 2 Woodwind 2
Synthesizer 5 Bass 32 Accordion 44 Bassoon 8
Stringed 3 Cello 9 Brass 17 Clarinet 12
Percussion 9 Viola 5 Horn 14 Flute 13
Wind 6 Violin 20 Saxophone 25 Oboe 12
Banjo 26 Mandolin 24 Trombone 11 Panpipes 3
Guitar 24 Piano 19 Trumpet 29 Piccole 5
Harp 20 Bell 3 Tuba 6 Recorder 26
Harpichord 14 Drums 33 Harmonium 6 Harmonica 14
55Mark matrix
- After analyses web pages, the column denotes
keywords, the row represents web documents. If
the keyword can be found in the web document, it
will be set to 1, otherwise it will be set to
0.
56Recursive ART (1/2)
- The recursive ART network will check whether the
output values are greater than the vigilance. We
test the vigilance step-by-step from 0.1 to 0.9
with an increment of 0.1.
57Recursive ART (2/2)
- The clustering of the ART network results in 78
groups. - we calculated the keywords TF/IDF values for each
group, using the highest value to represent the
keyword of the group. - Each group generates a representative keyword,
deleting identical representative keywords among
different groups, and then leaving only 40
keywords.
58group Key-term group Key-term
1 Drum 21 Trumpet
2 Pinched 22 Viola
3 Bass 23 Tuba
4 Harp 24 Clarinet
5 Mandolin 25 String
6 Piccolo 26 Wind
7 Harmonica 27 Trombone
8 Piano 28 Flute
9 Harpsichord 29 Woodwind
10 Violin 30 Bell
11 Guitar 31 Brass
12 Cymbal 32 recorder
13 Accordion 33 Gong
14 Oboe 34 Panpipes
15 Cello 35 Battery
16 Lyre 36 Tambourine
17 Instrument 37 Triangle
18 Percussion 38 Harmonium
19 Synthesizer 39 Bassoon
20 Saxophone 40 banjo
59Output ontology
- we obtain a 5-level ontology from the 40
candidate nodes by Boolean logic level
operations.
60Evaluation (1/5)
- After producing the ontology, we will compared
this new ontology with the expert-defined
ontology. - Precision and recall rate are then used to
evaluate our ontology. - In order to estimate the precision of the system,
we defined two kind of precision evaluation
methods.
61Evaluation (2/5)
- Concept precision demonstrates the precision of
the keywords the system selects. - Concept_location precision not only demonstrates
the precision of the selected keywords but also
shows the precision of the location in the
hierarchy relations. - Precision (C_P)
Precision (C_L_P) - Recall (R)
62Evaluation (3/5)
Expert- Defined concepts Concepts Not defined by expert Expert- defined, right location Expert- Defined location in error
Keywords generated by system A B C D
Keywords not generated by system E
Expert concepts
System keywords
63Experts defined ontology
- The ontology of the musical instrument domain
generated by the experts.
64Evaluation (4/5)
Expert- Defined concepts Concepts Not defined by expert Expert- defined, right location Expert- Defined location in error
Keywords generated by system 40 0 29 11
Keywords not generated by system 12
Expert concepts
System keywords
65Evaluation (5/5)
- When compared with the ontology defined by an
expert, the experimental results indicate our
proposed method - Precision (C_P) 100 concept precision.
- Precision (C_L_P) 73 concept hierarchy
precision. - recall rate of 77,
66Second stage experiment (1/2)
- We selected the beer domain and collected web
pages from the Internet. There are 18 catalogues,
212 web pages.
Catalogue Number Of web pages Catalogue Number Of web pages
ale 26 pilsner 7
beer 36 microbrewery 4
bitter 6 hop 23
brewery 26 festival 10
larger 14 bock 5
liquid 2 bitter 6
yeast 6 ingredient 11
stout 11 organization 5
porter 7 award 7
67Second stage experiment (2/2)
- The system selected 1,688 noun terms from the
6,914 input terms. The system then calculated
higher TF-IDF to obtain useful keywords from the
1,688 terms. - We also constructed a matrix in which the column
denotes ontology keywords while the row
represents web documents. - If the keyword can be found in the web document,
it will be set to 1 otherwise, it will be set
0.
68keyword TF-IDF value keyword TF-IDF value
ale 0.91 fermentation 0.617
association 0.89 grist 0.61
award 0.88 kraeusen 0.61
beer 0.872 mash 0.61
bitter 0.81 maltose 0.6
bock 0.81 pasteurization 0.6
brewery 0.81 wort 0.6
festival 0.80 cask 0.6
hop 0.77 firkin 0.59
ingredient 0.72 exchanger 0.58
lager 0.71 adjunct 0.58
liquid 0.70 dme 0.57
malt 0.70 hops 0.57
microbrewery 0.698 malt 0.57
organization 0.698 yeast 0.56
pilsner 0.69 alcoholic 0.56
porter 0.69 aroma 0.56
shout 0.68 astringent 0.56
yeast 0.66 bitter 0.55
dope 0.66 diacetyl 0.55
69dunker 0.66 esters 0.55
farmhouse 0.66 grainy 0.52
hefeweizen 0.66 happyhours 0.51
helles 0.658 skunked 0.5
kolsch 0.65 oxidation 0.5
lager 0.65 phenolic 0.5
lambic 0.64 yeasty 0.49
maibock 0.64 brewpub 0.483
marzen 0.63 camre 0.47
mead 0.62 breweriana 0.47
mild 0.62 rauchbier 0.46
munchener 0.62 saison 0.4
pilsener 0.51 steinbier 0.4
pilsner 0.51 stout 0.4
pils 0.51 vienna 0.4
porter 0.51
70Recursive ART (1/2)
- The recursive ART network will check whether the
output - values are greater than the vigilance. We test
the vigilance - step-by-step from 0.1 to 0.9 with an increment of
0.1.
71Recursive ART (2/2)
- The clustering performed by recursive ART network
yields 29 groups.
group documents group Documents
1 26 16 8
2 17 17 9
3 22 18 8
4 23 19 2
5 23 20 6
6 9 21 6
7 2 22 7
8 17 23 8
9 8 24 6
10 6 25 8
11 7 26 9
12 12 27 8
13 6 28 4
14 4 29 4
15 8
72Output ontology
- In this manner, Each group generates a
representative keyword, deleting identical
representative keywords among different groups,
and then leaving only 13 keywords. Boolean logic
is used to calculate relationships between levels
of concepts.
73Evaluate (1/2)
- After producing the ontology, its precision must
be evaluated. However, there was no another
ontology to compare with. So we invited domain
experts to evaluate its precision.
Identifies the term Does Not identify the term Identifies the term and location is right Identifies the term but location is in error
The system generates the concepts A B C D
User view of the terms
System terms
74Evaluate (2/2)
- Precision (C_P)
- Precision (C_L_P)
- The average Precision (C_P) of domain experts
evaluate is 0.794 (almost 79), and the average
Precision (C_L_P) of domain experts evaluate is
0.742 (almost 74).
75RDF format
- Finally, we used the W3C standard for ontology
web languages to record the ontology, and
outputted the results in a Jena package using an
RDF format.
76Conclusions (1/2)
- Ontology can help user to learn and search
related information effectively. Constructing an
ontology fast and correctly has become an
important topic for content based search on the
Internet. - Our proposed method does require less time to
select keywords and to define the relations
automatically with human intervention.
77Conclusions (2/2)
- The proposed method facilitates users
understanding of the content of data and its
relevancy, and is able to suggest content that is
highly relevant. - In the future, we will focus on investigations a
better method for finding multi-relations among
terms, and extend the systems abilities to cover
a multi-field ontology as the foundation for
robust and accurate ontology constructing.
78Current Reasearch
- Sensors Network Intrusion Detection.
- Ontology application on Medical Knowledge
- Ontology merging and alignment
- Using applied soft computing to solve problems
- Web pages analysis
- Image processing
- RFID Application
79