Ontological Classification of Web Pages

1 / 18

About This Presentation

Title:

Ontological Classification of Web Pages

Description:

Web pages presented on the internet do not conform to any data organization ... flights, travel insurances, vacation package discounts, cheap flights and etc. ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 19

Provided by: zafere

more less

Transcript and Presenter's Notes

Title: Ontological Classification of Web Pages

1

Ontological Classification of Web Pages
Zafer Erenel
Many users use search engines to locate and buy
goods and services (such as choosing a vacation).
Web pages presented on the internet do not
conform to any data organization standard and
search engines provide primitive query
capabilities for users to retrieve relevant data
1.
In addition to that, they do not list sites
equally and are inclined toward listing more
popular pages. These tendencies brush many web
pages aside and leave a limited number of
alternatives to the users.I have created a
lightweight domain ontology that consists of a
taxonomic hierarchy and made use of it by an
automated agent to classify web pages on the
internet.
The automated agent discovers and classifies
relevant pages with the help of Yahoo and Google
search engines.

Related Research
Desai and Spink presented a clustering scheme
that groups documents into partially and
substantially relevant pages by using similarity
measures and ranking heuristics 2.
They worked with the end-user queries (limited
number of terms) to obtain the relevance score.
Instead, my automated agent will act along with
an established ontology to discover and classify
documents.
Chiang, Chua, and Storey parsed snippets of
returned links to find the ratio of the number of
matching terms to rank the web pages for
relevance 1.
I believe snippets consist of a very few number
of words and we cannot judge the web page on the
basis of snippets. My agent scours the entire web
page which is more time-consuming but more
effective

Yahoo and Google search results contain scores of
links.
My lightweight domain ontology consists of 7
branches. Each branch is comprised of
predetermined terms.
Score of the web page increases in a certain
branch as the agent comes across these
predetermined terms on the html code.
Ive chosen country ontology because internet
users interest in a certain country can be quite
high.

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7

The ranks of web pages in each cluster will
clarify their content to the user.
In addition to that, we can compare result sets
of different search engines (Yahoo and Google)
for the same queries and find complement and
intersection of their result sets to have a clear
understanding of search engines behaviors.

Ive used web stream classes in C Programming
language to create my agent.
A WebRequest is an object that requests a Uniform
Resource Identifier (URI) such as the URL for a
web page 3.
You can use a WebRequest object to create a
WebResponse object that will encapsulate the
object pointed to by the URI.
Once you get the actual object (e.g., a web page)
pointed to by the URI, what you get back is a
stream of the web page.

I used this capability for reading a page from a
site to extract the information I need. I have
created two web requests using search syntax
given below

http//www.google.com/search?qcyprusvacationtra
vellrstart0saN
http//search.yahoo.com/search?pcyprusvacationt
ravelb1

Google search engine has returned 200 URLs and I
have created 200 web requests to extract relevant
information from each web page.
Yahoo search engine has been used in the same
manner to extract relevant information.

11
(No Transcript)
12
(No Transcript)
13

If we analyze price rankings, we come across
pages that have information about student
flights, travel insurances, vacation package
discounts, cheap flights and etc..
If we analyze nature rankings, we come across web
pages that offer adventure and etc.
If I want to do scuba diving on my vacation, I
know that hawai and fiji are among my options by
looking at activities rankings

Interestingly enough, in the top 100 search
lists, the number of web pages that both appear
on Google and Yahoo is 19.

In the top 200 search lists, the number of web
pages that both appear on Google and Yahoo is 24.

16
(No Transcript)
17

As a result, ontologically organized clusters of
web sites that are offering information about a
given country regarding vacation and travel
alternatives serve our objective to a greater
extent in finding what we are in search of.
In my work, I have used 2 search engines and a
single ontology. Search Engines shortcomings can
be prevented by combining multiple engines with
multiple ontologies to ease the search for most
needed information on the internet.
Venn Diagrams prove that a specific search engine
is not very effective by itself.

18
References 1 R.H.L. Chiang, C.E.H. Chua, V.C.
Storey, A smart web query method for semantic
retrieval of web data, Data Knowledge
Engineering 38 (2001) 63-84. 2 M. Desai, A.
Spink, An algorithm to cluster documents based on
relevance, Information Processing and Management
41 (2005) 1035-1049. 3 Liberty, J.,
Programming C,3rd ed. OREILLY, 2003.

Write a Comment

User Comments (0)