Title: Ontological Classification of Web Pages
1- Ontological Classification of Web Pages
- Zafer Erenel
- Many users use search engines to locate and buy
goods and services (such as choosing a vacation).
- Web pages presented on the internet do not
conform to any data organization standard and
search engines provide primitive query
capabilities for users to retrieve relevant data
1. - In addition to that, they do not list sites
equally and are inclined toward listing more
popular pages. These tendencies brush many web
pages aside and leave a limited number of
alternatives to the users.I have created a
lightweight domain ontology that consists of a
taxonomic hierarchy and made use of it by an
automated agent to classify web pages on the
internet. - The automated agent discovers and classifies
relevant pages with the help of Yahoo and Google
search engines.
2- Related Research
- Desai and Spink presented a clustering scheme
that groups documents into partially and
substantially relevant pages by using similarity
measures and ranking heuristics 2. - They worked with the end-user queries (limited
number of terms) to obtain the relevance score.
Instead, my automated agent will act along with
an established ontology to discover and classify
documents. - Chiang, Chua, and Storey parsed snippets of
returned links to find the ratio of the number of
matching terms to rank the web pages for
relevance 1. - I believe snippets consist of a very few number
of words and we cannot judge the web page on the
basis of snippets. My agent scours the entire web
page which is more time-consuming but more
effective
3- Yahoo and Google search results contain scores of
links. - My lightweight domain ontology consists of 7
branches. Each branch is comprised of
predetermined terms. - Score of the web page increases in a certain
branch as the agent comes across these
predetermined terms on the html code. - Ive chosen country ontology because internet
users interest in a certain country can be quite
high.
4(No Transcript)
5(No Transcript)
6(No Transcript)
7- The ranks of web pages in each cluster will
clarify their content to the user. - In addition to that, we can compare result sets
of different search engines (Yahoo and Google)
for the same queries and find complement and
intersection of their result sets to have a clear
understanding of search engines behaviors.
8- Ive used web stream classes in C Programming
language to create my agent. - A WebRequest is an object that requests a Uniform
Resource Identifier (URI) such as the URL for a
web page 3. - You can use a WebRequest object to create a
WebResponse object that will encapsulate the
object pointed to by the URI. - Once you get the actual object (e.g., a web page)
pointed to by the URI, what you get back is a
stream of the web page.
9- I used this capability for reading a page from a
site to extract the information I need. I have
created two web requests using search syntax
given below
- http//www.google.com/search?qcyprusvacationtra
vellrstart0saN - http//search.yahoo.com/search?pcyprusvacationt
ravelb1
10- Google search engine has returned 200 URLs and I
have created 200 web requests to extract relevant
information from each web page. - Yahoo search engine has been used in the same
manner to extract relevant information.
11(No Transcript)
12(No Transcript)
13- If we analyze price rankings, we come across
pages that have information about student
flights, travel insurances, vacation package
discounts, cheap flights and etc.. - If we analyze nature rankings, we come across web
pages that offer adventure and etc. - If I want to do scuba diving on my vacation, I
know that hawai and fiji are among my options by
looking at activities rankings
14- Interestingly enough, in the top 100 search
lists, the number of web pages that both appear
on Google and Yahoo is 19.
15- In the top 200 search lists, the number of web
pages that both appear on Google and Yahoo is 24.
16(No Transcript)
17- As a result, ontologically organized clusters of
web sites that are offering information about a
given country regarding vacation and travel
alternatives serve our objective to a greater
extent in finding what we are in search of. - In my work, I have used 2 search engines and a
single ontology. Search Engines shortcomings can
be prevented by combining multiple engines with
multiple ontologies to ease the search for most
needed information on the internet. - Venn Diagrams prove that a specific search engine
is not very effective by itself.
18References 1 R.H.L. Chiang, C.E.H. Chua, V.C.
Storey, A smart web query method for semantic
retrieval of web data, Data Knowledge
Engineering 38 (2001) 63-84. 2 M. Desai, A.
Spink, An algorithm to cluster documents based on
relevance, Information Processing and Management
41 (2005) 1035-1049. 3 Liberty, J.,
Programming C,3rd ed. OREILLY, 2003.