Ontological Classification of Web Pages

1 / 18
About This Presentation
Title:

Ontological Classification of Web Pages

Description:

Web pages presented on the internet do not conform to any data organization ... flights, travel insurances, vacation package discounts, cheap flights and etc. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 19
Provided by: zafere

less

Transcript and Presenter's Notes

Title: Ontological Classification of Web Pages


1
  • Ontological Classification of Web Pages
  • Zafer Erenel
  • Many users use search engines to locate and buy
    goods and services (such as choosing a vacation).
  • Web pages presented on the internet do not
    conform to any data organization standard and
    search engines provide primitive query
    capabilities for users to retrieve relevant data
    1.
  • In addition to that, they do not list sites
    equally and are inclined toward listing more
    popular pages. These tendencies brush many web
    pages aside and leave a limited number of
    alternatives to the users.I have created a
    lightweight domain ontology that consists of a
    taxonomic hierarchy and made use of it by an
    automated agent to classify web pages on the
    internet.
  • The automated agent discovers and classifies
    relevant pages with the help of Yahoo and Google
    search engines.

2
  • Related Research
  • Desai and Spink presented a clustering scheme
    that groups documents into partially and
    substantially relevant pages by using similarity
    measures and ranking heuristics 2.
  • They worked with the end-user queries (limited
    number of terms) to obtain the relevance score.
    Instead, my automated agent will act along with
    an established ontology to discover and classify
    documents.
  • Chiang, Chua, and Storey parsed snippets of
    returned links to find the ratio of the number of
    matching terms to rank the web pages for
    relevance 1.
  • I believe snippets consist of a very few number
    of words and we cannot judge the web page on the
    basis of snippets. My agent scours the entire web
    page which is more time-consuming but more
    effective

3
  • Yahoo and Google search results contain scores of
    links.
  • My lightweight domain ontology consists of 7
    branches. Each branch is comprised of
    predetermined terms.
  • Score of the web page increases in a certain
    branch as the agent comes across these
    predetermined terms on the html code.
  • Ive chosen country ontology because internet
    users interest in a certain country can be quite
    high.

4
(No Transcript)
5
(No Transcript)
6
(No Transcript)
7
  • The ranks of web pages in each cluster will
    clarify their content to the user.
  • In addition to that, we can compare result sets
    of different search engines (Yahoo and Google)
    for the same queries and find complement and
    intersection of their result sets to have a clear
    understanding of search engines behaviors.

8
  • Ive used web stream classes in C Programming
    language to create my agent.
  • A WebRequest is an object that requests a Uniform
    Resource Identifier (URI) such as the URL for a
    web page 3.
  • You can use a WebRequest object to create a
    WebResponse object that will encapsulate the
    object pointed to by the URI.
  • Once you get the actual object (e.g., a web page)
    pointed to by the URI, what you get back is a
    stream of the web page.

9
  • I used this capability for reading a page from a
    site to extract the information I need. I have
    created two web requests using search syntax
    given below
  • http//www.google.com/search?qcyprusvacationtra
    vellrstart0saN
  • http//search.yahoo.com/search?pcyprusvacationt
    ravelb1

10
  • Google search engine has returned 200 URLs and I
    have created 200 web requests to extract relevant
    information from each web page.
  • Yahoo search engine has been used in the same
    manner to extract relevant information.

11
(No Transcript)
12
(No Transcript)
13
  • If we analyze price rankings, we come across
    pages that have information about student
    flights, travel insurances, vacation package
    discounts, cheap flights and etc..
  • If we analyze nature rankings, we come across web
    pages that offer adventure and etc.
  • If I want to do scuba diving on my vacation, I
    know that hawai and fiji are among my options by
    looking at activities rankings

14
  • Interestingly enough, in the top 100 search
    lists, the number of web pages that both appear
    on Google and Yahoo is 19.

15
  • In the top 200 search lists, the number of web
    pages that both appear on Google and Yahoo is 24.

16
(No Transcript)
17
  • As a result, ontologically organized clusters of
    web sites that are offering information about a
    given country regarding vacation and travel
    alternatives serve our objective to a greater
    extent in finding what we are in search of.
  • In my work, I have used 2 search engines and a
    single ontology. Search Engines shortcomings can
    be prevented by combining multiple engines with
    multiple ontologies to ease the search for most
    needed information on the internet.
  • Venn Diagrams prove that a specific search engine
    is not very effective by itself.

18
References 1 R.H.L. Chiang, C.E.H. Chua, V.C.
Storey, A smart web query method for semantic
retrieval of web data, Data Knowledge
Engineering 38 (2001) 63-84. 2 M. Desai, A.
Spink, An algorithm to cluster documents based on
relevance, Information Processing and Management
41 (2005) 1035-1049. 3 Liberty, J.,
Programming C,3rd ed. OREILLY, 2003.
Write a Comment
User Comments (0)