Semantic Web Site Usage Analysis: The ORGAN system - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Semantic Web Site Usage Analysis: The ORGAN system

Description:

John Garofalakis Theodoula Giannakoudi Evangelos Sakkopoulos. RA Computer Technology Institute ... Computer Engineering & Informatics Dept. 26500 Patras, ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 12
Provided by: torch
Category:

less

Transcript and Presenter's Notes

Title: Semantic Web Site Usage Analysis: The ORGAN system


1
Semantic Web Site Usage Analysis The ORGAN
system
  • John Garofalakis Theodoula Giannakoudi
    Evangelos Sakkopoulos

RA Computer Technology Institute Internet and
Multimedia Technologies RU 5 and WestGate N.
Kazantzaki str. 26504, Greece
University of Patras Computer Engineering
Informatics Dept 26500 Patras, Greece
2
ORGAN overview
  • ORGAN Ontology-oRiented usaGe Analysis system
  • Web site usage analysis system by using semantic
    knowledge
  • Traditional usage Web site domain
  • analysis
    knowledge
  • combined semantically enhanced queries
    about the web site usage
  • Available integrated and standalone system

3
System architecture
4
Keywords Extraction
  • Content parsing,HTML
  • tags removal and
  • indexing
  • Removal of stop-words
  • Calculation of words
  • incidence
  • Extraction of top most
  • frequent keywords
  • Spotting of the area
  • around the specific link
  • in the HTML code,
  • which is anchored
  • 100 bytes before the link
  • and 100 bytes
  • after the link.

5
ORGAN Translation
  • Translation of non-English written sites web page
    keywords
  • to facilitate lexical processing
    (through WordNet, for
    words of the English vocabulary only)
  • Automatic keywords translation through a local
    web service

6
ORGAN Metadata Assigner
  • Calculation of semantic similarity
  • WordNet electronic lexical database for words
    of the English vocabulary only following an
    hierarchical concepts organization
  • Wu Palmer measure calculates relatedness by
    considering the depths of the two synsets (one or
    more set of synonyms) in the WordNet taxonomies
  • score 2depth(lcs) / (depth(s1) depth(s2))
  • where lcs root taxonomy, depth 1
  • s1,s2 synsets.
  • 0ltscore1

7
ORGAN LogPro
  • Log files preprocessing
  • Data cleaning from records of requests for
    images, requests for non-informational files
  • Data cleaning from records of requests that were
    not successfully responded or were submitted by
    search spiders robots
  • Sessions identification based on the user IP,
    the user agent and the time interval between
    subsequent requests.
  • Path completion - filling the page references
    that are missed due to local browsing caching
    mechanisms

8
ORGAN Analyzer (1)
  • Query builder that mines logs with respect to
    site semantics
  • Ontology querying functionality Protégé API
  • Analysis of knowledge derived from three
    resources
  • Web server raw log files
  • Metadata information of the site
  • OWL ontology
  • Queries type

9
ORGAN Analyzer (2)
10
Conclusion Future Work
  • ORGAN integrated tool with a service-oriented
    architecture providing web site usage semantic
    log analysis.
  • Future steps
  • more sophisticated keywords extraction
  • enhancement of log file processing
  • Support of web site reorganization tasks

11
Thank you for your attention!
Write a Comment
User Comments (0)
About PowerShow.com