Activity Based Metadata for Semantic Desktop Search - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Activity Based Metadata for Semantic Desktop Search

Description:

Paul-Alexandru Chirita, Rita Gavriloaie, Stefania Ghita, Wolfgang Nejdl, and Raluca Paiu ... j.0:hypernym feline /j.0:hypernym j.0:location_info ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 29
Provided by: raluc8
Category:

less

Transcript and Presenter's Notes

Title: Activity Based Metadata for Semantic Desktop Search


1
Activity Based Metadata for Semantic Desktop
SearchPaul-Alexandru Chirita, Rita Gavriloaie,
Stefania Ghita, Wolfgang Nejdl, and Raluca
PaiuESWC 2005, Heraklion, Greece
2
Activity Based Metadata for Semantic Desktop
Search
  • Metadata
  • Greek meta- ( about) data (information)
  • Data about data
  • Semantic
  • Greek semantikos ( significant meaning)
  • Study of meaning
  • We create data about the data existing on the
    desktop,
  • by analyzing the users activities and then use
    these
  • metadata for improving the search on the semantic
  • desktop.

3
Outline
  • Motivation
  • Current Approaches to Desktop Search
  • Context Metadata Desktop Search
  • Representing Context Information
  • Desktop Search Architecture Prototype
  • Conclusions Future Work

4
Motivation
  • Storage capacity of the hard-disks has increased
  • Difficulty of finding something on the desktop
    has increased
  • Where do you search for something, although you
    know you have it on your desktop?
  • A Google

5
Current Approaches to Desktop Search 1
  • Google desktop search http//desktop.google.com
  • Finds
  • Emails (Outlook, Outlook Express, Netscape,
    Thunderbird)
  • Files (Text, Word, Excel, PowerPoint, PDF, Music,
    Video, Images)
  • Web History (IE, Netscape, Mozilla, Firefox)
  • Chats (AOL Instant Messaging)
  • MSN desktop search application http//beta.toolba
    r.msn.com

6
Current Approaches to Desktop Search 2
  • Spotlight Search http//www.apple.com/macosx/tig
    er/spotlight.html
  • Indexes
  • Mail, Address Book contacts
  • Folders / directories, files, applications
  • Images
  • Video audio files
  • Incorporates semantics it uses explicit
    information, such as file size, creator, last
    modification date, metadata embedded into
    specific files.
  • Beagle desktop search http//gnome.org/projects/b
    eagle
  • Open source project for Linux

7
What is Context?
  • Contextual information referres to all aspects
    important for a certain situation ideas, facts,
    persons, publications, etc.
  • It includes all relevant relationships as well as
    interaction history

8
Context Metadata Desktop Search
  • Users tend to associate things to certain
    contexts
  • Contextual information should be used to
    enrich search results on the desktop
  • 3 types of user web search behavior
  • Navigational
  • Informational
  • Resource seeking
  • Searching on the desktop navigational queries

9
Representing Context Information
  • Current desktop search approaches do not make use
    of desktop specific information, especially
    context information
  • Some of these missed opportunities include
  • Email context
  • File hierarchy context
  • Web cache context

10
Exploiting Email Context 1
  • Scenario
  • Alice is interested in distributed page ranking
  • She remembers she discussed this topic with a
    colleague, who also sent her an article via email
  • The article does not mention distributed
    PageRank, but talks about distributed trust
    networks (equivalent topic)
  • Using the enhanced desktop search environment,
    she would be able to find the article, based on
    this additional information

11
Exploiting Email Context 2
Date
Date
String
accessed
reply_to
name
sent
to
body
belongs_to
Mail
Person
MailAddress
String
from
subject
status
has_attachment
stored_as
Attachment
File
String
String
12
Exploiting File Hierarchy Context 1
  • Scenario
  • Alex spent his holidays in Hanover, Germany and
    took a lot of pictures
  • He usually stores his photos under a folder named
    after the city or region he visited
  • If he forgets the directory name, no ordinary
    search can retrieve his photos, as the only word
    he remembers, Germany, does not appear in the
    file names, nor in the directory structure

13
Exploiting File Hierarchy Context 2
String
Attachment
VisitedWebPage
Date
type
stored_from
stored_from
last_accessed
last_modified
owned_by
File
Date
Person
created
name
in_directory
in_directory
Date
String
Directory
subClassOf
name
hypernym_to
String
hyponym_to
WordNetTerm
holonym_to
meronym_to
synonym_to
14
Exploiting the Web Cache 1
  • Visualization
  • Scenario
  • Paul is looking for the Microsoft internships web
    page, which he already visited, coming from the
    Microsoft main page
  • He does not remember the right set of keywords to
    jump directly to the desired page
  • Desktop search should provide the list of links
    he clicked from the Microsoft web page on his
    last visit

15
Exploiting the Web Cache 2
File
Date
accessed_at
stored_as
VisitedWebPage
arrived_from
departed_to
16
Exploiting the Web Cache 3
  • Enriching Search Results
  • Scenario
  • Alice browses through CiteSeer for papers on a
    specific topic, following reference links and
    downloading the most relevant papers
  • As soon as she stores the papers on her desktop,
    all the relations among the papers are lost
  • The semantic desktop environment should preserve
    this information and make it available as
    explicit metadata

17
Exploiting the Web Cache 4
VisitedWebPage
subClassOf
subClassOf
referenced_by
references
Publication
PDF_file
stored_as_pdf
stored_as_ps
PS_file
subClassOf
subClassOf
File
18
Desktop Search Architecture Prototype
  • Extending Beagle with metadata desktop search
  • Generating input metadata
  • Metadata generator applications
  • Displaying and enriching search results
  • Main characteristics of our desktop search
    architecture
  • Metadata generation
  • Indexing on the fly (based on events)
  • Triggered by events upon occurrence of file
    system changes
  • Notification functionality provided by the
    inotify-enabled linux kernel

19
Metadata Generator Applications
  • Depending on the type and context of the events,
    metadata generation is performed by the
    appropriate metadata generator application
  • Email Metadata Generator
  • Web Cache Metadata Generator
  • File Metadata Generator

20
Email Metadata Generator
  • Built on top of the JavaMail API
  • Incoming mails are processed into a class derived
    from the Message class, defined in JavaMail
  • Generated metadata for the incoming emails
    include information like
  • Sender and Recipient
  • Subject, Body and Status
  • Date when the email was sent or accessed
  • Attachments, etc.
  • Metadata stored as RDF, using the Jena toolkit

21
Web Cache Metadata Generator
  • Indexing triggered by browsing pages, which are
    not in the cache
  • Annotations include
  • Access date
  • Connections between web pages
  • Which hyperlinks of the current page are
    traversed
  • Generated metadata stored as RDF files

22
File Metadata Generator
  • Implemented in Java and uses the JWNL API
  • Generated metadata include information about
  • Type of the file
  • Name
  • Date of creation
  • Date of last change
  • Location of file on the disk
  • WordNet additional metadata for the file name and
    the path to the file
  • Annotations are stored as RDF files

23
WordNet
  • English lexical reference system
  • POS nouns, adjectives, adverbs, verbs organized
    in synonym sets
  • Relationships
  • Meronym - The name of a constituent part of, the
    substance of, or a member of something. X is a
    meronym of Y if X is a part of Y.
  • Holonym - The name of the whole of which the
    meronym names a part. Y is a holonym of X if X is
    a part of Y.
  • Hyponym - The specific term used to designate a
    member of a class. X is a hyponym of Y if X is a
    (kind of) Y.
  • Hypernym - The generic term used to designate a
    whole class of specific instances. Y is a
    hypernym of X if X is a (kind of) Y.
  • Synonym - a set of words that are interchangeable
    in some context. X is a synonym of Y if Y can
    substitute X in a certain context without
    altering the meaning.

24
WordNet - Example
  • ltrdfDescription rdfabout"file\\C\beautiful\ho
    me\plant\cat.txt"gt
  • ltj.0sensegtcomputerized_tomographylt/j.0sensegt
  • ltj.0hyponymgtjaguarlt/j.0hyponymgt
  • ltj.0hypernymgtfelinelt/j.0hypernymgt
  • ltj.0location_infogt
  • ltrdfDescription rdfabout"file\\C\beau
    tiful\"gt
  • ltj.0synonymgtravishinglt/j.0synonymgt
  • ltj.0sensegtbeautifullt/j.0sensegt
  • lt/rdfDescriptiongt
  • lt/j.0location_infogt
  • lt/rdfDescriptiongt

25
Beagle Finding More than Documents
26
Beagle Displaying Additional Context
27
Conclusions Future Work
  • We need contextual information to enable us to
    find what we want and to increase the utility of
    what we find
  • Current Future Work
  • Generalize context display
  • Contextual information for ranking on the desktop
  • Exchanging resources and contextual information
    among the members of an interest group ? social
    semantic desktop

28
  • eagle Demo
Write a Comment
User Comments (0)
About PowerShow.com