Title: Activity Based Metadata for Semantic Desktop Search
1Activity Based Metadata for Semantic Desktop
SearchPaul-Alexandru Chirita, Rita Gavriloaie,
Stefania Ghita, Wolfgang Nejdl, and Raluca
PaiuESWC 2005, Heraklion, Greece
2Activity Based Metadata for Semantic Desktop
Search
- Metadata
- Greek meta- ( about) data (information)
- Data about data
- Semantic
- Greek semantikos ( significant meaning)
- Study of meaning
- We create data about the data existing on the
desktop, - by analyzing the users activities and then use
these - metadata for improving the search on the semantic
- desktop.
3Outline
- Motivation
- Current Approaches to Desktop Search
- Context Metadata Desktop Search
- Representing Context Information
- Desktop Search Architecture Prototype
- Conclusions Future Work
4Motivation
- Storage capacity of the hard-disks has increased
- Difficulty of finding something on the desktop
has increased - Where do you search for something, although you
know you have it on your desktop? - A Google
5Current Approaches to Desktop Search 1
- Google desktop search http//desktop.google.com
- Finds
- Emails (Outlook, Outlook Express, Netscape,
Thunderbird) - Files (Text, Word, Excel, PowerPoint, PDF, Music,
Video, Images) - Web History (IE, Netscape, Mozilla, Firefox)
- Chats (AOL Instant Messaging)
- MSN desktop search application http//beta.toolba
r.msn.com
6Current Approaches to Desktop Search 2
- Spotlight Search http//www.apple.com/macosx/tig
er/spotlight.html
- Mail, Address Book contacts
- Folders / directories, files, applications
- Incorporates semantics it uses explicit
information, such as file size, creator, last
modification date, metadata embedded into
specific files.
- Beagle desktop search http//gnome.org/projects/b
eagle - Open source project for Linux
7What is Context?
- Contextual information referres to all aspects
important for a certain situation ideas, facts,
persons, publications, etc. - It includes all relevant relationships as well as
interaction history
8Context Metadata Desktop Search
- Users tend to associate things to certain
contexts -
- Contextual information should be used to
enrich search results on the desktop - 3 types of user web search behavior
- Navigational
- Informational
- Resource seeking
- Searching on the desktop navigational queries
9Representing Context Information
- Current desktop search approaches do not make use
of desktop specific information, especially
context information - Some of these missed opportunities include
- Email context
- File hierarchy context
- Web cache context
10Exploiting Email Context 1
- Scenario
- Alice is interested in distributed page ranking
- She remembers she discussed this topic with a
colleague, who also sent her an article via email - The article does not mention distributed
PageRank, but talks about distributed trust
networks (equivalent topic) - Using the enhanced desktop search environment,
she would be able to find the article, based on
this additional information
11Exploiting Email Context 2
Date
Date
String
accessed
reply_to
name
sent
to
body
belongs_to
Mail
Person
MailAddress
String
from
subject
status
has_attachment
stored_as
Attachment
File
String
String
12Exploiting File Hierarchy Context 1
- Scenario
- Alex spent his holidays in Hanover, Germany and
took a lot of pictures - He usually stores his photos under a folder named
after the city or region he visited - If he forgets the directory name, no ordinary
search can retrieve his photos, as the only word
he remembers, Germany, does not appear in the
file names, nor in the directory structure
13Exploiting File Hierarchy Context 2
String
Attachment
VisitedWebPage
Date
type
stored_from
stored_from
last_accessed
last_modified
owned_by
File
Date
Person
created
name
in_directory
in_directory
Date
String
Directory
subClassOf
name
hypernym_to
String
hyponym_to
WordNetTerm
holonym_to
meronym_to
synonym_to
14Exploiting the Web Cache 1
- Visualization
- Scenario
- Paul is looking for the Microsoft internships web
page, which he already visited, coming from the
Microsoft main page - He does not remember the right set of keywords to
jump directly to the desired page - Desktop search should provide the list of links
he clicked from the Microsoft web page on his
last visit
15Exploiting the Web Cache 2
File
Date
accessed_at
stored_as
VisitedWebPage
arrived_from
departed_to
16Exploiting the Web Cache 3
- Enriching Search Results
- Scenario
- Alice browses through CiteSeer for papers on a
specific topic, following reference links and
downloading the most relevant papers - As soon as she stores the papers on her desktop,
all the relations among the papers are lost - The semantic desktop environment should preserve
this information and make it available as
explicit metadata
17Exploiting the Web Cache 4
VisitedWebPage
subClassOf
subClassOf
referenced_by
references
Publication
PDF_file
stored_as_pdf
stored_as_ps
PS_file
subClassOf
subClassOf
File
18Desktop Search Architecture Prototype
- Extending Beagle with metadata desktop search
- Generating input metadata
- Metadata generator applications
- Displaying and enriching search results
- Main characteristics of our desktop search
architecture - Metadata generation
- Indexing on the fly (based on events)
- Triggered by events upon occurrence of file
system changes - Notification functionality provided by the
inotify-enabled linux kernel
19Metadata Generator Applications
- Depending on the type and context of the events,
metadata generation is performed by the
appropriate metadata generator application - Email Metadata Generator
- Web Cache Metadata Generator
- File Metadata Generator
20Email Metadata Generator
- Built on top of the JavaMail API
- Incoming mails are processed into a class derived
from the Message class, defined in JavaMail - Generated metadata for the incoming emails
include information like - Sender and Recipient
- Subject, Body and Status
- Date when the email was sent or accessed
- Attachments, etc.
- Metadata stored as RDF, using the Jena toolkit
21Web Cache Metadata Generator
- Indexing triggered by browsing pages, which are
not in the cache - Annotations include
- Access date
- Connections between web pages
- Which hyperlinks of the current page are
traversed - Generated metadata stored as RDF files
22File Metadata Generator
- Implemented in Java and uses the JWNL API
- Generated metadata include information about
- Type of the file
- Name
- Date of creation
- Date of last change
- Location of file on the disk
- WordNet additional metadata for the file name and
the path to the file - Annotations are stored as RDF files
23WordNet
- English lexical reference system
- POS nouns, adjectives, adverbs, verbs organized
in synonym sets - Relationships
- Meronym - The name of a constituent part of, the
substance of, or a member of something. X is a
meronym of Y if X is a part of Y. - Holonym - The name of the whole of which the
meronym names a part. Y is a holonym of X if X is
a part of Y. - Hyponym - The specific term used to designate a
member of a class. X is a hyponym of Y if X is a
(kind of) Y. - Hypernym - The generic term used to designate a
whole class of specific instances. Y is a
hypernym of X if X is a (kind of) Y. - Synonym - a set of words that are interchangeable
in some context. X is a synonym of Y if Y can
substitute X in a certain context without
altering the meaning.
24WordNet - Example
- ltrdfDescription rdfabout"file\\C\beautiful\ho
me\plant\cat.txt"gt - ltj.0sensegtcomputerized_tomographylt/j.0sensegt
- ltj.0hyponymgtjaguarlt/j.0hyponymgt
- ltj.0hypernymgtfelinelt/j.0hypernymgt
- ltj.0location_infogt
- ltrdfDescription rdfabout"file\\C\beau
tiful\"gt - ltj.0synonymgtravishinglt/j.0synonymgt
- ltj.0sensegtbeautifullt/j.0sensegt
- lt/rdfDescriptiongt
- lt/j.0location_infogt
- lt/rdfDescriptiongt
25Beagle Finding More than Documents
26Beagle Displaying Additional Context
27Conclusions Future Work
- We need contextual information to enable us to
find what we want and to increase the utility of
what we find - Current Future Work
- Generalize context display
- Contextual information for ranking on the desktop
- Exchanging resources and contextual information
among the members of an interest group ? social
semantic desktop
28