Dr. Susan Gauch - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Dr. Susan Gauch

Description:

Represented in hierarchical fashion. Asterisks represent relative concept weights at each level ... use Open Directory Project for concepts - train classifier ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 30
Provided by: jasonb2
Category:
Tags: gauch | susan

less

Transcript and Presenter's Notes

Title: Dr. Susan Gauch


1
Intelligent Access to Time-Sensitive Information
  • Dr. Susan Gauch

2
Motivation
  • Soldiers in the field need effective access to
    time-sensitive information
  • Need to be able to make new information available
  • Quickly
  • Effectively
  • Accurately
  • Need to be able to validate quality of
    informally-collected information
  • Need to tailor information to soldiers mission
    and expertise

3
Phase I Goals
  • Goal 1 Make informally collected information
    available quickly
  • Develop automatic techniques to harvest informal
    information sources
  • Develop automatic techniques to identify
    content-bearing chunks
  • Develop automatic techniques to segment and tag
    chunks to produce learning objects
  • Develop a Web accessible search site for the
    learning objects

4
Phase I Goals
  • Goal 2 Improve accuracy of informal information
  • Develop manual editing system to improve/extend
    tagging of learning objects
  • Develop manual annotation system to allow
    end-users to add comments, corrections, ratings
  • Develop new search algorithms that preferentially
    select highly-rated items

5
Phase II Goals
  • Goal 1 Tailor the information to the individual
  • Develop automatic techniques to implicitly create
    profiles for the users of the system
  • Incorporate explicit information about the users
  • Develop automatic techniques to select most
    relevant information for the user
  • Develop automatic techniques to tailor
    information presentation to the user

6
Phase II Goals
  • Goal 2 Improve the usefulness of the
    information
  • Create a system to analyze the annotations to
    identify objects requiring editing
  • Create an alert system to automatically notify
    interested parties about urgent new information,
    changes to information they contributed

7
(No Transcript)
8
Related Projects
  • IKME
  • Intelligent Knowledge Management Archive
  • Goal Given XML-tagged content
  • Provide automatic data cleanup
  • Track schema changes
  • Migrate objects to new schema
  • Provide Web-accessible searching
  • XML database
  • Added full-text relevance ranking

9
IKME (contd)
  • Provide Web-form to reuse learning objects
  • Manuals
  • Lesson objects
  • Identify similar learning objects
  • Automatically cluster related objects together

10
ChatTrack
  • Goal Intelligent Access to Chat data
  • http//www.ittc.ku.edu/chattrack
  • 3 components
  • Archiving
  • Profiling
  • Searching
  • XML-markup of chat messages
  • Relational database for fast access

11
Architecture
Client
Internet
Chat DataArchive (XML/SQL)
ConceptDatabase
IRCClient
IRCClient
Chat Server(with ChatLog)
Classifier
Indexer
Administrator /Intelligence Agent
ChatProfile
ChatRetrieve
12
Chat Archiving (2)
ltjoingt ltdategt2004-04-17lt/dategt
lttimegt085750lt/timegt ltusernamegtjmblt/username
gt ltchannelgtjayhawklt/channelgt lt/joingt
ltmsggt ltdategt2004-04-17lt/dategt
lttimegt085814lt/timegt ltsendergtJasongtlt/sendergt
ltreceivergtjayhawklt/receivergt
ltdatagtThere is going to be weather, whether or
not. Uh oh, Ill be RIGHT back!lt/datagt lt/msggt
ltquitgt ltdategt2004-04-17lt/dategt
lttimegt085819lt/timegt ltusernamegtjmblt/username
gt lt/quitgt ltmsggt ltdategt2004-04-17lt/dategt
lttimegt085946lt/timegt ltsendergtAlicelt/send
ergt ltreceivergtjayhawklt/receivergt
ltdatagtPoof..Left in a hurry! Must be a tornado
outside his door or something. lollt/datagt lt/msggt
  • XML chat data ? SQL database
  • ChatLog Library XML schema can be used for
    almost any client/server-based system

13
Chat Profile
  • User Profile focuses on one chat participant
  • Session Profile filters by chat room name only
  • Analyst selects criteria of interest

14
Chat Profile
  • Chat data collected from archive (stopword
    removal Porter stemming)
  • Classification performed once chat utterances
    collected
  • Classifier creates vector of keywords from chat
    data
  • Similarity measure
  • Determines similarity between chat data vector
    vectors for each trained concept
  • Concepts sorted top matches returned
  • Represented in hierarchical fashion
  • Asterisks represent relative concept weights at
    each level

15
Classification in american-politics(two hours
in January 2004 Undernet) SESSION PROFILE
16
Classification in american-politics(two hours
in January 2004 Undernet) USER PROFILE
selected one public chat participant
17
Chat Retrieve
  • Some profiles warrant further analysis
  • Agent/admin needs ability to explore chat
    session linked to profiles in question
  • Incremental Indexing system keeps data current

18
Chat Retrieve (3)
  • Queries based on
  • speaker name
  • keywords
  • date/time range
  • listener name
  • (combo of above)
  • Keyword retrieval based on tf idf

19
Chat Retrieve (4)
  • Selecting chat room name replays chat session
  • Includes utterances spoken by all participants
  • Tracks all chat room participants
  • Even if they do not contribute to sessions

20
Public Demo http//www.ittc.ku.edu/chattrack
  • ChatTrack Provides agents with new tools for
    vigilance against crime
  • ChatProfile generates conceptual profiles from
    chat data
  • Reduces manual efforts to classify chat sessions
  • ChatRetrieve facilitates manual analysis for
    session retrieval by agents, administrators,
    parents
  • Filter based on listener/sender, date/time, chat
    room session name

21
Beta Version ChatTrend
22
Future (ChatTrend 2)
23
Search Engines Today
  • Common problems of search engines
  • ambiguity ( e.g., rock, salsa )
  • retrieved results are based on web popularity
    rather than user's interests
  • Goal
  • Improve search accuracy by retrieving by concept
    (e.g., music, dance)
  • Improve search accuracy by matching user
    interests

24
Search Engine Personalization
  • Ongoing research to investigate ways
  • to implicitly collect information about the user
  • to represent information about the user.
  • Use user profiles
  • to re-rank the results returned from the initial
    retrieval process
  • to filter results that better fits user's
    interests

25
Sources of User Information
  • User explicit information
  • - users too lazy
  • - information becomes out of date/inaccurate
  • User browsing histories
  • - must collect information via desktop robot or
    have user connect to Internet via a proxy
  • User desktops
  • - contextual retrieval
  • User search histories
  • - information available to search engine itself

26
User Profile Creation
  • Collect information about the users interests
    (search history)
  • Categorize representative texts into concept
    hierarchy
  • - use Open Directory Project for concepts
  • - train classifier on representative pages
  • - compare representative texts to training
    texts to identify the concepts discussed
  • Concept weights represent user interests

27
Personalizing Search Results
  • Submit query to Internet search engine (e.g.,
    Google)
  • Categorize each result into same concept
    hierarchy (e.g., ODP) to create result profiles
  • Conceptual match is calculated based on
    similarity between result profiles and user
    profile
  • Rerank results based on conceptual match
  • - rank order produced called conceptual rank

28
Summary
  • Built user profiles based on queries submitted
    and snippets of user-selected results.
  • This information was sufficient to build user
    profiles that were able to significantly improve
    personalized rankings.
  • A query-based profile produced an improvement of
    33.
  • A snippet-based profile produced an equivalent
    improvement of 34.
  • http//www.ittc.ku.edu/mirco/demo/

29
Conclusions
  • Search engines can capture information submitted
    to their site in order to create personalized
    search.
  • Users need not install proxy servers or desktop
    bots.
  • Privacy issues arise with any personalized
    service.
  • Need to look at combination of short-term,
    long-term user interests with current task focus.
  • http//www.ittc.ku.edu/mirco/demo/search.php
Write a Comment
User Comments (0)
About PowerShow.com