Title: Dr. Susan Gauch
1Intelligent Access to Time-Sensitive Information
2Motivation
- Soldiers in the field need effective access to
time-sensitive information - Need to be able to make new information available
- Quickly
- Effectively
- Accurately
- Need to be able to validate quality of
informally-collected information - Need to tailor information to soldiers mission
and expertise
3Phase I Goals
- Goal 1 Make informally collected information
available quickly - Develop automatic techniques to harvest informal
information sources - Develop automatic techniques to identify
content-bearing chunks - Develop automatic techniques to segment and tag
chunks to produce learning objects - Develop a Web accessible search site for the
learning objects
4Phase I Goals
- Goal 2 Improve accuracy of informal information
- Develop manual editing system to improve/extend
tagging of learning objects - Develop manual annotation system to allow
end-users to add comments, corrections, ratings - Develop new search algorithms that preferentially
select highly-rated items
5Phase II Goals
- Goal 1 Tailor the information to the individual
- Develop automatic techniques to implicitly create
profiles for the users of the system - Incorporate explicit information about the users
- Develop automatic techniques to select most
relevant information for the user - Develop automatic techniques to tailor
information presentation to the user
6Phase II Goals
- Goal 2 Improve the usefulness of the
information - Create a system to analyze the annotations to
identify objects requiring editing - Create an alert system to automatically notify
interested parties about urgent new information,
changes to information they contributed
7(No Transcript)
8Related Projects
- IKME
- Intelligent Knowledge Management Archive
- Goal Given XML-tagged content
- Provide automatic data cleanup
- Track schema changes
- Migrate objects to new schema
- Provide Web-accessible searching
- XML database
- Added full-text relevance ranking
9IKME (contd)
- Provide Web-form to reuse learning objects
- Manuals
- Lesson objects
- Identify similar learning objects
- Automatically cluster related objects together
10ChatTrack
- Goal Intelligent Access to Chat data
- http//www.ittc.ku.edu/chattrack
- 3 components
- Archiving
- Profiling
- Searching
- XML-markup of chat messages
- Relational database for fast access
11Architecture
Client
Internet
Chat DataArchive (XML/SQL)
ConceptDatabase
IRCClient
IRCClient
Chat Server(with ChatLog)
Classifier
Indexer
Administrator /Intelligence Agent
ChatProfile
ChatRetrieve
12Chat Archiving (2)
ltjoingt ltdategt2004-04-17lt/dategt
lttimegt085750lt/timegt ltusernamegtjmblt/username
gt ltchannelgtjayhawklt/channelgt lt/joingt
ltmsggt ltdategt2004-04-17lt/dategt
lttimegt085814lt/timegt ltsendergtJasongtlt/sendergt
ltreceivergtjayhawklt/receivergt
ltdatagtThere is going to be weather, whether or
not. Uh oh, Ill be RIGHT back!lt/datagt lt/msggt
ltquitgt ltdategt2004-04-17lt/dategt
lttimegt085819lt/timegt ltusernamegtjmblt/username
gt lt/quitgt ltmsggt ltdategt2004-04-17lt/dategt
lttimegt085946lt/timegt ltsendergtAlicelt/send
ergt ltreceivergtjayhawklt/receivergt
ltdatagtPoof..Left in a hurry! Must be a tornado
outside his door or something. lollt/datagt lt/msggt
- XML chat data ? SQL database
- ChatLog Library XML schema can be used for
almost any client/server-based system
13Chat Profile
- User Profile focuses on one chat participant
- Session Profile filters by chat room name only
- Analyst selects criteria of interest
14Chat Profile
- Chat data collected from archive (stopword
removal Porter stemming) - Classification performed once chat utterances
collected - Classifier creates vector of keywords from chat
data - Similarity measure
- Determines similarity between chat data vector
vectors for each trained concept - Concepts sorted top matches returned
- Represented in hierarchical fashion
- Asterisks represent relative concept weights at
each level
15Classification in american-politics(two hours
in January 2004 Undernet) SESSION PROFILE
16Classification in american-politics(two hours
in January 2004 Undernet) USER PROFILE
selected one public chat participant
17Chat Retrieve
- Some profiles warrant further analysis
- Agent/admin needs ability to explore chat
session linked to profiles in question - Incremental Indexing system keeps data current
18Chat Retrieve (3)
- Queries based on
- speaker name
- keywords
- date/time range
- listener name
- (combo of above)
- Keyword retrieval based on tf idf
19Chat Retrieve (4)
- Selecting chat room name replays chat session
- Includes utterances spoken by all participants
- Tracks all chat room participants
- Even if they do not contribute to sessions
20Public Demo http//www.ittc.ku.edu/chattrack
- ChatTrack Provides agents with new tools for
vigilance against crime - ChatProfile generates conceptual profiles from
chat data - Reduces manual efforts to classify chat sessions
- ChatRetrieve facilitates manual analysis for
session retrieval by agents, administrators,
parents - Filter based on listener/sender, date/time, chat
room session name
21Beta Version ChatTrend
22Future (ChatTrend 2)
23Search Engines Today
- Common problems of search engines
- ambiguity ( e.g., rock, salsa )
- retrieved results are based on web popularity
rather than user's interests - Goal
- Improve search accuracy by retrieving by concept
(e.g., music, dance) - Improve search accuracy by matching user
interests
24Search Engine Personalization
- Ongoing research to investigate ways
- to implicitly collect information about the user
- to represent information about the user.
- Use user profiles
- to re-rank the results returned from the initial
retrieval process - to filter results that better fits user's
interests
25Sources of User Information
- User explicit information
- - users too lazy
- - information becomes out of date/inaccurate
- User browsing histories
- - must collect information via desktop robot or
have user connect to Internet via a proxy - User desktops
- - contextual retrieval
- User search histories
- - information available to search engine itself
26User Profile Creation
- Collect information about the users interests
(search history) - Categorize representative texts into concept
hierarchy - - use Open Directory Project for concepts
- - train classifier on representative pages
- - compare representative texts to training
texts to identify the concepts discussed - Concept weights represent user interests
27Personalizing Search Results
- Submit query to Internet search engine (e.g.,
Google) - Categorize each result into same concept
hierarchy (e.g., ODP) to create result profiles - Conceptual match is calculated based on
similarity between result profiles and user
profile - Rerank results based on conceptual match
- - rank order produced called conceptual rank
28Summary
- Built user profiles based on queries submitted
and snippets of user-selected results. - This information was sufficient to build user
profiles that were able to significantly improve
personalized rankings. - A query-based profile produced an improvement of
33. - A snippet-based profile produced an equivalent
improvement of 34. - http//www.ittc.ku.edu/mirco/demo/
29Conclusions
- Search engines can capture information submitted
to their site in order to create personalized
search. - Users need not install proxy servers or desktop
bots. - Privacy issues arise with any personalized
service. - Need to look at combination of short-term,
long-term user interests with current task focus. - http//www.ittc.ku.edu/mirco/demo/search.php