Title: Evaluation of Mixed Initiative Systems
1Evaluation of Mixed Initiative Systems
- Michael J. Pazzani
- University of California, Irvine
- National Science Foundation
2Overview
- Evaluation
- Micro-level Modules
- Macro-level Behavior of System Users
- Caution Dont lose sight of the goal in
evaluation - National Science Foundation
- CISE (Re)organization
- Funding for Mixed Initiative Systems
- Tip on writing better proposals Evaluate
3Evaluation
- Micro level
- Does the module (machine learning, user modeling,
information retrieval and visualization, etc,)
work properly. - Has been responsible for measurable progress in
most specialized domains of intelligent systems - Relatively easy to do using well known metrics,
error rate, precision, recall, time and space
complexity, goodness of fit, ROC curves - Builds upon long history in hard sciences and
engineering
4Evaluation
- Macro level
- Does the complex system, involving a user and a
machine work as desired. - Builds upon history in human (and animal)
experimentation, not always taught in (or
respected by) engineering schools - Allows controlled experiments comparing two
systems (or one system with two variations)
5Adaptive Personalization
6Micro Evaluating the Hybrid User Model
7Micro Speed to Effectiveness
Initially, AIS is as effective as a static system
in finding relevant content. After only one
usage, the benefits of AdaptiveInfo's Intelligent
Wireless Specific Personalization are clear
after three sessions even more so and, after 10
sessions the full benefits of Adaptive
Personalization are realized
8Macro Probability a Story is Read
40 probability a user will read one of the top 4
stories selected by an editor, but a 64 chance
they'll read one of the top 4 personalized
stories - the AIS user is 60 more likely to
select a story than a non-AIS user
9Macro Increased Page Views
After looking at 3 or more screens of headlines,
users read 43 more of the personally selected
news stories clearly showing AIS's ability to
dramatically increase stickiness of a wireless
web application
10Macro Readership and Stickiness
20 more LA Times users who receive personalized
news return to the wireless site 6 weeks after
the first usage.
11Cautions
- Optimizing a micro level evaluation may have
little impact on the macro level. It may even
have a counter-intuitive effect - If personalization causes a noticeable delay, it
may decrease readership - Dont lose sight of the goal.
- The metrics are just approximations of the goal.
- Optimizing the metric may not optimize the goal.
12RD within the NSF Organization
13CISE Directorate 2004
- Computing Communications Foundations
- Computer Networks Systems
- Information and Intelligent Systems (IIS)
- Deployed Infrastructure
14Information and Intelligent Systems Programs
- Information and Data Management
- Artificial Intelligence and Cognitive Science
- Human Language and Communication
- Robotics and Computer Vision
- Digital Society and Technologies
- Human Computer Interaction
- Universal Access
- Digital Libraries
- Science and Engineering Informatics
15Types of proposals/awards
- IIS Regular Proposal Deadlines 250-600K 3 yr
12/12 - CAREER Program ? (400-500K, 5 year) late July
- REU RET supplements?(10-30K 1 year) 3/1
- Information Technology Research (ITR) Probably
Feb
16NSF Merit Review Criteria
- Looking for important, innovative, achievable
projects - Criterion 1 What is the intellectual merit and
quality of the proposed activity? - Criterion 2 What are the broader impacts of the
proposed activity? - NSF will return proposal without review if the
single page proposal summary does not address
each criteria in separate statements - Evaluation Plan of both micro macro levels is
essential using metrics that you propose (and
your peers believe are appropriate)