Web Search Studies: Approaches and Methods - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Web Search Studies: Approaches and Methods

Description:

User search modeling studies important for academia, industry and ... WinWhatWhere spy software. Morea 1.1 software. Camtasia Studio. 18. Data Preparation ... – PowerPoint PPT presentation

Number of Views:54
Avg rating:3.0/5.0
Slides: 48
Provided by: stiuFit
Category:

less

Transcript and Presenter's Notes

Title: Web Search Studies: Approaches and Methods


1
Web Search Studies Approaches and Methods
  • Amanda Spink
  • Queensland University of Technology

2
Overview
  • Introduction
  • Key Issues Search Studies
  • Transaction log analysis
  • Search Evaluation
  • Conclusions

3
Search Studies
  • 1960s onwards online search library focus
  • 1990s onwards - Web search focus -
    Commercial search engines - Enterprise search
  • User search modeling studies important for
    academia, industry and organizations offering
    information via search engines

4
Book
  • Amanda Spink Jim Jansen (2004). Web Search
    Public Searching of the Web. Springer.

5
Research Background
  • Since 1990 user/IR systems interaction
  • Since 1997 user/Web search engine interaction
  • Relevance measurement relevance regions
  • IR interaction measures

6
Research Background
  • Examine patterns, trends, user modeling, systems/
    interface design ideas and significant insights
  • Human information behavior (HIB) focus system
    interaction imbedded in HIB

7
Web Search Studies
  • Web search engines - Alta Vista - Ask
    Jeeves - Excite - AlltheWeb - Vivi
    simo - Dogpile
  • Transaction log analysis studies
  • Focus on user search analysis for competitive
    advantage

8
Data Collection Methods
  • Various combinations of methods and approaches
  • Transaction log analysis
  • Videotaping Audio-taping
  • Think aloud protocols
  • Usability HCI techniques
  • Focus groups
  • Interviews
  • Survey
  • Experiments
  • Diaries

9
Data Analysis Methods
  • Quantitative and statistical analysis
  • Qualitative analysis grounded theory
  • Combination of both methods

10
Key Issues Search Studies
  • What is the goal of the project? - Insights,
    understanding develop theory - User
    modeling - Trends analysis - Interface/s
    ystems design - User training
  • What resources are available sample size,
    expertise and funds?
  • Academic or industry research?
  • Time pressures?

11
Key Issues Search Studies
  • What variables to measure?
  • How much data is enough?
  • Methods used single or multiple?
  • HCI approach test interface/system features

12
Transaction Log Analysis (TLA)
  • File or log of communications between user and
    system
  • File recorded on a server server side
    recordings
  • Log or file formats vary

13
Why Collect and Analyze Log Data?
  • Gain understanding of user interaction with
    system and interface
  • Goal to improve system and interface design, and
    improve user training.
  • Transaction log analysis is extensively used in
    academia and industry

14
TLA Process
  • Goals and objectives
  • Data collection
  • Log preparation
  • Data analysis
  • Making sense

15
Goals and Objectives
  • Gain understanding of user interaction with
    system and interface
  • Theoretical modeling and user modeling
  • Improve system and interface design, and improve
    user training
  • Examine trends patterns

16
Data Collection
  • Process of collecting the interaction data for a
    given period in a transaction log
  • Collect data on the search episode
  • User identification
  • Date
  • Time
  • Search session content
  • Resources accessed, e.g., URLs

17
Logging Software
  • Custom commercial applications
  • WinWhatWhere spy software
  • Morea 1.1 software
  • Camtasia Studio

18
Data Preparation
  • Process of cleaning and preparing the log data
    for analysis
  • Log data into a relational database
  • Cleaning the log corrupted data
  • Parsing the log, e.g., removing Web sessions with
    over 100 queries
  • Normalizing the log

19
Log Analysis Three Levels
  • Term
  • Query
  • Session

20
Term Level Analysis
  • Term occurrence
  • Total terms
  • High low usage terms
  • Term distribution
  • Co-occurring terms

21
Term Distribution
22
Terms Per Query 1997-2001
23
Queries Per User 1997-2001

24
Pages Viewed Per User 1997-2001
25
Top 10 Query Terms 1997-2001
26
Query Level Analysis
  • Initial query
  • Subsequent queries
  • Modified queries query reformulation
  • Identical queries
  • Query complexity
  • Boolean use
  • Spelling
  • Types of queries
  • Query topics

27
Query Subjects Alta Vista 2002 Vivisimo 2004
  • 1. People/Places 49.2
  • 2. Commerce, etc. 12.5
  • 3. Computers, etc. 12.4
  • 4. Health/sciences 7.4
  • 5. Education/Humanities 5
  • 6. Entertainment, etc. 4.5
  • 7. Sex/Pornography 3.2
  • 8. Society/Culture, etc. 3.1
  • 9. Government 1.5
  • 10. Performing/Fine Arts 0.6
  • 1. Commerce, etc. 21
  • 2. Indiscernible 19
  • 3. People/Places, etc. 15
  • 4. Computers/Internet 13
  • 5. Social/Culture 9
  • 6. Health/Sciences 6
  • 7. Education/Humanities 5
  • 8. Sex/Pornography 4
  • 9. Performing/Fine Arts 3
  • 10. Government 3
  • 11. Entertainment, etc. 2

28
Session Level Analysis
  • Duration
  • Patterns
  • Successive multitasking sessions
  • Page or resource viewing

29
Web Session Duration (Minutes)
  • 56 less than 1 minute
  • 72 sessions less than 5 minutes
  • 81 sessions less than 15 minutes
  • Mean approx. 58 minutes and 2 seconds

30
Pages Viewed Per User
  • 2004 - Most users view VERY FEW pages beyond the
    first or first two pages.
  • 14 of users view Web pages for less than 30
    seconds

31
Log Analysis Methods
  • Quantitative and statistical analysis requires
    software and expertise
  • Qualitative analysis requires training
  • Creativity factor
  • Combination of quantitative and qualitative
    methods

32
TLA Strengths
  • Data from a large user base
  • Reasonable and non-intrusive
  • Less time than other methods
  • Can be relatively inexpensive

33
TLA Limitations
  • TLA does not include user demographic and other
    data
  • Lacks data on search reasons and motivations
  • Incomplete data due to corrupted logging

34
Relevance Judgments
User-centered approaches to relevance have led to
a better understanding of the user/IR system
interaction process Studies have addressed the
limitations of precision and recall as effective
measures of IR performance Few studies offered
any new IR evaluation measures
35
Relevance Judgment Assumptions
  • Relevance research based on assumptions about
    user behavior
  • Only Highly Relevant Items important to the user
  • Partial relevant items not important to the user

36
Relevance Judgments Distribution
37
Relevance Judgment Levels

Relevant - A judgment that confirms that some
relationship by inference exists between the
retrieved item and the information problem at
hand Partially Relevant - A judgment that
confirms that some relation by inference exists,
but the relationship is weaker than a relevant
judgment Partially Not Relevant - A judgment that
confirms that some non-relation by inference
exists, but the relationship is not strong enough
to totally reject the relationship as not
relevant Not Relevant - A judgment that confirms
that a relationship by inference does not exist
between the retrieved item and the information
problem at hand
38
Measuring Search Impact
  • IR interaction measures impact of system
    interaction on users information seeking progress

39
One Search Assumption
  • One search assumption user conduct single
    searches on information problem
  • Search is more complex and holistic
  • Search is embedded in human information behaviors

40
Search Levels
41
Conclusions
  • Search analysis is a complex process with many
    choices
  • TLA a powerful tool
  • Requires planning, training and expertise
  • Can be combined with other data collection and
    analysis techniques

42
Conclusions
  • Search is more complex than the Web single search
    single query paradigm
  • Search context is important
  • Search technology is changing, however many user
    search characteristics are relatively stable
  • New search technology (e.g. history,
    visualization) impact user search behavior?

43
Conclusions
  • Need for more comparison of Web search engine
    performance
  • Comparison of single versus meta-search engines
  • Need for better user-based evaluation measures
  • Better usability testing of Web search engine
    interfaces and techniques

44
Conclusions
  • How do users coordinate their information
    behaviors?
  • Relation to information seeking stage, domain
    knowledge, gender or other cognitive variables?
  • Model multitasking and dual-tasking behaviors

45
Conclusions
  • Need for improved and better search technology
  • Need for improved user effort
  • Technology not the complete answer more user
    awareness of search process and their own human
    information behavior
  • Significant improvement in search will only come
    from improved user effort

46
Further Reading
  • Jansen, B. J. (forthcoming). Search log analysis
    What is it whats been done How to do it.
    Library and Information Science Research.
  • Jansen, B. J., Spink, A. (2005). How are we
    searching the Web? A comparison of nine search
    engine transaction logs. Information Processing
    and Management, 42(1), 248-263.
  • Peters, T. (1993). The history and development of
    transaction log analysis. Library Hi Tech,
    42(11), 41-66.
  • Spink, A., Jansen, B. J. (2004). Web Search
    Public Searching of the Web. Springer.
  • Spink, A., Jansen, B. J., Wolfram, D.,
    Saracevic, T. (2002). From e-sex to e-commerce
    Web search changes. IEEE Computer, 35(3),
    133-135.
  • Spink, A., Park, M., Jansen, B. J. (2006).
    Multitasking during Web search sessions.
    Information Processing and Management, 42(1),
    264-275.

47
QUESTIONS??
  • Thank You
Write a Comment
User Comments (0)
About PowerShow.com