Faceted Metadata in Image Search - PowerPoint PPT Presentation

About This Presentation
Title:

Faceted Metadata in Image Search

Description:

Yahoo directory structure. It is hard to be not overwhelming ... People often spend longer when browsing is going well. Not the case for directed search ... – PowerPoint PPT presentation

Number of Views:75
Avg rating:3.0/5.0
Slides: 62
Provided by: unkn491
Category:

less

Transcript and Presenter's Notes

Title: Faceted Metadata in Image Search


1
Faceted Metadata in Image Search Browsing
Using Words to Browse a Thousand Images
  • Ka-Ping Yee, Kirsten Swearingen, Kevin Li, Marti
    Hearst
  • Group for User Interface Research
  • UC Berkeley
  • CHI 2003
  • Research funded by
  • NSF CAREER Grant IIS-9984741
  • IBM Faculty Fellowship

2
Outline
  • How do people search and browse for images?
  • Current approaches
  • Keywords
  • Spatial similarity
  • Our approach
  • Hierarchical Faceted Metadata
  • Very careful UI design and testing
  • Usability Study
  • Conclusions

3
How do people want to search and browse images?
  • Ethnographic studies of people who use images
    intensely
  • Finding specific objects is easy
  • Find images of the Empire State Building
  • Browsing is difficult
  • People want to use rich descriptions.

4
Ethnographic Study
  • Markkula Sormunen 00
  • Journalists and newspaper editors
  • Choosing photos from a digital archive
  • Searching for specific objects is trivial
  • Stressed a need for browsing
  • Photos need to deal with themes, places, types of
    objects, views
  • Had access to a powerful interface, but it had 40
    entry forms and was generally hard to use no one
    used it.

5
Markkula Sormunen 00
6
Query Study
  • Armitage Enser 97
  • Analyzed 1,749 queries submitted to 7 image and
    film archives
  • Classified queries into a 3x4 facet matrix
  • Rio Carnivals Geo Location x Kind of Event
  • Concluded that users want to search images
    according to combinations of topical categories.

7
Ethnographic Study
  • Ame Elliot 02
  • Architects
  • Common activities
  • Use images for inspiration
  • Browsing during early stages of design
  • Collage making, sketching, pinning up on walls
  • This is different than illustrating powerpoint
  • Maintain sketchbooks shoeboxes of images
  • Young professionals have 500, older 5k
  • No formal organization scheme
  • None of 10 architects interviewed about their
    image collections used indexes
  • Do not like to use computers to find images

8
Current Approaches to Image Search
  • Keyword based
  • WebSeek (Smith and Jain 97)
  • Commercial web image search systems
  • Commercial image vendors (Corbis, Getty)
  • Museum web sites

9
Current Approaches to Image Search
  • Using Visual Content
  • Extract color, texture, shape
  • QBIC (Flickner et al. 95)
  • Blobworld (Carson et al. 99)
  • Piction images text (Srihari et al. 91 99)
  • Two uses
  • Show a clustered similarity space
  • Show those images similar to a selected one
  • Usability studies
  • Rodden et al. a series of studies
  • Clusters dont work showing textual labels is
    promising.

10
Rodden et al., CHI 2001
11
Rodden et al., CHI 2001
12
Rodden et al., CHI 2001
13
How Best to Support Browsing?
  • To support serendipity, want to view images that
    are related along multiple dimensions.
  • But clusters are not comprehensible.
  • Instead, allow users to steer through the
    multi-dimensional category space in a flexible
    manner.

14
Some Challenges
  • Users dont like new search interfaces.
  • How to show lots more information without
    overwhelming or confusing?

15
Our Approach
  • Integrate the search seamlessly into the
    information architecture.
  • Use proper HCI methodologies.
  • Use faceted metadata
  • More flexible than canned hyperlinks
  • Less complex than full search
  • Help users see where to go next and return to
    what happened previously

16
Metadata data about dataFacets orthogonal
categories
17
Hierarchical Faceted Metadata Example
Biological Subject Headings
  • 1. Anatomy A
  • 2. Organisms B
  • 3. Diseases C
  • 4. Chemicals and Drugs D
  • 5. Analytical, Diagnostic and Therapeutic
    Techniques and Equipment E
  • 6. Psychiatry and Psychology F
  • 7. Biological Sciences G
  • 8. Physical Sciences H
  • 9. Anthropology, Education, Sociology and
    Social Phenomena I
  • 10. Technology and Food and Beverages J
  • 11. Humanities K
  • 12. Information Science L
  • 13. Persons M
  • 14. Health Care N
  • 15. Geographic Locations Z

18
Hierarchical Faced Metadata
  • 1. Anatomy A Body Regions A01
  • 2. B
    Musculoskeletal System A02
  • 3. C Digestive
    System A03
  • 4. D Respiratory
    System A04
  • 5. E Urogenital
    System A05
  • 6. F
  • 7. G
  • 8. Physical Sciences H
  • 9. I
  • 10. J
  • 11. K
  • 12. L
  • 13. M

19
Hierarchical Faceted Metadata
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H
  • 9. I
  • 10. J
  • 11. K
  • 12. L
  • 13. M

20
Hierarchical Faceted Metadata
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H
    Electronics
  • 9. I
    Astronomy
  • 10. J
    Nature
  • 11. K
    Time
  • 12. L
    Weights and Measures
  • 13. M .

21
Hierarchical Faceted Metadata
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H
    Electronics Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures
  • 13. M .

22
Hierarchical Faceted Metadata
  • 1. Anatomy A Body Regions A01
    Abdomen A01.047
  • 2. B
    Musculoskeletal System A02 Back
    A01.176
  • 3. C Digestive
    System A03 Breast A01.236
  • 4. D Respiratory
    System A04 Extremities A01.378
  • 5. E Urogenital
    System A05 Head A01.456
  • 6. F
    Neck
    A01.598
  • 7. G
    .
  • 8. Physical Sciences H
    Electronics Amplifiers
  • 9. I
    Astronomy Electronics, Medical
  • 10. J
    Nature Transducers
  • 11. K
    Time
  • 12. L
    Weights and Measures Calibration
  • 13. M .
    Metric
    System


  • Reference Standard

23
Questions we are trying to answer
  • How many facets are allowable?
  • Should facets be mixed and matched?
  • How much is too much?
  • Should hierarchies be progressively revealed,
    tabbed, some combination?
  • How should free-text search be integrated?

24
An Important Trend in Information Architecture
Design
  • Generating web pages from databases
  • Implications
  • Web sites can adapt to user actions
  • Web sites can be instrumented

25
A Taxonomy of WebSites
high
Complexity of Data
low
low
high
Complexity of Applications
From The (Short) Araneus Guide to Website
development, by Mecca, et al, Proceedings of
WebDB99, http//www-rocq.inria.fr/cluet/WEBDB/pr
ocwebdb99.html
26
The Interface Design
  • Chess metaphor
  • Opening
  • Middle game
  • End game

27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
(No Transcript)
36
The Interface Design
  • Tightly Integrated Search
  • Supports Expand as well as Refine
  • Dynamically Generated Pages
  • Paths can be taken in any order
  • Consistent Color Coding
  • Consistent Backup and Bookmarking
  • Standard HTML

37
What is Tricky About This?
  • It is easy to do it poorly
  • Yahoo directory structure
  • It is hard to be not overwhelming
  • Most users prefer simplicity unless complexity
    really makes a difference
  • It is hard to make it flow
  • Can it feel like browsing the shelves?

38
Project History
  • Identify Target Population
  • Architects, city planners
  • Needs assessment.
  • Interviewed architects and conducted contextual
    inquiries.
  • Lo-fi prototyping.
  • Showed paper prototype to 3 professional
    architects.
  • Design / Study Round 1.
  • Simple interactive version. Users liked metadata
    idea.
  • Design / Study Round 2
  • Developed 4 different detailed versions
    evaluated with 11 architects results somewhat
    positive but many problems identified. Matrix
    emerged as a good idea.
  • Metadata revision.
  • Compressed and simplified the metadata
    hierarchies

39
Project History
  • Design / Study Round 3.
  • New version based on results of Round 2
  • Highly positive user response
  • Identified new user population/collection
  • Students and scholars of art history
  • Fine arts images
  • Study Round 4
  • Compare the metadata system to a strong,
    representative baseline

40
New Usability Study
  • Participants Collection
  • 32 Art History Students
  • 35,000 images from SF Fine Arts Museum
  • Study Design
  • Within-subjects
  • Each participant sees both interfaces
  • Balanced in terms of order and tasks
  • Participants assess each interface after use
  • Afterwards they compare them directly
  • Data recorded in behavior logs, server logs,
    paper-surveys one or two experienced testers at
    each trial.
  • Used 9 point Likert scales.
  • Session took about 1.5 hours pay was 15/hour

41
The Baseline System
  • Floogle
  • Take the best of the existing keyword-based image
    search systems

42
Comparison of Common Image Search Systems
43
sword
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
Evaluation Quandary
  • How to assess the success of browsing?
  • Timing is usually not a good indicator
  • People often spend longer when browsing is going
    well.
  • Not the case for directed search
  • Can look for comprehensiveness and correctness
    (precision and recall)
  • But subjective measures seem to be most
    important here.

48
Hypotheses
  • We attempted to design tasks to test the
    following hypotheses
  • Participants will experience greater search
    satisfaction, feel greater confidence in the
    results, produce higher recall, and encounter
    fewer dead ends using FC over Baseline
  • FC will perceived to be more useful and flexible
    than Baseline
  • Participants will feel more familiar with the
    contents of the collection after using FC
  • Participants will use FC to create multi-faceted
    queries

49
Four Types of Tasks
  • Unstructured (3) Search for images of interest
  • Structured Task (11-14) Gather materials for an
    art history essay on a given topic, e.g.
  • Find all woodcuts created in the US
  • Choose the decade with the most
  • Select one of the artists in this periods and
    show all of their woodcuts
  • Choose a subject depicted in these works and find
    another artist who treated the same subject in a
    different way.
  • Structured Task (10) compare related images
  • Find images by artists from 2 different countries
    that depict conflict between groups.
  • Unstructured (5) search for images of interest

50
Other Points
  • Participants were NOT walked through the
    interfaces.
  • The wording of Task 2 reflected the metadata not
    the case for Task 3
  • Within tasks, queries were not different in
    difficulty (ts0.05 according to
    post-task questions)
  • Flamenco is and order of magnitude slower than
    Floogle on average.
  • In task 2 users were allowed 3 more minutes in FC
    than in Baseline.
  • Time spent in tasks 2 and 3 were significantly
    longer in FC (about 2 min more).

51
Results
  • Participants felt significantly more confident
    they had found all relevant images using FC (Task
    2 t(62)2.18, p
  • Participants felt significantly more satisfied
    with the results
  • (Task 2 t(62)3.78, pp
  • Recall scores
  • Task2a In Baseline 57 of participants found all
    relevant results, in FC 81 found all.
  • Task 2b In Baseline 21 found all relevant, in
    FC 77 found all.

52
Post-Interface Assessments
All significant at poverwhelming
53
Perceived Uses of Interfaces
Baseline
FC
54
Post-Test Comparison
Baseline
FC
Which Interface Preferable For
Find images of roses Find all works from a given
period Find pictures by 2 artists in same media
55
Post-Test Comparison
Baseline
FC
Which Interface Preferable For
Find images of roses Find all works from a given
period Find pictures by 2 artists in same media
Overall Assessment
More useful for your tasks Easiest to use Most
flexible More likely to result in dead
ends Helped you learn more Overall preference
56
Facet Usage
  • Facets driven largely by task content
  • Multiple facets 45 of time in structured tasks
  • For unstructured tasks,
  • Artists (17)
  • Date (15)
  • Location (15)
  • Others ranged from 5-12
  • Multiple facets 19 of time
  • From end game, expansion from
  • Artists (39)
  • Media (29)
  • Shapes (19)

57
Qualitative Observations
  • Baseline
  • Simplicity, similarity to Google a plus
  • Also noted the usefulness of the category links
  • FC
  • Starting page well-organized, gave ideas for
    what to search for
  • Query previews were commented on explicitly by 9
    participants
  • Commented on matrix prompting where to go next
  • 3 were confused about what the matrix shows
  • Generally liked the grouping and organizing
  • End game links seemed useful 9 explicitly
    remarked positively on the guidance provided
    there.
  • Often get requests to use the system in future

58
Study Results Summary
  • Strongly positive results for the faceted
    metadata interface.
  • Moderate use of multiple facets.
  • Strong preference over the current state of the
    art.
  • Chair of Architecture Dept It felt like I was
    browsing the shelves!
  • This kind of enthusiasm is not seen in
    similarity-based image search interfaces.
  • Hypotheses are supported.

59
Implementation
  • All open source code
  • Mysql database
  • Python web server (Webkit)
  • Python code
  • Lucene search engine (java)

60
Metadata Availability
  • Many collections already have rich metadata
    associated with them.
  • Automated methods are improving.
  • This tool may be helpful for resolving metadata
    creation wars.

61
Summary
  • Usability studies done on 3 collections
  • Recipes 13,000 items
  • Architecture Images 40,000 items
  • Fine Arts Images 35,000 items
  • Conclusions
  • Users like and are successful with the dynamic
    faceted hierarchical metadata, especially for
    browsing tasks
  • Very positive results, in contrast with studies
    on earlier iterations
  • Note it seems you have to care about the
    contents of the collection to like the interface

62
Advantages of the Approach
  • Supports different search types
  • Highly constrained known-item searches
  • Open-ended, browsing tasks
  • Can easily switch from one mode to the other
    midstream
  • Can both expand and refine
  • Allows different people to add content without
    breaking things
  • Can make use of standard technology

63
Other Domains
  • Applying this to
  • Text
  • Tobacco Documents Archives
  • Medline biomedical texts
  • Products/Catalogs
  • Dont have a collection would like one

64
Future Work
  • What about information visualization?
  • How to integrate with relevance feedback (more
    like this)?
  • How to incorporate user preferences and past
    behavior?
  • How to combine facets to reflect tasks?

65
Thanks toAndrea SahliRashmi SinhaNSF CAREER
Grant IIS-9984741IBM Faculty Fellowship
Try the Demo flamenco.berkeley.edu
Write a Comment
User Comments (0)
About PowerShow.com