Some Thoughts on Tagging PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Some Thoughts on Tagging


1
Some Thoughts on Tagging
Marti Hearst UC Berkeley
2
Outline
  • What are Tags?
  • Organizing Tags for Navigation
  • Facets and faceted navigation
  • How to (semi)automatically create facet
    hierarchies
  • Whats up with Tag Clouds?

3
Social Tagging
  • Metadata assignment without all the bother
  • Spontaneous, easy, and tends towards single terms
  • Usually used in the context of social media

4
Example from del.icio.us
5
The Tagging Opportunity
  • At last! Content-oriented metadata in the large!
  • Attempts at metadata standardization always end
    up with something like the Dublin Core
  • author, date, publisher, yaaawwwwnnn.
  • Ive always thought the action was in the subject
    metadata, and have focused on how to navigate
    collections given such data.

6
The Tagging Opportunity
  • Tags are inherently faceted !
  • It is assumed that multiple labels will be
    assigned to each item
  • Rather than placing them into a folder
  • Rather than placing them into a hierarchy
  • Concepts are assigned from many different content
    categories
  • Helps alleviate the metadata wars
  • Allows for both splitters and lumpers
  • Is this a bird or a robin
  • Doesnt matter, you can do both!
  • Allows for differing organizational views
  • Does NASCAR go under sports or entertainment?
  • Doesnt matter, you can do both!

7
Tagging Problems
  • Tags arent organized
  • Thorough coverage isnt controlled for
  • The haphazard assignments lead to problems with
  • Synonymy
  • Homonymy
  • See how this author attempts to compensate

8
Tagging Problems / Opportunities
  • Some tags are fleeting in meaning or too personal
  • toread todo
  • Tags are not professional
  • (I personally dont think this matters)
  • Great example from Trant
  • "Anecdotal evidence also shows that
    professional cataloguers find the basic
    description of visual elements surprisingly
    difficult a curator exhibited significant
    discomfort during this description task. When
    asked what was wrong, he blurted out "everything
    I know isn't in the picture".
  • Investigating social tagging and folksonomy in
    the art museum with steve.museum", J. Trant, B.
    Wyman, WWW 2006 Collaborative Tagging Workshop

9
Investigating social tagging and folksonomy in
the art museumwith steve.museum", J. Trant, B.
Wyman, WWW 2006 Collaborative Tagging Workshop
10
What about Browsing?
  • I think tags need some organization
  • Currently most tags are used as a direct index
    into items
  • Click on tag, see items assigned to it, end of
    story
  • Co-occurring tags are not shown
  • Grouping into small hierarchies is not usually
    done
  • del.icio.us now has bundles, but navigation isnt
    good
  • IBMs dogear and RawSugar come the closest
  • I think the solution is to organize tags into
    faceted hierarchies and do browsing in the
    standard way

11
Faceted Navigation and Flamenco
12
The Problem With Hierarchy
  • Most things can be classified in more than one
    way.
  • Most organizational systems do not handle this
    well.
  • Example Animal Classification

Skin Covering
otter penguin robin salmon wolf cobra bat
Locomotion
Diet
13
The Problem with Hierarchy
  • Inflexible
  • Force the user to start with a particular
    category
  • What if I dont know the animals diet, but the
    interface makes me start with that category?
  • Wasteful
  • Have to repeat combinations of categories
  • Makes for extra clicking and extra coding
  • Difficult to modify
  • To add a new category type, must duplicate it
    everywhere or change things everywhere

14
The Problem With Hierarchy
start
15
The Idea of Facets
  • Facets are a way of labeling data
  • A kind of Metadata (data about data)
  • Can be thought of as properties of items
  • Facets vs. Categories
  • Items are placed INTO a category system
  • Multiple facet labels are ASSIGNED TO items

16
The Idea of Facets
  • Create INDEPENDENT categories (facets)
  • Each facet has labels (sometimes arranged in a
    hierarchy)
  • Assign labels from the facets to every item
  • Example recipe collection

Ingredient
Cooking Method
Chicken
Stir-fry
Bell Pepper
Curry
Course
Cuisine
Main Course
Thai
17
The Idea of Facets
  • Break out all the important concepts into their
    own facets
  • Sometimes the facets are hierarchical
  • Assign labels to items from any level of the
    hierarchy

Preparation Method Fry Saute Boil
Bake Broil Freeze
Desserts Cakes Cookies Dairy
Ice Cream Sorbet Flan
Fruits Cherries Berries Blueberries
Strawberries Bananas Pineapple
18
Using Facets
  • Now there are multiple ways to get to each item

Preparation Method Fry Saute Boil
Bake Broil Freeze
Desserts Cakes Cookies Dairy
Ice Cream Sherbet Flan
Fruits Cherries Berries Blueberries
Strawberries Bananas Pineapple
Fruit gt Pineapple Dessert gt Cake Preparation gt
Bake
Dessert gt Dairy gt Sherbet Fruit gt Berries gt
Strawberries Preparation gt Freeze
19
The Flamenco Interface
  • Fine Arts Museum Example

20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
Advantages of the Approach
  • Systematically integrates search results
  • reflect the structure of the info architecture
  • retain the context of previous interactions
  • Gives users control and flexibility
  • Over order of metadata use
  • Over when to navigate vs. when to search
  • Allows integration with advanced methods
  • Collaborative filtering, predicting users
    preferences

32
Advantages of Facets
  • Cant end up with empty results sets
  • (except with keyword search)
  • Helps avoid feelings of being lost.
  • Easier to explore the collection.
  • Helps users infer what kinds of things are in the
    collection.
  • Evokes a feeling of browsing the shelves
  • Is preferred over standard search for collection
    browsing in usability studies.
  • (Interface must be designed properly)

33
Related WorkAutomated Tag Organization
  • Some efforts are on tag prediction
  • Mishne 06
  • Uses IR techniques to find the closest tagged
    documents, uses their tags to assign new tags.
    Measures on how well new tags predicted
  • Xu et al. 06
  • Use tags that have already been predicted for a
    document to predict which to show to a new user
    who is tagging the document
  • Some efforts on tag organization
  • Brooks Montanez 06
  • Tries to see if tags can predict document
    clusters, which in my book arent really
    categories
  • After clustering based on text they try to induce
    a tag hierarchy by agglomerative clustering the
    text. Results not described in detail
  • Begelman et al. 06
  • Use clustering and tag co-occurrence to find
    associated tags. Not clear what the
    organizational goal is

34
RawSugar
  • A company/website that organizes tags from blogs
    into facets
  • They are undergoing a revamp, will move to
    channels
  • However, nothing published on this
  • (presumably, patents filed)

35
(No Transcript)
36
(No Transcript)
37
(No Transcript)
38
How to Create Facet Hierarchies?
  • Our Approach Castanet
  • (Stoica Hearst, to appear at HLT-NAACL 07)

39
Example Recipes (3500 docs)
40
Castanet Output (shown in Flamenco)
41
Castanet Output (shown in Flamenco)
42
Castanet Output (shown in Flamenco)
43
Example Biology Journal TitlesCastanet Output
(shown in Flamenco)
44
Castanet Algorithm
  • Leverage the structure of WordNet

Documents
45
1. Select Terms
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
  • Select well distributed
  • terms from collection

WordNet
46
2. Get Hypernym Path
red
blue
47
3. Build Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
red
blue
48
4. Compress Tree
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
chromatic color
red, redness
blue, blueness
green, greenness
red
blue
green
49
4. Compress Tree (cont.)
Build tree
Comp. tree
Documents
Select terms
Get hypernym paths
WordNet
color
color
chromatic color
red
blue
green
red
blue
green
50
5. Divide into Facets
Divide into facets
51
Disambiguation
  • Ambiguity in
  • Word senses
  • Paths up the hypernym tree

52
How to Select the Right Senses and Paths?
  • First build core tree
  • (1) Create paths for words with only one sense
  • (2) Use Domains
  • Wordnet has 212 Domains
  • medicine, mathematics, biology, chemistry,
    linguistics, soccer, etc.
  • Automatically scan the collection to see which
    domains apply
  • The user selects which of the suggested domains
    to use or may add own
  • Paths for terms that match the selected domains
    are added to the core tree
  • Then add remaining terms to the core tree.

53
Castanet Evaluation Method
  • Information architects assessed the category
    systems
  • For each of 2 systems output
  • Examined and commented on top-level
  • Examined and commented on two sub-levels
  • Also compared to a baseline system
  • Then comment on overall properties
  • Meaningful?
  • Systematic?
  • Likely to use in your work?

54
CastaNet Evaluation Results
  • Results on recipes collection for
    Would you use this system in your work?
  • Yes in some cases or yes, definitely
  • Castanet 29/34
  • LDA 0/18
  • Subsumption 6/16
  • Baseline 25/34
  • Average response to questions about quality
    (4 strongly agree)

55
Will Castanet Work on Tags?
  • Class project by Simon King and Jeff Towle, 2004
  • 1650 captions captured from mobile phones
  • Blocks with Grandpa, Weezer , A veterans day
    tour of berkeley in front of south hall., Bad
    photo, Kitchen, Jgj
  • Wanted to organize them.
  • Use the CastaNet wordnet-based facet-hierarchy
    creation algorithm
  • by Stoica Hearst, to appear at HLT-NAACL 07
  • Had to first remove proper names

56
Example Photos Captions (King Towle)
very scary x-mas tree
Hp presentation
chasing a cat in the dark
My cat
57
  • instrumentality, (112)
  • vehicle (26)
  • car (9)
  • bike (8)
  • vessel, watercraft (4)
  • mayflower (2)
  • ferry (1)
  • gig (1)
  • truck (3)
  • airplane (2)
  • device (20)
  • machine (7)
  • computer (4)
  • laptop (1)
  • sander (1)
  • container (16)
  • vessel (7)
  • bottle (5)
  • water_bottle (2)
  • jug (1)
  • pill_bottle (1)
  • bath (2)
  • bowl (1)
  • can (2)
  • backpack (1)
  • bumper (1)
  • empty (1)
  • salt_shaker (1)
  • furniture, piece of furniture, article of
    furniture (12)
  • seat (8)
  • bench (2)
  • chair (2)
  • couch (2)
  • lounge (1)

58
Research Questions for Tags Search
  • The role of interface on tag convergence
  • There seems to be a big effect
  • Would be really interesting to experiment with
    this
  • Also, for facet grouping
  • Anchor text vs. tags?
  • How are they the same how do they differ?
  • How to get tag expertise?
  • Right now, in many cases it is least-common-denomi
    nator
  • ESP-game

59
Whats up with Tag Clouds?
What does a typical tag cloud look like?
60
Definition
  • Tag Cloud A visual representation of social
    tags, organized into paragraph-style layout,
    usually in alphabetical order, where the relative
    size and weight of the font for each tag
    corresponds to the relative frequency of its use.

61
Definition
  • Tag Cloud A visual representation of social
    tags, organized into paragraph-style layout,
    usually in alphabetical order, where the relative
    size and weight of the font for each tag
    corresponds to the relative frequency of its use.

62
flickrs tag cloud
63
del.icio.us
64
del.icio.us
65
blogs
66
  • ma.gnolia.com

67
NYTimes.com tags from most frequent search terms
68
IBMs manyeyes project
69
Amazon.com Tag clouds on term frequenies
70
Alternative Semantic Layout
  • Improving Tag-Clouds as Visual Information
    Retrieval Interfaces, Yusef Hassan-Monteroa, 1
    and Víctor Herrero-Solana, InSciT2006
  • Tags grouped by similarity, based on clustering
    techniques and co-occurrence analysis

71
I was puzzled by the questions
  • What are designers and authors intentions in
    creating or using tag clouds?
  • How do they expect their readers to use them?

72
On the positive side
  • Compact
  • Draws the eye towards the most frequent
    (important?) tags
  • You get three dimensions simultaneously!
  • alphabetical order
  • size indicating importance
  • the tags themselves

73
Weirdnesses
  • Initial encounters unencouraging
  • Some reports from industry
  • Is the computer broken?
  • Is this a ransom note?

74
Weirdnesses
  • Violates principles of perceptual design
  • Longer words grab more attention than shorter
  • Length of tag is conflated with its size
  • White space implies meaning when there is none
    intended
  • Ascenders and descenders can also effect focus
  • Eye moves around erratically, no flow or guides
    for visual focus
  • Proximity does not hold meaning
  • The paragraph-style layout makes it quite
    arbitrary which terms are above, below, and
    otherwise near which other terms
  • Position within paragraph has saliency effects
  • Visual comparisons difficult (see Tufte)

75
Weirdnesses
  • Meaningful associations are lost
  • Where are the different country names in this tag
    clouds?

76
Weirdnesses
  • Which operating systems are mentioned?

77
Tag Cloud Study (1)
  • First part compared tag cloud layouts
  • Independent Variables
  • Tag size
  • Tag proximity to a large font
  • Tag quadrant position
  • Task recall after a distractor task
  • 13 participants effects for size and quadrant
  • Second part compared tag clouds to lists
  • 11 participants
  • Tested recognition (from a set of like words) and
    impression formation
  • Alphabetical lists were best for the latter no
    differences for the former
  • Getting our head in the clouds Toward
    evaluation studies of tagclouds, Walkyria
    Rivadeneira Daniel M. Gruen Michael J. Muller
    David R. Millen, CHI 2007 note

78
Tag Cloud Study (2)
  • 62 participants did a selection task
  • (find this country out of a list of 10 countries)
  • Independent Variables
  • Horizontal list
  • Horizontal list, alphabetical
  • Vertical list
  • Vertical list, alphabetical
  • Spatial tag cloud
  • Spatial tag cloud, alphabetical
  • Order for non-alphabetical not described
  • Alphabetical fastest in all cases, lists faster
    than spatial
  • May have used poor clouds (some people couldnt
    see larger font answers)
  • An Assessment of Tag Presentation Techniques
    Martin Halvey, Mark Keane, poster at WWW 2007.

79
A Justifying Claim
  • You get three dimensions simultaneously!
  • alphabetical order
  • size indicating importance
  • the tags themselves
  • but is this really a conscious design decision?

80
Solution Celebrity Interviews
  • I was really confused about tag clouds, so I
    decided to ask the people behind the puffs
  • 15 interviews, conducted at foocamp06
  • Several web 2.0 leaders
  • 5 more interviews at Google and Berkeley

81
A Surprise
  • 7 interviewees DID NOT REALIZE that alphabetical
    ordering is standard.
  • 2 of these people were in charge of such sites
    but had had others write the code
  • What was the answer given to what order are tags
    shown in?
  • hadnt thought about it
  • dont think about tag clouds that way
  • random order
  • ordered by semantic similarity
  • Suggests that perhaps people are too distracted
    by the layout to use the alphabetical ordering

82
Suggested main purposes
  • To signal the presence of tags on the site
  • A good way to get the gist of the site
  • An inviting and fun way to get people interacting
    with the site
  • To show what kinds of information are on the site
  • Some of these said they are good for navigation
  • Easy to implement

83
Tag Clouds as Self-Descriptions
  • Several noted that a tag cloud showing ones own
    tags can be evocative
  • A good summary of what one is thinking and
    reading about
  • Useful for self-reflection
  • Useful for showing others ones thoughts
  • One example comparing someone elses tags to
    owns one to see what you have in common, and
    what special interests differentiate you
  • Useful for tracking changes in friends lives
  • Oh, a new girls name has gotten larger he must
    have a new girlfriend!

84
Tag Clouds as showing Trends
  • Several people used this term, that tag clouds
    show trends in someones behavior
  • Trends are usually patterns across time, which
    are not inherently visible in tag clouds
  • To note a trend using a tag cloud, one must
    remember what was there at an earlier time, and
    what changed
  • tracking the girls names example
  • This suggests a reason for the importance of the
    large tags draws ones attention to what is big
    now versus was used to be large.
  • Suggests also why it doesnt matter that you
    cant see small tags.

85
New Perspective Tag Clouds are Social!
  • Its not about the information!
  • Not surprising in retrospect tagging is in large
    part about the social aspect
  • Seems to work mainly when the tags can be seen by
    many
  • Even better when items can be tagged by many and
    seen by many
  • What does this mean though when tag clouds are
    applied to non-social information?

86
Follow-up Study
  • Informed by the interview results, we search for,
    read, and coded web pages that mentioned tag
    clouds.
  • Looked at about 140 discussions
  • Developed 21 codes
  • Looked at another 90 discussions
  • Used web queries tag clouds, usability tag
    clouds, etc
  • Sampled every 10th url
  • 58 personal blogs
  • 20 commercial blogs
  • 10 commercial web pages
  • rest from group blogs and discussion lists
  • Doesnt tell us what people who dont write about
    tag clouds think.

87
The Role of Popularity
  • Popularity in the sense that tag clouds (and
    tagging) are trendy and popular.
  • Some people liked the visualization, but their
    popularity made them less appealing
  • Famous post Tag clouds are the new mullets
  • Led to self-consciousness about liking them
  • Many complained about unaesthetic cloud designs
  • Little consensus on if they are a fad or have
    staying power
  • Popularity also in the sense of the large font
    size for more popular tags
  • Many people like the prominence of large tags,
    but several commented on the tyranny of the
    popular

88
The Role of Navigation
  • Opinions vary
  • Many simply state they are useful for navigation,
    but with no support for this claim
  • Some claim the compactness makes navigation
    easier than a vertical list
  • Some object to the varying font size on
    scannability
  • Others object to the lack of organization
  • Overall, there is no evidence either way that we
    could find in the blog community

89
Aesthetic Considerations
  • Disagreement on the aesthetic and emotional
    appeal, especially for lay users.
  • Those who like them find them fun and appealing
  • Those who dont find them messy, strange, like a
    ransom note
  • Informal reports with first time users who are
    not in the Web 2.0 community are negative

90
Trends again
  • As in the interviews, the benefit of trends was
    mentioned many times.
  • There is another sense of trend as tendency or
    inclination, and this might be what people mean.

91
Summary of Stated Reasons for Tag Clouds(Note
some refuted by studies)
92
Tag Clouds as Social Information
  • An emphasis that tag clouds are meant to show
    human behavior.
  • We found reports of people commenting on other
    uses that were invalid because they did not
    reflect live user input
  • One blogger noted the incongruity of an online
    library using keyword frequencies in a tag cloud
    rather than having it reflect patrons usage of
    the collection.
  • An online community noticed one sites cloud
    didnt change over time and realized the sizes
    were decided by marketing. This was greated with
    derision.

93
Implications
  • Assume tag clouds are meant to reflect human
    mental activity (individual or group)
  • Then what might seem design flaws from an
    information conveyance perspective may not be
  • A large part of the appeal is the fun and
    liveliness.
  • The informality of the layout reflects the human
    activity beneath it.

94
Judith Donath, CACM 45(4), 2002
  • Traditional data visualization focuses on
    making abstract numbers and relationships into
    concrete, spatialized images the goal is to
    highlight important patterns while also
    representing the data accurately. This is a fine
    approach for social scientists studying the
    dynamics of online interactions. Yet for our
    purpose it is also important that the
    visualization evoke an appropriate intuitive
    response representing the feel of the
    conversation as well as depicting its dynamics

95
Judith Donath, CACM 45(4), 2002
  • One argument for deliberately designing
    evocative visualizations for online social
    environments is the existing default textual
    interfaces are themselves evocative, they simply
    evoke an aura of business-like monotony rather
    than the lively social scene that actually
    exists.''

96
Tag Cloud Alternatives
  • Provided by Martin Wattenberg

97
Conclusions
  • Social tagging is, in my view, a terrific way to
    get good content metadata.
  • I think automated techniques can do a lot to help
    clean them up and organize them.
  • They are an inherently social phenomenon, part of
    social media, which is a really exciting area.
  • The socialness of social media can yield
    surprises, like tag clouds.
Write a Comment
User Comments (0)
About PowerShow.com