Title: Information Classification and Retrieval
1Information Classification and Retrieval The
folks and their tags
Roger Hudson Web Usability
The Web Standards Group, Sydney 17 August, 2006
2Classification and navigation
Hoover and Port
Whats in a name? That which we call a rose By
any other name would smell as sweet.
But only if we can find the rose.
3Classifying things
Systema Naturae, 1758
Classification of living things
Kingdom Animalia Phylum Chordata Subphylum
Vertebrata Class Mammalia Family
Homininae Genus Homo Species Homo
sapiens
Carl (Carolus) Linnaeus, from Sweden Father of
Taxonomy.
The classification of living things, Systema
Naturae, published in 1758.
4Classifying information (books)
Dewey System, 1876
Melville Dewey
Dewey grew up in a small U.S. town in the 1850s.
An insular man with an American Christian view
of the world.
Dewey developed a system for classifying books
which has become the most widely used
classification system in the world.
5Dewey Decimal Classification System
Dewey Decimal Classification System published in
1876.Â
Organises non-fiction books into 10 general
subject areas.
Each subject area has
- 10 sub categories
- 10 sub-sub categories
- etc
6Dewey and information retrieval
A taxonomy with closely defined subject
categories
- Reflects needs at the time of development
- Harder to evolve to meet changing and future needs
682 Small forge work (blacksmithing) Great in
the 1870s. What about the internet today?
004.019 Human Computer Interaction 004.67
General Internet books 005.72 Web usability
books 808 Web Writing (in Literature)
7Website taxonomy
Traditional approach to classifying web content
- Content holders determine the information
architecture. - Rigid hierarchical site structure.
- The site taxonomy underpins the navigational
hierarchies. - Less reliance on site search.
8Navigation and information retrieval
In the early days, simplicity was the way to go.
But as sites got bigger new approaches were
needed.
9More navigation menus
10Expanding menus
- Alls well in the world
- Everything clearly defined
- And in its right place
But, not everything can be clearly defined
11Back to the library for a moment
The Dewey Decimal System
- enumerates all possible subjects
- and provides slots for all documents.
Indian mathematician and librarian S. R.
Ranganathan.
Ranganathan, saw limitations with the Dewey
System in the 1930s.
12Introducing Facets
Ranganatha introduced the idea of classifying
complex objects by the different facets they
contain. He proposed five facets for library
material.
Rather than putting an object into a slot, facets
allow for a composite classification of the
object.
S.R. Ranganatha, Colon Classification published
in1933
13Facets and the I.T. age
Colon Classification and notation is complex and
not widely used by libraries. The concept of
Facets underpins developments in information
technologies.
- Relational databases
- Metadata
- Keyword search
- Alternative website navigation
14Facets and the web
Web content is virtual and accessible from
anywhere via hyperlinks.
Facets allow content holders to
Wine Facets
- Identify web content by definable properties
- Categorise content from different perspectives
- Provide users with different content retrieval
starting points
Also faceted systems are flexible and can easily
accommodate new content entries.
15Many ways to find a recipe
?
?
- Epicurious recipe facets
- main ingredient
- cuisine
- special considerations
- preparation method
- season / occasion
- course / meal
- dish
?
?
?
?
?
16Browsing for tabouleh
17Browsing for tabouleh
18Looking for craft more facets
Etsy facets
- Time of listing
- Object colour
- Location
- Top 100 items
- Top 100 sellers
NB Objects are also listed by categories
19Shopping by colour
Cool for some
But, maybe not for grumpy old men!
20Patriotic memory bracelet
21Tagging and the folk
2004, tagging takes off with the release of two
folksonomic tagging sites
flickr (users tag photos) del.icio.us (users tag
links to web pages)
Folksonomy
A folksonomy is a set of uncontrolled tags
provided by individuals for their own retrieval
purposes of that object and these tags are shared
publicly.
Thomas Vander Wal
http//www.vanderwal.net/index.html
22Folksonomy and tagging
- Folksonomy is an open-ended labelling system
that allows users to categorise online content. - Users provide descriptive keywords or tags,
which use familiar, shared vocabularies.
Folksonomy is the sharing of tags provided by
different users.
Assumption If enough people tag an object,
interesting and useful patterns will emerge.
23Da Vinci search with del.icio.us
Results relate to Da Vinci the painter and the Da
Vinci Code book.
Tagging produced interesting and useful
associations.
24Potential benefits
- Users offer differing perspectives on how
resources can be organised and described. - Users designate terms that make sense to them.
- Users provide machine-readable metadata for
information content. - Tagging can enhance search engine information
retrieval.
Folksonomies can help support emergent
vocabularies and multilingual information
classification and retrieval.
25Cat chat
Germaine from Switzerland
Tags cat cats chat chats canon switzerland jeans c
ute kitty
26Cat chat
Ella from Poland
Tags cat kot kotek gato katz chat pet animal
27Da Vinci search with flickr
All of the first page results relate to the book,
not the painter.
Interesting, but perhaps not so useful.
28Potential issues
- Individual freedom or mob rule?
- Tag swamp leading to cognitive overload.
- Multiple words with the same meaning.
- Tags mean different things to different users.
- How many users know what tags are?
- How many users will tag?
29Looking for answers
Rough and ready survey
July 2006, participants
- 10 media workers (radio reporters and producers)
- 10 library workers (librarians, archivists and
researchers) - 10 web workers (producers, designers and
developers) - 10 museum workers (scientists and program
managers)
Key questions
- Are they aware of tags and social booking
marking? - What sort of tags might they use?
- How likely are they to tag in the future?
30Who has tagged in the past
How many the participants have previously tagged
web content?
8 out of 40 participants
How will the participants tag two survey photos?
31Photo 1 tags
49 different tags used
Bridge (10) Australia (7) Harbour Bridge (7)
Sydney sunset (3) Clouds (3) Silhouette (3)
Most common tags
Sydney (27) Sydney Harbour Bridge (25) Sunset
(22) Opera House (21) Sydney Harbour (16) Sydney
Opera House (15) Harbour (13)
29 unique tags including
Australian icon,bridge climb, calm waters, home,
Manly ferry, New South Wales, pylon, sea,
sightseeing, sky, skyline, tourism, yachts,
yellow.
32Photo 2 tags
Insect (5) Poisonous spider (5) Predator
(4) Arachnid (4) Arachnology (2) Entomology (2)
67 different tags used
Most common tags
Redback / Redback spider (33) (various
spellings) Spider (27) Spider web (8) Web
(8) Prey (6) Australian spider (5)
47 unique tags including
Australian wildlife, Australian bush, biology,
black widow, dangerous, death, eating, exotic
animal, feeding, green, locust, sheds, Slim
Dusty, spider killing, venom.
33How many will tag in the future
At the end of the survey each participant was
asked
If in the future you could provide tags for web
content (pages, images) that might be helpful to
you and other users, how often would you do this?
34How many will tag in the future
n40
At the end of the survey each participant was
asked
If in the future you could provide tags for web
content (pages, images) that might be helpful to
you and other users, how often would you do this?
Never Infrequently Sometimes Often Always 4 15 1
0 6 5
- Comments include
- I just want to get the information and get out.
- I might if it helps other people.
- Dont have the time.
- Whats in it for me?
35Issues for discussion
What do you do with large numbers of tags?
- Do you take all tags
- If not, how do you cull the list machine or
human - Processing demands
How do you handle wilfully misleading tags?
- Deliberately disruptive
- Spammers and scammers
- Search engine cheaters
Do you allow/encourage idiosyncratic tagging
36Tagging idiosyncrasy
Tags photoshop Da Vinci code cod fish gag
37Tag Clouds Another way to find things
Back to Etsy
Seller tags in a cloud
38Popular link tags
39Cloud number of directory entries
40Tag cloud mock-up
Back to the rough and ready survey
Two questions
- What is this? (the tag cloud)
- Why are some items bigger?
41Survey Responses 1
n40
What is this? (the tag cloud)
Most seem to recognise it as a list or index of
links relating to Sydney.
- Links to information sites about Sydney (10)
- List (index) of things to see and do in Sydney
(9) - Information about Sydney / attractions (7)
- Tag cloud, Tag thing (5)
- Search keywords / results (4)
- Keywords for areas of interest (3)
- Commonly used words as tags (2)
42Survey Responses 2
n40
Why are some items bigger?
Wide range of responses
- Paid more to be bigger / marketing / sponsored
links (10) - Developer (site owner) thinks they are more
important (9) - Visitations / links (areas) most clicked /
popular (7) - Areas with the most information (6)
- No idea / dont know (4)
- Results most relevant to search inquiry (3)
- How many people use the label (as a tag) (1)
43Other concerns
Use of tags and Tag clouds for information
retrieval raises some interesting questions
- Do users know what tag clouds are?
- How is the weighting of tags determined?
- Is the weighting open to misinterpretation?
- Is tagging the tyranny of the majority?
- When do items which are not in the cloud slip
from the collective conscious? - Are tag clouds accessible?
44Where to now?
- Information Architecture is dead
- Who knows more about what they want than the
user? - Folksonomy is a mess
- With mob indexing how will we know where
anything is or find what we want?
Traditional hierarchies, facets, tags and
folksonomies are all interesting and potentially
useful. It is a question of finding the right
balance.
45Theres more than one way to skin a cat
with apologies to cat lovers
- Web sites are different and their users have
different needs. - For site navigation and information retrieval,
there is no one size fits all. - Find the system that best meets your
circumstances.
Be cautious when people say they know the one and
only way. Unless of course, the answer is
42
46Thank you for listening. Roger Hudson Web
Usability