Information Classification and Retrieval - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Information Classification and Retrieval

Description:

Etsy facets. Time of listing. Object colour. Location. Top 100 items. Top 100 sellers ... Back to Etsy. 38. Popular link tags. 39. Cloud number of directory ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 47
Provided by: rog65
Category:

less

Transcript and Presenter's Notes

Title: Information Classification and Retrieval


1
Information Classification and Retrieval The
folks and their tags
Roger Hudson Web Usability
The Web Standards Group, Sydney 17 August, 2006
2
Classification and navigation
Hoover and Port
Whats in a name? That which we call a rose By
any other name would smell as sweet.
But only if we can find the rose.
3
Classifying things
Systema Naturae, 1758
Classification of living things
Kingdom Animalia Phylum Chordata Subphylum
Vertebrata Class Mammalia Family
Homininae Genus Homo Species Homo
sapiens
Carl (Carolus) Linnaeus, from Sweden Father of
Taxonomy.
The classification of living things, Systema
Naturae, published in 1758.
4
Classifying information (books)
Dewey System, 1876
Melville Dewey
Dewey grew up in a small U.S. town in the 1850s.
An insular man with an American Christian view
of the world.
Dewey developed a system for classifying books
which has become the most widely used
classification system in the world.
5
Dewey Decimal Classification System
Dewey Decimal Classification System published in
1876. 
Organises non-fiction books into 10 general
subject areas.
Each subject area has
  • 10 sub categories
  • 10 sub-sub categories
  • etc

6
Dewey and information retrieval
A taxonomy with closely defined subject
categories
  • Reflects needs at the time of development
  • Harder to evolve to meet changing and future needs

682 Small forge work (blacksmithing) Great in
the 1870s. What about the internet today?
004.019 Human Computer Interaction 004.67
General Internet books 005.72 Web usability
books 808 Web Writing (in Literature)
7
Website taxonomy
Traditional approach to classifying web content
  • Content holders determine the information
    architecture.
  • Rigid hierarchical site structure.
  • The site taxonomy underpins the navigational
    hierarchies.
  • Less reliance on site search.

8
Navigation and information retrieval
In the early days, simplicity was the way to go.
But as sites got bigger new approaches were
needed.
9
More navigation menus
10
Expanding menus
  • Alls well in the world
  • Everything clearly defined
  • And in its right place

But, not everything can be clearly defined
11
Back to the library for a moment
The Dewey Decimal System
  • enumerates all possible subjects
  • and provides slots for all documents.

Indian mathematician and librarian S. R.
Ranganathan.
Ranganathan, saw limitations with the Dewey
System in the 1930s.
12
Introducing Facets
Ranganatha introduced the idea of classifying
complex objects by the different facets they
contain. He proposed five facets for library
material.
Rather than putting an object into a slot, facets
allow for a composite classification of the
object.
S.R. Ranganatha, Colon Classification published
in1933
13
Facets and the I.T. age
Colon Classification and notation is complex and
not widely used by libraries. The concept of
Facets underpins developments in information
technologies.
  • Relational databases
  • Metadata
  • Keyword search
  • Alternative website navigation

14
Facets and the web
Web content is virtual and accessible from
anywhere via hyperlinks.
Facets allow content holders to
Wine Facets
  • Identify web content by definable properties
  • Categorise content from different perspectives
  • Provide users with different content retrieval
    starting points

Also faceted systems are flexible and can easily
accommodate new content entries.
15
Many ways to find a recipe
?
?
  • Epicurious recipe facets
  • main ingredient
  • cuisine
  • special considerations
  • preparation method
  • season / occasion
  • course / meal
  • dish

?
?
?
?
?
16
Browsing for tabouleh
17
Browsing for tabouleh
18
Looking for craft more facets
Etsy facets
  • Time of listing
  • Object colour
  • Location
  • Top 100 items
  • Top 100 sellers

NB Objects are also listed by categories
19
Shopping by colour
Cool for some
But, maybe not for grumpy old men!
20
Patriotic memory bracelet
21
Tagging and the folk
2004, tagging takes off with the release of two
folksonomic tagging sites
flickr (users tag photos) del.icio.us (users tag
links to web pages)
Folksonomy
A folksonomy is a set of uncontrolled tags
provided by individuals for their own retrieval
purposes of that object and these tags are shared
publicly.
Thomas Vander Wal
http//www.vanderwal.net/index.html
22
Folksonomy and tagging
  • Folksonomy is an open-ended labelling system
    that allows users to categorise online content.
  • Users provide descriptive keywords or tags,
    which use familiar, shared vocabularies.

Folksonomy is the sharing of tags provided by
different users.
Assumption If enough people tag an object,
interesting and useful patterns will emerge.
23
Da Vinci search with del.icio.us
Results relate to Da Vinci the painter and the Da
Vinci Code book.
Tagging produced interesting and useful
associations.
24
Potential benefits
  • Users offer differing perspectives on how
    resources can be organised and described.
  • Users designate terms that make sense to them.
  • Users provide machine-readable metadata for
    information content.
  • Tagging can enhance search engine information
    retrieval.

Folksonomies can help support emergent
vocabularies and multilingual information
classification and retrieval.
25
Cat chat
Germaine from Switzerland
Tags cat cats chat chats canon switzerland jeans c
ute kitty
26
Cat chat
Ella from Poland
Tags cat kot kotek gato katz chat pet animal
27
Da Vinci search with flickr
All of the first page results relate to the book,
not the painter.
Interesting, but perhaps not so useful.
28
Potential issues
  • Individual freedom or mob rule?
  • Tag swamp leading to cognitive overload.
  • Multiple words with the same meaning.
  • Tags mean different things to different users.
  • How many users know what tags are?
  • How many users will tag?

29
Looking for answers
Rough and ready survey
July 2006, participants
  • 10 media workers (radio reporters and producers)
  • 10 library workers (librarians, archivists and
    researchers)
  • 10 web workers (producers, designers and
    developers)
  • 10 museum workers (scientists and program
    managers)

Key questions
  • Are they aware of tags and social booking
    marking?
  • What sort of tags might they use?
  • How likely are they to tag in the future?

30
Who has tagged in the past
How many the participants have previously tagged
web content?
8 out of 40 participants
How will the participants tag two survey photos?
31
Photo 1 tags
49 different tags used
Bridge (10) Australia (7) Harbour Bridge (7)
Sydney sunset (3) Clouds (3) Silhouette (3)
Most common tags
Sydney (27) Sydney Harbour Bridge (25) Sunset
(22) Opera House (21) Sydney Harbour (16) Sydney
Opera House (15) Harbour (13)
29 unique tags including
Australian icon,bridge climb, calm waters, home,
Manly ferry, New South Wales, pylon, sea,
sightseeing, sky, skyline, tourism, yachts,
yellow.
32
Photo 2 tags
Insect (5) Poisonous spider (5) Predator
(4) Arachnid (4) Arachnology (2) Entomology (2)
67 different tags used
Most common tags
Redback / Redback spider (33) (various
spellings) Spider (27) Spider web (8) Web
(8) Prey (6) Australian spider (5)
47 unique tags including
Australian wildlife, Australian bush, biology,
black widow, dangerous, death, eating, exotic
animal, feeding, green, locust, sheds, Slim
Dusty, spider killing, venom.
33
How many will tag in the future
At the end of the survey each participant was
asked
If in the future you could provide tags for web
content (pages, images) that might be helpful to
you and other users, how often would you do this?
34
How many will tag in the future
n40
At the end of the survey each participant was
asked
If in the future you could provide tags for web
content (pages, images) that might be helpful to
you and other users, how often would you do this?
Never Infrequently Sometimes Often Always 4 15 1
0 6 5
  • Comments include
  • I just want to get the information and get out.
  • I might if it helps other people.
  • Dont have the time.
  • Whats in it for me?

35
Issues for discussion
What do you do with large numbers of tags?
  • Do you take all tags
  • If not, how do you cull the list machine or
    human
  • Processing demands

How do you handle wilfully misleading tags?
  • Deliberately disruptive
  • Spammers and scammers
  • Search engine cheaters

Do you allow/encourage idiosyncratic tagging
36
Tagging idiosyncrasy
Tags photoshop Da Vinci code cod fish gag
37
Tag Clouds Another way to find things
Back to Etsy
Seller tags in a cloud
38
Popular link tags
39
Cloud number of directory entries
40
Tag cloud mock-up
Back to the rough and ready survey
Two questions
  • What is this? (the tag cloud)
  • Why are some items bigger?

41
Survey Responses 1
n40
What is this? (the tag cloud)
Most seem to recognise it as a list or index of
links relating to Sydney.
  • Links to information sites about Sydney (10)
  • List (index) of things to see and do in Sydney
    (9)
  • Information about Sydney / attractions (7)
  • Tag cloud, Tag thing (5)
  • Search keywords / results (4)
  • Keywords for areas of interest (3)
  • Commonly used words as tags (2)

42
Survey Responses 2
n40
Why are some items bigger?
Wide range of responses
  • Paid more to be bigger / marketing / sponsored
    links (10)
  • Developer (site owner) thinks they are more
    important (9)
  • Visitations / links (areas) most clicked /
    popular (7)
  • Areas with the most information (6)
  • No idea / dont know (4)
  • Results most relevant to search inquiry (3)
  • How many people use the label (as a tag) (1)

43
Other concerns
Use of tags and Tag clouds for information
retrieval raises some interesting questions
  • Do users know what tag clouds are?
  • How is the weighting of tags determined?
  • Is the weighting open to misinterpretation?
  • Is tagging the tyranny of the majority?
  • When do items which are not in the cloud slip
    from the collective conscious?
  • Are tag clouds accessible?

44
Where to now?
  • Information Architecture is dead
  • Who knows more about what they want than the
    user?
  • Folksonomy is a mess
  • With mob indexing how will we know where
    anything is or find what we want?

Traditional hierarchies, facets, tags and
folksonomies are all interesting and potentially
useful. It is a question of finding the right
balance.
45
Theres more than one way to skin a cat
with apologies to cat lovers
  • Web sites are different and their users have
    different needs.
  • For site navigation and information retrieval,
    there is no one size fits all.
  • Find the system that best meets your
    circumstances.

Be cautious when people say they know the one and
only way. Unless of course, the answer is
42
46
Thank you for listening. Roger Hudson Web
Usability
Write a Comment
User Comments (0)
About PowerShow.com