Title: Considering a Faceted Searchbased Model
1Considering a Faceted Search-based Model
- Marti Hearst
- UCB SIMS
- hearst_at_sims.berkeley.edu
- NAS CSTB DNS Meeting on
- Internet Navigation and the Domain Name System
- Technical Alternatives and Policy Implications
- July 12, 2001
2Outline
- The Klensin proposal
- Synopsis
- Issues
- Recommendations
- UIs and faceted search
3A Proposal
- A Search-based access model for the DNS
- IETF Internet-Draft by John Klensin
- http//www.ietf.org/internet-drafts/draft-klensin-
dns-search-00.txt - A multi-layer approach to naming
- Faceted descriptions are used to facilitate both
flexible naming and inexact search - This talk
- What does research tell us about the search
issues?
4Klensins proposal
Free-text Search (unregulated)
Faceted System (detailed, unregulated)
Faceted Classification System (simple, regulated)
DNS (unchanged)
5Layer 2
Language Spanish
Name Joses Pizza
Industry Category Restaurant
Geolocation Miami
Network Location
6Layer 2
Inputs search values for one or more
facets Outputs appropriate DNS names and all
tuples with matched facets Allow for partial
(fuzzy) match
Faceted System (simple, regulated)
Joses Pizza, Miami Albertos Pizza, Miami Joses
Bistro, Miami Joses Pizza, Saratoga Joes Pizza,
Miami
7Layer 2 Selling Points
- Allows sharing of name space among different
(commercial) entities - Allows specification according to meaningful
attributes
8Layer 2 DNS Issues
- How to guarantee uniqueness?
- How to determine appropriate descriptors?
- How to use in a hyperlink?
- Requires a user interface for confirmation of
correct choice
9Layer 2 Descriptor Issues
- Emphasis on geolocation may be problematic
- May be too spare
- SFMOMA
- SFMOMA exhibits
- SFMOMA exhibit on digital art called 101010
10Layer 3
Not centrally coordinated (provided by commercial
services) More detailed facets Allow for
inheritance Context-sensitive (e.g., restaurant
has menu attribute auto repair has
services, etc.) Inputs service-dependent Outputs
layer 2 names
11Layer 4
Free-text Search (unregulated)
Use standard search to find sites that discuss
topics that relate to the query (as web search
works today)
12Relation to Web Search
- Web search is perceived to work better today than
two years ago. Why? - Finds appropriate starting points
- Also known as source selection
- Search for toyota no longer returns Tonys
Toyota pages as the top-ranked hit - Before the web, source selection was a separate
operation from free text search - Also, queries tended to be longer
- Web search engines could do this exclusively
but they do other things as well.
13Recommendations on Klensin Proposal
- A promising, intriguing approach
- One tweak
- Combine layers 2 and 3
- Have a partly regulated portion, and an open
portion - This however is susceptible to spamming
- Not clear if this should be regulated
14General Pitfalls ofControlled Vocabularies
- Difficult to get agreement on the set of labels
- Difficult to assign labels consistently
- Granularity
- Salience / Emphasis
- Context
- Connotations
- New labels always appearing old ones shift in
meaning - Lay people wont know the system
15How to do it wrongForce into a Hierarchy
The Wron
Lets try to find UCB
16How to do it wrong
The Wron
17How to do it wrong
The Wron
18What is the problem?
- Two deeply hierarchical facets
- Region
- Education
- Forced in convoluted ways into one hierarchy with
irregular cross links
19Two Approaches
- Statistical approaches map words into metadata
terms - Create flexible user interfaces that
progressively reveal appropriate subparts of the
system - (How to do so is a topic of our research.)
20The Practice
- Using descriptors under the hood
- The limited empirical work indicates
- Combining free text descriptors works best
- Some e-commerce sites do this for finding
products - Can sometimes match queries to standard
information needs - buy palm
- review crouching tiger
- berkeley gap
21The Wron
walmart.com Uses metadata under the hood
22The Promise
- Using descriptors in the User Interface
- Use faceted metadata for navigation
- Query Previews
- Tailored Search Forms
- Tightly Combine Navigation Search
23Facets
- Orthogonal sets of descriptors
- Gets complicated when they are hierarchical
- Example recipes
24Metadata Facets
Advantage Great for Mixing and Matching
25Faceted Recipe Metadata
26The Wron
Sunset.com Not the right way
27Dynamic Previews
- Avoid empty results sets
- Show the possible next steps
- A way to seamlessly integrate
- Related topics
- User preferences (personalization)
- Context-sensitivity
28The Wron
29The Wron
30The Wron
31The Wron
32Metadata Usage in Epicurious
- Can choose category types in any order
- But categories never more than one level deep
- And can never use more than one instance of a
category - Even though items may be assigned more than one
of each category type - Items (recipes) are dead-ends
- Dont link to more like this
- Not fully integrated with search
33Epicurious Metadata Usage
The Wron
- Problem lacks integration with search
34This is fixed in marthastewart.com
The Wron
35The Wron
Advanced search more specific than sunset.com
also allows for disjunction thus less likely to
get null results
36UIs for faceted metadata
- Use dynamic previews
- Allow user to select metadata in any order
- At each step, show different types of relevant
metadata, - based on prior steps and personal history,
- include of documents
- Previews restricted to only those metadata types
that might be helpful - Tightly integrate with keyword search
37The Flamenco Research Project
- Systematically determine what works for
integrating metadata into search interfaces - Develop recommendations that reflect both the
task structure and the richness of the
information structure - http//bailando.sims.berkeley.edu/flamenco.html
38Summary
- Agreement on metadata descriptors assignment is
difficult to achieve - Descriptors need to be constantly updated
- Layer 2 is probably not rich enough
- Assigning specifiers is quite different than
searching for specified items - Fuzzy search can help, but
- Requires a UI for confirmation of correct choices
- This will end up looking like a search service
- Can make search more meaningful and task-based
39Summary
- Web search engines can do source selection, but
- Sometimes users do want source selection,
- But often search hits based on content of pages
is often closer to what users want to do - We need to be certain not to confuse source
selection from content search -