About - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

About

Description:

Creates innovative software to access and cluster the ... Create a controlled vocabulary (taxonomy) of categories. Index new documents into the taxonomy ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 21
Provided by: viv55
Category:
Tags: create

less

Transcript and Presenter's Notes

Title: About


1
(No Transcript)
2
About
  • Vivisimo Inc. is an enterprise software company
  • Creates innovative software to access and cluster
    the worlds information, for better search and
    discovery
  • Founded June 00 in Pittsburgh, 20 employees,
    sustained profitability
  • Organic growth, no venture capital funding
  • About 80 customers Cisco, JNJ, NSA, JAMA,
    Micropatent, AOL, AAAS, etc.
  • Clusty.com, acclaimed web search engine
  • Launched on Sept 30 of last year
  • Raul Valdes-Perez, CEO co-founder
  • Last Stop Carnegie Mellon Computer Science Dept
    (1986-2000)

3
Problem Information Overload Overlook
  • Information Overload and Information Overlook
  • Look for information ? get too much back
  • Most people handle overload by overlooking most
    information
  • How to Get People to Overlook Less Information?
  • Provide categorized information!

4
Categorization at Creation Time
  • Also Known as Taxonomy Building
  • Create a controlled vocabulary (taxonomy) of
    categories
  • Index new documents into the taxonomy
  • Update the taxonomy over time
  • At search time, group results into most frequent
    categories
  • Examples
  • National Library of Medicines Medical Subject
    Headings (MeSH)
  • Developed over decades
  • Human indexers assign MeSH terms to new articles
  • Ten people refine MeSH over time recently added
    SARS
  • Yahoo directory, Open Directory
  • Why Not Widespread in Electronic World?
  • Taxonomy model is costly complicated
  • End-users need to meta-search, which undermines
    taxonomies
  • Much of electronic world still in the stack of
    books on the floor mode

5
(No Transcript)
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
New Way Clustering
  • To cluster means to form groups
  • Stars, candy, disease outbreaks, computers
  • Quick linguistic/statistical analysis of search
    results
  • Forms groups based on the major themes in the top
    N results
  • Can leverage taxonomies, as shown at
    http//Clustermed.info
  • Software based on 6 years of research
    development funded by National Science Foundation
  • Builds on many research-years by others in
    academia industry

10
End User Benefits of Clustering
User Benefits 1 See what's available At a
glance, the folders show you an information
landscape. 2 See much further Following our
interests, we navigate to low-ranked but
interesting search results, which we're unlikely
ever to see otherwise. 3 See similar
information together We don't have to be
satisfied with the first reference we come
across. We can compare several and pick the best
one.
Key Advantages 1 Works on the fly No need for
pre-processing 2 Spontaneous categories no need
to pre-define them
11
Question Clustering, Query Refinement,
Personalization, or Entity Extraction?
  • Query Refinement - Show alternative queries to
    the user
  • Pro little computation, dont need search
    results as input
  • Con
  • History-based, not matched to search results,
    relevance is a challenge
  • Click on a query, your screen disappears, absorb
    new context
  • Personalization
  • Pro its about you!
  • Con
  • Peoples interests arent static (Olympics,
    Oscars, Tsunamis, etc.)
  • Shared computers lead to shared personalization
  • Entity Extraction
  • Pro nouns are informative and familiar
  • Con
  • verbs matter (and adjectives, adverbs, etc.)
  • ungrammatical search-result descriptions


12
(No Transcript)
13
Question Index Everything or Meta-Search?
  • Index Everything
  • Pros centralization, universal ranking,
    simplicity
  • Cons centralization, universal ranking,
    complexity
  • Meta-Search
  • Pros decentralization, can leverage
  • Partners, secondary content, free/government
    search engines
  • Ranking methods can leverage voting
  • Publishers aggregators can create vertical
    destination sites (vortals)
  • Cons
  • Need to create relationships with partners
  • Need meta-search software
  • Dont have a single ranking score for everything
  • Is Meta-Search a Practical Necessity?

14
(No Transcript)
15
Some Early Adopters of Clustering
16
Old Way of Thinking
  • If a search returns many hits, then youre a
    novice
  • Most patrons/users are novice searchers!
  • New way
  • Purposely start out with broad searches
  • Learn something by viewing the results
  • Use what you learned to search again
  • Find what youre looking for
  • or discover what youd normally miss

17
Clustering Will Go Mainstream in 2005-06
  • Why? Critical Mass of Installations and Buzz
  • AOL Clusty reach about 10 of web searches
  • Unbeatable Value Proposition
  • Instant organized information without taxonomies
  • Can leverage taxonomies where they exist
  • No surprises
  • Real End-User Benefits
  • Users can consider lots more info with the same
    effort
  • Makes people smarter!
  • Transform online world of information
  • Less like 1-person used bookstore with piles of
    books
  • Look more like Barnes Noble or Borders

18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com