EDB Overview - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

EDB Overview

Description:

Finding Stuff: -LSI and Database Searching- A Business Use Case Joe Tragert EBSCO Publishing Bentley June 26, 2006 Overview EBSCO Publishing overview Latent Semantic ... – PowerPoint PPT presentation

Number of Views:90
Avg rating:3.0/5.0
Slides: 26
Provided by: JoeTr82
Category:
Tags: edb | bentley | overview

less

Transcript and Presenter's Notes

Title: EDB Overview


1
Finding Stuff -LSI and Database Searching- A
Business Use Case
Joe Tragert EBSCO PublishingBentley June 26, 2006
2
Overview
  • EBSCO Publishing overview
  • Latent Semantic Indexing pros and cons
  • Integrated diverse content types the Executive
    Daily Brief use case
  • Discovering obfuscated records the US PTO
    example

3
EBSCO Industries
  • Ranked 162 in Forbes Americas Largest Private
    Companies in 2005

4
EBSCO Publishing
  • Research reference solutions
  • Corporate
  • Medical
  • Academic
  • Public Library
  • K-12
  • 73 terabytes of content, configured into over 100
    different proprietary full-text databases
  • Redistribute 100 3rd-party reference products
  • Founded in 1987, 550 employees world wide, HQ in
    Ipswich, MA

5
Latent Semantic Indexing
  • Searching is focused on the words, not indices or
    metadata.
  • The engine can be trained to optimize results
    by domain (engineering, medical, general
    business, etc.)
  • Engine creates a vector space based upon the data
    it sees. All articles are placed within that
    vector space.
  • Updates are quickly assigned values within the
    vector space, enabling real-time integration of
    RSS feeds.
  • Multiple data sources are integrated rapidly,
    requiring a few hours to a few days.

6
LSI Advantages
  • Conceptual Search concepts are matched, not key
    words
  • Easier to create searches by using chunks of text
    as search terms
  • No need to understand thesauri or Boolean
    operators
  • Integrated Content databases, blogs, RSS, etc.
  • Multiple databases can be searched at once
    (similar to federated search, but different)
  • Since the words are searched, no need to
    normalize indices or record structures of source
    data sets
  • Real time content
  • The engine can rapidly assign new content to the
    existing vector space, enabling integration of
    current content with archival material
  • Language agnostic
  • Since all content is converted to value in the
    vector space, multiple languages can be searched
    and returned in a single result list

7
LSI Disadvantages
  • Precision Matching concepts does not lead to the
    one perfect article
  • Multiple content types in one result set requires
    robust filtering and refining functionality, to
    minimize confusion
  • Default date order sorting can overwhelm a
    result list
  • Multiple languages is seductive, but requires
    quality translator feature to get best utility
    from the results
  • Can be difficult for the Google generation to
    grasp the concept of concepts

8
Why Use LSI?
  • Structured data users tend not to care about
    meta data
  • Currency is king users tend to focus on real
    time content (news sites, blogs) but periodicals
    can provide real value
  • Skills not everyone is a librarian actually,
    most arent
  • Tools slow to learn, slower to change
  • Perspective impatient with complexity

9
LSI Use Case
  • Customizable monitoring and alert service
  • Supports non-librarian corporate uses brand
    management, corporate intelligence, general
    counsel, IP management, etc.
  • Two types of Search
  • Content Analyst LLCs patented Concept Search
  • EBSCOs keyword search
  • Multiple content types
  • Premium business content (EBSCO structured
    content)
  • Newspapers
  • RSS feeds (blogs, news sites)
  • Licensed databases (USPTO, INSPEC, etc.)
  • Intranet repositories

10
Multiple Content Types and Search Methods
  1. Users can set up folders, and monitor for content
    related conceptually (same meaning, but different
    words) to key words or article examples already
    in the folders
  2. Users can search for immediate results that are
    related to words, articles, emails or external
    documents, using Concept Search or Key Word
    Search
  3. Users can link to advanced key word search
    options, thesauri, and visual searching

11
Folders Are Determined by End Users
  • Users can add, delete or edit alerts (folders)
    as needed
  • Users put words, phrases, paragraphs, full
    articles, emails, MS Word docs, etc. into the
    folders.
  • EDB adds matches to the folders
  • Results for a folder appear when the folder is
    selected
  • Users can easily make a result into a concept
    (example) and put it into a folder

12
Structured Content in Familiar Layout
  • The full text is viewed in a pop up window
  • The user will link to the source (the article on
    EBSCOhost, news site, the RSS feed provider,
    licensed database or intranet file)
  • Users can email, save, print the document, or add
    it to their folder as a new example to be
    monitored

13
Linking to RSS Providers Simplifies Access
  • Selected RSS articles are viewed in a pop up
    window
  • The user links to the source

14
Results Are Refined, Interactively
  • Users can sort results by Date, Title,
    Publication and Relevance
  • Users can narrow results by Publication or
    Content Type
  • Users can delete previously read content, content
    of a specific relevance, or content published
    before a specific date

15
Alerts Controlled by End User
  • Users can set up email lists (groups and
    individuals) to automatically forward documents
  • Users can set higher relevancy threshold for
    shared documents, vs. their own inbox (only send
    the best articles to colleagues

16
LSI Use Case
  • Find deliberately obscured patents
  • Compare prior art to current research
  • Monitor pending patents
  • Search patents in native languages
  • USPTO
  • European Patent Organization
  • Japan Patent Office
  • Expose patent search to more staff
  • Bench scientists
  • Competitive intelligence
  • Risk managers

17
Sneak Peak EBSCO Patent Monitor
  • In development Fall 2006 release
  • Use Concept Searching to identify conceptually
    related patents
  • Enable cross-database searching
  • Patents (various sources)
  • Published STM literature
  • Proprietary research intranets

18
Searching on motorcycle finds patents that do
not include the term motorcycle
19
Patent 6,085,857 does not contain the word
motorcycle, but it sure looks like one
aka motorcycle
20
Running a concept search on the patent abstract
creates an instant context list
These terms are found in the USPTO database and
relate to saddle-type riding vehicles. Users
can search the USPTO database to find those
patents, or they can research the individuals to
see who else is an expert
21
The terms and names on the Instant Context list
can indicate the true nature of the patent
Shinobu Tsutsumikoshi is a developer at Suzuki...
22
Search using press release on the new Maxim Knee
System and get hundreds of related patents.
23
US Patent 6,090,144 is about prosthetic knees
even though the Maxim press release never used
the term prosthesis
24
Finding Stuff The Dead Mouse Test
  • LSI, key words, proximity, etc
  • The real question is not which mouse trap works
    better
  • just did we kill the mouse?

25
Thank You
Joe Tragert Director, Market Development EBSCO
Publishing O 800-653-2726 ext. 661 E
jtragert_at_epnet.com
Write a Comment
User Comments (0)
About PowerShow.com