Title: EDB Overview
1Finding Stuff -LSI and Database Searching- A
Business Use Case
Joe Tragert EBSCO PublishingBentley June 26, 2006
2Overview
- EBSCO Publishing overview
- Latent Semantic Indexing pros and cons
- Integrated diverse content types the Executive
Daily Brief use case - Discovering obfuscated records the US PTO
example
3EBSCO Industries
- Ranked 162 in Forbes Americas Largest Private
Companies in 2005
4EBSCO Publishing
- Research reference solutions
- Corporate
- Medical
- Academic
- Public Library
- K-12
- 73 terabytes of content, configured into over 100
different proprietary full-text databases - Redistribute 100 3rd-party reference products
- Founded in 1987, 550 employees world wide, HQ in
Ipswich, MA
5Latent Semantic Indexing
- Searching is focused on the words, not indices or
metadata. - The engine can be trained to optimize results
by domain (engineering, medical, general
business, etc.) - Engine creates a vector space based upon the data
it sees. All articles are placed within that
vector space. - Updates are quickly assigned values within the
vector space, enabling real-time integration of
RSS feeds. - Multiple data sources are integrated rapidly,
requiring a few hours to a few days.
6LSI Advantages
- Conceptual Search concepts are matched, not key
words - Easier to create searches by using chunks of text
as search terms - No need to understand thesauri or Boolean
operators - Integrated Content databases, blogs, RSS, etc.
- Multiple databases can be searched at once
(similar to federated search, but different) - Since the words are searched, no need to
normalize indices or record structures of source
data sets - Real time content
- The engine can rapidly assign new content to the
existing vector space, enabling integration of
current content with archival material - Language agnostic
- Since all content is converted to value in the
vector space, multiple languages can be searched
and returned in a single result list
7LSI Disadvantages
- Precision Matching concepts does not lead to the
one perfect article - Multiple content types in one result set requires
robust filtering and refining functionality, to
minimize confusion - Default date order sorting can overwhelm a
result list - Multiple languages is seductive, but requires
quality translator feature to get best utility
from the results - Can be difficult for the Google generation to
grasp the concept of concepts
8Why Use LSI?
- Structured data users tend not to care about
meta data - Currency is king users tend to focus on real
time content (news sites, blogs) but periodicals
can provide real value - Skills not everyone is a librarian actually,
most arent - Tools slow to learn, slower to change
- Perspective impatient with complexity
9LSI Use Case
- Customizable monitoring and alert service
- Supports non-librarian corporate uses brand
management, corporate intelligence, general
counsel, IP management, etc. - Two types of Search
- Content Analyst LLCs patented Concept Search
- EBSCOs keyword search
- Multiple content types
- Premium business content (EBSCO structured
content) - Newspapers
- RSS feeds (blogs, news sites)
- Licensed databases (USPTO, INSPEC, etc.)
- Intranet repositories
10Multiple Content Types and Search Methods
- Users can set up folders, and monitor for content
related conceptually (same meaning, but different
words) to key words or article examples already
in the folders - Users can search for immediate results that are
related to words, articles, emails or external
documents, using Concept Search or Key Word
Search - Users can link to advanced key word search
options, thesauri, and visual searching
11Folders Are Determined by End Users
- Users can add, delete or edit alerts (folders)
as needed - Users put words, phrases, paragraphs, full
articles, emails, MS Word docs, etc. into the
folders. - EDB adds matches to the folders
- Results for a folder appear when the folder is
selected - Users can easily make a result into a concept
(example) and put it into a folder
12Structured Content in Familiar Layout
- The full text is viewed in a pop up window
- The user will link to the source (the article on
EBSCOhost, news site, the RSS feed provider,
licensed database or intranet file) - Users can email, save, print the document, or add
it to their folder as a new example to be
monitored
13Linking to RSS Providers Simplifies Access
- Selected RSS articles are viewed in a pop up
window - The user links to the source
14Results Are Refined, Interactively
- Users can sort results by Date, Title,
Publication and Relevance - Users can narrow results by Publication or
Content Type - Users can delete previously read content, content
of a specific relevance, or content published
before a specific date
15Alerts Controlled by End User
- Users can set up email lists (groups and
individuals) to automatically forward documents - Users can set higher relevancy threshold for
shared documents, vs. their own inbox (only send
the best articles to colleagues
16LSI Use Case
- Find deliberately obscured patents
- Compare prior art to current research
- Monitor pending patents
- Search patents in native languages
- USPTO
- European Patent Organization
- Japan Patent Office
- Expose patent search to more staff
- Bench scientists
- Competitive intelligence
- Risk managers
17Sneak Peak EBSCO Patent Monitor
- In development Fall 2006 release
- Use Concept Searching to identify conceptually
related patents - Enable cross-database searching
- Patents (various sources)
- Published STM literature
- Proprietary research intranets
18Searching on motorcycle finds patents that do
not include the term motorcycle
19Patent 6,085,857 does not contain the word
motorcycle, but it sure looks like one
aka motorcycle
20Running a concept search on the patent abstract
creates an instant context list
These terms are found in the USPTO database and
relate to saddle-type riding vehicles. Users
can search the USPTO database to find those
patents, or they can research the individuals to
see who else is an expert
21The terms and names on the Instant Context list
can indicate the true nature of the patent
Shinobu Tsutsumikoshi is a developer at Suzuki...
22Search using press release on the new Maxim Knee
System and get hundreds of related patents.
23US Patent 6,090,144 is about prosthetic knees
even though the Maxim press release never used
the term prosthesis
24Finding Stuff The Dead Mouse Test
- LSI, key words, proximity, etc
- The real question is not which mouse trap works
better - just did we kill the mouse?
25Thank You
Joe Tragert Director, Market Development EBSCO
Publishing O 800-653-2726 ext. 661 E
jtragert_at_epnet.com