Title: The START Information Access System
1The START InformationAccess System
Boris Katz http//www.ai.mit.edu/projects/infolab/
2The Problem
Finding information on line Two Approaches 1.
Keyword search (search engines, e.g., AltaVista)
2. Natural language processing
3Whats Wrong with Keyword Search?
4Whats Right About Natural Language Processing?
5Whats Wrong with Natural Language Processing
(today)?
- 1. Too hard
- Full-text NL understanding still beyond reach
- Intersentential reference
- Paraphrasing
- Summarization
- Common sense implication
- 2. Too slow
- 3. Not all information is language
- Most Web resources are not textual
- Maps and Images
- Sound and Video
- Multimedia
- Web resources are distributed across numerous
non-traditional databases
6What is START?
START (SynTactic Analysis using Reversible
Transformations) provides multimedia information
access using natural language. Natural
language Natural language is human language. You
dont have to learn a special language to use
START. Ask your questions in English enter
information using English. Multimedia access
using natural language annotations START lets you
use English to access any kind of information
text, pictures, movies, and more. Just the right
information START gives you the answer you want
without including a thousand others. Virtual
collaboration START retrieves information from
its own knowledge base and from databases all
over the Web.
7Natural Language
Natural language is human language. You dont
have to learn a special language to use START.
Ask your questions in English enter information
using English
8Multimedia Access Using Natural Language
Annotations
START lets you use English to access any kind of
information text, pictures, movies, and more.
9Just the Right Information
START gives you the answer you want without
including a thousand other answers.
10Virtual Collaboration
START retrieves information from its own
knowledge base and from databases all over the
Web.
11Natural Language Annotations
- Bridge the gap between our ability to analyze
natural language sentences and other information
and our desire to access the huge amount of data
now available on the Web. - Annotations are collections of natural language
sentences and phrases that describe the content
of various information segments. - START
- analyzes these annotations
- creates the necessary representational
structures - produces special pointers to the information
segments summarized by the annotations.
12Natural Language Annotations
Document
Annotation
Xxx xx xx xxx xxxx x
Neptune was discovered using mathematics.
START Server
START Server
START Server
Xxx xx xxxx xx xx xxxxx x xxx xxx x xxx x xxx
START Server
Information Provider
(negotiation)
Question
How was Neptune discovered?
(submitted)
Information Seeker
(retrieved)
Document
Xxx xx xx xxx xxxx x
Xxx xx xxxx xx xx xxxxx x xxx xxx x xxx x xxx
13Uniform Access
NL questions
IMDb
Queries
U.S. Census
START
Omnibase
Fortune500
Data
Multimedia responses
POTUS
HPKB
- Local knowledge base of ternary expressions
- Core vocabulary
- Uniform interface to multiple database formats
(Web, text, etc.) - Extended lexicon
14How START Works
Web browser
START
HTML
English
English
Scripts
Parser
Generator
Input T-exps
Matcher
Annotations
Native knowledge
T-exps from KB
Database of T-exps
15(No Transcript)