Overview - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

Overview

Description:

Overview Market Leader: Intelligent Capture & Exchange Solutions * * * * * * * * * * * * * * * Talk about FSM here? * Accuracy went from 98% on doc review via ... – PowerPoint PPT presentation

Number of Views:217
Avg rating:3.0/5.0
Slides: 22
Provided by: mknconsul
Category:
Tags: overview

less

Transcript and Presenter's Notes

Title: Overview


1
Overview
Market LeaderIntelligent Capture Exchange
Solutions
2
Information comes in many forms
  • Structured Content
  • Information is predictable
  • Location of information ispredictable
  • Examples
  • Waybill
  • Traffic Citations
  • Tax Forms
  • Mail Order Forms
  • Applications
  • Insurance Claims

3
Information comes in many forms
  • Semi-Structured Content
  • Information is predictable
  • Location of information isNOT predictable
  • Examples
  • Accounts Payable
  • Accounts Receivable
  • Transportation
  • Bills of Lading
  • Medical Billing

4
Information comes in many forms
  • Unstructured Content
  • Information is NOT predictable
  • Location of information isNOT predictable
  • Examples
  • Mortgage Folders
  • Medical Records
  • Email Classification
  • Digital Mailroom
  • Litigation Support

5
Where Did Kofax Classification / Separation
Originate?
Was funded by In-Q-Tel, the joint venture capture
startup group owned by the CIA.
6
Enabling the automation of Document
Classification Processes
  • Processing millions of captured foreign documents
  • Automating the categorization of content to
    expedite linguistic activities
  • Connecting to an internal content management
    solution

7
Kofax Transformation - Advanced Document
Separation
  • Automatically identify document type and
    individual document boundaries (start/end) within
    a batch of multiple documents
  • Goal Perform separation/recognition just as if
    physical separator sheets were inserted between
    each document
  • Utilizes multiple approaches in classification
    and separation in a waterfall approach.

8
KTM Advanced Document Separation Process
KTM Advanced Document Separation
Typical Process Flow
Extraction 1
Document
Scan/
Extraction
Classify
Review
Release
Data Validation
Import
Separate
9
Vector Space Machines Under the Hood
Warning The following slides may require pocket
protectors.
10
Classification
  • CLASSIFICATION METHODS
  • Each classification method can be used
    independently or in combination.
  • Mark Registration
  • Text Registration Image
  • Image
  • Advanced Text
  • Manual Rules

Template Technology
Self-Learning Document Classification
NEW
Powered by INDICIUS
NEW
Used to augment Advanced Text
11
Classification Waterfall
Template
Document
Yes
Type Set
Image
No
Document
Yes
Type Set
Document Type
Advanced Text
No
Document
Yes
Type Set
Rules
No
Document
Yes
Type Set
Document
Review /
No
Completion
12
Learn-By-Example Approach
Advanced Text and Separation
Subject experts provide example documents
text
Learning Algorithm
Engine
Attributes
  • Training and Model Development Phase
  • Requires example documents correctly placed in
    each category
  • Generates a small footprint model used by
    run-time engine
  • No rules to write!
  • Run-time Operation
  • Uses model to apply patterns learned in training
    phase to new, incoming documents
  • Analyzes document attributes and determines the
    appropriate category
  • Provides associated confidence score for each
    result

13
Advanced Text Training Phase Model Builder
The Advanced Text Classification and Separation
is trained on a directory of files that contains
sample documents properly placed in order and in
the correct categories.
HUD - Page 1
14
Training on Text Patterns Learning Algorithm
PHOENIX (AP) - Ray Durham hit a leadoff homer and
Brett Tomko pitched four innings, helping the San
Francisco Giants beat the Milwaukee Brewers 9-2
Tuesday. Milwaukee center fielder Scott
Podsednik, returned to the lineup and doubled
twice....
SARASOTA, Fla. (AP) - Jose Acevedo made a strong
bid for a spot in Cincinnati's rotation, pitching
five solid innings Tuesday night in a 5-0 victory
over the Boston Red Sox. Adam Dunn returned to
the starting lineup and went 1-for-2 with a
sacrifice fly....
Sports
NEW YORK (AP) - A spinoff of its hit cartoon
"Dora the Explorer" and a comedy that stars Julia
Roberts' niece are among nine new programs
ordered by Nickelodeon for the networks next
season. "Go, Diego, Go" will feature Dora's
rough-and-tumble....
NEW YORK (AP) - If Regis Philbin once saved ABC,
Donald Trump has certain bragging rights at NBC.
In two months, the hit show "The Apprentice" has
made a huge difference on Thursday nights for
NBC, an evening the network....
Entertainment
15
Using Spatial Vectors for Text Classification
Sports
Simple Vector Approach
Tech.
Ent.
Mohomine Approach Space Vector Machine
algorithm confidence (sports category)
0.3pitch 0.4inning 0.7lineup 0.1hit
16
Separation
  • Strict Rule Separation
  • For Example, The Solution will create a new
    document every time a page of type X is seen
  • Advanced Separation
  • Uses probabilities to ascertain from page
    classifications the most likely document
    structure
  • Rule-based separation
  • implemented through scripting (Ascent Capture
    platform only, introduced in version 4).

17
Advanced Text Separation Methodology
  • 1st Pass Determine if given page is the first,
    middle, or last page of a known form (each page
    may receive multiple answers)
  • 2nd Pass Individual page assignments are used
    to identify most likely form order and separation
    points within document (using context of pages
    before/after each page)

3
4
5
6
7
8
9
1
2
Page
Classifier Result
Last Form X
Last Form Z
Middle Form X
Middle Form Y
First Form X
First Form Y
?
Last Form Y
First Form Z
Most Likely Grouping
Form X
Form Y
Form Z
18
Advanced Text Separation Methodology
  • 1st Pass Determine if given page is the first,
    middle, or last page of a known form (each page
    may receive multiple answers)
  • 2nd Pass Individual page assignments are used
    to identify most likely form order and separation
    points within document (using context of pages
    before/after each page)

19
Thank You!
Kofax Confidential
20
Automatic Document ID and Indexing
S 90
E 90
S 65
M 70
E 85
S 72
E 80
S 85
E 50
S 55
M 65
E 70
S 70
E 75
E 22
M 15
S 12
M 10
E 65
S 12
E 30
S
E
S
M
E
S
E
21
Automatic Document ID and Indexing
Page Identification
Document Separation
S
E
S
M
E
S
E
Index
22
Automatic Document ID and Indexing
Page Identification
Document Separation
S
E
S
M
E
S
E
Index
23
Classification Waterfall Technique
Using multiple classification engines
  • Performance is optimized by attempting fastest
    classification techniques first, accepting
    results only if very confident
  • Mohomine text classification is used as catch
    all methodvery accurate with widest reach, but
    dependent on full-page OCR

1
2
3
4
5
6
7
8
Page
First Form X
1 ms
First Form Z
First Form Y
20 ms
Last Form X
Last Form Z
Last Form Y
200 ms
Middle Form X
Middle Form Z
1000 ms
24
How do we actually build a model?
Business
Dictionary
NEW YORK (Reuters) - Former WorldCom Inc. finance
chief Scott Sullivan, who has become the star
witness against Bernard Ebbers, admitted on
Wednesday to a history of lies, saying he had
deceived shareholders, analysts and the board
while his staff undertook an 11 billion
accounting fraud. Sharply questioned by the lead
attorney for Ebbers, the one-time chief executive
officer
SAN JOSE, Calif. (AP) -- One week after firing
its top executive, Hewlett-Packard Co. reported
quarterly earnings that were essentially flat,
and the interim chief executive acknowledged,
There is work to be done.'' For the three
months ended Jan. 31, HP reported a profit of
943 million, or 32 cents per share, only 0.7
percent higher than the 936 million, or 30 cents
per share, it earned in the first fiscal
quarter
Sports
PARIS (AP) -- Still hungry to race but wary he is
not in the best shape, Lance Armstrong wants to
take his Tour de France record to even mightier
heights He will try for a seventh straight title
this summer. Armstrong had left open the
possibility he wouldn't compete this year in
cycling's showcase event to pursue other races.
But in an announcement Wednesday on the Web site
of his Discovery Channel team the Tour's only
six-time winner
Saying this was a "sad, regrettable day,"
Commissioner Gary Bettman announced today that
the National Hockey League was canceling the
season because negotiators had failed to come to
an agreement with the players' union on salary
caps. With his announcement, the N.H.L. becomes
the first major pro sports league in North
America to lose an entire season to a labor
dispute
Technology
SAN FRANCISCO, Feb. 15 - Late in the summer of
1973, two young scientists in the nascent field
of computer networks hunkered down in a
conference room of the Cabana Hyatt Hotel in Palo
Alto, Calif., a clean but bland stopping place
for salesmen and the parents of students at
nearby Stanford University. Their goal was to
thrash out a way to make different, isolated
computer networks talk to each other.
A new battery-powered Etch A Sketch will rely on
digital electronics for a speedy interpretation
of each knob twist. It is designed, its makers
say, to transmit data along a wire plugged into a
television set that will display every line and
detail in real time, with accompanying sounds and
optional color. It will cost 20, twice the price
of the traditional Etch A Sketch. "I think the
kids are becoming more advanced in
25
The Problem Document Separation
  • Separation of unstructured documents is a
    significant expense for a high volume capture
    system
  • Typical structured recognition technologies are
    not applicable
  • Manual insertion of separator sheets is the
    primary solution today
  • 50 of document preparation labor spent sorting
    documents and inserting separator pages

Where does one document stop and the next begin?
Here?
Here?
Here?
SS
26
How Document Separation Works
3
4
5
1
2
Page
X
X
mC Result
First Form X (97)
Middle Form X (92)
Last Form X (95)
First Form Y (84)
Last Form Y (95)
FSM Constraints
  • A First page must be followed by Middle or
    Last of same type
  • After a Last page must come a First
  • Custom Business Rules

Best Path Analysis
Form X
Form Y
27
Customer Success Story
  • Residential mortgage processing, 12 Million
    images/month
  • Each customer folder 100 pages, 60-80 doc types
  • Before automatic document separation
  • 60 people doing document separation and
    preparation
  • 16 people to review (QC) a customer folder
  • 8.25 minutes per folder to review
  • With automatic document separation
  • 10 people doing document separation and
    preparation
  • 3 people to review (exceeded goal to reduce staff
    to 8)
  • 2 minutes per folder to review
  • Exceeded processing goal targets at each step
  • 420,000 annual savings in labor
  • 100,000 annual savings in separator sheet
    consumables

28
Capabilities Overview
  • Classification
  • Content (text)
  • Layout (topography)
  • Combination of the above
  • Extraction
  • Rules (format, database)
  • Learn-by-example
  • Templates
  • Any document
  • Structured (inc. legacy forms)
  • Semi-structured, e.g. invoices
  • Unstructured documents, e.g. correspondence

29
Key Applications/Use Cases
  • Invoices (AP automation)
  • Speed up AP process and reduce manual keying
  • Pre-configured solution already available
  • Sales Orders
  • Improve sales order process and accuracy
  • Mailroom applications/Workflow automation
  • Automatic classification and routing
  • Indexing (lt 3 fields) for archive
  • No need for pre-sorting
  • Image to archive automation
  • Automatic classification and indexing for storage
    in dm system
  • Better, quicker, more accurate batch capture
  • Business process automation
  • Full data capture
  • Straight thru processing
  • Semi-structured and unstructured documents
  • Invoices and credit notes
  • Correspondence
  • Reports

30
Kofax KTM Differentiators
  • Integrated with Kofax Capture (offering HA, xx)
  • Learn-by-example extraction
  • Learn-by-example classification
  • Continuous supervised learning in production
  • Single product for all document types that is
    upgradable

31
Kofax Solution Strengths
  • Market leader
  • Out-of-the-box
  • Unlimited import options
  • VRS integrated with QC Later
  • Better Recognition/Multiple Document Types
  • API Integrated export
  • Secure handling of images data
  • Out-of-the-box reports
  • You wont outgrow it
Write a Comment
User Comments (0)
About PowerShow.com