Census Data Capture Challenge Intelligent Document Capture Solution - PowerPoint PPT Presentation

About This Presentation
Title:

Census Data Capture Challenge Intelligent Document Capture Solution

Description:

Census Data Capture Challenge. Intelligent Document Capture Solution ... and languages and Based on state-of-the-art Machine Learning algorithms ... – PowerPoint PPT presentation

Number of Views:147
Avg rating:3.0/5.0
Slides: 29
Provided by: yossirub
Learn more at: https://unstats.un.org
Category:

less

Transcript and Presenter's Notes

Title: Census Data Capture Challenge Intelligent Document Capture Solution


1
Census Data Capture Challenge Intelligent
Document Capture Solution
Amir Angel Director of Government Projects
  • UNSD Workshop - Minsk Dec 2008

2
The evolution of data capture in census projects
Five steps
From OCR into IDR Solution
eFLOW
3
The evolution of data capture in census projects
  • Manual data entry (Key from paper)
  • Slow process
  • High error rate in the data entry process
  • Recruitment, training and management of personnel
  • Key from Image
  • Archive
  • Approx 20 faster than key from paper

4
The evolution of data capture in census projects
  • OMR (Hardware readers for checkbox)
  • Requires special scanners and specially printed
    forms
  • Cannot handle handwritten/printed data
  • Forms are not user-friendly
  • OMR requires more answers gt more space gt
    increased paper expenditures gt more handling and
    printing costs
  • Not flexible, difficult to adjust to other
    applications once census is over
  • No possibility to add business rules imputation,
    validations, coding

5
The evolution of data capture in census projects
  • Automated Data Capture
  • Requires less human intervention, enables to
    complete the census data capture much faster
    (less space, less salaries, less hardware)
  • Full flexibility in the type of data gathered
    (checkbox, OMR, handwritten, alpha and numeric,
    barcode)
  • Ensures data integrity enables the use of
    automatic AND manual online validations,
    exception handling, coding
  • The most advanced and proven technology for
    Censuses, recommended by the UN and used by all
    modern countries for census projects
  • Creates a correlation between the image and the
    actual form
  • Remote capabilities enable all forms to be
    scanned locally and then sent to a central site
    for processing

eFLOW
5
6
The evolution of data capture in census projects
  • Intelligent data capture platform (IDR)
  • by using OCR/ICR/OMR/PDA/Web/email
  • Automated data capture
  • Automatic classification for documents
  • understands and differentiates between various
    types of documents and languages and Based on
    state-of-the-art Machine Learning algorithms
  • Artificial intelligence algorithms which provides
    enough information for the system to find the
    location of the fields on its own

eFLOW
7
Traditional Data Capture
Back-Office
Mail Room
Scanning
Data Entry
End Users
Document prep Sorting
Manual Key from image
8
Intelligent Document Capture
Back-Office
Mail Room
Scanning
Data Entry
End Users
Document prep No sorting
Reduce manual data entry by 40-70
Increase accuracy and consistency
9
  • India 2001
  • Turkey 1997
  • Brazil 2000
  • South Africa 2001
  • Ireland 2002
  • Italy 2002
  • Cyprus 2002
  • Turkey 2000
  • Kenya 2000
  • Slovak Republic 2001
  • Hong Kong 2001
  • Thailand 2008(Community)
  • Slovenia 2006
  • Hong Kong 2006
  • South Africa Survey 2007
  • Ireland 2006

9
10
Manual
Automated Data Capture time saving
Saving of 25
Saving of 50
(Source CSO Central Statistic Office Ireland)
11
The technology is there
  • No need to invent the wheel
  • Reducing risks by using an Off the shelf
    technologies.

12
Data Types
OCR
ICR
OMR
13
Automatic Recognition
14
Improve Recognition Voting mechanism
15
Voting Single Engine vs. Virtual Engines
16
Figure Of Merit Example
  • A system recognizes 90 of the characters
    contained in a batch, but misclassifies 4
  • 90 - (104) 50
  • The Figure Of Merit in this example is 50
  • A system recognizes 80 of the characters
    contained in a batch, but misclassifies 1
  • 80- (101) 70
  • The Figure Of Merit in this example is 50
  • The second system is more efficient

17
Benefits of Multiple ICRs
2 8 9 5 6 3 7 4 3 1 6 7 8 5
18
Unique Tiling station Checking for false
positives
  • Identify false positives
  • Alpha Numeric fields
  • Highlight for verifications
  • Quality control for ICR

19
Voting Methods Example
  • Assume we have a V. engine that includes 4
    engines
  • We want to identify the following number 253478
  • The results of each engine are displayed on the
    right
  • The final results of the V. engines will be
  • Safe 28
  • Normal 2578
  • Majority 253478
  • Order 255378
  • Equalizer ??????

20
Processing Example
3
3
8
3
21
Automatic Recognition Time Completion Time
Correction Time THROUGHPUT
22
Fuzzy/Approximate Search
Recognition
Image
Completion
23
Image
Recognition
Completion
24
Other Approaches
  • Auto Coding
  • Coding tasks and data validations performed on
    the data capture platform a cost-effective
    solution
  • Use artificial intelligent statistic software's
    for understand sentences
  • Q What do you do for living?
  • A I am guiding children
    Teacher 2030
  • Use Approximate Search tools for improving
    results via DB (Exorbyte)

25
Process integrality, Questioner integrity - a
work flow according to the client needs
MFlexibilityctivator
Scanning
OCR
Validation
Export
25
26
Flexibility
27
Flexibility
28
Thank You
Census Data Capture Platform
Write a Comment
User Comments (0)
About PowerShow.com