Title: Toward Automatic Processing and Indexing of Microfilm
1Toward Automatic Processing and Indexing of
Microfilm
2Microfilm Processing
Images are scanned from ribbons of microfilm.
Each image on the microfilm ribbon is then
cropped and de-skewed.
3Microfilm Processing
Cropped and De-skewed Image
4 Lines in a document emit a unique
signature.
Image Zoning
- The algorithm searches for these
- patterns to detect the lines that
describe a table.
5Image Zoning
Automatically IdentifiesTable Structure.
6Optical Character Recognition
- A neural net evaluates each zone in the
image. - The neural net converts the printed
characters in each zone into ASCII text.
7Optical Character Recognition
8Column-Row Recognition
- The algorithm uses the geometry of
- each zone to identify the tables
columns and rows. - The algorithm associates each column and
row label with its values in the - table.
9Column-Row Recognition
10Identify Labels
- The algorithm maps the printed text of
each label to a standardized name. - The standardized names correspond to the
fields in a database.
11Identify Labels
ROAD, STREET, c., And No. or NAME of HOUSE
Address
12Identify Labels
NAME and Surname of each Person
Full Name
Address
13Identify Labels
RELATION to Head of Family
Relationship
Address
Full Name
14Extract Data
- The algorithm identifies factored table
values. - The algorithm stores each record in an XML
file.
15Extract Data
Collafer
Extracted by hand.
16Extract Data
John Eyres
Head
Collafer
Extracted by hand.
17Extract Data
Annie Eyres
Wife
Collafer
Extracted by hand.
18Extract Data
Lehailes Eyre
Son
Collafer
Extracted by hand.
19Microfilm Queries
- A web form provides the interface to query
the microfilm database. - Individuals can enter keywords (such as
a first and last name), and the system
locates appropriate records in the indexed
microfilm documents.
20Web Query
John
Eyre
21Search Results
- The system returns the indexed images that
contain the results. - Since the database indexes both the text
and geometry of the document, the process
can return just the relevant regions of the
microfilm image.
22Search Results
23Search Results
24Just-In-Time Browsing
- To make the query results display quickly,
the system uses Just-In-Time Browsing.
- Just-In-Time Browsing will allow people to
browse digitized microfilm and other large
collections of images over the Internet at
interactive rates.
25Just-In-Time Browsing
26Just-In-Time Browsing