Title: Handwritten Word Recognition: A New CAPTCHA Challenge
1Handwritten Word Recognition A New CAPTCHA
Challenge
- Amalia Rusu and Venu Govindaraju
- CEDAR
- University at Buffalo
2CAPTCHA
- Completely Automatic Public Turing test to tell
Computers and Humans Apart - An automated test that humans can pass but
current computer programs fail beyond the
state-of-the-art - Exploits the difference in abilities between
humans and machines - (i.e. text, speech or facial features
recognition) - A new formulation of the Alan Turings test -
Can machines think?
3Objective
- Example of interface and handwritten CAPTCHA to
confirm registration.
4User Authentication Steps using HCAPTCHA
Automatic Authentication Session for Web Services.
- Initialization
- Handwritten CAPTCHA Challenge
- User Response
- Verification
5Desirable Properties
- CAPTCHA should be automatically generated and
graded - Test can be taken quickly and easily by human
users - Test will accept virtually all human users and
reject software agents - Test will resist automatic attack for many years
despite the technology advances and prior
knowledge of algorithms
6Previous Work
- First CAPTCHA designed in 1997 (for AltaVista
website URL filter) - CMU
- Gimpy, EZ-Gimpy, Gimpy-R, Bongo, Pix, Eco
- PARC
- BaffleText
- UCB PARC
- PessimalPrint
- Microsoft
- ARTiFACIAL
- Bell Labs
- Reverse Turing test using speech
- GIT
- Character morphing
7CAPTCHA Tests
AltaVista URL filter uses isolated random
characters and digits on a cluttered background.
PessimalPrint uses a degradation model simulating
physical defects caused by copying and scanning
of printed text.
BaffleText uses pronounceable character strings
that are not in the English dictionary and render
the character string using a font into an image
(without physics-based degradations) then
generate a mask image as shown above.
8CAPTCHA Tests
EZ-Gimpy uses real English words.
Gimpy Type 3 different English words appearing in
the picture above.
Gimpy-R uses nonsense words.
Character morphing algorithm that transforms a
string into its graphical form.
9Why Handwritten CAPTCHA?
- No handwritten text based CAPTCHA exists - so
far!!! - Several machine printed text based CAPTCHA
already broken - Greg Mori and Jitendra Malik of the UCB have
written a program that can solve Ez-Gimpy with
accuracy 83 - Thayananthan, Stenger, Torr, and Cipolla of the
Cambridge vision group have written a program
that can achieve 93 correct recognition rate
against Ez-Gimpy - Gabriel Moy, Nathan Jones, Curt Harkless, and
Randy Potter of Areté Associates have written a
program that can achieve 78 accuracy against
Gimpy-R - Machine recognition of handwriting is more
difficult than printed text - Handwriting recognition is a task that humans
perform easily and reliably - Research is in the early stages - a promising
field - Handwritten CAPTCHAs will challenge the KBCS
community!
10State-of-the-art
Lexicon size Lexicon Driven Lexicon Driven Lexicon Driven Grapheme Model Grapheme Model Grapheme Model
Lexicon size time (secs) accuracy accuracy time (secs) accuracy accuracy
Lexicon size time (secs) Top 1 Top 2 time (secs) Top 1 Top 2
10 0.027 96.53 98.73 0.021 96.56 98.77
100 0.044 89.22 94.13 0.031 89.12 94.06
1000 0.144 75.38 86.29 0.089 75.38 86.29
20000 1.827 58.14 66.56 0.994 58.14 66.49
- Speed and accuracy of a HR. Feature extraction
time is excluded. Testing platform is an
Ultra-SPARC.
11Source of Errors for HW Recognizers
- Image quality
- Background noise, printing surface, writing
styles - Image features
- Variable stroke width, slope, rotations,
stretching, compressing - Segmentation errors
- Over-segmentation, merging, fragmentation,
ligatures, scrawls - Recognition errors
- Confusion with similar lexicon entries, large
lexicons
12Creating H-CAPTCHAS
- Use handwritten word images that current
recognizers cannot read - Controlled distortion of existing handwritten
word images - Create handwritten images by concatenating
handwritten character images - Use handwritten US city name images (4,000 from
CEDAR CDROM) - Character images were discretely printed to begin
with - Character images are automatically segmented out
of handwritten word images - Use set of 20,000 handwritten character images
(extracted by program) - Synthesize sentence images by gluing together
isolated upper and lower case handwritten
characters or word images
13H-CAPTCHA Generation Algorithm
- Input.
- Original (random) handwritten image (existing US
city name image or synthetic word image with
length 5 to 8 characters or meaningful sentence). - Lexicon containing the images truth word.
- Output.
- H-CAPTCHA image.
- Method.
- Randomly choose a number of transformations
- Randomly establish the transformations
corresponding to the given number from add
lines, circles, grids, arcs, background noise
(multiplicative or impulse), random convolution
masks, blur, wave, spread, median filters, thick
or thin characters on vertical or horizontal
fashion, etc. - A priori order is assigned to each transformation
based on experimental results. Sort the list of
chosen transformations based on their priority
order and apply them in sequence, so that the
effect is cumulative.
14Handwritten text images
- Examples of handwritten characters used to
generate random words.
Examples of handwritten US city name images used
as a base for transformations.
Examples of synthetic handwritten sentence images.
15H-CAPTCHA by Image Quality Transforms
Add lines, grids, arcs, background noise,
convolution masks and special filters
16H-CAPTCHA by Image Features Transforms
Variable stroke width, slope, rotations,
stretching, compressing
17H-CAPTCHA by Segmentation Transform
Delete ligatures, use touching letters/digits,
merge characters for over segmentation or to be
unable to segment
18H-CAPTCHA by Lexicon Transform
- Lexicon challenges size, density, availability
19H-CAPTCHA Evaluation
- No risk of image repetition
- Image generation completely automated words,
images and distortions chosen at random - The transformed images cannot be easily
normalized or rendered noise free by present
computer programs, although original images must
be public knowledge - Deformed images do not pose problems to humans
- Human subjects succeeded on our test images
- Test against state-of-the-art WMR, Accuscript
- CAPTCHAs unbroken by CEDAR recognizers
20H-CAPTCHAs
- Handwritten US city name images that defeat both
WMR and Accuscript recognizers.
21H-CAPTCHA Challenge
Word Recognizers Number of Recognized Images Accuracy
WMR 383 9.28
Accuscript 182 4.41
Low accuracy of handwriting recognizers. The
lexicons are created so as to contain all the
truths of test images. Total number of tested
images is 4,127 (and so is the lexicon size)
Number of Students Number of Test Images Humans Accuracy WMR Accuracy Accuscript Accuracy
12 15 82 0 0
Low accuracy of handwriting recognizers vs.
humans on a subset of test images.
22CAPTCHA using Gestalt Psychology
- Gestalt psychology is based on the observation
that we often experience things that are not a
part of our simple sensations - What we are seeing is an effect of the whole
event, not contained in the sum of the parts
(holistic approach) - Organizing principles - Gestalt laws
- law of closure
- law of similarity
- law of proximity
- law of symmetry
- law of continuity
- law of familiarity
- figure and ground
- Not restricted to perception
- memory
OXXXXXX XOXXXXX XXOXXXX XXXOXXX XXXXOXX
XXXXXOXXXXXXXO
23H-CAPTCHA based on Gestalt Laws
Gestalt laws law of proximity, symmetry,
familiarity, continuity
Methods create horizontal or vertical overlaps -
for same words smaller distance overlaps
- for different words
bigger distance overlaps
24H-CAPTCHA based on Gestalt Laws
Gestalt laws law of closure, proximity,
continuity
Methods create occlusions by circles,
rectangles, lines with random angles
25H-CAPTCHA based on gestalt laws
Gestalt laws law of closure, proximity,
continuity
Methods add occlusions by waves from left to
right on entire image, with various amplitudes /
wavelength or rotate them by an angle
26H-CAPTCHA based on Gestalt Laws
Gestalt laws law of closure, proximity,
continuity, background
Methods use empty letters, broken letters, edgy
contour, fragmentation
27H-CAPTCHA based on Gestalt Laws
Gestalt laws memory, internal metrics,
familiarity of letters
vertical mirror difficult for humans
horizontal mirror difficult for humans
flip-flop OK for humans!!
Methods change word orientation entirely, or the
orientation for few letters only
28Gestalt H-CAPTCHA Results
Word Recognizers Horizontal Overlap (Small) Horizontal Overlap (Large) Vertical Overlap Occlusion by waves Occlusion by circles Empty Letters Less Fragment-ation More Fragment-ation Old Transforms
WMR 24.35 12.93 27.88 15.43 35.93 0.89 0 0.48 9.28
Accuscript 2.93 2.42 12.64 10.56 32.34 0.06 0.18 0 4.41
Tested images is 4,127 for each type of
transformation.
29Future Work
Personalizing Email Addresses
- Creates transformed alias e-mail addresses to
prevent mining by software agents
30Future Work
Adult vs. Child vs. Machine
- Few methods to differentiate between adult vs.
child - Asking a question that has the answer in the
handwritten sentence - Giving an incomplete handwritten sentence and
asking to imply the missing word - Comparing the handwritten text with a standard
word list - Using longer, more complicated handwritten
sentences, using advanced topics from technical
fields such as math, physics, or financial - Useful on Internet services due to expansion of
harmful minor websites
- Reading abilities delimitation
- Machine vs. 1st grade child
- Adult vs. 7th grade child
31Future Work
- HCAPTCHA based on Handwritten Sentence Reading
and Understanding - Incorporate and adjust the image complexity
factor as a parameter of error - Try out more image transformations and compare
results against humans performance - Cognitive aspects of HCAPTCHA for adult vs. child
protocol - HCAPTCHA as a Challenge Response Protocol for
Security Systems - Online-Handwriting CAPTCHA
- HCAPTCHA as a Biometric?
- HCAPTCHA normalization concerns based on future
technology development
32 33Handwritten CAPTCHA Applications
- Wide variety on the web applications
- Suppressing SPAM and worms
- Only accept an email if I know there is a human
behind the other computer. - Prove you are human before you can get a free
email account. - Search engine boots
- There is an html tag to prevent search engine
bots from reading web pages it only - serves to say "no bots, please, but not
guarantee that bots won't enter a web site. - Thwarting password guessing
- Prevent a computer from being able to iterate
through the entire space of passwords. - Blocking denial-of-service attacks
- Prevent congestion based DoS attacks from denying
any users access to web servers - targeted by those attacks.
34Handwritten CAPTCHA Applications
- Preventing ballot stuffing
- Can the result of any online poll be trusted? Not
unless the poll requires that only - humans can vote.
- Protecting databases
- I.e. eBay protecting the data from auction
portals that search across auction sites to - provide listings and price information for their
users, but prohibiting copying that - data
- Email addresses personalization
- You will only be able to read the address and
send the email if you are a human.
35CAPTCHA Tests
- PIX
- Uses a large database of labeled images. All of
these images are pictures of concrete objects (a
horse, a table, a house, a flower, etc). In our
example an egg. The program picks an object at
random, finds 4 random images of that object from
its database, distorts them at random, presents
them to the user and then asks the question "what
are these pictures of?"
ECO Sounds can be thought of as a sound version
of Gimpy. The program picks a word or a sequence
of numbers at random, renders the word or the
numbers into a sound clip and distorts the clip.
It then presents the distorted sound clip to its
user and asks the user to type in the contents of
the sound clip.
36CAPTCHA Tests
- ARTiFACIAL
- Per each user request, it automatically
synthesizes an image with a distorted face
embedded in a cluttered background. The user is
asked to first find the face and then click on 6
points (4 eye corners and 2 mouth corners) on the
face.
37Power of Context
Context
Ranked Lexicon
38Lexicon Driven Model
Distance between lexicon entry word first
character w and the image between - segments 1
and 4 is 5.0 - segments 1 and 3 is 7.2 - segments
1 and 2 is 7.6
Find the best way of accounting for characters
w, o, r, d buy consuming all segments 1
to 8 in the process
39Lexicon Free Model
- Image from 1 to 3 is a in with 0.5 confidence
- Image from segment 1 to 4 is a w with 0.7
confidence - Image from segment 1 to 5 is a w with 0.6
confidence and an m with 0.3 confidence
w.6, m.3
w.7
d.8
o.5
u.5, v.2
i.8, l.8
i.7
r.4
u.3
m.2
m.1
Find the best path in graph from segment 1 to 8 w
o r d
40Grapheme Model
Loops
End
Junction
End
Loop
Turns
41Matching - Structural Features
Statistical analysis of the feature attributes
42Hidden Markov Models
- The occurrence of the structural features can be
modeled as a HMM - The HMM can be converted to a SFSA by assigning
observation and probability to the transitions
instead of to the states
43Law of closure
If something is missing in an otherwise complete
figure, we will tend to add it (i.e. a triangle,
for example, with a small part of its edge
missing, will still be seen as a triangle). We
will close the gap. A set of dots outlining the
shape of a B is likely to be perceived as a B,
not as a set of dots. We tend to complete the
figure, make it the way it should be, finish
it.
44Law of similarity
OXXXXXXXXXX XOXXXXXXXXX XXOXXXXXXXX
XXXOXXXXXXX XXXXOXXXXXX XXXXXOXXXXX
XXXXXXOXXXX XXXXXXXOXXX XXXXXXXXOXX
XXXXXXXXXOX XXXXXXXXXXO
We tend to group similar items together, to see
them as forming a larger form. It is just natural
for us to see the os as a line within a field of
xs.
45Law of proximity
Things that are close together are seen as
belonging together. You are much more likely to
see three lines of close-together s than 14
vertical collections of 3 s each.
46Law of symmetry
Despite the pressure of proximity to group the
brackets nearest each other together, symmetry
overwhelms our perception and makes us see them
as pairs of symmetrical brackets.
47Law of continuity
We can see a line, for example, as continuing
through another line, rather than stopping and
starting, as in this example, which we see as
composed of two lines, not as a combination of
two angle.
- Ambiguous segmentation
- Segmentation based on good continuity, follows
the path of minimal curvature change - Perceptually implausible segmentation
48Law of familiarity
The elements are grouped together if we are used
to seeing them together, i.e. we are used to
seeing rectangles and squares rather than the
shape in (c).
- Ambiguous segmentation
- Perceptual segmentation
- Segmentation based on good continuity proves to
be erroneous
49Figure and ground
We seem to have an innate tendency to perceive
one aspect of an event as the figure or
fore-ground and the other as the ground or
back-ground. There is only one image here, and
yet, by changing nothing but our attitude, we can
see two different things. It doesnt even seem
to be possible to see them both at the same time!
50Memory
- If you see an irregular figure, it is likely that
your memory will straighten it out for you a
bit. - Or, if you experience something that doesnt
quite make sense to you, you will tend to
remember it as having meaning that may not have
been there. Good example are dreams Watch
yourself the next time you tell someone a dream
and see if you dont notice yourself modifying
the dream a little to force it to make sense! - The world is an outside iconic memory with
internal metric relations.
After flip-flop (vertical mirror / horizontal
mirror)