Title: Image Understanding
1Image Understanding Web Security
- Henry Baird
- Joint work with
- Richard Fateman, Allison Coates, Kris Popat,
- Monica Chew, Tom Breuel, Mark Luk
2A fast-emerging research topic
- Human Interactive Proofs (definition later)
- first instance in 1999
- research took hold in CS security theory field
first - intersects image understanding, cog sci, etc etc
- fast attracting researchers, engineers, users
- This talk
- A brief history of HIPs
- Professional activities, so far -- incl. the 1st
Intl Workshop - Existing systems -- w/ my critiques
- Next steps for the field
- In detail PARCs PessimalPrint BaffleText
H. Baird K. Popat, Web Security Document
Image Analysis, in J. Hu A. Antonacopoulos
(Eds.), Web Document Analysis, World Scientific,
2003 (in press).
3Early rumblings
- 90s spammers trolling for email addresses
- in defense, people disguise them, e.g.
- baird at parc dot com
- 1997 abuse of Add-URL feature at AltaVista
- some write programs to add their URL many times
- skewed the popularity rankings
- Andrei Broder et al (then at DEC SRC)
- a user action which is legitimate when performed
once - becomes abusive when repeated many times
- no effective legal recourse
- how to block or slow down these programs
4The first known instanceAltavistas AddURL
filter
An image of text, not ASCII
- 1999 ransom note filter
- randomly pick letters, fonts, rotations render
as an image - every user required to read and type it in
correctly - reduced spam add_URL by over 95
- Weaknesses isolated chars, filterable noise,
affine deformations
M. D. Lillibridge, M. Abadi, K. Bharat, A. Z.
Broder, Method for Selectively Restricting
Access to Computer Systems, U.S. Patent No.
6,195,698, Issued February 27, 2001.
5Yahoo!s Chat Room Problem
- September 2000
- Udi Manber asked Prof. Manuel Blums group at
CMS-SCS - programs impersonate people in chat rooms,
- then hand out ads ugh!
- how can all machines be denied access to a Web
site - without inconveniencing any human users?
- I.e., how to distinguish between machines and
people on-line - some variation on Turing tests !
6Alan Turing (1912-1954)
- 1936 a universal model of computation
- 1940s helped break Enigma (U-boat) cipher
- 1949 first serious uses of a working computer
- including plans to read printed
text - (he expected it would be easy)
- 1950 proposed strong test for machine
intelligence
7Turing Tests
- How to judge that a machine can think
- play an imitation game conducted via teletypes
- a human judge two invisible interlocutors
- a human
- a machine pretending to be human
- after asking any questions (challenges) he/she
- wishes, the judge decides which is human
- failure to decide correctly would be convincing
- evidence of machine intelligence (Turing
asserted) - Modern GUIs invite richer challenges than
teletypes.
A. Turing, Computing Machinery Intelligence,
Mind, Vol. 59(236), 1950.
8CAPTCHAs Completely Automated Public Turing
Tests to Tell Computers Humans Apart
(M. Blum, L. A. von Ahn, J. Langford, et al, CMU
SCS)
- challenges can be generated graded
automatically - (i.e. the judge is a machine)
- accepts virtually all humans, quickly easily
- rejects virtually all machines
- resists automatic attack for many years
- (even assuming that its algorithms are
known?) - NOTE the machine administers, but cannot pass
the test!
L. von Ahn, M. Blum, N.J. Hopper, J. Langford,
CAPTCHA Using Hard AI Problems For Security,
Proc., EuroCrypt 2003, Warsaw, Poland, May 4-8,
2003 to appear.
9CMUs Gimpy CAPTCHA
- Randomly pick
- English words, deformations, occlusions,
backgrounds, etc - Challenge user to type in any three of the words
- Designed by CMU team tried out by Yahoo!
- Problem users hated it --- it was withdrawn
L. Von Ahn, M. Blum, N. J. Hopper, J. Langford,
The CAPTCHA Web Page, http//www.captcha.net.
10Yahoo!s present CAPTCHA EZ-Gimpy
- Randomly pick
- one English word, deformations,
degradations, occlusions, - colored backgrounds
- Better tolerated by users
- Now used on a large scale to protect various
services - Well tolerated by users
- Weaknesses a single typeface, English lexicon
11PayPals CAPTCHA
- Nothing published
- Seems to use one typeface
- Picks, at random
- letters, overlain pattern
- Weaknesses single typeface, simple grid,
- no image degradations, spaced apart
12Cropping up everywhere
- In use today, defending against
- skewing search-engine rankings (Altavista, 1999)
- infesting chat rooms, etc (Yahoo!, 2000)
- gaming financial accounts (PayPal, 2001)
- robot spamming (SpamArrest, MailBlock, 2002)
- also Overture, Chinese website, CD-rebate,
TicketMaster, - have you seen others?
- Coming up over the horizon they can discourage
- password guessing
- denial-of-service attacks
- ballot stuffing
- many others
- Similar problems w/ scrapers also, likely on
Intranets.
D. P. Baron, eBay and Database Protection, Case
No. P-33, Case Writing Office, Stanford Graduate
School of Business, Stanford Univ., 2001.
13The Known Limits ofImage Understanding
Technology
- There remains a large gap in ability
- between human and machine vision systems,
- even in reading printed text
- The performance of OCR machines has been
systematically studied - 7 year olds can consistently do better!
- Researchers have developed
- stochastic models of document image
degradation - so we can generate challenging
- word images pseudorandomly
S. Rice, G. Nagy, T. Nartker, OCR An Illustrated
Guide to the Frontier, Kluwer Academic
Publishers 1999.
H. Baird, Document Image Defect Models, in H.
Baird, H. Bunke, K. Yamamoto (Eds.), Structured
Document Image Analysis, Springer-Verlag New
York, 1992.
14Can You Read These Degraded Images?
Of course you can . but OCR machines cannot!
15Experiments by PARC UCB-CS
- Pick words at random
- 70 words commonly used on the Web
- w/out ascenders or descenders (cf. Spitz)
- Vary physics-based image degradation parameters
- blur, threshold, x-scale -- within certain
ranges - Pick fonts at random from a large set
- Times Roman (TR), Times Italic (TI),
- Palatino Roman (PR), Palatino Italic (PI),
- Courier Roman (CR), Courier Oblique (CO),
etc - Test legibility on
- ten human volunteers (UC Berkeley CS Dept grad
students) - three OCR machines
- Expervision TR (E), ABBYY FineReader (A),
IRIS Reader (I)
16Results OCR Accuracy, by machine
Each machine has its peculiar blind spots
17OCR Accuracy varying blur threshold
They share some blind spots
18PessimalPrint exploiting image degradations
- Three OCR machines fail when
OCR outputs - blur 0.0
- threshold ? 0.02 - 0.08
- threshold 0.02
- any value of blur
but people find these easy to read
A. Coates, H. Baird, R. Fateman, PessimalPrint
A Reverse Turing Test, Proc. 6th IAPR Intl
Conf. On Doc. Anal. Recogn. (ICDAR01),
Seattle, WA, Sep 10-13, 2001.
19Jan 2002 High Time for a Workshop!
- Manuel Blum proposes it, rounds up some key
speakers - Henry Baird offers PARC as venue Kris Popat
helps run it - Goals
- Invite known principals theory, systems,
engineers, users - Describe the state of the art
- Plan next steps for the field
- Organization
- 30 attendees
- abstracts only, 1-5 pages, no refereeing, no
archival publication - 100 participation everyone gives a (short)
talk - mixing it up panel working group
discussions - 2-1/2 days, lots of breaks for informal
socializing - plenary talk by John McCarthy Father of AI
20NSF 1st Intl HIP WorkshopJan 9-11, 2002, Palo
Alto, CA
21HIP2002 Participants
- CMU - SCS, Aladdin Center
- Manuel Blum, Lenore Blum, Luis von Ahn, John
Langford, Guy Blelloch, Nick Hopper, Ke Yang,
Brighten Godfrey, Bartosz Przydatek, Rachel Rue - PARC - SPIA/Security/Theory
- Henry Baird, Kris Popat, Tom Breuel, Prateek
Sarkar, Tom Berson, Dirk Balfanz, David Goldberg - UCB - CS SIMS
- Richard Fateman, Allison Coates, Jitendra Malik,
Doug Tygar, Alma Whitten, Rachna Dhamija, Monica
Chew, Adrian Perrig, Dawn Song - RPI
- George Nagy
- Stanford
- John McCarthy
- NSF
- Robert Sloan
- Altavista
- Andrei Broder
- Yahoo!
- Udi Manber
- Bell Labs
- Dan Lopresti
- IBM T.J. Watson
- Charles Bennett
- InterTrust Star Labs
- Stuart Haber
- City Univ. of Hong Hong
- Nancy Chan
- Weizmann Institute
- Moni Naor
- RSA Security Laboratories
- Ari Juels
- Document Recognition Techs, Inc
- Larry Spitz
22Variations Generalizations
- CAPTCHA
- Completely Automatic Public Turing test to tell
Computers and Humans Apart - HUMANOID
- Text-based dialogue which an individual can use
to authenticate that he/she is himself/herself
(naked in a glass bubble) - PHONOID
- Individual authentication using spoken language
- Human Interactive Proof (HIP)
- An automatically administered challenge/response
protocol - allowing a person to authenticate him/herself
as belonging to a certain group over a network
without the burden of passwords, - biometrics, mechanical aids, or special
training.
23Highlights
- Theory
- some text-based CAPTCHAs are provably breakable
- Ability Gaps
- vision gestalt, segmentation, noise immunity,
style consistency - speech noise of many kinds, clutter (cocktail
party effect) - intelligence puzzles, analogical reasoning,
weak logic - gestures, reflexes, common knowledge,
- Applications
- subtle system-level vulnerabilties
- aggressive arms race with shadowy enemies
24Funding Partnerships
- NSF
- Robert Sloan, Dir, Theory of Computing Pgm
- strongly supportive of this newborn field
- encouraged grant proposals
- Yahoo!
- willing to run field trials
- user acceptance laboratory
- able to detect intrusion
25Disciplines
- Participating
- Cryptography
- Security
- Document Image Analysis
- Computer Vision
- Artificial Intelligence
- Needed
- Cognitive Science
- Psychophysics (esp. of Reading)
- Biometrics
- eCommerce, Business
- .?
26Weaknesses of Existing Reading-Based CAPTCHAs
- English lexicon is too predictable
- dictionaries are too small
- only 1.2 bits of entropy per character (cf.
Shannon) - Physics-based image degradations vulnerable
- to well-studied image restoration attacks,
e.g. - ?
- Complex images irritate people
- even when they can read them
- need user-tolerance experiments
27Strengths ofHuman Reading
- Literature on the psychophysics of reading is
relevant - familiarity helps, e.g. English words
- optimal word-image size (subtended angle)
- is known (0.3-2 degrees)
- optimal contrast conditions known
- other factors measured for the best performance
- to achieve and sustain critical reading speed
- BUT gives no answer to
- wheres the optimal comfort zone?
G. E. Legge, D. G. Pelli, G. S. Rubin, M. M.
Schleske, Psychophysics of Reading I. normal
vision, Vision Research 25(2), 1985.
AJ. Grainger J. Segui, Neighborhood Frequency
Effects in Visual Word Recognition, Perception
Psychophysics 47, 1990..
28Designing a Stronger CAPTCHA BaffleText
principles
- Nonsense words.
- generate pronounceable not spellable
words - using a variable-length character n-gram
Markov model - they look familiar, but arent in any lexicon,
e.g. - ablithan wouquire quasis
- Gestalt perception.
- force inference of a whole word-image
- from fragmentary or occluded characters,
e.g. - using a single familiar typeface also helps people
M. Chew H. S. Baird, BaffleText A Human
Interactive Proof, Proc., SPIE/IST Conf. on
Document Recognition Retrieval X, Santa Clara,
CA, January 23-24, 2003.
29Mask Degradations
- Parameters of pseudorandom mask generator
- shape type square, circle, ellipse, mixed
- density black-area / whole-area
- range of radii of shapes
30BaffleText Experiment at PARC
- Goal map the margins of accurate comfortable
- human reading on this family of
images - Metrics
- objectiive difficulty accuracy
- subjective difficulty rating
- response time
- exit survey how tolerable overall
- Participation
- 41 individual sessions
- gt1200 challenge/response trials
- 18 exit surveys
31BaffleText challenge webpage
32BaffleText user rating
33User Acceptance
- Subjects who say theyre willing to solve a
BaffleText - 17 every time they send email
- 39 if it cut spam by 10x
- 89 every time they register for an
e-commerce site - 94 if it led to more trustworthy
recommendations - 100 every time they register for an email
account -
Out of 18 responses to the exit survey.
34Subjective difficulty tracks objective
difficulty
35How to engineer BaffleText
- When we generate a challenge,
- need to be able to estimate its difficulty
- throw away if too easy or too hard
- Apply an idea from the psychophysics of reading
- image complexity metric how hard to read
- simple to compute perimeter? / black-area
36Image complexity predicts objective difficulty
37Image complexity predicts subjective difficulty
38Engineering guidelines
- For high performance, image complexity
- should fall in the range 50-100 e.g.
- Within this regime, BaffleText performs well
- 100 human subjects willing to try to read it
- 89 accuracy by humans
- 0 accuracy by commercial OCR
- 3.3 difficulty rating, out of 10 (on average)
- 8.7 seconds / trial on average
39The latest serious (known or published) attack
G. Mori J. Malik, Recognizing Objects in
Adversarial Clutter, submitted to CVPR03,
Madison, WI, June 16-22, 2003.
- Greg Mori Jitendra Malik (UCB-CS)
- Generalized Shape Context CV method
- requires known lexicon else, fails completely
- expects known font (or fonts) else, does worse
- Results of Mori-Malik attacks (Dec 2002) with
- foreknowledge of both lexicon and font
40BaffleText the strongest known CAPTCHA?
- Resists many known attacks
- physics-based image restoration
- recognizing into a lexicon
- typeface targeting
- segmenting then recognizing
- Exploits hard-to-automate human cognition powers
- Gestalt perception
- semi-linguistic familiarity
- style consistency
41PARCs Leadership Role
- Published 1st refereed paper on CAPTCHAs
- A. L. Coates, H. S. Baird, R. Fateman,
Pessimal Print a Reverse Turing Test, Proc.,
6th IAPR Intl Conf. On Document Analysis
Recognition, Seattle, WA, Sept. 10-13, 2001. - Hosted first professional event 1st NSF Intl
Workshop on HIPs, Jan. 9-11, 2002, Palo Alto, CA. - Plays both offense defense
- attacks CAPTCHAs builds high-performance OCR
systems - builds strong CAPTCHAs
- Validates using human-factors research
- human-subject trials measuring accuracy
tolerance - PARCs interdisciplinary tradition social
computer sciences
42The Arms Race
- Will serious technical attacks be launched?
- spam kings make millions
- two spam-blocking e-commerce firms use CAPTCHAs
- How long can a CAPTCHA stand against attack?
- especially if its algorithms are published or
guessed - Keep a pipeline of defenses in reserve
- a long partnership between research users
43Lots of Open Research Questions
- What are the most intractable obstacles to
machine vision? - segmentation, occlusion, degradations, ?
- Under what conditions is human reading most
robust? - linguistic semantic context, Gestalt, style
consistency? - Where are ability gaps located?
- quantitatively, not just qualitatively
- How to generate challenges strictly within
ability gaps? - fully automatically
- an indefinitely long sequence of distinct
challenges
44HIP Research Community
- HIP Website at Aladdin Center, CMU SCS
- www.captcha.net
- Volunteers for a CAPTCHA usability test?
- PARC CAPTCHA experimental software tools
- FreeType-based, C, C for Linux etc (T. Breuel)
- Doc. image degradation generator (H. Baird)
- New Gestalt-inspired degradations (M. Chew, UCB)
- PHP4 code for CAPTCHA test web site (M. Chew, M.
Luk) - would a free GPL license be acceptable?
- A 2nd HIP Workshop soon?
45Alan Turing might have enjoyed the irony
- A technical problem machine reading
- which he thought would be easy,
- has resisted attack for 50 years, and
- now allows the first widespread
- practical use of variants of
- his test for artificial intelligence.
46Contact
- Henry S. Baird
- baird_at_parc.com
- www.parc.com/baird
47(No Transcript)
48(No Transcript)
49(No Transcript)
50OCR Accuracy varying blur
51OCR Accuracy varying blur
52OCR Accuracy varying blur
53Yahoo!s current CAPTCHA
- Randomly pick
- one English word, typeface, distortions,
occlusions, background - More tolerable to users
- Used on a large scale to protect various services