Microsoft - PowerPoint PPT Presentation

About This Presentation

Title:

Microsoft

Description:

Microsoft s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team jpittman_at_microsoft.com – PowerPoint PPT presentation

Number of Views:12

Avg rating:3.0/5.0

Slides: 16

Provided by: JamesArth2

Learn more at: https://homes.cs.washington.edu

Category:

more less

Transcript and Presenter's Notes

Title: Microsoft

1
Microsofts Cursive Recognizer

Jay Pittman
and the entire
Microsoft Handwriting Recognition
Research and Development Team
jpittman_at_microsoft.com

2
The Handwriting Recognition Team

An experiment
A research group, but not housed in MSR
Positioned inside a product group
Our direction and inspiration come directly from
the users
This isnt for everyone, but we like it
Just over a dozen researchers
Half with PhDs
Mostly CS, but 1 Chemistry, 1 Industrial
Engineering, 1 Math, 1 Speech
Mostly neural network researchers
Small to moderate experience in other recognition
technologies

3
Neural Network Review
1.0
-2.3
1.4
1.0
0.1
-0.1
0.6
0.0
0.8
0.0
0.0
-0.8
0.7

Directed acyclic graph
Nodes and arcs, each containing a simple value
Nodes contain activations, arcs contain weights
At run-time, we do a forward pass which
computes activation from inputs to hiddens, and
then to outputs
From the outside, the application only sees the
input nodes and output nodes
Node values (in and out) range from 0.0 to 1.0

4
TDNN Time Delayed Neural Network
item 6
item 5
item 4
item 2
item 3
item 1
item 1

This is still a normal back-propagation network
All the points in the previous slide still apply
The difference is in the connections
Connections are limited
Weights are shared
The input is segmented, and the same features are
computed for each segment
Small detail edge effects
For the first two and last two columns, the
hidden nodes and input nodes that reach outside
the range of our input receive zero activations

5
Training

We use back-propagation training
We collect millions of words of ink data from
thousands of writers
Young and old, male and female, left handed and
right handed
Natural text, newspaper text, URLs, email
addresses, street addresses
We collect in nearly two dozen languages around
the world
Training on such large databases takes weeks
We constantly worry about how well our data
reflect our customers
Their writing styles
Their text content
We can be no better than the quality of our
training sets
And that goes for our test sets too

6
Languages

We ship now in
English (US), English (UK), French, German,
Spanish, Italian
We have done some initial work in
Dutch, Portuguese, Swedish, Danish, Norwegian,
Finnish
We cannot predict when we might ship these
Are starting initial research in more
Using a completely different approach, we also
ship now in
Japanese, Chinese (Simplified), Chinese
(Traditional), Korean

7
Recognizer Architecture
Ink Segments
Top 10 List
TDNN
dog
68
clog
57
dug
51
doom
42
Output Matrix
divvy
37
a
88
8
68
22
63
57
4
ooze
35
Lexicon
b
4
44
61
57
57
23

Beam Search
4

cloy
34

a
g 57
d
a 88
92
51
9
47
20
81
o

14
doxy
29
g
e
o 65
31
13
8
2
14
3
b
3
b 23
l
t 12

client
22
b
t
l 76
c
b 6
g
c 86
71
12
52
8
79
90
a
90
dozy
13
t
a
h
a 73
13
17
17
5
7
43
d
7
t 5
d 92
o

g
e

o 77
n
18
28
57
6
7
57
g 68

5
t
o
53
79
16
91
44
15
12
t 8
8
Language Model

We get better recognition if we bias our
interpretation of the output matrix with a
language model
Better recognition means we can handle sloppier
cursive
You can write faster, in a more relaxed manner
The lexicon (system dictionary) is the main part
But there is also a user dictionary
And there are regular expressions for things like
dates and currency amounts
We want a generator
We ask it what characters could be next after
this prefix?
It answers with a set of characters
We still output the top letter recognitions
In case you are writing a word out-of-dictionary
You will have to write more neatly

9
Clumsy lexicon Issue

The lexicon includes all the words in the
spellchecker
The spellchecker includes obscenities
Otherwise they would get marked as misspelled
But people get upset if these words are offered
as corrections for other misspellings
So the spellchecker marks them as restricted
We live in an apparently stochastic world
We will throw up 6 theories about what you were
trying to write
If your ink is near an obscene word, we might
include that
Dilemma
We want to recognizer your obscene word when you
write it
Otherwise we are censoring, which is NOT our
place
We DONT want to offer these outputs when you
dont write them
Solution (weak)
We took these words out of the lexicon
You can still write them, because you can write
out-of-dictionary
But you have to write very neat cursive, or nice
handprint
Only works at the word level
Cant remove words with dual meanings
Cant handle phrases that are obscene when the
individual words are not

10
Regular Expressions

Many built-in, callable by ISVs, web pages
Number, date, time, currency amount, phone
number, address, URL, email address, file name,
phrase list
Many components of the above
Month, day of month, day of week, year, area
code, hour, minute
Isolated characters
Digit, lowercase letter, uppercase letter
None
Yields an out-of-dictionary-only system (turns
off the language model)
Great for form-filling apps and web pages
Accuracy is greatly improved
This is in addition to the ability to load the
user dictionary
One could load 500 color names for a color field
in a form-based app
Or 8000 drug names in a prescription app
The regular expression compiler is available at
run time
Software vendors can add their own regular
expressions
One could imagine the DMV adding automobile VINs
Example expressions (from the built-in date
format)
digit "0123456789"
nummonth "0" "123456789" "1" "012"

11
Default Factoid

Used when no factoid is set
Intended for natural text, such as the body of an
email
Includes system dictionary, user dictionary,
hyphenation rule, number grammar, web address
grammar
All wrapped by optional leading punctuation and
trailing punctuation
Hyphenation rule allows sequence of dictionary
words with hyphens between
Alternatively, can be a single character (any
character supported by the system)

SysDict
UserDict
Leading Punc
Hyphenation
Trailing Punc
Start
Final
Number
Web
Single Char
12
Error Correction SetTextContext()
Goal Better context usage for error correction
scenarios
1.

User writes Dictionary
Recognizer misrecognizes it as Dictum
User selects um and rewrites ionary
TIP notes partial word selection, puts recognizer
into correction mode with left and right context
Beam search artificially recognizes left context
Beam search runs ink as normal
Beam search artificially recognizes right context
This produces ionary in top 10 list TIP must
insert this to the right of Dict

Dictum
2.
Dictum
3.
4.
Right Context
Left Context
Dict

a 0
b 0
e 0
a 57
c 0
t 100
c 100
i 85
i 100
d 100
o 72
n 5
6.
a 0
5.
7.
13
Calligrapher

The Russian recognition company Paragraph sold
itself to SGI (Silicon Graphics, Incorporated),
who then sold it to Vadem, who sold it to
Microsoft.
In the purchase we obtained
Calligrapher
Cursive recognizer that shipped on the first
Apple Newton (but not the second)
Transcriber
Handwriting app for handheld computers (shipped
on PocketPC)
Calligrapher has a very similar architecture
Instead of a TDNN it employs a hand-built HMM
The lexicon and beam search similar in nature
(many small differences)
We combined our system with Calligrapher
We use a voting system (neural nets) to combine
each recognizers top 10 list
They are very different, and make different
mistakes
We get the best of both worlds
If either recognizer outputs a single-character
word we forget these lists and run the isolated
character recognizer

14
Personalization

Ink shape personalization
Simple concept just do same training on this
customers ink
Start with components already trained on massive
database of ink samples
Train further on specific users ink samples
Explicit training
User must go to a wizard and copy a short script
Do have labels from customer
Limited in quantity, because of tediousness
Implicit training
Data is collected in the background during normal
use
Doesnt have labels from customer
We must assume correctness of our recognition
result using our confidence measure
We get more data
Much of the work is in the infrastructure
GUI, database, management of different users
trained networks, etc.
Lexicon personalization Harvesting
Simple concept just add the users new words to
the lexicon
Examples (at Microsoft) RTM, dev, SDET,
dogfooding, KKOMO, featurization
Happens when correcting words in the TIP

15
Best Job at Microsoft

Bill Gates makes more money, but I have more fun
No one hassles me for money or slots
I remember senior people at several research
institutions saying waste of time and money
Insert here
I still have a sense of wonder that it works at
all
Its as if your dog starting talking to you
People tell me it recognizes their writing when
no one else can
But I also know there are others who get poor
recognition
I wonder if Gary Trudeau has tried it
People will adapt to a recognizer, if they use it
enough
Just as they adapt to the people they live with
and work with
My physician in Issaquah gets perfect recognition
on a Newton
Biggest complaint we dont yet ship their
language
Other complaints
Weak on URLs, email addresses, slashes
Some handprint gets poor recognition
Adaptation to my handwriting style (coming)