Microsoft - PowerPoint PPT Presentation

About This Presentation
Title:

Microsoft

Description:

Microsoft s Cursive Recognizer Jay Pittman and the entire Microsoft Handwriting Recognition Research and Development Team jpittman_at_microsoft.com – PowerPoint PPT presentation

Number of Views:12
Avg rating:3.0/5.0
Slides: 16
Provided by: JamesArth2
Category:

less

Transcript and Presenter's Notes

Title: Microsoft


1
Microsofts Cursive Recognizer
  • Jay Pittman
  • and the entire
  • Microsoft Handwriting Recognition
  • Research and Development Team
  • jpittman_at_microsoft.com

2
The Handwriting Recognition Team
  • An experiment
  • A research group, but not housed in MSR
  • Positioned inside a product group
  • Our direction and inspiration come directly from
    the users
  • This isnt for everyone, but we like it
  • Just over a dozen researchers
  • Half with PhDs
  • Mostly CS, but 1 Chemistry, 1 Industrial
    Engineering, 1 Math, 1 Speech
  • Mostly neural network researchers
  • Small to moderate experience in other recognition
    technologies

3
Neural Network Review
1.0
-2.3
1.4
1.0
0.1
-0.1
0.6
0.0
0.8
0.0
0.0
-0.8
0.7
  • Directed acyclic graph
  • Nodes and arcs, each containing a simple value
  • Nodes contain activations, arcs contain weights
  • At run-time, we do a forward pass which
    computes activation from inputs to hiddens, and
    then to outputs
  • From the outside, the application only sees the
    input nodes and output nodes
  • Node values (in and out) range from 0.0 to 1.0

4
TDNN Time Delayed Neural Network
item 6
item 5
item 4
item 2
item 3
item 1
item 1
  • This is still a normal back-propagation network
  • All the points in the previous slide still apply
  • The difference is in the connections
  • Connections are limited
  • Weights are shared
  • The input is segmented, and the same features are
    computed for each segment
  • Small detail edge effects
  • For the first two and last two columns, the
    hidden nodes and input nodes that reach outside
    the range of our input receive zero activations

5
Training
  • We use back-propagation training
  • We collect millions of words of ink data from
    thousands of writers
  • Young and old, male and female, left handed and
    right handed
  • Natural text, newspaper text, URLs, email
    addresses, street addresses
  • We collect in nearly two dozen languages around
    the world
  • Training on such large databases takes weeks
  • We constantly worry about how well our data
    reflect our customers
  • Their writing styles
  • Their text content
  • We can be no better than the quality of our
    training sets
  • And that goes for our test sets too

6
Languages
  • We ship now in
  • English (US), English (UK), French, German,
    Spanish, Italian
  • We have done some initial work in
  • Dutch, Portuguese, Swedish, Danish, Norwegian,
    Finnish
  • We cannot predict when we might ship these
  • Are starting initial research in more
  • Using a completely different approach, we also
    ship now in
  • Japanese, Chinese (Simplified), Chinese
    (Traditional), Korean

7
Recognizer Architecture
Ink Segments
Top 10 List
TDNN
dog
68
clog
57
dug
51
doom
42
Output Matrix
divvy
37
a
88
8
68
22
63
57
4
ooze
35
Lexicon
b
4
44
61
57
57
23

Beam Search
4

cloy
34

a
g 57
d
a 88
92
51
9
47
20
81
o

14
doxy
29
g
e
o 65
31
13
8
2
14
3
b
3
b 23
l
t 12

client
22
b
t
l 76
c
b 6
g
c 86
71
12
52
8
79
90
a
90
dozy
13
t
a
h
a 73
13
17
17
5
7
43
d
7
t 5
d 92
o

g
e

o 77
n
18
28
57
6
7
57
g 68

5
t
o
53
79
16
91
44
15
12
t 8
8
Language Model
  • We get better recognition if we bias our
    interpretation of the output matrix with a
    language model
  • Better recognition means we can handle sloppier
    cursive
  • You can write faster, in a more relaxed manner
  • The lexicon (system dictionary) is the main part
  • But there is also a user dictionary
  • And there are regular expressions for things like
    dates and currency amounts
  • We want a generator
  • We ask it what characters could be next after
    this prefix?
  • It answers with a set of characters
  • We still output the top letter recognitions
  • In case you are writing a word out-of-dictionary
  • You will have to write more neatly

9
Clumsy lexicon Issue
  • The lexicon includes all the words in the
    spellchecker
  • The spellchecker includes obscenities
  • Otherwise they would get marked as misspelled
  • But people get upset if these words are offered
    as corrections for other misspellings
  • So the spellchecker marks them as restricted
  • We live in an apparently stochastic world
  • We will throw up 6 theories about what you were
    trying to write
  • If your ink is near an obscene word, we might
    include that
  • Dilemma
  • We want to recognizer your obscene word when you
    write it
  • Otherwise we are censoring, which is NOT our
    place
  • We DONT want to offer these outputs when you
    dont write them
  • Solution (weak)
  • We took these words out of the lexicon
  • You can still write them, because you can write
    out-of-dictionary
  • But you have to write very neat cursive, or nice
    handprint
  • Only works at the word level
  • Cant remove words with dual meanings
  • Cant handle phrases that are obscene when the
    individual words are not

10
Regular Expressions
  • Many built-in, callable by ISVs, web pages
  • Number, date, time, currency amount, phone
    number, address, URL, email address, file name,
    phrase list
  • Many components of the above
  • Month, day of month, day of week, year, area
    code, hour, minute
  • Isolated characters
  • Digit, lowercase letter, uppercase letter
  • None
  • Yields an out-of-dictionary-only system (turns
    off the language model)
  • Great for form-filling apps and web pages
  • Accuracy is greatly improved
  • This is in addition to the ability to load the
    user dictionary
  • One could load 500 color names for a color field
    in a form-based app
  • Or 8000 drug names in a prescription app
  • The regular expression compiler is available at
    run time
  • Software vendors can add their own regular
    expressions
  • One could imagine the DMV adding automobile VINs
  • Example expressions (from the built-in date
    format)
  • digit "0123456789"
  • nummonth "0" "123456789" "1" "012"

11
Default Factoid
  • Used when no factoid is set
  • Intended for natural text, such as the body of an
    email
  • Includes system dictionary, user dictionary,
    hyphenation rule, number grammar, web address
    grammar
  • All wrapped by optional leading punctuation and
    trailing punctuation
  • Hyphenation rule allows sequence of dictionary
    words with hyphens between
  • Alternatively, can be a single character (any
    character supported by the system)

SysDict
UserDict
Leading Punc
Hyphenation
Trailing Punc
Start
Final
Number
Web
Single Char
12
Error Correction SetTextContext()
Goal Better context usage for error correction
scenarios
1.
  1. User writes Dictionary
  2. Recognizer misrecognizes it as Dictum
  3. User selects um and rewrites ionary
  4. TIP notes partial word selection, puts recognizer
    into correction mode with left and right context
  5. Beam search artificially recognizes left context
  6. Beam search runs ink as normal
  7. Beam search artificially recognizes right context
  8. This produces ionary in top 10 list TIP must
    insert this to the right of Dict

Dictum
2.
Dictum
3.
4.
Right Context
Left Context
Dict

a 0
b 0
e 0
a 57
c 0
t 100
c 100
i 85
i 100
d 100
o 72
n 5
6.
a 0
5.
7.
13
Calligrapher
  • The Russian recognition company Paragraph sold
    itself to SGI (Silicon Graphics, Incorporated),
    who then sold it to Vadem, who sold it to
    Microsoft.
  • In the purchase we obtained
  • Calligrapher
  • Cursive recognizer that shipped on the first
    Apple Newton (but not the second)
  • Transcriber
  • Handwriting app for handheld computers (shipped
    on PocketPC)
  • Calligrapher has a very similar architecture
  • Instead of a TDNN it employs a hand-built HMM
  • The lexicon and beam search similar in nature
    (many small differences)
  • We combined our system with Calligrapher
  • We use a voting system (neural nets) to combine
    each recognizers top 10 list
  • They are very different, and make different
    mistakes
  • We get the best of both worlds
  • If either recognizer outputs a single-character
    word we forget these lists and run the isolated
    character recognizer

14
Personalization
  • Ink shape personalization
  • Simple concept just do same training on this
    customers ink
  • Start with components already trained on massive
    database of ink samples
  • Train further on specific users ink samples
  • Explicit training
  • User must go to a wizard and copy a short script
  • Do have labels from customer
  • Limited in quantity, because of tediousness
  • Implicit training
  • Data is collected in the background during normal
    use
  • Doesnt have labels from customer
  • We must assume correctness of our recognition
    result using our confidence measure
  • We get more data
  • Much of the work is in the infrastructure
  • GUI, database, management of different users
    trained networks, etc.
  • Lexicon personalization Harvesting
  • Simple concept just add the users new words to
    the lexicon
  • Examples (at Microsoft) RTM, dev, SDET,
    dogfooding, KKOMO, featurization
  • Happens when correcting words in the TIP

15
Best Job at Microsoft
  • Bill Gates makes more money, but I have more fun
  • No one hassles me for money or slots
  • I remember senior people at several research
    institutions saying waste of time and money
  • Insert here
  • I still have a sense of wonder that it works at
    all
  • Its as if your dog starting talking to you
  • People tell me it recognizes their writing when
    no one else can
  • But I also know there are others who get poor
    recognition
  • I wonder if Gary Trudeau has tried it
  • People will adapt to a recognizer, if they use it
    enough
  • Just as they adapt to the people they live with
    and work with
  • My physician in Issaquah gets perfect recognition
    on a Newton
  • Biggest complaint we dont yet ship their
    language
  • Other complaints
  • Weak on URLs, email addresses, slashes
  • Some handprint gets poor recognition
  • Adaptation to my handwriting style (coming)

Raspberry
Write a Comment
User Comments (0)
About PowerShow.com