Lightweight Natural Langrage Processing - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Lightweight Natural Langrage Processing

Description:

Lightweight Natural Langrage Processing Lin Ziheng Introduction Mobile devices is becoming the dominant computing platform for apps like web browsing, data/address ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 18
Provided by: linz158
Category:

less

Transcript and Presenter's Notes

Title: Lightweight Natural Langrage Processing


1
Lightweight Natural Langrage Processing
  • Lin Ziheng

2
Introduction
  • Mobile devices is becoming the dominant computing
    platform for apps like web browsing, data/address
    books
  • Use NLP and machine learning algo
  • NLP and ML algo are run on high power desktops,
    no concerns for power/memory consumption
  • NLP and ML pipeline for mobile devices needs to
    take into account these constraints

3
  • To develop an entire NLP processing pipeline (POS
    tagging, NE recognition, parsing and semantic
    role labeling..) for use in lightweight
    environment
  • To create a software suite that will be able to
    make accuracy/resource consumption tradeoffs

4
POS Tagging
  • Features
  • Capitalization
  • Numbers
  • Hyphens
  • Previous tags
  • uni-, bi-, tri-grams, 6804 features
  • Prefix, suffix
  • up to length of 4, 14023 features
  • Surrounding words
  • uni-, bi-, tri-grams, 153820 features
  • Machine learning
  • Maximum Entropy, SVM
  • We use naïve bayes, decision tree

5
(No Transcript)
6
Naive Bayes Classifier
Prior
Likelihood
7
The Model
  • Binary features
  • Fi true or Fi false
  • Only need to store p(Fitrue Tt)
  • p(Fifalse Tt) 1 - p(Fitrue Tt)

p(Tti) p(F1true Tti) p(F2true Tti) p(Fntrue Ttn)
t1
t2

t45
8
  • Tagging a word requires accessing the whole
    model, which is very large
  • May cause many cache misses

9
  • The usual way to tag words

W1
W2
W3
W4
.
10
  • Tag a set of words at the same time
  • Word window WWIN, e.g., WWIN 4
  • No more previous tags, tag feature set must be
    removed
  • Need some extra memory to store intermediate
    information for this set of words

W1
W2
W3
W4
11
  • Read a subset of feature likelihoods at one time
  • Feature window FWIN

W1
W2
W3
W4
12
  • Integrate with a tag dictionary
  • Collected from the training data
  • May encounter unknown words

W1
W2
W3
W4
13
If a certain condition is met, then stop and
return t else continue.
  • Early termination

W1
W2
W3
W4
14
  • Collected from the training data
  • Most of the words have only one tag
  • Out of 6278 words that have more than one tag,
    72 of the chances a word belongs to a
    particular tag
  • E.g., than
  • IN 99.84
  • RB 0.11
  • RBR 0.05

tags words
1 40537
2 5041
3 1074
4 117
5 33
6 11
7 2
6278
15
If argmax for W2 is t and t has probability gt
0.75, then stop and return t else continue.
W1
W2
W3
W4
16
  • Features
  • Capitalization
  • Numbers
  • Hyphens
  • Previous tags
  • uni-, bi-, tri-grams, 6804 features
  • Prefix, suffix
  • up to length of 4, 14023 features
  • Surrounding words
  • uni-, bi-, tri-grams, 153820 features

17
If argmax for the word is t and t has probability
gt 0.75, then stop and return t else continue.
basic affix
surrounding words
W1
W2
W3
W4
Write a Comment
User Comments (0)
About PowerShow.com