Title: Advanced word vector representations
1???????????
- Advanced word vector representations
2??
- Lecture 1 ?????
- ????????(Bengio 2003)
- Word2vec (Mikolov 2013)
- (CBOW Skip-gram) (HS NEG)
- ????????
- Softmax????(?PPT??)
- ????????
3????
cs224d Lecture 3 ??
- Lecture 1 ????? (Refresher)
- ????????????? (GD SGD)
- ??????? (evaluate)
- Softmax?? (softmax classification)
- ????? (problem set)
????
- Lecture 1 ?????
- ????????(Bengio 2003)
- Word2vec (Mikolov 2013)
- (CBOW Skip-gram) (HS NEG)
- ????????
- Softmax????(?PPT??)
- ????????
4????????
- ?????
- One-hot Representation
- ????? 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ...
- Distributional Representation
- ????? 0.792, -0.177, -0.107, 0.109, -0.542,
... - ?????
- SVD,LSA,LDA
- Based on lexical co-occurrence
- Learning representations
- Predict surrounding words of every word
- Eg. word2vec
5????? input output vector
- ?????w????????
- input vecter????????(center vector) v
- output vector??????(external vectors) v
- ??window size 1,?? I like learning
- like?v_like
- I?learning?v_I v_learning
6????? Simple word2vec
- Predict surrounding words in a window of length c
of every word.
7????? Word2Vec GloVe
- Word2Vec
- Efficient Estimation of Word Representations in
Vector Space. Mikolov et al. (2013) - GloVe
- Glove Global Vectors for Word Representation.
Pennington et al. (2014) - aggregated global word-word co-occurrence
statistics from a corpus
8????????????(??)
- ????????????????
- ???NLP????(?N-gram??)
- ????????????
- ????P(high winds tonite) gt P(large winds tonite)
- ????P(about fifteen minutes from) gt P(about
fifteen minuets from) - ????P(I saw a van) gtgt P(eyes awe of an)
- ????P(??????nixianzaiganshenme) gt
P(???????nixianzaiganshenme) - ??????????... ...
9????????????(??)
- ???NLP????(?N-gram??)
- ????????????
- ???????
- p(S)p(w1,w2,w3,w4,w5,,wn)
- p(w1)p(w2w1)p(w3w1,w2)...p(wnw1,w2,...,w
n-1) - p(w1)p(w2w1)p(w3w2)...p(wnwn-1)
- ??????
- ??????
- ?????MaxEnt??????????MEMM????????CRF(???????????)
- ????????
- Bengio2003?Mikolov2013?
10?????????????(??)
- 2000?,??,??????????
- Can Artificial Neural Networks Learn Language
Models? - ?????????????(? P(wtwt-1))???
11?????????????(??)
- 2003?,Bengio,?????????????
- A Neural Probabilistic Language Model
12?????????????(??)
- 2008,Ronan Collobert ? Jason Weston
- CW model
- Natural Language Processing (Almost) from
Scratch - 2008,Andriy Mnih ? Geoffrey Hinton
- A scalable hierarchical distributed language
model - 2010,Mikolov
- RNNLM
- Recurrent neural network based language model
- 2012,Huang
- Improving Word Representations via Global
Context and Multiple Word Prototypes
13Bengio2003
14Bengio2003
- ?????????
- ??????v(w), w??Dictionary
- ?????WUpq
- ?????
- ???(n-1)m,n??????,???5m??????,10103
- ???n_hidden,????,???102??
- ???N,????,????,104105??
- most computation here (?? word2vec ??????)
- ???to????tanh
- ????softmax
15Bengio2003
- ??????
- ???
- ????(n-1)m ,??x
- ???(???)
- ???h???
- tanh(Hx d)
- ???????????? U (Vh)
- ???
- V???,softmax?????? y
- ???
- W??
- ???????(????)??????,????
- Bengio ???????????????,????????????
- ?????????????,????????????
16ffnnlm??? ngram?????
- ?????????????????
- ????? cat ? mouse ?ffnnlm??????,??ngram?cat???????
- A cat is walking on the street 10000
- A mouse is walking on the street 1
- ??????????
- ????p(wcontext)
17ffnnlm???????
18??????
- ????????1986?Hinton??
- 2003?Bengio????ffnnlm
- Feed-forward Neural Net Language Model
- SENNA?HLBL?Word2vec ?SSWE?
- GloVe
19Word2Vec????(Mikolov 2013)
- ?????????????????????,???????,?????
- ???tanh??,?????,???????
- ????????????,???????
- ????????????
- ????softmax
- ??negative sampling
20Word2Vec?????
21(CBOW Skip-gram) 2
- ????(????)
- CBOW (Continuous Bag-Of-Words Model)
- Skip-gram (Continuous Skip-gram Model)
- ????(????)
- Hierarchical Softmax
- Negative Sampling
22CBOWHS (structure)
- ???
- ???2C????(m?)
- ??
- ?????????
- ???
- Huffman Tree (Why?)
- ????
- ????(D-1 ?)
- ????????
- ???(D?)
- ??????
23CBOWHS (huffman code)
- Huffman tree
- ??????,????huffman code,??00101
- ???????????,?????1,????0
- ?????????????????,??????????????theta
- ??????????
- ????
- ????(??? 1)
- ????(??? 0)
24CBOWHS (a train sample)
- Train sample (Context(??), ??)
- Train huffman path 1001
- ????
- Loss function ( ?0/1)
- i.e.
25CBOWHS (Gradient Ascent Method)
- GD (Gradient Descent Method)
- ???????????????J(?)??????????????
- SGD (Stochastic Gradient Descent Method)
- ????,??????
- ??????(SGD)??????????????????
26CBOWHS (Gradient Ascent Method)
- SGD (Stochastic Gradient Descent Method)
- ??????(SGD)??????????????????
- ?????????,????2c-1??,??????????????
- ????
- ?????????
- ???????L?L'????
- ????????????????????????????????????
27CBOWHS (Gradient Ascent Method)
- theta update (theta gradient)
- word_vector update (word_vector gradient)
28CBOWHS (hierarchical)
- No hierarchical structure
- ???????????,??????O(V)
- Binary tree
- O(log2(V))
29CBOWHS (softmax)
- softmax
- softmax???logistic(sigmoid)?????
- sigmoid???????,?softmax????
- ?????z_j????z,??softmax???????1,??????0
30Skip-gram HS
- ??????
- ????????
- ??????
- ???????
- ??????CBOWHS
31Negative Sampling
- Negative Sampling (??NEG)
- ????Noise Contrastive Estimation(NCE)
- ????????????????
- Hierarchical softmax?????
- ?????????tree structure
32Negative Sampling
- Negative Sampling ?? Negative Sample?
- ?CBOW??
- ??Context(w) ? ?w
- ???? ?w
- ???? ???w??????
- Negative Sampling???
- ??????
- ????????
- ??????
33CBOW Negative Sampling
- ?????Context(w) ? ?w,
- ????
- ??
- i.e.
- ???????(????)
- ???????(????)
- ????????
34Skip-gram Negative Sampling
- ????
- ??
- i.e.
- ??,
- ???????(????)
- ???????(????)
- ????????
35?????(???)
- Intrinsic
- ?????????(???)?????
- ????
- ??????????
- ???????????????????NLP????????????
- Extrinsic
- ????????????
- ?????????????
- ????????????????????????
- ??????????????????????gt??(Winning!)
36????? (Intrinsic - Word Vector Analogies)
- ???????????????????????????????
- ??????????????????
37????? (Intrinsic -Semantic)
38????? (Intrinsic - Syntactic)
39????? (Intrinsic - Sem. Syn.)
40????? (Sem. Syn. using GloVe)
- ??????(???????)?????
- ???????300??,????????
- ??GloVe????????????8
41????? (Sem. Syn. using GloVe)
- ?????
- ??GloVe?????
- Word2vec???
42????? (Sem. Syn. using GloVe)
43???????? (??????)
- ???????????????????(??run??noun??verb),???????????
?????? - ????????,?????????????,??bank1, bank2?
44???????? (??????)
45????? (Intrinsic - correlation)
46????? (Extrinsic)
47?????????? (?PPT??)
- ???????????????????
- ?????????????
- ?????????????
- ????????????????????
- ?????????????????????
- ?????????????????
- ??????????????/????
48???? ??? softmax ??(1)
- softmax????????x??????y????
49???? ??? softmax ??(2)
- ???? (Loss function)
- ???? (Cost function)
- ???? (Objective function)
- Softmax???(Loss) ???(Cross Entropy)
- ?????p 0,,0,1,0,0, ??????????q
- ???,???????????KL????
50??????????
- ??????
- ????
- ??1????softmax??W
- ??2???????
- ????????????????
- Pro ?????????
- Con ???????
51?????????? -??????????????
Fun Enjoyable Worth Right Blarblar
Blarblar dull boring
52??????????
- NLP??
- ????????,word analogy,?????????
- ????????????????????,?????????feature,NER?CHK???
- ????
- Relational extraction
- Connecting images and sentences,image
understanding - ?NLP??
- ?qq???doc,????word,??user distributed
representation,????user - ?query session??doc,query??word,??query
distributed representation,????query - ???????????doc,??????word,??product distributed
representation,????product
53????(??PPT????)
- Socher,CS224d Slides
- fandywang,??????????? Language Modeling
- licstar,Deep Learning in NLP (?)????????
- falao_beiliu,????word2vec??
- hisen,word2vec????
- Mikolov,word2vec source code
- shujun_deng,Deep Learning???word2vec _at_????
- ???,word2vec ????????
- ??,Google ???? word2vec ????_at_??
- ????,?????????(??)
- ??,Softmax ????????????_at_??
- 52nlp,????????????????
54Thanks QA