Advanced word vector representations - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

Advanced word vector representations

Description:

Title: Last modified by: Windows Document presentation format: (4:3) Other titles: Arial Calibri Tahoma ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 55
Provided by: educ5500
Category:

less

Transcript and Presenter's Notes

Title: Advanced word vector representations


1
???????????
  • Advanced word vector representations
  • ??????

2
??
  • Lecture 1 ?????
  • ????????(Bengio 2003)
  • Word2vec (Mikolov 2013)
  • (CBOW Skip-gram) (HS NEG)
  • ????????
  • Softmax????(?PPT??)
  • ????????

3
????
cs224d Lecture 3 ??
  • Lecture 1 ????? (Refresher)
  • ????????????? (GD SGD)
  • ??????? (evaluate)
  • Softmax?? (softmax classification)
  • ????? (problem set)

????
  • Lecture 1 ?????
  • ????????(Bengio 2003)
  • Word2vec (Mikolov 2013)
  • (CBOW Skip-gram) (HS NEG)
  • ????????
  • Softmax????(?PPT??)
  • ????????

4
????????
  • ?????
  • One-hot Representation
  • ????? 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 ...
  • Distributional Representation
  • ????? 0.792, -0.177, -0.107, 0.109, -0.542,
    ...
  • ?????
  • SVD,LSA,LDA
  • Based on lexical co-occurrence
  • Learning representations
  • Predict surrounding words of every word
  • Eg. word2vec

5
????? input output vector
  • ?????w????????
  • input vecter????????(center vector) v
  • output vector??????(external vectors) v
  • ??window size 1,?? I like learning
  • like?v_like
  • I?learning?v_I v_learning

6
????? Simple word2vec
  • Predict surrounding words in a window of length c
    of every word.

7
????? Word2Vec GloVe
  • Word2Vec
  • Efficient Estimation of Word Representations in
    Vector Space. Mikolov et al. (2013)
  • GloVe
  • Glove Global Vectors for Word Representation.
    Pennington et al. (2014)
  • aggregated global word-word co-occurrence
    statistics from a corpus

8
????????????(??)
  • ????????????????
  • ???NLP????(?N-gram??)
  • ????????????
  • ????P(high winds tonite) gt P(large winds tonite)
  • ????P(about fifteen minutes from) gt P(about
    fifteen minuets from)
  • ????P(I saw a van) gtgt P(eyes awe of an)
  • ????P(??????nixianzaiganshenme) gt
    P(???????nixianzaiganshenme)
  • ??????????... ...

9
????????????(??)
  • ???NLP????(?N-gram??)
  • ????????????
  • ???????
  • p(S)p(w1,w2,w3,w4,w5,,wn)
  • p(w1)p(w2w1)p(w3w1,w2)...p(wnw1,w2,...,w
    n-1)
  • p(w1)p(w2w1)p(w3w2)...p(wnwn-1)
  • ??????
  • ??????
  • ?????MaxEnt??????????MEMM????????CRF(???????????)
  • ????????
  • Bengio2003?Mikolov2013?

10
?????????????(??)
  • 2000?,??,??????????
  • Can Artificial Neural Networks Learn Language
    Models?
  • ?????????????(? P(wtwt-1))???

11
?????????????(??)
  • 2003?,Bengio,?????????????
  • A Neural Probabilistic Language Model

12
?????????????(??)
  • 2008,Ronan Collobert ? Jason Weston
  • CW model
  • Natural Language Processing (Almost) from
    Scratch
  • 2008,Andriy Mnih ? Geoffrey Hinton
  • A scalable hierarchical distributed language
    model
  • 2010,Mikolov
  • RNNLM
  • Recurrent neural network based language model
  • 2012,Huang
  • Improving Word Representations via Global
    Context and Multiple Word Prototypes

13
Bengio2003
14
Bengio2003
  • ?????????
  • ??????v(w), w??Dictionary
  • ?????WUpq
  • ?????
  • ???(n-1)m,n??????,???5m??????,10103
  • ???n_hidden,????,???102??
  • ???N,????,????,104105??
  • most computation here (?? word2vec ??????)
  • ???to????tanh
  • ????softmax

15
Bengio2003
  • ??????
  • ???
  • ????(n-1)m ,??x
  • ???(???)
  • ???h???
  • tanh(Hx d)
  • ???????????? U (Vh)
  • ???
  • V???,softmax?????? y
  • ???
  • W??
  • ???????(????)??????,????
  • Bengio ???????????????,????????????
  • ?????????????,????????????

16
ffnnlm??? ngram?????
  • ?????????????????
  • ????? cat ? mouse ?ffnnlm??????,??ngram?cat???????
  • A cat is walking on the street 10000
  • A mouse is walking on the street 1
  • ??????????
  • ????p(wcontext)

17
ffnnlm???????
  • ????????
  • ????????
  • ?????

18
??????
  • ????????1986?Hinton??
  • 2003?Bengio????ffnnlm
  • Feed-forward Neural Net Language Model
  • SENNA?HLBL?Word2vec ?SSWE?
  • GloVe

19
Word2Vec????(Mikolov 2013)
  • ?????????????????????,???????,?????
  • ???tanh??,?????,???????
  • ????????????,???????
  • ????????????
  • ????softmax
  • ??negative sampling

20
Word2Vec?????
  • CBOW Skip-gram

21
(CBOW Skip-gram) 2
  • ????(????)
  • CBOW (Continuous Bag-Of-Words Model)
  • Skip-gram (Continuous Skip-gram Model)
  • ????(????)
  • Hierarchical Softmax
  • Negative Sampling

22
CBOWHS (structure)
  • ???
  • ???2C????(m?)
  • ??
  • ?????????
  • ???
  • Huffman Tree (Why?)
  • ????
  • ????(D-1 ?)
  • ????????
  • ???(D?)
  • ??????

23
CBOWHS (huffman code)
  • Huffman tree
  • ??????,????huffman code,??00101
  • ???????????,?????1,????0
  • ?????????????????,??????????????theta
  • ??????????
  • ????
  • ????(??? 1)
  • ????(??? 0)

24
CBOWHS (a train sample)
  • Train sample (Context(??), ??)
  • Train huffman path 1001
  • ????
  • Loss function ( ?0/1)
  • i.e.

25
CBOWHS (Gradient Ascent Method)
  • GD (Gradient Descent Method)
  • ???????????????J(?)??????????????
  • SGD (Stochastic Gradient Descent Method)
  • ????,??????
  • ??????(SGD)??????????????????

26
CBOWHS (Gradient Ascent Method)
  • SGD (Stochastic Gradient Descent Method)
  • ??????(SGD)??????????????????
  • ?????????,????2c-1??,??????????????
  • ????
  • ?????????
  • ???????L?L'????
  • ????????????????????????????????????

27
CBOWHS (Gradient Ascent Method)
  • theta update (theta gradient)
  • word_vector update (word_vector gradient)

28
CBOWHS (hierarchical)
  • No hierarchical structure
  • ???????????,??????O(V)
  • Binary tree
  • O(log2(V))

29
CBOWHS (softmax)
  • softmax
  • softmax???logistic(sigmoid)?????
  • sigmoid???????,?softmax????
  • ?????z_j????z,??softmax???????1,??????0

30
Skip-gram HS
  • ??????
  • ????????
  • ??????
  • ???????
  • ??????CBOWHS

31
Negative Sampling
  • Negative Sampling (??NEG)
  • ????Noise Contrastive Estimation(NCE)
  • ????????????????
  • Hierarchical softmax?????
  • ?????????tree structure

32
Negative Sampling
  • Negative Sampling ?? Negative Sample?
  • ?CBOW??
  • ??Context(w) ? ?w
  • ???? ?w
  • ???? ???w??????
  • Negative Sampling???
  • ??????
  • ????????
  • ??????

33
CBOW Negative Sampling
  • ?????Context(w) ? ?w,
  • ????
  • ??
  • i.e.
  • ???????(????)
  • ???????(????)
  • ????????

34
Skip-gram Negative Sampling
  • ????
  • ??
  • i.e.
  • ??,
  • ???????(????)
  • ???????(????)
  • ????????

35
?????(???)
  • Intrinsic
  • ?????????(???)?????
  • ????
  • ??????????
  • ???????????????????NLP????????????
  • Extrinsic
  • ????????????
  • ?????????????
  • ????????????????????????
  • ??????????????????????gt??(Winning!)

36
????? (Intrinsic - Word Vector Analogies)
  • ???????????????????????????????
  • ??????????????????

37
????? (Intrinsic -Semantic)
  • ??
  • ????

38
????? (Intrinsic - Syntactic)
39
????? (Intrinsic - Sem. Syn.)
40
????? (Sem. Syn. using GloVe)
  • ??????(???????)?????
  • ???????300??,????????
  • ??GloVe????????????8

41
????? (Sem. Syn. using GloVe)
  • ?????
  • ??GloVe?????
  • Word2vec???

42
????? (Sem. Syn. using GloVe)
  • ????
  • Wiki????????

43
???????? (??????)
  • ???????????????????(??run??noun??verb),???????????
    ??????
  • ????????,?????????????,??bank1, bank2?

44
???????? (??????)
45
????? (Intrinsic - correlation)
46
????? (Extrinsic)
  • ??????????NLP??
  • ???????

47
?????????? (?PPT??)
  • ???????????????????
  • ?????????????
  • ?????????????
  • ????????????????????
  • ?????????????????????
  • ?????????????????
  • ??????????????/????

48
???? ??? softmax ??(1)
  • softmax????????x??????y????

49
???? ??? softmax ??(2)
  • ???? (Loss function)
  • ???? (Cost function)
  • ???? (Objective function)
  • Softmax???(Loss) ???(Cross Entropy)
  • ?????p 0,,0,1,0,0, ??????????q
  • ???,???????????KL????

50
??????????
  • ??????
  • ????
  • ??1????softmax??W
  • ??2???????
  • ????????????????
  • Pro ?????????
  • Con ???????

51
?????????? -??????????????
Fun Enjoyable Worth Right Blarblar
Blarblar dull boring
52
??????????
  • NLP??
  • ????????,word analogy,?????????
  • ????????????????????,?????????feature,NER?CHK???
  • ????
  • Relational extraction
  • Connecting images and sentences,image
    understanding
  • ?NLP??
  • ?qq???doc,????word,??user distributed
    representation,????user
  • ?query session??doc,query??word,??query
    distributed representation,????query
  • ???????????doc,??????word,??product distributed
    representation,????product

53
????(??PPT????)
  • Socher,CS224d Slides
  • fandywang,??????????? Language Modeling
  • licstar,Deep Learning in NLP (?)????????
  • falao_beiliu,????word2vec??
  • hisen,word2vec????
  • Mikolov,word2vec source code
  • shujun_deng,Deep Learning???word2vec _at_????
  • ???,word2vec ????????
  • ??,Google ???? word2vec ????_at_??
  • ????,?????????(??)
  • ??,Softmax ????????????_at_??
  • 52nlp,????????????????

54
Thanks QA
Write a Comment
User Comments (0)
About PowerShow.com