More Xkwic and Tgrep - PowerPoint PPT Presentation

About This Presentation
Title:

More Xkwic and Tgrep

Description:

LING 5200. Computational Corpus Linguistics. Martha Palmer ... LING 5200, 2006. BASED on Kevin Cohen's LING 5200. 2. Resources Laura is bugging me to make ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 30
Provided by: verbsCo
Category:
Tags: laura | ling | more | tgrep | xkwic

less

Transcript and Presenter's Notes

Title: More Xkwic and Tgrep


1
More Xkwic and Tgrep
  • LING 5200
  • Computational Corpus Linguistics
  • Martha Palmer
  • March 2, 2006

2
Resources Laura is bugging me to make a CU
Corpora page
  • Like this
  • http//www.stanford.edu/dept/linguistics/corpora/
    cas-home.html
  • TGREP http//www.stanford.edu/dept/linguistics/cor
    pora/cas-tut-tgrep.html

3
Searching with pos tags and !
  • word "tThe" !( pos "DT" ) wsj
  • !(word "water" pos "NN")
  • !(word "water") !( pos "NN")
  • word ! "water" pos ! "NN"

4
Operator precedence
  • The precedence properties of the (logical)
    operators are defined by the following list, i.e.
    if operator x is listed before operator y,
    operator x has precedence over y. Operators are
    evaluated left-right
  • , !, !, ,
  • ! word "water" ! pos "NN"
    disambiguates as
  • !(word "water") !( pos "NN")

5
Searching sequences with and ?
  • "Bill" pos "NP"
  • pos "NP" pos "NP" pos "NP"
  • (pos "NP" pos "NP")
  • (pos "NP" "of" pos "NP")
  • (pos "NP" "of? pos "NP")
  • Note First match applies

6
Corpus Position wild cards and contexts
  • "give" "up"
  • "give" 0,5 "up"
  • "give" "up" within 7
  • "Clinton" expand to 5
  • "Clinton" expand left to 5
  • "Clinton" expand right to 5

7
Assignments and Intersect
  • Q1 "rain"
  • Q2 pos"NN"
  • intersect Q1 Q2
  • Q1 pos "JJ" pos "NN"
  • Q2 "acid" "rain"
  • intersect Q1 Q2
  • word "acid" pos "JJ" word "rain" pos
    "NN"

8
Structural restrictions
  • "give" "up" within s
  • ("gain" "profit") ("profit" "gain")
    within 3 s
  • ("gain" "profit") ("profit" "gain")
    within article
  • "Clinton" expand left to 2 s

9
Defining structural restrictions
  • Nounphrase pos "DT" pos "JJ" pos
    "NN"
  • Nounphrase
  • pos JJ
  • Go back to select

10
For fun
  • ltsgt pos "V."pos "PN. lt/sgt
  • ltsgt pos "V."pos "PN. lt/sgt
  • ( pos V. pos PN.) within s
  • Not a question, not beginning of sentence

11
less is more
  • less ltfilenamegt
  • cat ??/ less
  • Switches
  • SPACE next screenful
  • b previous screenful
  • /ltreg exp patterngt /RNR search for pattern
  • ?ltreg exp patterngt search backwards for pattern
  • q - quit

12
Searching for a word
  • tgrep Halloween what happens?
  • Why dont you have to specify a file?
  • babelgtgrep tgrep .cshrc
  • tgrep stuff
  • setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tg
    repabl/brwn_cmb.crp
  • setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/ws
    j_mrg.crp
  • Count results tgrep research wc l
  • cat ??/ grep Halloween wc -l

13
Tgrep Switches
  • -a Match on all patterns in a sentence
  • -w Return the whole sentence
  • -n Put the entire string on one line
  • -t Print only the terminals

14
Viewing it in sentential context
  • tgrep wn Halloween more
  • tgrep wn research more (20,865 hits)
  • Can also use less

15
Viewing it in sentential context
  • tgrep wn research more

16
Searching by POS
  • tgrep NNS more

Another way to do your sanity check
17
See more data?
  • tgrep NNS grep . more

18
Sentential context (again)
  • tgrep wn NNS more

19
Searching by syntactic constituent
  • tgrep NP more

20
Single-line outputs
  • tgrep n NP more

21
Viewing tree-like output
  • tgrep w NP head 20

22
Searching for relations between nodes
  • tgrep NP lt CC head -16

23
tgrep g (whole language)
  • A lt B A immediately dominates B
  • A lt B A is immediately dominated by B
  • A ltlt B A dominates B
  • A gtgt B A is dominated by B
  • A . B A immediately precedes B
  • A .. B A precedes B
  • Altlt,B B is the leftmost descendent of A
  • AltltB B is the rightmost descendent of A

24
Alternation
  • node names can be ORed e.g.
  • tgrep ClintonGore head

25
Character classes
  • Regular expressions
  • tgrep /Cchild/ egrep . head

26
Working towards that weird example
  • tgrep /Ppresident/ head

27
Combining alternation and a regular expression
  • tgrep ClintonGorePpresident/ head

28
Searching for a transitive verb
  • tgrep -w 'VP ltlt like lt NP ltlt DT' more

29
Verbs Particles
  • tgrep -w 'VP ltlt kick' gt kicktgrep 'VP ltlt
    /kick./ lt2 PRT' kicktgrep 'VP lt1 VB lt2 PRT'
    kicktgrep -nw 'VP lt1 /VB./ lt2 PRT' kicktgrep
    'VP lt1 (VB lt kick) lt2 PRT' kicktgrep 'VP lt1
    (/VB./ lt kick) lt2 PRT' kick
Write a Comment
User Comments (0)
About PowerShow.com