Title: More Xkwic and Tgrep
1More Xkwic and Tgrep
- LING 5200
- Computational Corpus Linguistics
- Martha Palmer
- March 2, 2006
2Resources Laura is bugging me to make a CU
Corpora page
- Like this
- http//www.stanford.edu/dept/linguistics/corpora/
cas-home.html - TGREP http//www.stanford.edu/dept/linguistics/cor
pora/cas-tut-tgrep.html
3Searching with pos tags and !
- word "tThe" !( pos "DT" ) wsj
- !(word "water" pos "NN")
- !(word "water") !( pos "NN")
- word ! "water" pos ! "NN"
4Operator precedence
- The precedence properties of the (logical)
operators are defined by the following list, i.e.
if operator x is listed before operator y,
operator x has precedence over y. Operators are
evaluated left-right - , !, !, ,
- ! word "water" ! pos "NN"
disambiguates as - !(word "water") !( pos "NN")
5Searching sequences with and ?
- "Bill" pos "NP"
- pos "NP" pos "NP" pos "NP"
- (pos "NP" pos "NP")
- (pos "NP" "of" pos "NP")
- (pos "NP" "of? pos "NP")
-
- Note First match applies
6Corpus Position wild cards and contexts
- "give" "up"
- "give" 0,5 "up"
- "give" "up" within 7
- "Clinton" expand to 5
- "Clinton" expand left to 5
- "Clinton" expand right to 5
7Assignments and Intersect
- Q1 "rain"
- Q2 pos"NN"
- intersect Q1 Q2
- Q1 pos "JJ" pos "NN"
- Q2 "acid" "rain"
- intersect Q1 Q2
- word "acid" pos "JJ" word "rain" pos
"NN"
8Structural restrictions
- "give" "up" within s
- ("gain" "profit") ("profit" "gain")
within 3 s - ("gain" "profit") ("profit" "gain")
within article - "Clinton" expand left to 2 s
9Defining structural restrictions
- Nounphrase pos "DT" pos "JJ" pos
"NN" - Nounphrase
- pos JJ
- Go back to select
10For fun
- ltsgt pos "V."pos "PN. lt/sgt
- ltsgt pos "V."pos "PN. lt/sgt
- ( pos V. pos PN.) within s
- Not a question, not beginning of sentence
11less is more
- less ltfilenamegt
- cat ??/ less
- Switches
- SPACE next screenful
- b previous screenful
- /ltreg exp patterngt /RNR search for pattern
- ?ltreg exp patterngt search backwards for pattern
- q - quit
12Searching for a word
- tgrep Halloween what happens?
- Why dont you have to specify a file?
- babelgtgrep tgrep .cshrc
- tgrep stuff
- setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tg
repabl/brwn_cmb.crp - setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/ws
j_mrg.crp - Count results tgrep research wc l
- cat ??/ grep Halloween wc -l
13Tgrep Switches
- -a Match on all patterns in a sentence
- -w Return the whole sentence
- -n Put the entire string on one line
- -t Print only the terminals
14Viewing it in sentential context
- tgrep wn Halloween more
- tgrep wn research more (20,865 hits)
- Can also use less
15Viewing it in sentential context
16Searching by POS
Another way to do your sanity check
17See more data?
18Sentential context (again)
19Searching by syntactic constituent
20Single-line outputs
21Viewing tree-like output
22Searching for relations between nodes
23 tgrep g (whole language)
- A lt B A immediately dominates B
- A lt B A is immediately dominated by B
- A ltlt B A dominates B
- A gtgt B A is dominated by B
- A . B A immediately precedes B
- A .. B A precedes B
- Altlt,B B is the leftmost descendent of A
- AltltB B is the rightmost descendent of A
24Alternation
- node names can be ORed e.g.
- tgrep ClintonGore head
25Character classes
- Regular expressions
- tgrep /Cchild/ egrep . head
26Working towards that weird example
27Combining alternation and a regular expression
- tgrep ClintonGorePpresident/ head
28Searching for a transitive verb
- tgrep -w 'VP ltlt like lt NP ltlt DT' more
29Verbs Particles
- tgrep -w 'VP ltlt kick' gt kicktgrep 'VP ltlt
/kick./ lt2 PRT' kicktgrep 'VP lt1 VB lt2 PRT'
kicktgrep -nw 'VP lt1 /VB./ lt2 PRT' kicktgrep
'VP lt1 (VB lt kick) lt2 PRT' kicktgrep 'VP lt1
(/VB./ lt kick) lt2 PRT' kick