Title: An Introduction to Using Semgrex
1An Introduction to Using Semgrex
Chloé Kiddon
2What is Semgrex?
- A java utility (in javanlp) for identifying
patterns in Stanford JavaNLP SemanticGraph
structure - Much like Tregex, which does this for tree
structures (Levy, Andrew 2006) and is based on
tgrep-2 style syntax and functionality. (These
slides adapted from the structure of theirs) - Applied the same way you use regular expressions
to find patterns in strings
Ex.
tag/VB./ gtdobj ( gtamod lemmared)
3Semgrex Overview
- SemgrexPatterns are composed of nodes,
representing IndexedFeatureLabels, and relations
between them, representing edges in a
SemanticGraph - SemgrexMatchers can be used on singular
SemanticGraphs OR on two SemanticGraphs and an
Alignment between them - Ex. an RTE problem has the hypothesis graph, the
text graph, and the alignment from the hypothesis
graphs IndexedFeatureLabels to the text graphs
IndexedFeatureLabels - SemgrexPatterns return matches for
IndexedFeatureLabels in a SemanticGraph
4Syntax - Nodes
- Nodes are represented as attr1value1attr2value
2 - Attributes are regular strings values can be
strings or regular expressions marked by /s - lemmarunpos/VB./ gt any verb form of the
word run - is any node in the graph
- is any root in the graph
- is the empty word (IndexedFeatureLabel.NO_WORD
) - Comes up when working with alignments
- Descriptions can be negated with !
- !lemmaboy gt any word that isnt boy
5Grouping Nodes
- Perhaps you want a node that is either word with
an ner TIME tag, or the lemma when. The node
nerTIMElemmawhen does not accomplish this OR
operation - Can use brackets and (or ) to specify these
groupings - lemmalocate nerLOCATION
- A node that is either a word with a lemma
locate or a word with LOCATION ner - Can also be negated by putting a ! In front
- By default, takes precedence over , but has
no reason to be used
6Syntax - Relations
- Relationships between nodes can be specified
- Relations in Semgrex have two parts the relation
symbol and the relation type i.e. ltnsubj - A ltreln B A is the dependent of a reln
relation with B - A gtreln B A is the governor of a reln relation
with B - A ltltreln B There is some node in a dep-gtgov
chain from A that is the dependent of a reln
relation with B - A gtgtreln B There is some node in a govgtdep
chain from A that is the governor of a reln
relation with B - A _at_ B A is aligned to B through an Alignment
object - Relation types can be regular strings or regular
expressions encased by /
7Building complex expressions
- Relations can be strung together for and
- All relations are relative to first node in
string - gtnsubj gtdobj
- A node that is the governor of both an nsubj
relation and a dobj relation - symbol is optional gtnsubj gtdobj
- Nodes can be grouped w/ parentheses
- posNN _at_ ( ltnsubj )
- A noun that is aligned to a node that is the
dependent of an nsubj relation - Not the same as posNN _at_ ltnsubj
8Other Operators on Relations
- Operators can be combined via or with
- Ex ltagent ltnsubj
- A node that is either an agent or a nsubj in the
graph - Like with nodes, takes precedence over
- Ex ltagent ltnsubj gtamod lemmared
- An agent node OR a subject modified by the word
red - Equivalent operators are left-associative
- Any relation can be negated with ! prefix
- Ex tag/VB./ !_at_ tag/VB./
- An verb that is not aligned to another verb
9Other Operators on Relations
- For times when the pattern will be being matched
on a pair of graphs and their alignment, the
default search point is the graph that where the
alignments are from - To override this, place a _at_ at the beginning of
the pattern - Ex for a hypGraph, txtGraph and alignment
hyp-gttxt - nerLOCATION _at_
- Represents all LOCATION nodes in the hypGraph
aligned to nodes in the txtGraph - _at_ nerLOCATION _at_
- Represents all LOCATION nodes in the txtGraph
that are aligned to nodes in the hypGraph
10Grouping relations
- To specify operation order, use and
- Ex tagnn ltprep_in ltprep_on _at_
- A noun that is the dependent of either a prep_in
or prep_on relation and is aligned to NO_WORD - Grouped relations can be negated
- Just put ! before the
11Named Relations
- Suppose we want to find two nodes connected by
any relation which have a pair of nodes aligned
to them with the same relation - Name relations with
- The first showing of a named relation in a
pattern is the one that is stored as the relation - ( gt/.subjagent/reln ) _at_ ( gtreln )
- We can retrieve the string form of the relation
found in the graph later by using that name
12Named Nodes
- We can name nodes as well as relations
- Name nodes with and if the node matches, we can
retrieve node by that name - Ex ltnsubj verb
- Verb with subject found by this pattern is stored
by the name verb - The first showing of a named node in the pattern
is the one stored under that name. All others
must be equal to that first one - Ex. ( gtnsubj subject _at_ ( gtnsubj ( _at_
subject)) - Finds a node that is both the governor of an
nsubj relation to a node called subject and
aligned to a node that is the governor of an
nsubj relation to a node aligned to the node
labeled as subject
13Optional Relations to Nodes
- Sometimes we want to try to match a
sub-expression to retrieve named nodes if they
exist, but still match if sub-expression fails. - Use optional relation prefix ?
- Ex gt/nsubjagent/ subject ?gt/.obj/
object - Matches nodes that are governors of nsubj or
agent relations - If the node also is the governor of some sort of
object relation, then, we can retrieve the object
using the key object - If there is no object, the expression will still
match - Cannot be combined with negation
- Can be used in front of bracketed relations ?.
14Use of Semgrex classes
- Semgrex usage is like java.util.regex
- Two ways of calling the matcher for a single
SemanticGraph - or for two SemanticGraphs and an Alignment
between them
String s ( gtnsubj subject _at_ (
gtnsubj ( _at_ subject)) SemgrexPattern p
SemgrexPattern.compile(s)
SemgrexMatcher m p.matcher(graph)
SemgrexMatcher m p.matcher(hypGraph, alignment,
txtGraph) while (m.find())
System.out.println(m.getMatch().word())
15Use of Semgrex classes
- Named nodes are retrieved w/ getNode()
- Named relations are retrieved w/ getRelnString()
IndexedFeatureLabel subj m.getNode(subject)
String subjReln m.getRelnString(subjReln
)
16A Real Code Example - Before
- private void checkCopula(Problem problem,
SemanticGraph hypGraph, SemanticGraph txtGraph) - IndexedFeatureLabel root hypGraph.getFirstRo
ot() -
- IndexedFeatureLabel subj
hypGraph.getChildWithReln(root, "nsubj") - if (subj ! null)
- IndexedFeatureLabel alignedRoot
problem.getTxtWord(root) - if (alignedRoot ! IndexedFeatureLabel.NO_
WORD) - IndexedFeatureLabel appos
txtGraph.getChildWithReln(alignedRoot, "appos") - ListltIndexedFeatureLabelgt
appositionList - try
- appositionList txtGraph.getChildrenW
ithReln(problem.getTxtWord(subj), "nn") - catch (IllegalArgumentException e)
- appositionList new
ArrayListltIndexedFeatureLabelgt() -
- if(appos ! null)
- if(problem.getTxtWord(subj).equals(app
os)) - problem.addFeature(this,
Feature.APPOSITION_MATCH, "apposition in text
between " root.word() " and " subj.word()) -
- else
17A Real Code Example - After
- private void checkCopula(Problem problem,
SemanticGraph hypGraph, SemanticGraph txtGraph) - IndexedFeatureLabel root hypGraph.getFirstRo
ot() - if (checkAttributiveStructure(hypGraph)
!checkAttributiveStructure(txtGraph)) - if(VERBOSE) System.err.println("in check
copula") - SemgrexPattern copulaPat
SemgrexPattern.compile("(subj ltnsubj (root
_at_ alignedRoot)) _at_ ( gtnn alignedRoot
ltappos alignedRoot)") - SemgrexMatcher copulaMatcher
copulaPat.matcher(hypGraph, problem.getAlignment()
, txtGraph) - if (copulaMatcher.find())
- problem.addFeature(this,
Feature.APPOSITION_MATCH, "apposition in text
between " copulaMatcher.getNode("root").word()
" and " copulaMatcher.getNode("subj").word())
- else
- problem.addFeature(this,
Feature.APPOSITION_MISMATCH, "no apposition in
text between " copulaMatcher.getNode("root").wor
d() " and " copulaMatcher.getNode("subj").word
()) -
18For More Help
- There is a JUnitTest in the Semgrex package
called SemgrexPatternTest that can be used to
test patterns for validity and view what their
parses are - If you find a bug (i.e. a pattern that should
work but doesnt) or need more help, email
chloe_at_cs.stanford.edu
19Thanks!