Title: Linguistics 187/287 Week 2
1Linguistics 187/287 Week 2
Engineering and Linguistic Generalizations
2- Homework
- Due Friday
- Can discuss in class or via email or ask us for
office hours - Last assignment
- How much time?
- Trouble access, procedure?
- Issues XLE, LFG, grammar?
3Topics for this week
- Notation in LFG (more background)
- Templates
- Lexical rules
- Configurations
- Feature declaration
- Metarulemacro
4Grammar engineering for deep processing
- Draws on theoretical linguistics, software
engineering - Theoretical linguistics gt papers
- Generalizations, universality, idealization
(competence) - Software engineering gt programs
- Coverage, interface, QA, maintainability,
efficiency, practicality - Grammar engineering
- GrammarTheory ProgramProgramming language
- Reflect linguistic generalizations
- Respect special cases of ordinary language
- Deal with large-scale interactions
- Theory/practice trade-offs
5Grammar Engineering and Linguistic Theory
- Description vs. representation
- Program vs. data
- Expressiveness of notation
- Regular predicates for c-structure
- Boolean combinations (esp. disjunction)
- Equality, set-membership
- Defaults and marking conventions
- Constraining vs. defining, existentials, defaults
- Abbreviation and factoring
- Templates, macros, lexical rules
- Configuration management
- Combining rules, templates, lexicons
- Priority of core/specializations/extensions
6Description vs. Representation
- Complexity trades (program vs. data)
- Simplify descriptions but complicate
representations - Complicate descriptions but simplify
representations - Example Arguments and adjuncts
- Different behavior
- Arguments selected by predicate, unique
- Adjuncts modify predicate, multiple instances
- Similar behavior Can both be questioned
- Representation solution (HPSG)
- ARG ADJ DEP ARG ? ADJ
(new type) - Description solution (LFG)
- ARG ADJ ARG ADJ
7Description vs. Representation
- External constraints on representation
- Linguistic theory
- Applications
- Multilingual/cross-grammar similarity
8Expressiveness of notation
- Regular predicates for c-structure
Simple context-free rules
Compact notation
NP --gt N NP --gt Det N
NP --gt (Det) N optionality
9Expressiveness of notation and Representation
- Equality attribute values
- Set-membership sets and elements
- Adjuncts PP ( ADJUNCT)!
- PP ! ( ADJUNCT)
- Coordination (more next week)
- NP --gt NP ! CONJ NP ! .
- Semantic forms
- ( PRED)kicklt( SUBJ)( OBJ)gt
- Semantic relations, instantiation,
subcategorization
10Defaults and Marking Conventions
- Constraining vs. defining
- Must be assigned nom ( SUBJ CASE)c nom
- Is nom ( SUBJ CASE)nom
- Existentials
- Must have case ( CASE)
- Defaults
- NTYPE proper pronoun common
- ( NTYPE) ( NTYPE)common
- ( NTYPE)common
-
(make choices disjoint)
11Abbreviations and Factoring
- Templates
- Capture generalizations of annotations
- Maintainability changes, mistakes
- Compare HPSG type hierarchy
- Macros
- Capture generalizations of rules
- Lexical Rules
- Theoretical proposal to manipulate predicates
- Implemented to expand lexicons consistently
12Example The verb bakes
- Belongs to several classes
- Third-person, singular, present-tense verb
- Transitive or intransitive
- Shares
- Some properties with falls
- Other properties with cooked
13The lexicon à la Kiparsky
- A dumping ground for exceptions
- A kind of appendix to the grammar, whose
function is to list what is unpredictable and
irregular about the words of a language
14The lexicon à la Bresnan
- A repository of linguistic generalizations
- Active and passive forms are related by lexical
rules, not syntactic transformations - ( SUBJ) ? ( OBL-AG)
- ( OBJ) ? ( SUBJ)
- Rules relating lexical items are a prime locus of
syntactic generalizations
15The lexicon à la Flickinger
- A hierarchical structure of classes
- Each class represents some piece of syntactic
information - bakes belongs to
- the third-person singular present-tense class
- (like appears)
- the transitive/intransitive class
- (like cooked)
- and others
- Classes may be subclasses of other classes
- Classes may partition other classes along several
dimensions
16LFG Relations between descriptions
LFG can encode linguistic generalizations
asrelations between descriptions of structures
- LFG functional description is a collection of
equations - These can be named
- This name can stand for those equations in
linguistic descriptions - Named descriptions are referred to as templates
- Interpretation Simple substitution
Template-description is substituted for
template-name that appears in (is invoked by)
another description
173SG and PRESENT templates
- 3SG ( SUBJ PERSON) 3
- ( SUBJ NUM) SG.
- 3SG names ( SUBJ PERSON)3 ( SUBJ
NUM)SG - PRESENT ( TENSE) PRES.
_at_ marks invocation (in lexicon, rules,
templates) Substitute ( TENSE)PRES for
_at_PRESENT in other descriptions
18Templates enable hierarchical generalizations
- Template definitions can refer to other templates
by name - E.g. further divide 3SG into
- 3PERS ( SUBJ PERSON) 3.
- SING ( SUBJ NUM) SG.
- then 3SG _at_3PERS _at_SING.
- Hierarchy of references represents inclusion
hierarchy of named descriptions - Frequently repeated subdescriptions
- specified in one place
- effective in many
19Hierarchy of template invocations
- Sharing in verb agreement
SING
3PERS
PRESENT
3SG
PRES3SG
- Boolean combinations of template references
- (just like ordinary descriptions)
- Sharing is distinct from mode of combination
20Functional description for bakes
- ( PRED)bakeltSUBJ,OBJgt (
PRED)bakeltSUBJgt - ( TENSE)PRES
- ( SUBJ PERS)3
- ( SUBJ NUM)SG
- With agreement template
- ( PRED)bakeltSUBJ,OBJgt (
PRED)bakeltSUBJgt - _at_PRES3SG
- Agreement template invoked by other verbs
21Templates with parameters Valency
Pargram convention Parameters begin with _
- TRANS-OR-INTRANS(_p)
- ( PRED) _pltSUBJ, OBJgt
- ( PRED) _pltSUBJgt .
- PRED value as a parameter of the template
- _at_TRANS-OR-INTRANS(bake)
- ? ( PRED) bakeltSUBJ, OBJgt
- ( PRED)
bakeltSUBJgt - Arguments can substitute for any part of an
f-description - Attributes
- Values
- Semantic relation-names
- Descriptions
22Valency hierarchy
- TRANS-OR-INTRANS(p)
- _at_INTRANSITIVE(p) _at_TRANSITIVE(p)
. - INTRANSITIVE(p) ( PRED)pltSUBJgt
- TRANSITIVE(p) ( PRED)pltSUBJ, OBJgt.
INTRANSITIVE
TRANSITIVE
TRANS-OR-INTRANS
23Templates and generalizations bakes
- bakes _at_TRANS-OR-INTRANS(bake) _at_PRES3SG
- TRANS-OR-INTRANS(p) shared by eat, cooked,
- PRES3SG shared by appears, goes, cooks,
- PRESENT
- used by PRES3SG template
- shared by bake, laugh, etc.
24Lexical sharing
3PERS
SING
INTRANSITIVE
TRANSITIVE
PRESENT
3SG
TRANS-OR-INTRANS
PRES3SG
bakes
cooked
falls
25Type hierarchy vs. templates
- Templates can play the same role as hierarchical
type systems in theories like HPSG - A notational device for factoring descriptions
- Interpreted as simple substitution
- Not part of a formal ontology
- Do not require an elaborate mathematical
characterization
26Templates also invoked by Rules
- Rule annotations can also call templates
- Global changes, typo prevention
- Example adjunct annotation
- PP ! ( ADJUNCT) (! ADJ-TYPE)VP
- ADVP ! ( ADJUNCT) (! ADJ-TYPE)VP
- ADJ(_T) ! ( ADJUNCT) (! ADJ-TYPE)_T.
- PP _at_(ADJ VP) PP _at_(ADJ NP)
- ADVP _at_(ADJ VP) ADVP _at_(ADJ S)
27Templates Rules
- Example null pronouns
- Push it! They left (in order) to be on time.
- NULL-PRON(_P) (_P PRED)pro
- (_P
PRON-TYPE)null. - VPimp --gt VP _at_(NULL-PRON ( SUBJ)).
- VPimp --gt VP ( SUBJ PRED)pro
- ( SUBJ PRON-TYPE)null.
-
28Templates Extend notation
- DEFAULT(D V) D DV DV .
- e.g. _at_(DEFAULT ( NTYPE) common)
- IF(P1 P2) P1 P2
- IFF(P1 P2) P1 P2 P1 P2 .
29Templates and Principles
- Subject principle every verb has a subject.
- Implementaton
- VERB ( SUBJ).
- Put _at_VERB in every verbal entry.
- or
- Put _at_VERB in the templates called by the verbal
entries.
30Lexical Rules
- Theoretical construct
- Templates can often achieve the same result
- Disjunction of several templates
- Parameterization of a complex template
31Lexical Rules Example
- Active
- They ate the cake.
- ( PRED)eatlt(SUBJ)(OBJ)gt'
- Passive
- The cake was eaten.
- ( PRED)'eatltNULL (SUBJ)gt'
- Could have VTRANS have two disjuncts
- Or manipulate PRED with lexical rule
32Lexical Rules Example
- Passive lexical rule
- _SCHEMA is a subcategorization frame
- PASSIVE(_SCHEMA)
- _SCHEMA ( PASSIVE)-
- _SCHEMA
- ( SUBJ) --gt NULL
- ( OBJ) --gt ( SUBJ)
- ( PASSIVE)c .
- Example calls
- TRANS(_P) _at_(PASSIVE ( PRED)'_Plt(SUBJ)(OBJ)gt'
). - DITRANS(_P) _at_(PASSIVE ( PRED)'_Plt(SUBJ)(OB
J)(OBJ2)gt'). -
33Lexical Rules Summary
- Lexical rules manipulate arguments of predicates
- capture systematic alternations like
active-passive - Rename and remove roles
- No good implementation for adding roles
- causative
- complex predicates
- benefactives
34Configuration Management
- Combining rules, templates, lexicons,
- System needs to know where everything is
- For large grammars, need modularization (multiple
grammar rule files, multiple lexicons) - Priority of core/specializations/extentions
- Want to specialize a grammar
- No questions in instruction manuals
- Loosen subj-V agreement
- Have lexicons of varying quality
35Combining Rules, Templates, Lexicons
- XLE configuration section
- Specify what files are called
- Specify which rule, template, and lexicon
sections are used - RULES (TOY ENGLISH).
- RULES (CORE ENGLISH) (SPECIAL ENGLISH).
- Other grammar information
36Configurations and Declarations
- Configurations
- File management
- Priority
- Declarations
- Governable relations and semantics
- Features
- Global Operators
- METARULEMACRO
37Files
- Priority ordered rules/entries in later files
override those in earlier ones - Example
- FILES standard-english-rules.lfg
- eureka-english-rules.lfg
- standard-english-lexicon.lfg
- eureka-english-lexicon.lfg.
38Eureka vs. Standard rules
- STANDARD ENGLISH RULES (1.0)
- N --gt _at_NOUN-COMMON
- _at_NOUN-PROPER.
- NOUN-COMMON -gt
- NOUN-PROPER -gt
- EUREKA ENGLISH RULES (1.0)
- N --gt _at_NOUN-COMMON
- _at_NOUN-PROPER
- _at_NOUN-EUREKA
- N PL .
- NOUN-EUREKA --gt EUR-PART EUR-NUM .
39Sections Used
- All lexicon, rule, and template sections have
names and versions. - These are called in priority order in the config.
- Use with the file order to create overrides.
- RULES (STANDARD RULES) (EUREKA RULES).
- LEXENTRIES (all all).
Versions allow for future XLE upgrades
40Multiple Lexicon Sections
- LEXENTRIES (AUTOMATIC ENGLISH)
-
(CORRECTED ENGLISH). - AUTOMATIC ENGLISH LEXICON (1.0)
- appear V XLE _at_(V-TRANS appear)
- _at_(V-INTRANS
appear). - CORRECTED ENGLISH LEXICON (1.0)
- appear V XLE _at_(V-INTRANS appear)
- _at_(V-SUBJ-XCOMP
appear).
41Other Configuration Information
- ROOTCAT default top level category
- Standard ROOT, Eureka FIELD
- Nondistributives for coordination
- External attributes for applications
- Character encoding
- Reparse category and Optimality order for
robustness - See XLE documentation for complete list
42Declarations
- Must declare grammatical and semantic functions
for each grammar. - Used for completeness and coherence
- GOVERNABLERELATIONS
- Functions (features) that must be subcategorized
for in the PRED - SUBJ OBJ OBL-? ?COMP etc.
- SEMANTICFUNCTIONS
- Functions that must have a PRED
- ADJUNCT NMOD
43Feature Declaration
- List of all the features
- GGF and semantic functions need not be listed
- all other features must be listed
- List of their possible values
- atomic
- f-structure
- Multiple feature declarations
- multilingual setting
- grammar specialization
44Why a feature declaration?
- Good engineering practice
- Catch typos and old analyses
- Grammar easier to read
- NB Theory doesnt have typos
45Declaration format
- STANDARD LANGUAGE FEATURES (1.0)
- feature1 -gt val1 val2 val3 .
- feature2 -gt val4 val 5 .
- feature3 -gt ltlt feature1 feature2 .
- feature4.
- ----
46Sample feature declaration
- TOY ENGLISH FEATURES (1.0)
- NUM -gt sg pl .
- PERS -gt 1 2 3 .
- TNS-ASP -gt ltlt TENSE MOOD ASPECT .
- TENSE.
- MOOD -gt indicative subjunctive .
- ASPECT -gt ltlt PERF PROG .
- PERF -gt - .
- PROG -gt - .
47XLE and the feature declaration
- XLE will not load a grammar with a violation of
the feature declaration. - To catch violations in the lexicon, the generator
must be loaded. - regenerate some-sentence-to-parse
- parse, then choose generate in f-str window
- create-generator grammar-name.lfg
- print-unused-feature-declarations
48Multiple feature declarations
- List in priority order in the configuration
- FEATURES (STANDARD COMMON)
-
(STANDARD ENGLISH). - New features are listed as usual
- Changes to features use edit operators
- add a new value
- intersect the values
- ! replace the feature entirely
49Multiple feature declarations
- STANDARD COMMON FEATURES (1.0)
- NUM -gt sg pl dual .
- CASE -gt nom acc .
- TENSE -gt ltlt PAST FUTURE .
- PAST -gt - .
- FUTURE -gt - .
- STANDARD ENGLISH FEATURES (1.0)
- PERS -gt 1 2 3 .
PERS -gt 1 2 3 . - NUM -gt sg pl . NUM
-gt sg pl . - CASE -gt gen . CASE
-gt nom acc gen . - !TENSE -gt pres past fut . TENSE -gt
pres past fut . - !PAST -gt .
- !FUTURE -gt .
50Using Multiple Feature Decl.
- Multilingual contexts
- Language universal features
- Customize to particular language
- Grammar specialization
- Add new features for odd constructions
- Remove unused choices
51Global Operations METARULEMACRO
- System defined function
- Operates on every category
- Global statements
- Linguistic subject condition
- SUBJ lt OBJ
- coordination
- Engineering quotes
- bracketing
52METARULEMACRO
- Right-hand side of each grammar rule is the
result of applying the macro to the rule - METARULEMACRO(_CAT _BASECAT _RHS)
- _RHS.
53Punctuation and METARULEMACRO
- Surround any constituent with quotes
- METARULEMACRO( _CAT _BASECAT _RHS)
- _RHS
- L-QT
- _CAT
- R-QT
- L-DQT
- _CAT
- R-DQT.
54Punctuation cont.
- Mary and John left them there.
- We saw them in the garden.
- They appeared and then disappeared.'
55Punctuation Problem
- Vacuous branching results in many analyses
NP
etc.
Nzero
N
bagels
56Solution PUSHUP
- If non-branching, push up to highest node.
- METARULEMACRO(_CAT _BASECAT _RHS)
- _RHS
- L-QT
- _CAT _at_PUSHUP
- R-QT .
- How to define PUSHUP?
- Need to test existence of sister nodes MOTHER
SISTER
PUSHUP ( MOTHER LEFT_SISTER)
( MOTHER RIGHT_SISTER)
( MOTHER LEFT_SISTER)
( MOTHER MOTHER) .
57Summary
- Lexical rules allow for generalizations over
predicate alternations - Configurations and declarations allow management
of large-scale grammars - readability and consistency
- maintenance
- specialization
- Global operators allow for cross-grammar
generalizations - coordination
58(No Transcript)
59The HPSG lexicon a type hierarchy
- More specific types inherit information from less
specific - Types and subtypes
- A mathematical relation between structures
AND/OR lattice - Different subtypes represent alternatives/disjunct
ion - Multiple supertypes represent conjunction
head
(Malouf)
OR
noun
relational
AND
c-noun
gerund
verb
- LFG does not use typed feature structures for
lexical generalizations
but type inheritance is not the only (best?)
way to express generalizations
60Coordination without METARULEMACRO
- Want to coordinate any constituent
- Coordination macro
- SCCOORD(_CAT)
- _CAT !
- COMMA
- _CAT !
- CONJ
- _CAT ! .
- Put call in each rule
- NP (DET) AP N PP
- _at_(SCCOORD NP).
- Engineering problem
- forget to call
- put in wrong category
61Coordination with METARULEMACRO
- Call SCCOORD as part of MRM
- METARULEMACRO(_CAT _BASECAT _RHS)
- _RHS
- _at_(SCCOORD _CAT).
- NP rule now
- NP (DET) AP N PP.
- Effectively
- NP (DET) AP N PP
- _at_(SCCOORD NP.