Title: CS626-449: NLP, Speech and Web-Topics-in-AI
1CS626-449 NLP, Speech and Web-Topics-in-AI
- Pushpak BhattacharyyaCSE Dept., IIT Bombay
- Lecture 36 UNL
2Internet for the Masses
English interface
Spanish interface
Internet
Spanishviewer
English viewer
Hindi interface
Hindi viewer
3Vaquious Triangle
Interlingua based (do deep semantic
process Before entering the target language)
Generation
Analysis
Transfer Based (do deep semantic process Before
entering the target language)
Vaquious an eminent French Machine Translation
Researcher- Originally a Physicist
Direct (enter the target Language
immediately Through a dictionary)
4UNL representation
Representation of Knowledge
Ram is reading the newspaper
5Knowledge Representation
UNL Graph - relations
read
agt
obj
Ram
newspaper
6Knowledge Representation
UNL Graph - UWs
read(iclgtinterpret)
obj
agt
newspaper(iclgtprint_media)
Ram(iofgtperson)
7The noun read has 1 sense (no senses from tagged
texts)
1. read -- (something that is read "the article
was a very good read") The verb read has 11
senses (first 8 from tagged texts)
1. (117) read --
(interpret something that is written or printed
"read the advertisement" "Have you read Salman
Rushdie?") 2. (17) read, say -- (have or contain
a certain wording or form "The passage reads as
follows" "What does the law say?") 3. (15) read
-- (look at, interpret, and say out loud
something that is written or printed "The King
will read the proclamation at noon") 4. (5) read,
scan -- (obtain data from magnetic tapes "This
dictionary can be read by the computer)
85. (5) read -- (interpret the significance of, as
of palms, tea leaves, intestines, the sky also
of human behavior "She read the sky and
predicted rain" "I can't read his strange
behavior" "The fortune teller read his fate in
the crystal ball") 6. (5) take, read --
(interpret something in a certain way convey a
particular meaning or impression "I read this
address as a satire" "How should I take this
message?" "You can't take credit for this!") 7.
(4) read, register, show, record -- (indicate a
certain reading of gauges and instruments "The
thermometer showed thirteen degrees below zero"
"The gauge read empty'") 8. (4) learn, study,
read, take -- (be a student of a certain subject
"She is reading for the bar exam") 9. read --
(audition for a stage role by reading parts of a
role "He is auditioning for Julius Caesar' at
Stratford this year") 10. read -- (to hear and
understand "I read you loud and clear!") 11.
understand, read, interpret, translate -- (make
sense of a language "She understands French"
"Can you read Greek?")
9The boy who works here went to school
Another Example
10UNL System
11The UNL System An Overview
12Dependency Parsers
- UNL generator is a kind of dependency parser
other such parsers are - Link Parser
- Mini Parser
- Stanford Dependency Parser
- Mat Parser
- MaxEnt Parser
13Universal Networking Language
- Universal Words (UWs)
- Relations
- Attributes
- Knowledge Base
14UNL Graph
He forwarded the mail to the minister.
15UNL Expression
- agt (forward(iclgtsend)._at_ entry _at_ past,
he(iclgtperson)) - obj (forward(iclgtsend)._at_ entry _at_ past,
minister(iclgtperson)) - gol (forward(iclgtsend )._at_ entry _at_ past,
mail(iclgtcollection). _at_def)
16Universal Word (UW)
- What is a Universal Word (UW)?
- What are the features of a UW?
- How to create UWs?
17What is a Universal Word (UW)?
- Words of UNL
- Constitute the UNL vocabulary, the
syntactic-semantic units to form UNL expressions - A UW represents a concept
- Basic UW (an English word/compound word/phrase
with no restrictions or Constraint List) - Restricted UW (with a Constraint List )
- Examples
- crane(iclgtdevice)
- crane(iclgtbird)
18The Features of a UW
- Every concept existing in any language must
correspond to a UW - The constraint list should be as small as
necessary to disambiguate the headword - Every UW should be defined in the UNL
Knowledge-Base
19Restricted UWs
- Examples
- He will hold office until the spring of next
year. - The spring was broken.
- Restricted UWs, which are Headwords with a
constraint list, for example - spring(iclgtseason)
- spring(iclgtdevice)
- spring(iclgtjump)
- spring(iclgtfountain)
20How to create UWs?
- Pick up a concept
- the concept of crane"
- as "a device for lifting heavy loads
- or
- as a long-legged bird that wade in water in
search of food - Choose an English word for the concept.
- In the case for crane", since it is a word of
English, the corresponding word should be crane'
- Choose a constraint list for the word.
- crane(iclgtdevice)'
- crane(iclgtbird)'
21UNL Relations
- Constitute the syntax of UNL
- Expresse how concepts(UWs) constitute a sentence
- Represented as strings of 3 characters or less
- A set of 41 relations specified in UNL (e.g.,
agt, aoj, ben, gol, obj, plc, src, tim,) - Refer to a semantic role between two lexical
items in a sentence - E.g., John has composed this poem.
22AGT / AOJ / OBJ
- AGT (Agent)Definition Agt defines a thing
which initiates an action - AOJ (Thing with attribute)Definition Aoj
defines a thing which is in a state or has an
attribute - OBJ (Affected thing)Definition Obj defines a
thing in focus which is directly affected by an
event or state
23Examples
- John broke the window.
- agt ( break._at_entry._at_past, John)
- This flower is beautiful.
- aoj ( beautiful._at_entry, flower)
- He blamed John for the accident.
- obj ( blame._at_entry._at_past, John)
24BEN
- BEN (Beneficiary)Definition Ben defines a not
directly related beneficiary or victim of an
event or state - Can I do anything for you?
- ben ( do._at_entry._at_interrogation._at_politeness, you )
- obj ( do._at_entry._at_interrogation._at_politeness,
anything ) - agt (do._at_entry._at_interrogation._at_politeness, I )
25BEN UNL Graph
He carved a toy for the baby.
carve(iclgtcut)
_at_ entry _at_ past
agt
ben
obj
he(iofgtperson)
baby(iclgtchild)
_at_def
toy(iclgtplaything)
26GOL / SRC
- GOL (Goal final state)Definition Gol
defines the final state of an object or the thing
finally associated with an object of an event - SRC (Source initial state)Definition Src
defines the initial state of object or the thing
initially associated with object of an event
27GOL
- I deposited my money in my bank account.
28SRC
- They make a small income from fishing.
make(iclgtdo)
_at_ entry _at_ present
src
obj
agt
income(iclgtgain)
fishing(iclgtbusiness)
they(iclgtpersons)
mod
small(aojgtthing)
29PUR
- PUR (Purpose or objective)Definition Pur
defines the purpose or objectives of the agent of
an event or the purpose of a thing exist - This budget is for food.
- pur ( food._at_entry, budget ) mod ( budget, this
)
30RSN
- RSN (Reason)Definition Rsn defines a reason
why an event or a state happens - They selected him for his honesty.
- agt(select(iclgtchoose)._at_entry, they)
- obj(select(iclgtchoose) ._at_entry, he)
- rsn (select(iclgtchoose)._at_entry, honesty)
31TIM
- TIM (Time)Definition Tim defines the time an
event occurs or a state is true - I wake up at noon.
- agt ( wake up._at_entry, I )tim ( wake up._at_entry,
noon(iclgttime))
32TMF
- TMF (Initial time)Definition Tmf defines a
time an event starts - The meeting started from morning.
obj ( start._at_entry._at_past, meeting._at_def )tmf (
start._at_entry._at_past, morning(iclgttime) )
33TMT
- TMT (Final time)Definition Tmt defines a time
an event ends - The meeting continued till evening.
obj ( continue._at_entry._at_past, meeting._at_def )tmt (
continue._at_entry._at_past,evening(iclgttime) )
34PLC
- PLC (Place)Definition Plc defines the place
an event occurs or a state is true or a thing
exists - He is very famous in India.
- aoj ( famous._at_entry, he ) man ( famous._at_entry,
very) plc ( famous._at_entry, India)
35PLF
- PLF (Initial place)Definition Plf defines the
place an event begins or a state becomes true - Participants come from the whole world.
- agt ( come._at_entry, participant._at_pl )plf (
come._at_entry, world )mod ( world, whole)
36PLT
- PLT (Final place)Definition Plt defines the
place an event ends or a state becomes false - We will go to Delhi.
- agt ( go._at_entry._at_future, we )plt (
go._at_entry._at_future, Delhi)
37INS
- INS (Instrument) Definition Ins defines the
instrument to carry out an event - I solved it with computer
- agt ( solve._at_entry._at_past, I )ins (
solve._at_entry._at_past, computer )obj (
solve._at_entry._at_past, it )
38INS UNL Graph
John covered the baby with a blanket.
39Attributes
- Constitute syntax of UNL
- Play the role of bridging the conceptual world
and the real world in the UNL expressions - Show how and when the speaker views what is said
and with what intention, feeling, and so on - Seven types
- Time with respect to the speaker
- Aspects
- Speakers view of reference
- Speakers emphasis, focus, topic, etc.
- Convention
- Speakers attitudes
- Speakers feelings and viewpoints
40Tense _at_past
He went there yesterday
- The past tense is normally expressed by _at_past
- unl
- agt(go._at_entry._at_past, he)
-
- /unl
41Aspects _at_progress
Its raining hard.
- unl
- man ( rain._at_entry._at_present._at_progress, hard )
- /unl
42Speakers view of reference
- _at_def (Specific concept (already referred))
- The house on the corner is for sale.
- _at_indef (Non-specific class)
- There is a book on the desk
- _at_not is always attached to the UW which is
negated. - He didnt come.
- agt ( come._at_entry._at_past._at_not, he )
43Speakers emphasis
- _at_emphasis
- John his name is.
- mod ( name, he )aoj ( John._at_emphasis._at_entry,
name ) - _at_entry denotes the entry point or main UW of an
UNL expression
44Tense Aspect
Single
Continuous / Progressive
Present
Perfect / Complete
Tense
Aspect
453 Types of Verbs
- Transitive
- Intransitive
- Stative
46_at_entry attribute
- _at_entry the attribute indicating the main
- Predicate.
- Within a sub-graph the entry node is the type
of semantic relation.
47UNL Knowledge Base (UNLKB)
- What is the UNL Knowledge Base?
- Linguistic Background
- How to define the UWs in the UNL Knowledge-Base?
48What is the UNL Knowledge Base?
- A semantic network comprising every directed
binary relation between UWs - Categorized according to the role of a concept to
other concepts
49Linguistics Background
- Ungrammaticality
- The boy relied on the girl.
- The boy relied the girl.
- The boy relied.
- Grammatically Sound but Semantically Odd
- The boy frightens sincerity.
- Sincerity kicked the boy.
50Ungrammaticality
- Given sentences
- The boy relied on the girl.
- The boy relied the girl.
- The boy relied.
- PS Rules
- VP ? V (NP) (PP)
- NP ? Det N
- V ? rely
- Det ? the
- N ? boy, girl
51Subcategorization Frames
- Specify the categorial class of the lexical item.
- Specify the environment.
- Examples
- kick V _ NP
- cry V _
- rely V _PP
- put V _ NP PP
- think V _ S
52Subcategorization Rules
Subcategorization Rule
_NP _ _PP _NP PP _S
y
/
V
53Subcategorization Rules
The boy relied on the girl.
- 1. S ? NP VP
- 2. VP ? V (NP) (PP) (S)
- 3. NP ? Det N
- 4. V ? rely / _PP
- 5. P ? on / _NP
- 6. Det ? the
- 7. N ? boy, girl
54Semantically Odd Constructions
- Can we exclude these two ill-formed structures ?
- The boy frightened sincerity.
- Sincerity kicked the boy.
- Selectional Restrictions
55Selectional Restrictions
- Inherent Properties of Nouns
- /- ABSTRACT, /- ANIMATE
- E.g.,
- Sincerity ABSTRACT
- Boy ANIMATE
56Selectional Rules
- A selectional rule specifies certain selectional
restrictions associated with a verb.
57Subcategorization Frame
forward
e.g., We will be forwarding our new catalogue to
you
V
__ NP PP
invitation
e.g., An invitation to the party
N
__ PP
accessible
A
e.g., A program making science is more
accessible to young people
__ PP
58Thematic Roles
The man forwarded the mail to the minister.
59How to define the UWs in UNL Knowledge-Base?
- Nominal concept
- Abstract
- Concrete
- Verbal concept
- Do
- Occur
- Be
- Adjective concept
- Adverbial concept
60Nominal Concept Abstract thing
- abstract thing(iclgtthing)culture(iclgtabstract
thing) - civilization(iclgtculturegtabstract thing)
- direction(iclgtabstract thing) east(iclgtdirection
gtabstract thing)duty(iclgtabstract
thing) mission(iclgtdutygtabstract
thing) responsibility(iclgtdutygtabstract
thing) accountability(iclgtresponsibilitygtduty
)event(iclgtabstract thing,iclgttimegtabstract
thing) - meeting(iclgteventgtabstract thing,iclgtgroupgtabstr
act thing) conference(iclgtmeetinggtevent)
TV conference(iclgtconferencegtmeeting)
61Nominal Concept Concrete thing
- concrete thing(iclgtthing,iclgtplacegtthing)
- building(iclgtconcrete thing)
- factory(iclgtbuildinggtconcrete thing)
- house(iclgtbuildinggtconcrete thing)
- substance(iclgtconcrete thing)
- cloth(iclgtsubstancegtconcrete thing)
- cotton(iclgtclothgtsubstance)
- fiber(iclgtsubstancegtconcrete thing)
- synthetic fiber(iclgtfibergtsubstance)
- textile
fiber(iclgtfibergtsubstance) - liquid(iclgtsubstancegtconcrete thing)
- beverage(iclgtfood,iclgtliquidgt
substance) - coffee(iclgtbeveragegtfood)
- liquor(iclgtbeveragegtfood
) - beer(iclgtliquorgtbeverage)
62Verbal concept do
- do(iclgtdo,agtgtthing,golgtthing,objgtthing)
- express(iclgtdo(agtgtthing,golgtthing,objgtthing))
- state(iclgtexpress(agtgtthing,golgtthing,objgtthing)
) - explain(iclgtstate(agtgtthing,golgtthing,objgtthing
)) - add(iclgtdo(agtgtthing,golgtthing,objgtthing))
- change(iclgtdo(agtgtthing,golgtthing,objgtthing)
) - convert(iclgtchange(agtgtthing,golgtthing,objgtthin
g) - classify(iclgtdo(agtgtthing,golgtthing,objgtthing
)) divide(iclgtclassify(agtgtthing,golgtthing,objgt
thing))
63Verbal concept occur and be
- occur(iclgtoccur,golgtthing,objgtthing) melt(iclgt
occur(golgtthing,objgtthing)) - divide(iclgtoccur(golgtthing,objgtthing))
- arrive(iclgtoccur(objgtthing))
- be(iclgtbe,aojgtthing,objgtthing)
- exist(iclgtbe(aojgtthing))
- born(iclgtbe(aojgtthing))
64How to define the UWs in UNL Knowledge Base?
- In order to distinguish among the verb classes
headed by 'do', 'occur' and 'be', the following
features are used
UW need an agent need an object English
'do' "to kill"
'occur' - "to fall"
'be' - - "to know"
65How to define the UWs in UNL Knowledge-Base?
- The verbal UWs (do, occur, be) also take some
pre-defined semantic cases, as follows
UW PRE-DEFINED CASES English
'do' takes necessarily agtgtthing "to kill"
'occur' takes necessarily objgtthing "to fall"
'be' takes necessarily aojgtthing "to know"
66Complex sentence
I want to watch this movie.
_at_entry._at_past
01
watch (iclgtdo)
_at_entry._at_inf
want (iclgt)
obj
obj
agt
agt
I (iofgtperson)
movie(iclgt)
I (iofgtperson)
_at_def
67BREAK
68Enconversion
69Operations in Analysis
- Movement of heads
- Addition of two nodes
- Deletion of a node
- Creating relation between two nodes
- Adding dynamically inferred attributes to node
70EnConverter
- Simultaneously does
- Morphological
- Syntactic
- Semantic analysis
- Works like Turing machine with many heads
- Heads move over the input to and fro
71Enco - Operation
- Analysis windows -Two in number
- Left Analysis Window (LAW)
- Right Analysis Window (RAW)
- Condition windows - Many in number
- Left Condition Window (LCW)
- Right Condition Window (RAW)
72Deconversion
UNL expression
Output sentence
Deconverter
Rule base
Dictionary
73Operations in DeConverter
- Insertion of node
- Changing attributes
- Movement of windows
- Copying of node
74Process
- Syntax Planning
- Parent child placement
- Morphology
- For Noun, Verb and Adjectives
75The Lexicon
- Format of the dictionary entry
- minister minister(iclgtperson)
(N,ANIMT,PHSCL,PRSN) - Head word
- Universal word
- Attributes
- Morphological - Pl(plural), V_ed(past tense form)
- Syntactic - V(verb),VOA(verb of action)
- Semantic - ANIMT(animate), PLACE, TIME
headword Universal word (Attribute
list)
76The Lexicon
He forwarded the mail to the minister.
- Content words
- forward forward(iclgtsend) (V,VOA) ltE,0,0gt
- mail mail(iclgtcollection) (N,PHSCL,INANI)
ltE,0,0gt - minister minister(iclgtperson)
(N,ANIMT,PHSCL,PRSN) ltE,0,0gt
Headword
Universal Word
Attributes
77The Lexicon
He forwarded the mail to the minister.
- Closed words
- He he (PRON,SUB,SING,3RD) ltE,0,0gt
- the the (ART,THE) ltE,0,0gt
- to to (PRE,TO) ltE,0,0gt
Headword
Universal Word
Attributes
78Rule-base
- If (condition) then action
- Conditions matching attributes or the HW or UW
- Actions
- Create relation
- Add attributes
- Move windows
- Priority
79Rule format
Type of rule Condition windows Analysis
windows Priority End marker
gt (SHEAD) N,ANI agt V,VAUXAGTRES
(PRE,ON) (BLK) P20
80Condition window
- Format
- (attribute1,attribute2,)
- E.g.
- (N,TIME)
- - has noun attribute and TIME attribute
-
81Analysis window
- Format
- conditions/- attributesrelation
- If conditions are matched then do the action
- i.e., to add()/remove(-) its attributes
- Or create the specified relation with the other
analysis window - Or move the heads according to the type of rule
82Rule-Priority
- Indicated by number 1 to 255
- lowest 1
- highest 255
- General conditions Low priority
- (N) (V)
- Priority is 20
- Specific conditions High priority
- (N,ANIMT) (V,VOA)
- Priority is 30
83UNL Rule
- Right shift
- RV,VOAN,ANIMT,PRSN(PRE,OF)P60
- IF
- the left analysis window is on a verb (V) which
is also a veb of action (VOA) - AND
- the right analysis window is on a noun(N) which
is animate(ANIMT) and a person(PRSN) - AND
- the preposition-of follows the noun(N)
indicated by (PRE,OF) - THEN
- shift right (indicated by R at the start of the
rule)
84Right shift
Before application of rule
asked
director
of
...
- RV,VOAN,ANIMT,PRSN(PRE,OF)P60
After application of rule
asked
director
of
...
85UNL Rule for a Semantic Relation
- Create relation between V and N2, after
resolving the preposition preceding N2 - ltV,VOA,N,TIME,DAY,ONRES,PREREStimP25
- IF
- the left analysis window is on a verb(V) which
is verb of action (VOA) - AND
- the right analysis window is on a noun (N) and
has TIME, DAY attribute for which the preceding
preposition (on) has been processed and deleted - THEN
- set up the tim relation between V and N2.
(indicated by lt at the start of the rule)
86Semantic relation
Before application of rule
came
monday
...
- ltVRB,VOA,N,TIME,DAY,ONRES,PREREStimP25
After application of rule
came
...
87UNL expression
- S
- he is playing chess
- unl
- obj(play(iclgtcompete)06._at_entry._at_present._at_progress
, chess(iclgtgame)0E) - agt(play(iclgtcompete)06._at_entry._at_present._at_progress
, he00) - /unl
- /S
88Suggested Readings
- UNDL Foundation. 2003. The Universal Networking
Language (UNL), Specifications, Version 3 Edition
2. http//www.undl.org
89M.Tech Seminar Topics 2009
901. Cross Lingual Information Retrieval The goal
is to study the techniques of information
retrieval when the language of the query is
different from the language of the web page.
After covering basics of information retrieval
(crawling, indexing, ranking etc.), the student
is expected to study the complexities of cross
lingual search (see http//clef-campaign.org).
Considerable work on this has happened at IIT
Bombay which leads the national project on CLIR
(http//www.clia.iitb.ac.in/clia-beta-ext the
site may not be up always). 2. Semantic Role
Labeling (UNL) Sentences have inherent structure
in terms of agent, object, instrument, time,
place and such relationships between words.
When extracted, such information finds usage in
a large number of applications like machine
translation, entailment, question answering,
information extraction etc. We will study
semantic roles in the form of Universal
Networking Language (UNL http//www.undl.org)
and the ongoing work on UNL extraction spanning
over last several years (visit http//www.cse.iitb
.ac.in/pb, click on publications and see
publications on Semantics).
913. Empirical Methods in NLP-ML This is a
foundational topic proposing to investigate
mathematical and statistical principles and
methods cutting across different areas of NLP and
Machine Learning. Examples of these are
Spectral Analysis, Expectation Maximization,
Graphical Models, Lexical Semantic Association
Techniques and so on. Additional topics of study
will be NLP-suitable probability distributions,
e.g., Dirichlet distribution. Specific NLP and ML
applications will always be kept in focus.
Browse proceedings of EMNLP conference
(http//www.isca-students.org/emnlp_2009) and
CoNLL conference (http//www.cnts.ua.ac.be/conll/)
. 4. Text Entailment This studies the
methodologies involved in deciding if a piece of
text (H) is inferrable from another (e.g.,
France beat Brazil in the FIFA World Cup semi
final entails Brazil bowed out of the World
Cup). Grounded on predicate calculus, there
has been considerable work recently on using
machine learning techniques for entailment
(http//pascallin.ecs.soton.ac.uk/Challenges/RTE/)
. A number of students have worked with me on
this topic and the use of UNL in entailment.
92 5. Foundation of Artificial Intelligence We
will study work by Minsky, Newel and Simon,
McCarthy, Penrose, Dreyfus, Hofstadter, Marr,
Chomsky and so on, including Indian thoughts on
intelligence and consciousness. 6. Machine
Translation, Principles and Paradigms Starting
with Interligua Based Machine Translation using
UNL, we have made inroads in Statistical Machine
Translation (http//www.cse.iitb.ac.in/pb/papers/
acl09-smt.pdf). This topic will study many
different approaches to machine translation which
is a key problem (along with CLIR) for a
multilingual country like India. We are members
of large national projects on English-to-Indian-La
nguage and Indian-language-to-Indian-Language
machine translation. 7. Shallow Parsing for
Indian Languages This is ongoing critical work
attempting to build common platforms for
morphology, POS tag and chunk processing for
Indian Languages- especially Hindi and Marathi.
Talk to seniors Mugdha Bapat (mbapat_at_cse) and
Harshada Gune (harshada_at_cse).