Title: Linguistics and Language Technologies
1Linguistics and Language Technologies
- Lori Levin
- 11-721 Grammars and Lexicons
- Fall Term 2003
2Linguistics
- Linguistics is a
- Cognitive Science
- Social Science
- Area of the Humanities
- Also, neuro-science, area of mathematics,
computer science, etc. - Primarily about the human mind and human
communication behavior.
3Linguistics as a Cognitive Science
- Knowledge of language is not conscious knowledge.
- Like knowing how to walk without knowing which
neurons and muscles are involved. - What does knowledge of a language consist of?
- Sub-areas of linguistic knowledge
- Grammar of sentences (syntax), grammar of words
(morphology), sentence meaning (semantics), word
meaning (lexical semantics), language use in
context (pragmatics and discourse analysis). - Do human languages differ from each other in
random ways, or are there common, universal
properties? - How are human languages different from
mathematical languages, logical languages,
programming languages, and animal communication
systems?
4Linguistics as a Cognitive Science
- First language acquisition How do human babies
learn something so complex so quickly with such
imperfect input? - Second language acquisition How do adults learn
a second language, and why are they so bad at
something that babies are so good at? - Do adults learn languages better with immediate
or delayed feedback on errors? - Does explanation of foreign language grammar help
adults learn the foreign language? - Psycholinguistics How is human language
processed in the brain and how is human language
produced? - When you hear the sentence He put his money in
the bank? does your brain activate only the
sense of bank that is related to money, or all
of the senses of bank because they sound the
same (e.g., river bank). - Why do you have to do a double take to understand
this sentence - The cotton shirts are made of is soft.
- Neuro-linguistics What areas of the brain are
activated during language processing? How do
brain injuries affect language production and
comprehension?
5Linguistics as a Social Science
- Historical Linguistics How do human languages
change over time? - Drift
- Corn used to mean all small grains, e.g, pepper
corn, barley corn. - What happened to the word britches?
- English f is systematically related to French
p. What was the common sound that they both
derived from in some ancient language? - Foot/pied
- Father/pere
- Contact
- Languages in proximity to each other will
influence each others vocabulary and grammar,
even if the languages were previously unrelated.
6Language as a Social Science
- Sociolinguistics
- How do human languages vary with social factors
such as - Geography
- Age
- Ethnic group
- Sex
- Race
- Economic class
- In situations of language contact, what are the
factors that determine whether there will be
bilingualism or language loss?
7Language Technologies
- Computer based tools for processing human
languages - Speech recognition
- Speech synthesis
- Machine translation
- Human-machine dialogue systems
- Information Retrieval, Extraction, and
Summarization - Computer-assisted language learning
8Why should language technologists learn
linguistics?
9What does knowledge of a language consist of?
- Can he and Sam be the same person?
- He thinks that Sam is wrong.
- Sam expected to see him.
- Sam thinks that he is wrong.
- Sam believed him to be wrong.
- Sam expected Bill to see him.
- The person that he saw likes Sam.
10What does knowledge of a language consist of?
- Recognition of ambiguity
- I saw a man with a telescope.
- We sold her dog biscuits.
- Milk drinkers turn to powder.
- I saw a friend of Johns brother.
- Grandmother of nine makes hole in one.
11What does knowledge of a language consist of?
- Recognition of grammaticality.
- Many linguists (probably a majority) assume that
people can distinguish strings of words that are
sentences of their language from strings of words
that are not sentences of their language. - So imagine that you are a machine or a classifier
that takes a sentence as input, and returns
accept or reject as output.
12Grammaticality
- I gave back the car to him.
- I gave the car back to him.
- I gave the car to him back.
- I gave back him the car.
- I gave him back the car.
- I gave him the car back.
13Grammaticality
- I gave back the car to him.
- I gave the car back to him.
- I gave the car to him back.
- I gave back him the car.
- I gave him back the car.
- I gave him the car back.
14Grammaticality
- A string of words that you recognize as a
sentence in your native language is grammatical. - A string of words that you do not recognize as a
sentence in your native language is
ungrammatical. - When you decide whether a sentence is grammatical
or ungrammatical, this is called giving a
grammaticality judgement. - Ungrammatical sentences are preceded by an
asterisk or star (). Sometimes they are called
starred sentences. - If native speakers cant decide whether the
sentence is grammatical or ungrammatical, it is
preceded by a combination of stars and question
marks.
15Grammaticality Descriptive and Prescriptive
Linguistics
- Linguists describe what people say.
- Me and him went to the movies.
- Sam wants to boldly go where no one has gone
before. - Linguists do not prescribe what people should
say. - Language technologists dont get a say in the
matter. - If its in the input, you have to deal with it.
- When you give a grammaticality judgement, you are
not supposed to judge whether the sentence is the
most elegant or appropriate --- just whether it
is a sentence of your language or not.
16Grammaticality
- Grammaticality is not completely determined by
meaning - Sentences 1 and 2 have similar meaning
- Bill saw Sam and Sue.
- Bill saw Sam with Sue.
- Sentence 2 can be transformed into a question by
(1) changing Sue to Who, (2) moving it to
the beginning of the sentence, and (3) making
some changes to the verb. - Who did Bill see Sam with?
- The same process applied to Sentence 1 does not
result in a grammatical sentence. - Who did Bill see Sam and?
17Grammaticality
- Sentences that are only possible in poetry are
probably not grammatical - To her we laurels bring.
- indirect-object subject direct-object verb
- We bring laurels to her.
- subject verb direct-object indirect-object
18Grammaticality
- Sentences that are only possible in poetry are
probably not grammatical - Bring we to our alma mater trust and honor due.
- verb subject indirect-object direct-object
- We bring trust and honor (that are) due to our
alma mater. - subject verb direct-object indirect-object
19Grammaticality
- Sentences that are understandable, but sound like
mistakes are probably not grammatical. - These are things that I dont know anyone who
says.
20Grammaticality
- However, many types of sentences that are found
in writing, or are restricted to special contexts
are considered to be grammatical and even have
names - Locative Inversion In this village live many
people. - Topicalization Sam, I like.
- Heavy NP Shift I presented to the students many
examples of strange and unusual constructions.
(indirect object comes before direct object
because the direct object is too long)
21Problems with Grammaticality
- Dialect differences
- The car needs washed.
- (The car needs to be washed.)
- We go to the movies a lot anymore.
- (We go to the movies a lot these days.)
- I gave it her.
- (I gave it to her.)
- It were me what told her.
- (It was me that told her.)
- Mine is bigger than what yours is.
- (Mine is bigger than yours is.)
22Problems with grammaticality
- What is the source of the problem?
- Colorless green ideas sleep furiously.
- Sleep ideas green furiously.
23Grammaticality in language technologies
- Real input (especially spoken input) is not
always well-formed, so you should not build a
program that accepts only grammatical sentences. - Can we do away with grammar in language
technologies?
24Grammaticality in Language Technologies
- You cannot extract the meaning of a sentence
without processing the grammar - Sue interviewed Sam.
- Sam interviewed Sue.
- LT output has to be comprehensible, and
therefore, mostly grammatical - Synthesized speech
- An automatically produced translation
- An automatically produced summary
- Error detection programs for computer-assisted
language instruction or for word processing must
distinguish grammatical from ungrammatical
sentences.