How linguistics can help you - PowerPoint PPT Presentation

1 / 44
About This Presentation

How linguistics can help you


... you teach something that is not conscious knowledge for experts (native speakers) ... sch nheitsk nigin ('beauty queen') vs. sch nheit and k nigin ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 45
Provided by: lsl
Tags: help | linguistics


Transcript and Presenter's Notes

Title: How linguistics can help you

How linguistics can help you
  • Lori Levin
  • LTI Immigration Course 2005

  • What is linguistics?
  • The influence of linguistics on language
  • The long rule of rationalism
  • The triumph of empiricism
  • What was forgotten
  • The use of linguistics in Language Technologies
  • Outside of LTI
  • Inside LTI
  • Linguistics courses

  • Linguistics is a
  • Cognitive Science
  • Social Science
  • Area of the Humanities
  • Neuroscience
  • Computer science
  • Primarily about the human mind and human
    communication behavior.

Linguistics as a Cognitive Science
  • Knowledge of language is not conscious knowledge.
  • Like knowing how to walk without knowing which
    neurons and muscles are involved.
  • Sub-areas of linguistic knowledge
  • Grammar of sentences (syntax), grammar of words
    (morphology), sentence meaning (semantics), word
    meaning (lexical semantics), language use in
    context (pragmatics and discourse analysis).

Linguistics as a Cognitive Science
  • Do human languages differ from each other in
    random ways, or are there common, universal
  • How are human languages different from
    mathematical languages, logical languages,
    programming languages, and animal communication

Linguistics as a Cognitive Science
  • First language acquisition How do human babies
    learn something so complex so quickly with such
    imperfect input?
  • Second language acquisition How do adults learn
    a second language, and why are they so bad at
    something that babies are so good at?
  • How can you teach something that is not conscious
    knowledge for experts (native speakers)?
  • Do adults learn languages better with immediate
    or delayed feedback on errors?
  • Does explanation of foreign language grammar help
    adults learn the foreign language?

Linguistics as a Cognitive Science
  • Psycholinguistics How is human language
    processed in the brain and how is human language
  • Why do you have to do a double take to understand
    this sentence (garden path sentence)
  • The cotton shirts are made of is soft.
  • Neuro-linguistics What areas of the brain are
    activated during language processing? How do
    brain injuries affect language production and

Linguistics as a Social Science
  • Historical Linguistics How do human languages
    change over time?
  • Drift
  • Corn used to mean all small grains, e.g, pepper
    corn, barley corn.
  • What happened to the word britches?
  • English f is systematically related to French
    p. What was the common sound that they both
    derived from in some ancient language?
  • Foot/pied
  • Father/pere
  • Contact
  • Languages in proximity to each other will
    influence each others vocabulary and grammar,
    even if the languages were previously unrelated.

Language as a Social Science
  • Sociolinguistics
  • How do human languages vary with social factors
    such as
  • Geography
  • Age
  • Ethnic group
  • Sex
  • Race
  • Economic class
  • Social setting
  • In situations of language contact, what are the
    factors that determine whether there will be
    bilingualism or language loss?

Documentary Linguistics
  • Computer based tools for describing languages
  • Annotating corpora
  • Databases for annotated corpora
  • Managing lexicons
  • Has become urgent because of increased rate of
    language death.

Computational Linguistics
  • Formalisms for describing human languages
  • Based on formal language theory
  • Enable precise, testable formulations of
    linguistic rules
  • Using linguistic rules for language processing.

Language Technologies
  • Computer based tools for processing human
  • Speech recognition
  • Speech synthesis
  • Machine translation
  • Human-machine dialogue systems
  • Information Retrieval, Extraction, and
  • Computer-assisted language learning

  • Before Language Technologies there was
    Computational Linguistics
  • Cognitive Science
  • Artificial Intelligence how can a computer
    understand a story as well as a human does
  • Psycholinguistics how is language processed by
    the human brain
  • Formal Linguistic Theory use formal language
    theory to model human language
  • All of these topics would be covered at
    Computational Linguistics conferences
  • Models of human linguistic knowledge and human
    language processing were thought to be relevant
    to computer based processing of language
  • All computational linguists knew a lot of

  • Computers got faster
  • Toy systems and papers on theories of language
    gave way to implementations that worked on a real
  • Computational linguistics became more of a
    computer science rather than a cognitive science.

HistoryTwo Philosophical Approaches
  • Rationalism
  • The source of knowledge is reason.
  • Knowledge comes from the mind.
  • Empiricism
  • The source of knowledge is experience.
  • Knowledge comes from data.

HistoryTwo Philosophical Approaches
  • Rationalism
  • The source of knowledge is reason.
  • Goal To discover the mental representation of
    linguistic rules
  • The primary data for studying language is the
    grammaticality judgment
  • Your head contains a formal grammar that accepts
    strings of words that are in your language and
    rejects strings of words that are not in your

History Rationalism
  • Grammaticality judgments
  • The car needs to be washed. YES
  • The car needs washed. NO
  • Car the washed needs. NO
  • This music gives a headache to me. NO
  • This music gives me a headache. YES

History Rationalism
  • Mental models of language are real
  • Linguistics should strive to model human
    grammaticality judgment
  • Corpora are artifacts
  • Pieces of the mental model get mixed up with
    speech and writing errors, constraints of time
    and space, etc.

  • Empiricism
  • The source of knowledge is experience
  • Goal to discover how meaning is negotiated in
  • The primary data is what is attested in a corpus

History Empiricism
  • Sentences are found in corpora with different
  • This music would give a headache to anyone with
    refined sensibilities.
  • This music gives a headache to me.
  • This music gives anyone with refined
    sensibilities a headache.
  • This music gives me a headache.
  • The car needs washed.

History Rationalism and Empiricism
  • From 1957 until very close to the present time,
    linguistics was vehemently rationalist.
  • Actually, some empiricists survived through the
    20th century, but they typically didnt formalize
    their theories, so they didnt have a way to
    influence computational linguistics.
  • Rationalists and empiricists were intellectual
  • Computational Linguistics was mostly rationalist
    until the mid 1980s.

History Late 1980s and Early 1990s
  • Rationalist vs Empiricist debates
  • Empiricism triumphs in speech recognition

HistoryThe Rise of Empiricism, 1990s
  • Speech recognition
  • Statistical Machine Translation
  • Information Retrieval

  • I amar prestar aen. The world is changed. Han
    mathon ne nen. I feel it in the waters. Han
    mathon ne chae. I feel it in the earth. A han
    noston ned 'wilith. I smell it in the air. Much
    that once was is lost. For none now live who
    remember it
  • .And some things that should not have been
    forgotten were lost.
    Lord of the Rings

  • Movie Script

What was forgotten?
  • Parsers
  • Treebanks
  • Lexicons
  • Morphological analyzers

  • (TOP (S
  • (NP (DT The) (NNP National) (NNP Park)
  • (NNP Service))
  • (VP (VBZ hopes) (PP (IN by) (NP (CD 1966)))
  • (S (NP (-NONE- )) (AUX (TO to))
  • (VP (VB have)
  • (NP (CD 30,000) (NNS campsites))
  • (ADJP (JJ available)
  • (PP (IN for)
  • (NP (NP (NP (CD
  • (NNS
  • (NP (DT
    a) (NN day))).

Things that were lost that should not have been
  • Can you understand the Treebank?
  • Can you evaluate the Treebank?
  • Can you build a Treebank?
  • If you train a statistical parser on the Treebank
    and it doesnt work well, can you attribute blame
    to your training or to mistakes in the Treebank?

Language Technologies Today
  • Influenced by both rationalism and empiricism.
  • Rule based systems
  • Rationalist linguistics
  • Statistical systems
  • Corpus linguistics
  • Statistics and machine learning

Jamie Callan
Full-Text IndexingGerman Decompounding
  • Compounds that behave like English phrases
  • computerviren (computer viruses) vs. computer
    and viren
  • sonnenenergie (solar energy) vs. sonnen and
  • Compounds that probably dont behave like English
  • gemüseexporteure (vegetable exporters)
  • fussballeuropameisterschaft (European football
  • vs. fuss, ball, europa, meisterschaft
  • vs. fussball, europa, meisterschaft
  • vs. europameisterschaft im fußball
  • Slightly irregular compounds
  • schönheitskönigin (beauty queen) vs. schönheit
    and königin
  • Note introduction of s between compounds
  • erdbeben (earthquake) vs. erde and beben
  • Note that ending e in erde is elided

(Chen, 2002)
Jamie Callan
Example of using linguistics with speech
  • Integrate parsing and speech recognition so that
    only parsable hypotheses are considered.

Answer Verification
  • Parse passages to create a dependency tree among
  • Attempt to unify logical forms of question and
    answer text

(M. Pasca and S. Harabagiu, SIGIR 2001)
Jamie Callan
For More Information
  • S. Harabagiu, D. Moldovan, C. Clark, M. Bowden,
    J. Williams, and J. Bensley. Answer mining by
    combing extraction techniques with abductive
    reasoning. In Proceedings of the Twelfth Text
    Retrieval Conference (TREC 2003). 2004.
  • M. Pasca and S. Harabagiu. High performance
    question answering. In SIGIR 2001 conference

Recent Work on Combining Linguistic Structure
with Statistical MT
  • David Chiang, Best Paper Award, Association for
    Computational Linguistics, 2005
  • Johns Hopkins Workshop, 2005, Statistical Machine
    Translation by Parsing, led by Dan Melamed.

Motivation for SMT by Parsing
  • State-of-the-art SMT often produces word salad.
  • Bolting trees onto FST-based (IBM-style) SMT
    doesn't seem to help
  • SMT is very compute-intensive (slow).
  • SMT systems getting very complicated, making them
    hard to study and improve.

Dan Melamed
The Engineering Motivation for Syntax
  • Need fewer parameters to express ordering
  • E.g. Arabic adjectives always follow their
  • Fewer parameters are easier to learn, given
    limited training data and/or computing resources.
  • Less training data needed to reach a given level
    of accuracy.
  • Better accuracy on fixed amount of data.
  • All parameters interact during learning, so
    better estimates for syntactic parameters lead to
    better estimates for other types.

Dan Melamed
But isnt syntax too expensive?
  • Myth Translation models involving syntax are
    computationally too expensive to train.
  • Fact Finite-state models are more expensive!
    (more parameters)
  • Of course, bolting syntax on top of a finite
    state model incurs the combined cost of both. (So
    we avoided that.)
  • In machine learning with structured inference
    (most of NLP), better models should train faster.

Dan Melamed
LTI projects that use linguistic knowledge
  • Javelin Question answering
  • Nyberg, Mitamura
  • Radar Information extraction from email
  • Nyberg, Frederking, Levin
  • Lets Go Human-machine bus information system
  • Eskenazi, Black
  • Writing Tutor Detect errors made by English
  • Mitamura

LTI projects that use linguistic knowledge
  • AVENUE Automatically learning machine
    translation rules for minor-major language pairs
  • Carbonell, Lavie, Levin, Frederking, Brown
  • SAMPLE Reading assistant for English as a second

LTI faculty with training in linguistics
  • Lori Levin, Linguistics
  • Teruko Mitamura, Linguistics
  • Eric Nyberg, Computational Linguistics
  • Alan Black, Cognitive Science

Linguistic Courses at LTI
  • Grammars and Lexicons, 11-721
  • Levin and Mitamura
  • Fall 2005
  • Goals Skills
  • Write grammar rules that can be used for parsing
  • Work on multilingual applications
  • Understand syntactic annotation of data
  • Goals Knowledge
  • Linguistic categories
  • Noun, verb, noun phrase, verb phrase
  • Linguistic Structures
  • Main clauses, embedded clauses, relative clauses
  • Linguistic Variation
  • How to recognize the categories and structures
    even though they look different
  • Grammar writing
  • Writing grammar rules for a parser (English and

Linguistics Courses at LTI
  • Grammar Formalisms, 11-722
  • Levin, Lavie, Black
  • Spring 2006
  • Goal
  • How to implement basic linguistic structures and
    semantics in several formalisms
  • Lexical Functional Grammar, Head Driven Phrase
    Structure Grammar, Categorial Grammar, Tree
    Adjoining Grammar
  • How parsers can be implemented for these

Linguistics Courses at LTI
  • Formal Semantics, 11-723
  • Mandy Simons (Philosophy)
  • Next offered in 2006-2007?
  • Apply formal logic to the modeling of natural
    language meaning.
  • Another thing that should not have been lost.

Useful Linguistics Courses at the University of
  • Phonetics and phonemics
  • The inventory of human speech sounds
  • Phonology
  • Patterns of sounds and syllable structure
  • Morphology
  • Prefixes, suffixes, and other processes for
    making words out of smaller pieces
  • Morphosyntax
  • Morphology that affects syntax e.g., passive and
    causative affixes
  • Syntactic Theory
  • The course taught at Pitt is not the most
    relevant kind of syntactic theory for LT, but it
    will give you insight into what languages have in
Write a Comment
User Comments (0)