How linguistics can help you - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

How linguistics can help you

Description:

... you teach something that is not conscious knowledge for experts (native speakers) ... sch nheitsk nigin ('beauty queen') vs. sch nheit and k nigin ... – PowerPoint PPT presentation

Number of Views:80
Avg rating:3.0/5.0
Slides: 45
Provided by: lsl
Category:
Tags: help | linguistics

less

Transcript and Presenter's Notes

Title: How linguistics can help you


1
How linguistics can help you
  • Lori Levin
  • LTI Immigration Course 2005

2
Outline
  • What is linguistics?
  • The influence of linguistics on language
    technologies
  • The long rule of rationalism
  • The triumph of empiricism
  • What was forgotten
  • The use of linguistics in Language Technologies
  • Outside of LTI
  • Inside LTI
  • Linguistics courses

3
Linguistics
  • Linguistics is a
  • Cognitive Science
  • Social Science
  • Area of the Humanities
  • Neuroscience
  • Computer science
  • Primarily about the human mind and human
    communication behavior.

4
Linguistics as a Cognitive Science
  • Knowledge of language is not conscious knowledge.
  • Like knowing how to walk without knowing which
    neurons and muscles are involved.
  • Sub-areas of linguistic knowledge
  • Grammar of sentences (syntax), grammar of words
    (morphology), sentence meaning (semantics), word
    meaning (lexical semantics), language use in
    context (pragmatics and discourse analysis).

5
Linguistics as a Cognitive Science
  • Do human languages differ from each other in
    random ways, or are there common, universal
    properties?
  • How are human languages different from
    mathematical languages, logical languages,
    programming languages, and animal communication
    systems?

6
Linguistics as a Cognitive Science
  • First language acquisition How do human babies
    learn something so complex so quickly with such
    imperfect input?
  • Second language acquisition How do adults learn
    a second language, and why are they so bad at
    something that babies are so good at?
  • How can you teach something that is not conscious
    knowledge for experts (native speakers)?
  • Do adults learn languages better with immediate
    or delayed feedback on errors?
  • Does explanation of foreign language grammar help
    adults learn the foreign language?

7
Linguistics as a Cognitive Science
  • Psycholinguistics How is human language
    processed in the brain and how is human language
    produced?
  • Why do you have to do a double take to understand
    this sentence (garden path sentence)
  • The cotton shirts are made of is soft.
  • Neuro-linguistics What areas of the brain are
    activated during language processing? How do
    brain injuries affect language production and
    comprehension?

8
Linguistics as a Social Science
  • Historical Linguistics How do human languages
    change over time?
  • Drift
  • Corn used to mean all small grains, e.g, pepper
    corn, barley corn.
  • What happened to the word britches?
  • English f is systematically related to French
    p. What was the common sound that they both
    derived from in some ancient language?
  • Foot/pied
  • Father/pere
  • Contact
  • Languages in proximity to each other will
    influence each others vocabulary and grammar,
    even if the languages were previously unrelated.

9
Language as a Social Science
  • Sociolinguistics
  • How do human languages vary with social factors
    such as
  • Geography
  • Age
  • Ethnic group
  • Sex
  • Race
  • Economic class
  • Social setting
  • In situations of language contact, what are the
    factors that determine whether there will be
    bilingualism or language loss?

10
Documentary Linguistics
  • Computer based tools for describing languages
  • Annotating corpora
  • Databases for annotated corpora
  • Managing lexicons
  • Has become urgent because of increased rate of
    language death.

11
Computational Linguistics
  • Formalisms for describing human languages
  • Based on formal language theory
  • Enable precise, testable formulations of
    linguistic rules
  • Using linguistic rules for language processing.

12
Language Technologies
  • Computer based tools for processing human
    languages
  • Speech recognition
  • Speech synthesis
  • Machine translation
  • Human-machine dialogue systems
  • Information Retrieval, Extraction, and
    Summarization
  • Computer-assisted language learning

13
History
  • Before Language Technologies there was
    Computational Linguistics
  • Cognitive Science
  • Artificial Intelligence how can a computer
    understand a story as well as a human does
  • Psycholinguistics how is language processed by
    the human brain
  • Formal Linguistic Theory use formal language
    theory to model human language
  • All of these topics would be covered at
    Computational Linguistics conferences
  • Models of human linguistic knowledge and human
    language processing were thought to be relevant
    to computer based processing of language
  • All computational linguists knew a lot of
    linguistics

14
History
  • Computers got faster
  • Toy systems and papers on theories of language
    gave way to implementations that worked on a real
    scale.
  • Computational linguistics became more of a
    computer science rather than a cognitive science.

15
HistoryTwo Philosophical Approaches
  • Rationalism
  • The source of knowledge is reason.
  • Knowledge comes from the mind.
  • Empiricism
  • The source of knowledge is experience.
  • Knowledge comes from data.

16
HistoryTwo Philosophical Approaches
  • Rationalism
  • The source of knowledge is reason.
  • Goal To discover the mental representation of
    linguistic rules
  • The primary data for studying language is the
    grammaticality judgment
  • Your head contains a formal grammar that accepts
    strings of words that are in your language and
    rejects strings of words that are not in your
    language.

17
History Rationalism
  • Grammaticality judgments
  • The car needs to be washed. YES
  • The car needs washed. NO
  • Car the washed needs. NO
  • This music gives a headache to me. NO
  • This music gives me a headache. YES

18
History Rationalism
  • Mental models of language are real
  • Linguistics should strive to model human
    grammaticality judgment
  • Corpora are artifacts
  • Pieces of the mental model get mixed up with
    speech and writing errors, constraints of time
    and space, etc.

19
History
  • Empiricism
  • The source of knowledge is experience
  • Goal to discover how meaning is negotiated in
    context
  • The primary data is what is attested in a corpus

20
History Empiricism
  • Sentences are found in corpora with different
    probabilities
  • This music would give a headache to anyone with
    refined sensibilities.
  • This music gives a headache to me.
  • This music gives anyone with refined
    sensibilities a headache.
  • This music gives me a headache.
  • The car needs washed.

21
History Rationalism and Empiricism
  • From 1957 until very close to the present time,
    linguistics was vehemently rationalist.
  • Actually, some empiricists survived through the
    20th century, but they typically didnt formalize
    their theories, so they didnt have a way to
    influence computational linguistics.
  • Rationalists and empiricists were intellectual
    enemies.
  • Computational Linguistics was mostly rationalist
    until the mid 1980s.

22
History Late 1980s and Early 1990s
  • Rationalist vs Empiricist debates
  • Empiricism triumphs in speech recognition

23
HistoryThe Rise of Empiricism, 1990s
  • Speech recognition
  • Statistical Machine Translation
  • Information Retrieval

24
  • I amar prestar aen. The world is changed. Han
    mathon ne nen. I feel it in the waters. Han
    mathon ne chae. I feel it in the earth. A han
    noston ned 'wilith. I smell it in the air. Much
    that once was is lost. For none now live who
    remember it
  • .And some things that should not have been
    forgotten were lost.
  •  
    Lord of the Rings

  • Movie Script

25
What was forgotten?
  • Parsers
  • Treebanks
  • Lexicons
  • Morphological analyzers

26
  • (TOP (S
  • (NP (DT The) (NNP National) (NNP Park)
  • (NNP Service))
  • (VP (VBZ hopes) (PP (IN by) (NP (CD 1966)))
  • (S (NP (-NONE- )) (AUX (TO to))
  • (VP (VB have)
  • (NP (CD 30,000) (NNS campsites))
  • (ADJP (JJ available)
  • (PP (IN for)
  • (NP (NP (NP (CD
    100,000)
  • (NNS
    campers))
  • (NP (DT
    a) (NN day))).

27
Things that were lost that should not have been
forgotten
  • Can you understand the Treebank?
  • Can you evaluate the Treebank?
  • Can you build a Treebank?
  • If you train a statistical parser on the Treebank
    and it doesnt work well, can you attribute blame
    to your training or to mistakes in the Treebank?

28
Language Technologies Today
  • Influenced by both rationalism and empiricism.
  • Rule based systems
  • Rationalist linguistics
  • Statistical systems
  • Corpus linguistics
  • Statistics and machine learning

29
Jamie Callan
30
Full-Text IndexingGerman Decompounding
  • Compounds that behave like English phrases
  • computerviren (computer viruses) vs. computer
    and viren
  • sonnenenergie (solar energy) vs. sonnen and
    energie
  • Compounds that probably dont behave like English
    phrases
  • gemüseexporteure (vegetable exporters)
  • fussballeuropameisterschaft (European football
    cup)
  • vs. fuss, ball, europa, meisterschaft
  • vs. fussball, europa, meisterschaft
  • vs. europameisterschaft im fußball
  • Slightly irregular compounds
  • schönheitskönigin (beauty queen) vs. schönheit
    and königin
  • Note introduction of s between compounds
  • erdbeben (earthquake) vs. erde and beben
  • Note that ending e in erde is elided

(Chen, 2002)
Jamie Callan
31
Example of using linguistics with speech
recognition
  • Integrate parsing and speech recognition so that
    only parsable hypotheses are considered.

32
Answer Verification
  • Parse passages to create a dependency tree among
    words
  • Attempt to unify logical forms of question and
    answer text

(M. Pasca and S. Harabagiu, SIGIR 2001)
Jamie Callan
33
For More Information
  • S. Harabagiu, D. Moldovan, C. Clark, M. Bowden,
    J. Williams, and J. Bensley. Answer mining by
    combing extraction techniques with abductive
    reasoning. In Proceedings of the Twelfth Text
    Retrieval Conference (TREC 2003). 2004.
  • M. Pasca and S. Harabagiu. High performance
    question answering. In SIGIR 2001 conference
    proceedings.

34
Recent Work on Combining Linguistic Structure
with Statistical MT
  • David Chiang, Best Paper Award, Association for
    Computational Linguistics, 2005
  • Johns Hopkins Workshop, 2005, Statistical Machine
    Translation by Parsing, led by Dan Melamed.

35
Motivation for SMT by Parsing
  • State-of-the-art SMT often produces word salad.
  • Bolting trees onto FST-based (IBM-style) SMT
    doesn't seem to help
  • SMT is very compute-intensive (slow).
  • SMT systems getting very complicated, making them
    hard to study and improve.

Dan Melamed
36
The Engineering Motivation for Syntax
  • Need fewer parameters to express ordering
    preferences.
  • E.g. Arabic adjectives always follow their
    nouns.
  • Fewer parameters are easier to learn, given
    limited training data and/or computing resources.
  • Less training data needed to reach a given level
    of accuracy.
  • Better accuracy on fixed amount of data.
  • All parameters interact during learning, so
    better estimates for syntactic parameters lead to
    better estimates for other types.

Dan Melamed
37
But isnt syntax too expensive?
  • Myth Translation models involving syntax are
    computationally too expensive to train.
  • Fact Finite-state models are more expensive!
    (more parameters)
  • Of course, bolting syntax on top of a finite
    state model incurs the combined cost of both. (So
    we avoided that.)
  • In machine learning with structured inference
    (most of NLP), better models should train faster.

Dan Melamed
38
LTI projects that use linguistic knowledge
  • Javelin Question answering
  • Nyberg, Mitamura
  • Radar Information extraction from email
  • Nyberg, Frederking, Levin
  • Lets Go Human-machine bus information system
  • Eskenazi, Black
  • Writing Tutor Detect errors made by English
    learners
  • Mitamura

39
LTI projects that use linguistic knowledge
  • AVENUE Automatically learning machine
    translation rules for minor-major language pairs
  • Carbonell, Lavie, Levin, Frederking, Brown
  • SAMPLE Reading assistant for English as a second
    language.

40
LTI faculty with training in linguistics
  • Lori Levin, Linguistics
  • Teruko Mitamura, Linguistics
  • Eric Nyberg, Computational Linguistics
  • Alan Black, Cognitive Science

41
Linguistic Courses at LTI
  • Grammars and Lexicons, 11-721
  • Levin and Mitamura
  • Fall 2005
  • Goals Skills
  • Write grammar rules that can be used for parsing
  • Work on multilingual applications
  • Understand syntactic annotation of data
  • Goals Knowledge
  • Linguistic categories
  • Noun, verb, noun phrase, verb phrase
  • Linguistic Structures
  • Main clauses, embedded clauses, relative clauses
  • Linguistic Variation
  • How to recognize the categories and structures
    even though they look different
  • Grammar writing
  • Writing grammar rules for a parser (English and
    Japanese)

42
Linguistics Courses at LTI
  • Grammar Formalisms, 11-722
  • Levin, Lavie, Black
  • Spring 2006
  • Goal
  • How to implement basic linguistic structures and
    semantics in several formalisms
  • Lexical Functional Grammar, Head Driven Phrase
    Structure Grammar, Categorial Grammar, Tree
    Adjoining Grammar
  • How parsers can be implemented for these
    formalisms

43
Linguistics Courses at LTI
  • Formal Semantics, 11-723
  • Mandy Simons (Philosophy)
  • Next offered in 2006-2007?
  • Apply formal logic to the modeling of natural
    language meaning.
  • Another thing that should not have been lost.

44
Useful Linguistics Courses at the University of
Pittsburgh
  • Phonetics and phonemics
  • The inventory of human speech sounds
  • Phonology
  • Patterns of sounds and syllable structure
  • Morphology
  • Prefixes, suffixes, and other processes for
    making words out of smaller pieces
  • Morphosyntax
  • Morphology that affects syntax e.g., passive and
    causative affixes
  • Syntactic Theory
  • The course taught at Pitt is not the most
    relevant kind of syntactic theory for LT, but it
    will give you insight into what languages have in
    common.
Write a Comment
User Comments (0)
About PowerShow.com