CS4705 - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

CS4705

Description:

CS4705 Natural Language Processing – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 11
Provided by: JuliaHir1
Category:
Tags: cs4705 | speech

less

Transcript and Presenter's Notes

Title: CS4705


1
  • CS4705
  • Natural Language Processing

2
What is Natural Language Processing?
  • The study of human languages and how they can be
    represented computationally and analyzed,
    recognized, and generated algorithmically
  • The cat is on the mat. --gt on (mat, cat)
  • on (mat, cat) --gt The cat is on the mat
  • Studying NLP involves studying natural language,
    formal representations, and algorithms for their
    manipulation

3
What must we know about human language to do NLP?
  • Morphology words and their composition
  • cat, cats, dogs
  • child, children
  • undo, union
  • Phonetics and Phonology speech sounds, their
    production, and the rule systems that govern
    their use
  • tap, butter
  • nice white rice height/hot kite/cot
    night/not...
  • city hall, parking lot, city hall parking lot
  • The cat is on the mat. The cat is on the mat?

4
  • Syntax the structuring of words into larger
    phrases
  • John hit Bill
  • Bill was hit by John (passive)
  • Bill, John hit (preposing)
  • Who John hit was Bill (wh-cleft)
  • Semantics the (truth-functional) meaning of
    words and phrases
  • gun(x) holster(y) in(x,y)
  • fake (gun (x)) (compositional semantics)
  • The king of France is bald (presupposition
    violation)
  • bass fishing, bass playing (word sense
    disambiguation)

5
  • Pragmatics and Discourse the meaning of words
    and phrases in context
  • George got married and had a baby.
  • George had a baby and got married.
  • Some people left early.
  • Prosodic Variation
  • German teachers
  • Bill doesnt drink because hes unhappy.
  • John only introduced Mary to Sue.
  • John called Bill a Republican and then he
    insulted him.
  • John likes his mother, and so does Bill.

6
Bureaucracy
  • Instructor Julia Hirschberg
  • (julia_at_cs.columbia.edu)
  • Office and hours CEPSR 705, TBA
  • Teaching Assistant Sameer Maskey
  • (smaskey_at_cs.columbia.edu)
  • Office and hours CEPSR 720, TBA
  • Syllabus available at http//www1.cs.columbia.edu/
    julia/cs4705/syllabus.html

7
  • Text Daniel Jurafsky and James H. Martin, Speech
    and Language Processing, Prentice-Hall, 2000
    (available at Morningside Bookstore)
  • Note errata available on website check before
    reading each chapter please
  • Assignments 3 homework assignments, midterm,
    final (1 extra homework for graduate students)
  • Evaluation 40 homework 40 exams 20 class
    participation

8
Academic Integrity
  • Copying or paraphrasing someone's work (code
    included), or permitting your own work to be
    copied or paraphrased, even if only in part, is
    forbidden, and will result in an automatic grade
    of 0 for the entire assignment or exam in which
    the copying or paraphrasing was done. Your grade
    should reflect your own work. If you are going to
    have trouble completing an assignment, talk to
    the instructor or TA in advance of the due date
    please. Everyone Read/write protect your
    homework files at all times.

9
NLP Applications
  • Speech Synthesis, Speech Recognition, IVR Systems
    (TOOT more or less succeeds)
  • Information Retrieval (SCANMail demo)
  • Information Extraction
  • Question Answering (AQUA)
  • Machine Translation (SYSTRAN)
  • Summarization (NewsBlaster)
  • Automated Psychotherapy (Eliza)

10
For Next Class
  • Read Chapters 1-2
  • For fun Experiment with Eliza
  • Does she pass the Turing Test?
  • What kind of input defeats her?
  • How could you improve her ability to fool people
    into thinking she is human? Using FSAs alone?
Write a Comment
User Comments (0)
About PowerShow.com