Title: 11-719 Computational Models of Discourse Analysis
111-719Computational Models of Discourse Analysis
- Carolyn Penstein Rosé
- Language Technologies Institute
- and Human-Computer Interaction Institute
2New York Times ArticleWhat strikes you about the
agents style of speaking?
- June 24, 2010
- Computers Learn to Listen, and Some Talk Back
- By STEVE LOHR and JOHN MARKOFF
- Hi, thanks for coming, the medical assistant
says, greeting a mother with her 5-year-old son.
Are you here for your child or yourself? - The boy, the mother replies. He has diarrhea.
- Oh no, sorry to hear that, she says, looking
down at the boy. - The assistant asks the mother about other
symptoms, including fever (slight) and
abdominal pain (He hasnt been complaining). - She turns again to the boy. Has your tummy been
hurting? Yes, he replies. - After a few more questions, the assistant
declares herself not that concerned at this
point. She schedules an appointment with a
doctor in a couple of days. The mother leads her
son from the room, holding his hand. But he keeps
looking back at the assistant, fascinated, as if
reluctant to leave. - Maybe that is because the assistant is the
disembodied likeness of a womans face on a
computer screen a no-frills avatar. Her words
of sympathy are jerky, flat and mechanical. But
she has the right stuff the ability to
understand speech, recognize pediatric conditions
and reason according to simple rules to make an
initial diagnosis of a childhood ailment and its
seriousness. And to win the trust of a little
boy.
3Not all so rosy
4Are we missing something?Sociolinguists and
Discourse Analysts have been studying social
aspects of language since the 20s and 30s!!!
5Ask yourself thisWhere do I sound like Im from?
Actually from California, but picked up some
accent from my dad from New York... Did you
notice the a in Carolyn? But not the back-open
r. And if you heard me say daughter But how
often do I say that in class?
Note that context is everything. a in sat
doesnt have the same significance as a in
Carolyn.
6What information are we throwing away or
ignoring that would allow us to distinguish
meaningful variation from meaningless variation?
7What will you get out of this class?
- Learn to read the primary literature in
sociolinguistics, discourse analysis, and
pragmatics - Get a more intimate familiarity with the
state-of-the-art in language processing applied
to analysis of social media, especially
conversation and narrative - Explore what insights these fields of linguistics
can contribute to language technologies - Explore what language technologies might be able
to do to advance these fields of linguistics - Get hands on experience working on both
8Please Introduce Yourself
- What experience do you have with discourse
analysis? - What do you most want to get out of this class?
9Review from my LTI Colloquium talk
10Discourse and Identity
- Identity is reflected in the way we present
ourselves in conversational interactions - Reflects who we are, how we think, and where we
belong - Also reflects how we think of our audience
- Examples
- Regional dialect shows my identification with
where I am from, but also shows I am comfortable
letting you identify me that way - Jargon and technical terms shows my
identification with a work community, but also
shows I expect you to be able to relate to that
part of my life - Level of formality shows where we stand in
relation to one another - Explicitness in reference shows whether I am
treating you like an insider or an outsider
11Discourse and Identity
- Discourse is text above the clause level (Martin
Rose, 2007) - A Discourse is an ongoing conversation type
- Socialization is the process of joining a
Discourse (Lave Wenger, 1991 Sfard, 2010) - We join Discourses that match our core identity
(de Fina, Schiffrin, Bamberg, 2006) - In moving from the periphery to the core of a
Discourse community, we sound more and more like
the community (Arguello et al., 2006) - A discourse is one instance of it token
- All discourses contain echoes of previous
discourses (Bakhtin, 1983)
Lakoff Johnson, 1980
Lave Wenger, 1991
12Metaphors Structure our Experience
- We describe arguments using terms related to war
- Using a typical war script to structure a story
about an argument - We orient towards arguments as though they were
wars - Our conversational partner is our opponent
- We may feel that we won or lost
- We may feel wounded as a result
13Discourses, Frames, and Metaphors
- Frame A portion of a discourse belonging to
distinct Discourse - Metaphor One linguistic device that can be used
to define a set of discourse practices that
constitute a frame - Topic models a technical approach that makes
sense for identifying frames within a discourse - A discourse could be drawn from a mixture of
Discourses - Within the same conversation, we may wear a
variety of hats - E.g., the same discourse with a co-worker may
contain exchanges pertaining to our relationship
as colleagues and others to our relationship as
friends
14Now its your turn
15http//video.google.com/videoplay?docid-654777733
6881961043hlen
16Discussion Questions
- What other stories/movies/genres does this remind
you of? - What is the message being communicated about
Hummers? - What is communicated about the company that makes
them? - What is communicated about the assumed audience?
- What are other messages?
- E.g., are any political statements being made?
17Semester Plan
- In each Unit
- Readings from Discourse Analysis and
Sociolinguistics - Readings from Language Technologies
- Hands-on assignment
- Implementation and corpus based experiment
- Competitive error analysis
- Student Presentations
- Unit 1 Theoretical Foundation
- Unit 2 Linguistic Structure
- Unit 3 Sentiment
- Unit 4 Identity and Personality
- Unit 5 Social Positioning
18Gradingpeople who make a good faith effort
always do well in my courses
- 15 for each of 5 Unit assignments
- First one is a discourse analysis
- Others are corpus based experiments
- We provide the corpus
- You implement a feature extractor, test it, do an
error analysis, and present your well motivated
idea and evaluation in class - 10 for class participation
- Doing readings (will be posted to course Drupal)
- Posting to Drupal discussion by 10pm the night
before class - Actively contributing to class discussions
- 15 for final critique of a technical paper
19Corpora for experimentation
- Unit 2 Maptask data (Negotiation coding)
- Possibly other chat corpora with same coding as
well - Unit 3 Product Reviews (Sentiment)
- Unit 4 Blog corpus (Age and Gender)
- Unit 5 AMI meeting corpus (Dialogue Acts)
- Other corpora
- Email discussion list (Social Support coding)
20SIDE Workbench for Experimentation
- http//www.cs.cmu.edu/cprose/SIDE.html
21SIDE
22SIDE
23Two Options
- Create your own feature extractor plugins
- We will provide documents abstract classes that
you create specializations of - Programmed in Java
- Elijah is the developer and can answer your
questions - Use SIDEs feature creation functionality to
create novel functions - Grades will be based on
- The extent to which your features are theory
motivated or data motivated - The depth of your error analysis
24SIDE
25SIDE
26SIDE
27Setting up the course Drupal
- If you are not registered, please do so
- If you dont have an Andrew account, make sure I
have your email address - We will manage the course through Drupal
- All materials, including pdfs for required
readings, will be posted to Drupal - Slides for all lectures will be posted to Drupal
after class - Discussion threads in preparation for each
lecture will be found on Drupal
28Assignment 1 (not due til Jan26)
- Transcribe a scene from a favorite move, play, or
TV show - As a shortcut, you can find a script online
- Excerpt should be no more than one page of text
- Select one of the methodologies we are discussing
in Unit 1 (e.g., from Gee, Martin Rose, or
Levinson) - Do a qualitative analysis of the script and write
it up - Use readings from Unit 1 as a collection of
models to chose from - Due on Week 3 lecture 2
- Turn in transcript, raw analysis (can be
annotations added to the transcript), and write
up (your interpretation of the analysis) - Prepare a powerpoint presentation for class (no
more than 5 minutes of material)
29For next time.
- You will receive login information for Drupal
- http//kanagawa.lti.cs.cmu.edu/11719/
- Read excerpts from James Gees book (linked to
syllabus entry for Wednesdays lecture) - Post to drupal (in response to discussion
question posted for Week 1 Lecture 2)
30Questions?
- Carolyn Penstein Rosé
- http//www.cs.cmu.edu/cprose
- cprose_at_cs.cmu.edu
- Gates-Hillman Center 5415