Automatic Identification of Discourse Moves in Scientific Article Introductions

About This Presentation

Title:

Automatic Identification of Discourse Moves in Scientific Article Introductions

Description:

Automated essay scoring (AES) in performance-based and high ... Elliott, 2003; Landauer et al., 2003; Mitchell et al., 2002; Page, 2003; Rudner and Liang, 2002) ... – PowerPoint PPT presentation

Number of Views:78

Avg rating:3.0/5.0

Slides: 26

Provided by: Ele991

Learn more at: https://www.cs.rochester.edu

Category:

more less

Transcript and Presenter's Notes

Title: Automatic Identification of Discourse Moves in Scientific Article Introductions

1
Automatic Identification of Discourse Moves in
Scientific Article Introductions
NICK PENDAR AND ELENA COTOS IOWA STATE
UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF
NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE
19, 2008
2
Outline

Background and motivation
Discourse move identification
Data and annotation scheme
Feature selection
Sentence representation
Classifier
Evaluation
Inter-annotator agreement
Further work

3
Automated evaluation Background

Automated essay scoring (AES) in
performance-based and high-stakes standardized
tests (e.g., ACT, GMAT, TOEFL, etc.)?
Automated error detection in L2 output (Burstein
and Chodorow, 1999 Chodorow et al., 2007 Han et
al., 2006 Leacock and Chodorow, 2003)?
Assessment of various constructs, e.g., topical
content, grammar, style, mechanics, syntactic
complexity, and deviance or plagiarism (Burstein,
2003 Elliott, 2003 Landauer et al., 2003
Mitchell et al., 2002 Page, 2003 Rudner and
Liang, 2002)
Text organization limited to recognizing the
five-paragraph essay format, thesis, and topic
sentences
AntMover (Anthony and Lashkia, 2003)?

4
Automated evaluation CALI Motivation

Wide range of possibilities for high quality
evaluation and feedback (Criterion Burstein,
Chodorow, Leacock, 2004)?
Potential in formative assessment, but the
effects of intelligent formative feedback are not
fully investigated
Warschauer and Ware (2006) call for the
development of a classroom research agenda that
would help evaluate and guide the application of
AES in the writing pedagogy
the potential of automated essay evaluation for
improving student writing is an empirical
question, and virtually no peer-reviewed research
has yet been published (Hyland and Hyland, 2006,
p. 109)?

5
Automated evaluation EAP Motivation

EAP pedagogical approaches (Cortes, 2006 Levis
Levis-Muller, 2003 Vann Myers, 2001) fail to
provide NNSs with sufficient academic writing
practice and remediational guidance
Problem of disciplinarity
An NLP-based academic discourse evaluation
software application could account for this
drawback
Such an application has not yet been developed

6
Automated evaluation Research Motivation

Long-term research goals
design and implementation of IADE (Intelligent
Academic Discourse Evaluator)?
analysis of IADE effectiveness for formative
assessment purposes

Evaluates students research article
introductions in terms of moves/steps (Swales
1990, 2004)?
Draws from
SLA models interactionist views (Caroll, 1999
Gass, 1997 Long, 1996 Long Robinson, 1998
Mackey, Gass, McDonough, 2000 Swain, 1993) and
Systemic Functional Linguistics (Martin, 1992
Halliday, 1985)?
Skill Acquisition Theory of learning (DeKeyser,
2007 )?
Is informed by empirical research on the
provision of feedback
Is informed by Evidence Centered Design
principles (Mislevy et al., 2006)?

8
Discourse Move Identification

Approached as a classification problem (similar
to Burstein et al., 2003)?
given a sentence and a finite set of moves and
steps, what move/step does the sentence signify?
ISUAW corpus 1,623 articles 1,322,089 words
average length of articles 814.09 words
Stratified sampling of 401 introduction sections
representative of 20 academic disciplines
Sub-corpus 267,029 words average length 665.91
words 11,149 sentences
Manual annotation

9
Discourse Move Identification

Annotation scheme (Swales, 1990 Swales, 2004)?

10
Discourse Move Identification

Multiple layers of annotation for cases when the
same sentence signified more than one move or
more than one step

11
Feature Selection

Features that reliably indicate a move/step
Text-categorization approach (see Sebastiani,
2002)?
Each sentence treated as a data item to be
classified and represented as an n-dimensional
vector in the Euclidean space
The task of the learning algorithm is to find a
function F S ? M that would map the sentences
in the corpus S to classes in M m1,m2,m3
Identification of moves, not yet steps

12
Feature Selection

Extraction of word unigrams, bigrams, and
trigrams from the annotated corpus
Preprocessing
All tokens stemmed using the NLTK port of the
Porter Stemmer algorithm (Porter, 1980)?
All numbers in the texts replaced by the string
_number_
The tokens inside each n-gram alphabetized in
case of bigrams and trigrams
All n-grams with a frequency of less than five
excluded

13
Feature Selection

Odds ratio
Conditional probabilities are calculated as
maximum likelihood estimates
N-grams with maximum odds ratios selected as
features

14
Sentence Representation

Each sentence represented as a vector
Presence or absence of terms in sentences
recorded as Boolean values (0 for the absence of
the corresponding term or a 1 for its presence)?

15
Classifier

Support Vector Machines (SVM) (Basu et al., 2003
Burges, 1998 Cortes and Vapnik, 1995 Joachims,
1998 Vapnik, 1995)?
five-fold cross validation
Machine learning environment RAPIDMINER (Mierswa
et al., 2006)?
RBF kernel
found through a set of different parameter
settings on the feature set with 3,000 unigrams
Parameters not necessarily the best exhaustive
searches will be performed on the other feature
sets

16
Evaluation

Five-fold cross validation on 14 different
feature sets were performed

17
Evaluation
Accuracy - the proportion of classifications that
agreed with the manually assigned labels
18
Evaluation

Precision - what proportion of the items assigned
to a given category actually belonged to it
Recall - what proportion of the items actually
belonging to a category were labeled correctly

19
Evaluation

Trigram models result in the best precision

Unigram models result in the best recall

20
Evaluation

Move 2 is most difficult to identify as revealed
by error analysis Move 2 gets misclassified as
Move 1
Use the relative position of the sentence in the
text to disambiguate the move involved
see what percentage of Move 2 sentences
identified as Move 1 by the system also have been
labeled Move 1 by the annotator
Extracted features are not discipline-dependent

21
This just in

Built a model with top 3000 unigrams and top 3000
trigrams
Precision 91.14
Recall 82.98
Kappa 87.57

22
Inter-annotator agreement

Second annotations on a sample of files across
all 20 disciplines 487 sentences
k - inter-annotator agreement
P(A) - observed probability of agreement
P(E) - expected probability of agreement
Average k 0.945 over the three moves

23
Further work on IADE

Ongoing experiments to improve accuracy
? experimenting with different kernel parameters
to find optimal models
More annotation
Inter-annotator agreement (3 annotators)?
Identification of steps
Development of intelligent feedback
Web interface design

24
Further research with IADE

Evaluation of IADE effectiveness
Learning potential
Learner fit
Meaning focus
Authenticity
Impact
Practicality (Chapelle, 2001)
Process/product research direction - interaction
between use and outcome (Warschauer Ware, 2006)?
Target for evaluation - what is taught through
technology (Chapelle, 2007, p.30)?

25
Questions?Suggestions?
THANK YOU!

Write a Comment

User Comments (0)