Automatic Identification of Discourse Moves in Scientific Article Introductions - PowerPoint PPT Presentation

About This Presentation
Title:

Automatic Identification of Discourse Moves in Scientific Article Introductions

Description:

Automated essay scoring (AES) in performance-based and high ... Elliott, 2003; Landauer et al., 2003; Mitchell et al., 2002; Page, 2003; Rudner and Liang, 2002) ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 26
Provided by: Ele991
Category:

less

Transcript and Presenter's Notes

Title: Automatic Identification of Discourse Moves in Scientific Article Introductions


1
Automatic Identification of Discourse Moves in
Scientific Article Introductions
NICK PENDAR AND ELENA COTOS IOWA STATE
UNIVERSITY THE 3RD WORKSHOP ON INNOVATIVE USE OF
NLP FOR BUILDING EDUCATIONAL APPLICATIONS JUNE
19, 2008
2
Outline
  • Background and motivation
  • Discourse move identification
  • Data and annotation scheme
  • Feature selection
  • Sentence representation
  • Classifier
  • Evaluation
  • Inter-annotator agreement
  • Further work

3
Automated evaluation Background
  • Automated essay scoring (AES) in
    performance-based and high-stakes standardized
    tests (e.g., ACT, GMAT, TOEFL, etc.)?
  • Automated error detection in L2 output (Burstein
    and Chodorow, 1999 Chodorow et al., 2007 Han et
    al., 2006 Leacock and Chodorow, 2003)?
  • Assessment of various constructs, e.g., topical
    content, grammar, style, mechanics, syntactic
    complexity, and deviance or plagiarism (Burstein,
    2003 Elliott, 2003 Landauer et al., 2003
    Mitchell et al., 2002 Page, 2003 Rudner and
    Liang, 2002)
  • Text organization limited to recognizing the
    five-paragraph essay format, thesis, and topic
    sentences
  • AntMover (Anthony and Lashkia, 2003)?

4
Automated evaluation CALI Motivation
  • Wide range of possibilities for high quality
    evaluation and feedback (Criterion Burstein,
    Chodorow, Leacock, 2004)?
  • Potential in formative assessment, but the
    effects of intelligent formative feedback are not
    fully investigated
  • Warschauer and Ware (2006) call for the
    development of a classroom research agenda that
    would help evaluate and guide the application of
    AES in the writing pedagogy
  • the potential of automated essay evaluation for
    improving student writing is an empirical
    question, and virtually no peer-reviewed research
    has yet been published (Hyland and Hyland, 2006,
    p. 109)?

5
Automated evaluation EAP Motivation
  • EAP pedagogical approaches (Cortes, 2006 Levis
    Levis-Muller, 2003 Vann Myers, 2001) fail to
    provide NNSs with sufficient academic writing
    practice and remediational guidance
  • Problem of disciplinarity
  • An NLP-based academic discourse evaluation
    software application could account for this
    drawback
  • Such an application has not yet been developed

6
Automated evaluation Research Motivation
  • Long-term research goals
  • design and implementation of IADE (Intelligent
    Academic Discourse Evaluator)?
  • analysis of IADE effectiveness for formative
    assessment purposes

7
  • Evaluates students research article
    introductions in terms of moves/steps (Swales
    1990, 2004)?
  • Draws from
  • SLA models interactionist views (Caroll, 1999
    Gass, 1997 Long, 1996 Long Robinson, 1998
    Mackey, Gass, McDonough, 2000 Swain, 1993) and
    Systemic Functional Linguistics (Martin, 1992
    Halliday, 1985)?
  • Skill Acquisition Theory of learning (DeKeyser,
    2007 )?
  • Is informed by empirical research on the
    provision of feedback
  • Is informed by Evidence Centered Design
    principles (Mislevy et al., 2006)?

8
Discourse Move Identification
  • Approached as a classification problem (similar
    to Burstein et al., 2003)?
  • given a sentence and a finite set of moves and
    steps, what move/step does the sentence signify?
  • ISUAW corpus 1,623 articles 1,322,089 words
    average length of articles 814.09 words
  • Stratified sampling of 401 introduction sections
    representative of 20 academic disciplines
  • Sub-corpus 267,029 words average length 665.91
    words 11,149 sentences
  • Manual annotation

9
Discourse Move Identification
  • Annotation scheme (Swales, 1990 Swales, 2004)?

10
Discourse Move Identification
  • Multiple layers of annotation for cases when the
    same sentence signified more than one move or
    more than one step

11
Feature Selection
  • Features that reliably indicate a move/step
  • Text-categorization approach (see Sebastiani,
    2002)?
  • Each sentence treated as a data item to be
    classified and represented as an n-dimensional
    vector in the Euclidean space
  • The task of the learning algorithm is to find a
    function F S ? M that would map the sentences
    in the corpus S to classes in M m1,m2,m3
  • Identification of moves, not yet steps

12
Feature Selection
  • Extraction of word unigrams, bigrams, and
    trigrams from the annotated corpus
  • Preprocessing
  • All tokens stemmed using the NLTK port of the
    Porter Stemmer algorithm (Porter, 1980)?
  • All numbers in the texts replaced by the string
    _number_
  • The tokens inside each n-gram alphabetized in
    case of bigrams and trigrams
  • All n-grams with a frequency of less than five
    excluded

13
Feature Selection
  • Odds ratio
  • Conditional probabilities are calculated as
    maximum likelihood estimates
  • N-grams with maximum odds ratios selected as
    features

14
Sentence Representation
  • Each sentence represented as a vector
  • Presence or absence of terms in sentences
    recorded as Boolean values (0 for the absence of
    the corresponding term or a 1 for its presence)?

15
Classifier
  • Support Vector Machines (SVM) (Basu et al., 2003
    Burges, 1998 Cortes and Vapnik, 1995 Joachims,
    1998 Vapnik, 1995)?
  • five-fold cross validation
  • Machine learning environment RAPIDMINER (Mierswa
    et al., 2006)?
  • RBF kernel
    found through a set of different parameter
    settings on the feature set with 3,000 unigrams
  • Parameters not necessarily the best exhaustive
    searches will be performed on the other feature
    sets

16
Evaluation
  • Five-fold cross validation on 14 different
    feature sets were performed

17
Evaluation
Accuracy - the proportion of classifications that
agreed with the manually assigned labels
18
Evaluation
  • Precision - what proportion of the items assigned
    to a given category actually belonged to it
  • Recall - what proportion of the items actually
    belonging to a category were labeled correctly

19
Evaluation
  • Trigram models result in the best precision
  • Unigram models result in the best recall

20
Evaluation
  • Move 2 is most difficult to identify as revealed
    by error analysis Move 2 gets misclassified as
    Move 1
  • Use the relative position of the sentence in the
    text to disambiguate the move involved
  • see what percentage of Move 2 sentences
    identified as Move 1 by the system also have been
    labeled Move 1 by the annotator
  • Extracted features are not discipline-dependent

21
This just in
  • Built a model with top 3000 unigrams and top 3000
    trigrams
  • Precision 91.14
  • Recall 82.98
  • Kappa 87.57

22
Inter-annotator agreement
  • Second annotations on a sample of files across
    all 20 disciplines 487 sentences
  • k - inter-annotator agreement
  • P(A) - observed probability of agreement
  • P(E) - expected probability of agreement
  • Average k 0.945 over the three moves

23
Further work on IADE
  • Ongoing experiments to improve accuracy
  • ? experimenting with different kernel parameters
    to find optimal models
  • More annotation
  • Inter-annotator agreement (3 annotators)?
  • Identification of steps
  • Development of intelligent feedback
  • Web interface design

24
Further research with IADE
  • Evaluation of IADE effectiveness
  • Learning potential
  • Learner fit
  • Meaning focus
  • Authenticity
  • Impact
  • Practicality (Chapelle, 2001)
  • Process/product research direction - interaction
    between use and outcome (Warschauer Ware, 2006)?
  • Target for evaluation - what is taught through
    technology (Chapelle, 2007, p.30)?

25
Questions?Suggestions?
THANK YOU!
Write a Comment
User Comments (0)
About PowerShow.com