Claire%20Cardie%20(CS IS),%20Cynthia%20Farina%20(Law), - PowerPoint PPT Presentation

About This Presentation
Title:

Claire%20Cardie%20(CS IS),%20Cynthia%20Farina%20(Law),

Description:

Claire Cardie (CS IS), Cynthia Farina (Law), Matt Rawding (IS), Adil Aijaz (CS) ... The 'Sophisticated Commenter' At the other extreme... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 28
Provided by: car111
Learn more at: http://www.lrec-conf.org
Category:

less

Transcript and Presenter's Notes

Title: Claire%20Cardie%20(CS IS),%20Cynthia%20Farina%20(Law),


1
An eRulemaking Corpus Identifying Substantive
Issues in Public Comments
  • Claire Cardie (CSIS), Cynthia Farina (Law),
  • Matt Rawding (IS), Adil Aijaz (CS)
  • CeRI (Cornell eRulemaking Initiative)
  • Cornell University

2
Plan for the Talk
  • Background
  • E-rulemaking
  • CeRI FTA Grant Circulars Corpus
  • Text Categorization Experiments

3
Rulemaking?E-Rulemaking
  • Rulemaking one of the principal methods of
    making regulatory policy in the US
  • 4000 per year
  • notice and comment rulemaking formal public
    participation phase
  • 10 500,000 comments per rule
  • comment length 1 sentence 10s of pages
  • agency legally bound to respond to all
    substantive issues
  • E-rulemaking e-notice and e-comment

4
Current Agency Practice
5
Goals of Our Current Work
  • Determine the degree to which automatic issue
    categorization can facilitate analysis of
    comments by identifying and categorizing
    relevant issues.
  • Framed as a text categorization task
  • Given a comment set, the automated system
    should determine, for each sentence in each
    comment, which of a group of pre-defined issue
    categories it raises, if any.
  • Builds on the work of Kwon Hovy (2007) and
    Kwon et al. (2006)

6
Plan for the Talk
  • Background
  • CeRI FTA Grant Circulars Corpus
  • Difficulties
  • Interannotator agreement results
  • Text Categorization Experiments

7
FTA Grant Circulars Rule
  • Topic guidance to public and private
    transportation providers applying for federal aid
    for elderly, disabled and low income persons
  • 267 comments
  • shortest 1 sentence
  • longest 1420 sentences
  • 11,094 sentences total

8
FTA Grant Circulars Issue Set
17 top-level issues
39 fine-grained issues
9
Kwon Hovy (2007)
vs.
10
Difficulties for Text Categorization
  • Large, hierarchical issue set

11
FTA Grant Circulars Issue Set
17 top-level issues
39 fine-grained issues
12
Difficulties for Text Categorization
  • Large, hierarchical issue set
  • NONE category
  • Skewed distribution across issues
  • 87 of the sentences are from 6 categories
  • 13 of the sentences are from 33 categories
  • Potentially multiple issues per sentence.
  • Even long sentences contain few words.
  • Variation in comment quality, scope, vocabulary
    and form.

13
The Annotators
14
Interannotator Agreement
  • 146 comments used for the study
  • 6 annotators
  • 2.66 annotators per comment
  • 41.5 sentences per comment
  • Overlap agreement measure

15
Category-by-Category IAG Results
16
Plan for the Talk
  • Background
  • E-rulemaking
  • Public comment analysis
  • CeRI FTA Grant Circulars Corpus
  • Difficulties
  • Interannotator agreement results
  • Text Categorization Experiments

17
Standard Text Categorization Algorithms
  • Fine-grained issues (39)
  • Coarse-grained issues (17)

Standard (flat) text categorization methods Hierarchical text categorization methods
SVMs (0/1 loss) Maxent Naïve Bayes cascaded classification Dumais Chen (2000)
18
Cascaded Categorization
Some
19
Cascaded Categorization
20
Cascaded Categorization
21
Gold Standard Data Set
  • Simulate agency comment analysis process
  • One analyst / rule
  • Six data sets
  • One data set / annotator

22
SVM Results with tf.idf Features
23
Best-Performing Fine-Grained Issues (Annotator 1)
24
Progress and Plans
  • Promising initial results rule-specific issue
    categorization of public comments
  • Annotate comments for more rules
  • Expert (rulewriter) vs. law student annotation
  • Integrate automatic text categorization into
    annotation interface
  • Active learning (Purpura, Cardie Simons, dg.o
    2008)
  • Collaboration with HCI colleagues in InfoSci

25
The End
  • For more on
  • the hierarchical text categorization method
  • Cardie et al. (dg.o 2008)
  • a new structural learning approach for
    hierarchical classification
  • Purpura et al. (in preparation)
  • active learning methods for hierarchical text
    categorization
  • Purpura, Cardie Simons (dg.o 2008)

26
(No Transcript)
27
(No Transcript)
28
Minimizing the Costliest Errors
Underinclusive errors are the most costly
29
The Sophisticated Commenter
30
At the other extreme
  • I am disabled and take medications and fear
    flying because of the new government conditions
    on Air Marshalls to determine if someone looks
    suspicious behaviour but what if passengers take
    psychiatric medications and have side effects
    such as execisive sweating or shallow breathing
    due to medications? I hope my concern be
    properly addressed where Airlines can also
    increase the seating on plans to provide
    additional information of medicaitons and items
    which can accomodate passengers when flying
    instead of assume and act without knowing of
    there history of medications. I guess disabled
    people are being forced to give up privacy just
    to avoid any problems from Air Marshalls.Jon
Write a Comment
User Comments (0)
About PowerShow.com