Title: Claire%20Cardie%20(CS IS),%20Cynthia%20Farina%20(Law),
1An eRulemaking Corpus Identifying Substantive
Issues in Public Comments
- Claire Cardie (CSIS), Cynthia Farina (Law),
- Matt Rawding (IS), Adil Aijaz (CS)
- CeRI (Cornell eRulemaking Initiative)
- Cornell University
2Plan for the Talk
- Background
- E-rulemaking
- CeRI FTA Grant Circulars Corpus
- Text Categorization Experiments
3Rulemaking?E-Rulemaking
- Rulemaking one of the principal methods of
making regulatory policy in the US - 4000 per year
- notice and comment rulemaking formal public
participation phase - 10 500,000 comments per rule
- comment length 1 sentence 10s of pages
- agency legally bound to respond to all
substantive issues - E-rulemaking e-notice and e-comment
4Current Agency Practice
5Goals of Our Current Work
- Determine the degree to which automatic issue
categorization can facilitate analysis of
comments by identifying and categorizing
relevant issues. - Framed as a text categorization task
- Given a comment set, the automated system
should determine, for each sentence in each
comment, which of a group of pre-defined issue
categories it raises, if any. - Builds on the work of Kwon Hovy (2007) and
Kwon et al. (2006)
6Plan for the Talk
- Background
- CeRI FTA Grant Circulars Corpus
- Difficulties
- Interannotator agreement results
- Text Categorization Experiments
7FTA Grant Circulars Rule
- Topic guidance to public and private
transportation providers applying for federal aid
for elderly, disabled and low income persons - 267 comments
- shortest 1 sentence
- longest 1420 sentences
- 11,094 sentences total
8FTA Grant Circulars Issue Set
17 top-level issues
39 fine-grained issues
9Kwon Hovy (2007)
vs.
10Difficulties for Text Categorization
- Large, hierarchical issue set
11FTA Grant Circulars Issue Set
17 top-level issues
39 fine-grained issues
12Difficulties for Text Categorization
- Large, hierarchical issue set
- NONE category
- Skewed distribution across issues
- 87 of the sentences are from 6 categories
- 13 of the sentences are from 33 categories
- Potentially multiple issues per sentence.
- Even long sentences contain few words.
- Variation in comment quality, scope, vocabulary
and form.
13The Annotators
14Interannotator Agreement
- 146 comments used for the study
- 6 annotators
- 2.66 annotators per comment
- 41.5 sentences per comment
- Overlap agreement measure
15Category-by-Category IAG Results
16Plan for the Talk
- Background
- E-rulemaking
- Public comment analysis
- CeRI FTA Grant Circulars Corpus
- Difficulties
- Interannotator agreement results
- Text Categorization Experiments
17Standard Text Categorization Algorithms
- Fine-grained issues (39)
- Coarse-grained issues (17)
Standard (flat) text categorization methods Hierarchical text categorization methods
SVMs (0/1 loss) Maxent Naïve Bayes cascaded classification Dumais Chen (2000)
18Cascaded Categorization
Some
19Cascaded Categorization
20Cascaded Categorization
21Gold Standard Data Set
- Simulate agency comment analysis process
- One analyst / rule
- Six data sets
- One data set / annotator
22SVM Results with tf.idf Features
23Best-Performing Fine-Grained Issues (Annotator 1)
24Progress and Plans
- Promising initial results rule-specific issue
categorization of public comments - Annotate comments for more rules
- Expert (rulewriter) vs. law student annotation
- Integrate automatic text categorization into
annotation interface - Active learning (Purpura, Cardie Simons, dg.o
2008) - Collaboration with HCI colleagues in InfoSci
25The End
- For more on
- the hierarchical text categorization method
- Cardie et al. (dg.o 2008)
- a new structural learning approach for
hierarchical classification - Purpura et al. (in preparation)
- active learning methods for hierarchical text
categorization - Purpura, Cardie Simons (dg.o 2008)
26(No Transcript)
27(No Transcript)
28Minimizing the Costliest Errors
Underinclusive errors are the most costly
29The Sophisticated Commenter
30At the other extreme
- I am disabled and take medications and fear
flying because of the new government conditions
on Air Marshalls to determine if someone looks
suspicious behaviour but what if passengers take
psychiatric medications and have side effects
such as execisive sweating or shallow breathing
due to medications? I hope my concern be
properly addressed where Airlines can also
increase the seating on plans to provide
additional information of medicaitons and items
which can accomodate passengers when flying
instead of assume and act without knowing of
there history of medications. I guess disabled
people are being forced to give up privacy just
to avoid any problems from Air Marshalls.Jon