Stylometry Project - PowerPoint PPT Presentation

1 / 10
About This Presentation
Title:

Stylometry Project

Description:

... by any researcher attempting to identify the authorship of email text messages ... C# program used raw keystroke data and converted into simple text files ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 11
Provided by: Geral7
Category:

less

Transcript and Presenter's Notes

Title: Stylometry Project


1
Stylometry Project
  • IT691 CS615Computer Information Systems
    Projects
  • December, 2007

2
Team Members
  • Geraldine McCabe Team Leader
  • Huriya Manzar Programmer
  • Melissa Connors Programmer
  • Kristina Calix Implementer
  • De Havaland Levy Quality Assurance

3
Overview
  • This program can be used by any researcher
    attempting to identify the authorship of email
    text messages

4
Overview of Existing Program
  • A pattern recognition system to identify the
    author of arbitrary email using Stylometry
    features
  • Existing C program used raw keystroke data and
    converted into simple text files
  • Performs feature extraction for statistical
    analysis, followed by classification using
    K-nearest neighbor

5
Program Modifications
  • Collected larger data set of plain text email
    samples for improved accuracy of testing, 10
    samples from each of 12 different authors
    averaging 150 words
  • Keystroke features were removed from existing
    program and new features added to provide a total
    of 55 stylistic features for extraction.

6
Modifications Cont.
  • Demographics for each author was added as per
    clients request
  • Reset option was added to allow for single input
    of demo info for multiple samples from each
    individual author.
  • Feature vector data was normalized in the range
    0-1 and formatted to provide a CSV file.

7
Modifications Cont.
  • GUI was enhanced to eliminate unnecessary menu
    options provide relevant options for new
    modifications

8
Demonstration
  • Plaintext email samples
  • Create Base Data Set
  • Normalize Base Data Set
  • Output normalized data as CSV/Excel file
  • Compare unknown author

9
Future Work
  • Add additional features for per clients requests
  • Since formatting plays a big part in Stylometry.
    features such as indentations, number of blank
    lines between paragraphs, number of blank lines
    between the last sentence and the closing, number
    of spaces after periods (some people type 1
    space, some people type 2 spaces), could be
    added
  • Grammatical features For example, stylometry
    experts have noticed that women tend to use
    adverbs more than men.
  • Identify gender based on stylistic linguistic
    habits

10
Questions
  • Contact gm60518w_at_pace.edu for more information or
    visit http//utopia.csis.pace.edu/cs691/2007-2008/
    team2/index2.htm
Write a Comment
User Comments (0)
About PowerShow.com