Text Deception Detection System - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Text Deception Detection System

Description:

Survey of Related Technologies and Systems. Text Deception Detection System Design ... Objects for Language Engineering (CREOLE) it a data library containing ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 26
Provided by: nichalinsu
Category:

less

Transcript and Presenter's Notes

Title: Text Deception Detection System


1
Text Deception DetectionSystem
  • MIS 497/597A Final Project Presentation
  • May 3rd, 2006
  • By
  • Nichalin Suakkaphong
  • Leroy Walters

2
Agenda
  • Overview
  • Survey of Related Technologies and Systems
  • Text Deception Detection System Design
  • Demo
  • Discussion and Conclusion

3
Overview
  • There are many tools that process natural
    language. Each has its own purposes.
  • E.g. speech recognition, named entity recognition
    in text, relation extraction, phrase chunking,
    sentence splitting, morphological analyzer, and
    etc.
  • Majority of tools are intended for research use.
  • Combination of tools are used in deception
    detection
  • GATE WEKA for text deception detection
  • Our objective in this project is to produce a
    Text Deception Detection Systems tool for general
    user.
  • We call it TxtDetector.

4
Overview
  • The TxtDetector system is designed to analyze
    text and output the class of text whether it is a
    truthful or deceptive.
  • Easy to use GUI
  • No preprocessing necessary
  • Flexibility to input single or multiple documents
  • Ability to display user-selected cues
  • Extensible for any text cues analysis that may
    come up in future

5
Agenda
  • Overview
  • Survey of Related Technologies and Systems
  • Text Deception Detection System Design
  • Demo
  • Discussion and Conclusion

6
Survey of Related Technologies and Systems
  • In text deception detection, researchers have
    been using GATE and WEKA
  • GATE is a general architecture for text
    engineering. (University of Sheffield)
  • WEKA is a collection of machine learning
    algorithms for data mining tasks. (University of
    Waikato)

7
GATE Screen shot
8
GATE Components as seen by user
  • Applications
  • A combination of Language resources and
    processing resources
  • Have a run button
  • (Analogy User put the food into a microwave,
    set the program, and start the microwave. )
  • Language resources
  • Text corpus
  • (Analogy The Uncooked Food )
  • Processing Resources
  • Set of small Java files that computes text
    features and annotations
  • (Analogy The Programs on the microwave )
  • Data stores
  • Set of documents
  • (Analogy The Refrigerator)

9
GATE Technical Components
  • The GATE System is based on an object-oriented
    data model. The three major components are
  • Gate Document Manager (GDM) is a database that
    stores information concerning corpus texts. The
    database stores comments regarding the texts
    separately from the main text, but has links back
    to main text by way of character offsets
  • Collection of Reusable Objects for Language
    Engineering (CREOLE) it a data library
    containing data resource wrappers or APIs that
    allow users to interconnect with other programs
    into GATE. It does all the real work of
    analyzing texts its modules can also be created
    from scratch for GATE.
  • Gate Graphical Interface (GGI) its a graphical
    tool that allows users to view and edit documents
    collections managed by the GDM. It also allows
    users to view annotations in either the raw
    format or in an annotation-specific format.

10
Current set of cues (features)
  • We use the features specified by the previous
    year project.
  • Content_Word_Diversity
  • Redundancy
  • Misspelled_Words
  • You_References
  • Self_References
  • Group_References
  • Other_References
  • Pleasantness
  • Activation
  • Imagery
  • Pos_Pleasantness_1dev
  • Pos_Activation_1dev
  • Pos_Imagery_1dev
  • Pos_Pleasantness_2dev
  • Pos_Activation_2dev
  • Pos_Imagery_2dev
  • Neg_Pleasantness_1dev
  • Neg_Activation_1dev
  • Neg_Imagery_1dev
  • conditionID
  • messageID
  • subjectID
  • partnerNumber
  • Lexical_Diversity
  • Emotiveness
  • Pausality
  • Word_Quantity
  • Verb_Quantity
  • Modifier_Quantity
  • Sentence_Quantity
  • passive_verb_ratio
  • modal_verb_ratio
  • Affect_Ratio
  • Sensory_Ratio
  • Temporal_Immediate_Ratio
  • Temporal_NonImmediate_Ratio
  • Spatial_Far_Ratio
  • Spatial_Close_Ratio

11
GATE to WEKA
  • The Weka Output plug-in processing resource is
    needed by GATE to generate arff file.
  • The arff file is the file format understood by
    WEKA.

12
WEKA Screen shot
13
WEKA Screen shot
14
WEKA
  • Machine Learning Algorithms
  • J48
  • MultilayerPerceptron

15
Agenda
  • Overview
  • Survey of Related Technologies and Systems
  • Text Deception Detection System Design
  • Demo
  • Discussion and Conclusion

16
Design Process
  • User requirements
  • An email with Dr. Burgoon
  • Preliminary GUI design
  • Study on the APIs
  • GATE and WEKAs JavaDoc
  • Design Code
  • Class Diagram
  • Use-case diagram
  • Java code on JBuilder

17
GUI Design
Run button
File list
Current Filename
Original Text
Cues
Features
Result
18
High-level Technical Design
  • User interacts with the GUI that makes calls to
    GATE and WEKA library.

19
Extensibility
20
Use case diagram
21
Class Diagram
GATE lib
WEKA lib
22
Agenda
  • Overview
  • Survey of Related Technologies and Systems
  • Text Deception Detection System Design
  • Demo
  • Discussion and Conclusion

23
Agenda
  • Overview
  • Survey of Related Technologies and Systems
  • Text Deception Detection System Design
  • Demo
  • Discussion and Conclusion

24
The advantages of this system
  • Integration with GATE and WEKA make it easier for
    general user.
  • Extensibility
  • Intuitive GUI
  • Possible use
  • Machine-bias experiment

25
Future Works
  • Additional Functionalities
  • Add modules for researchers to train and generate
    mode models and new configurations.
  • Add Plug-in selection capabilities (add a
    drop-down menu)
  • Add Graph, Histogram, and Multi-file view
  • Visualize the connection between cues and
    features
  • Provide standard Helps documentation to educate
    the user on text deception.
  • Add feedback module. (We may be able to collect
    more data to improve the model, i.e.
    unsupervised-learning)
  • Perform proper testing and user evaluation

Note As this is a prototype, certain file paths
are currently hard-coded.
Write a Comment
User Comments (0)
About PowerShow.com