Long Text Keystroke Biometrics Study

About This Presentation

Title:

Long Text Keystroke Biometrics Study

Description:

The bat assured him that he was not a bird, but a mouse, and thus was set free. ... The weasel said that he had a special hostility to mice. ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 23

Provided by: cspc3

Learn more at: http://csis.pace.edu

Category:

more less

Transcript and Presenter's Notes

Title: Long Text Keystroke Biometrics Study

1
Long Text Keystroke Biometrics Study

Gary Bartolacci, Mary Curtin, Marc Katzenberg,
Ngozi Nwana
Sung-Hyuk Cha, Charles Tappert
(Software Engineering Project Team DPS Student)

2
Keystroke Biometric

Biometrics important for security apps
Advantage - inexpensive and easy to implement,
the only hardware needed is a keyboard
Disadvantage - behavioral rather than
physiological biometric, easy to disguise
One of the least studied biometrics, thus good
for dissertation studies

3
Focus of Study

Previous studies mostly concerned with short
character string input
Password hardening
Short name strings
We focus on large text input
200 or more characters per sample

4
Focus of Study (cont)

Applications of interest
Identification
1-of-n classification problem
e.g., sender of inappropriate e-mail in a
business environment with a limited number of
employees
Verification
Binary classification problem, yes/no
e.g., student taking online exam

5
Software Components

Raw Keystroke Data Capture over the Internet
(Java applet)
Feature Extraction (SAS software)
Classification (SAS software)
Training
Testing

6
Keystroke Data Capture(Java Applet)

Raw data recorded for each entry
Keys character
Keys code text equivalent
Keys location on keyboard
1 standard, 2 left, 3 right
Time key was pressed (msec)
Time key was released (msec)
Number of left, right, double mouse clicks

7
Keystroke Data Capture(Java Applet)
8
Aligned Raw Data File(Hello World!)
9
Feature Extraction

10 Mean and 10 Std of key press durations
8 most frequent alphabet letters (e, a, r, i, o,
t, n, s)
Space shift keys
10 Mean and 10 Std of key transitions
8 most common digrams (in, th, ti, on, an, he,
al, er)
Space-to-any-letter any-letter-to-space
18 Total number of keypresses for
Space, backspace, delete, insert, home, end,
enter, ctrl, 4 arrow keys, shift (left), shift
(right), total entry time, left, right, double
mouse clicks

10
Feature Extraction Preprocessing

Outlier removal
Remove samples gt 2 std from mean
Prevents skewing of feature measurements caused
by pausing of the keystroker
Standardization
x (x - xmin) / (xmax - xmin)
Scales to range 0-1 to give roughly equal weight
to each feature

11
Sample Datasets
Prior to Standardization
After Standardization
12
Classification

Identification
Nearest neighbor classifier using Euclidean
distance
Input sample compared to every training sample

13
Experimental DesignIdentification Experiment

8 subjects that know the purpose of exp.
Training 10 reps of text a (approx. 600 char)
Testing
10 reps of text a
10 reps of text b (same length as text a)
10 reps of text c (half length of text a)

14
Experimental Design Instructions for Subjects

Subjects were told to input the data using their
normal keystroke dynamics
Subjects were asked leave at least a day between
entering samples

15
Experimental DesignText a about 600 characters

This is an Aesop fable about the bat and the
weasels. A bat who fell upon the ground and was
caught by a weasel pleaded to be spared his life.
The weasel refused, saying that he was by nature
the enemy of all birds. The bat assured him that
he was not a bird, but a mouse, and thus was set
free. Shortly afterwards the bat again fell to
the ground and was caught by another weasel, whom
he likewise entreated not to eat him. The weasel
said that he had a special hostility to mice. The
bat assured him that he was not a mouse, but a
bat, and thus a second time escaped. The moral of
the story it is wise to turn circumstances to
good account.

16
Expected Outcomes Recognition Accuracy

Accuracy on text a gt that on text b
text a is the training text
Accuracy on text b gt that on text c
text b is longer than text c
Accuracy on texts a, b, c gt arbitrary text
texts a, b, c are similar, all Aesop fables

17
Preliminary Results Reduced Experiment

Reduced identification experiment
Smaller text input
The quick brown fox jumps over the lazy dog.
Fewer subjects
Three project team members
Fewer feature measurements
Mean and std for e and o key press durations
Accuracy of 80, which is promising

18
Results Comparison to Same Text
Predicted

Prior to Standardization only yielded a 59
accuracy
100 accuracy with standardization
(76 out of 76)
Confusion Matrix of Results after Standardization
?

Actual
19
Results Comparison to Different Text of Equal
Length
Predicted

Prior to Standardization only yielded a 38
accuracy
98.5 accuracy with standardization
(65 out of 66)
Confusion Matrix of Results after Standardization
?

Actual
20
Results Comparison to Different Text of Shorter
Length
Predicted

Prior to Standardization only yielded a 14
accuracy
97 accuracy with standardization
(74 out of 76)
Confusion Matrix of Results after Standardization
?

Actual
21
Conclusions

System is a viable means of differentiating
between individuals based on typing patterns
Standardization is crucial to the accuracy of the
system
It is likely that the shorter the text used for
verification, the lower the accuracy
Decreasing measurements used also decreases
accuracy