Index of Coincidence - PowerPoint PPT Presentation

1 / 9
About This Presentation
Title:

Index of Coincidence

Description:

Imagine a hat filled with the 26 letters of the alphabet. The chance of pulling out an A is 1/26. ... the frequency of the ith letter of the alphabet in the ... – PowerPoint PPT presentation

Number of Views:39
Avg rating:3.0/5.0
Slides: 10
Provided by: meghan69
Category:

less

Transcript and Presenter's Notes

Title: Index of Coincidence


1
Index of Coincidence
  • Meghan Emilio
  • Professor Ralph Morelli
  • February 18, 2004

2
Introduction
  • Index of Coincidence (IC) is a statistical
    measure of text which distinguishes text
    encrypted with a substitution cipher from plain
    text.
  • IC was introduced by William Friedman in The
    Index of Coincidence and its Applications in
    Cryptography (1920)
  • It has been called "the most important single
    publication in cryptology.

3
The Idea
  • IC is defined to be the probability that two
    randomly selected letters will be identical.
  • Imagine a hat filled with the 26 letters of the
    alphabet. The chance of pulling out an A is
    1/26.
  • If we had two such hats. The probability of
    pulling out two As simultaneously is
    (1/26)(1/26).
  • The chance of drawing any pair of letters is
  • 26(1/26)(1/26) (1/26) 0.0385
  • So the IC of an evenly distributed set of letters
    is 0.0385

4
The Idea (cont.)
  • Suppose we fill the hats with 100 letters, and
    had the number of each letter correspond to the
    average frequency of that letter in the English
    language.
  • (i.e. 8 As, 3 Cs, 13 Es, etc.)
  • The chance of drawing any pair of identical
    letters is (8/100)(8/100) (3/100)(3/100)
    (13/100)(13/100) 0.0667
  • This is the IC for English.
  • Every language has such an IC, for example
  • Russian 0.0529
  • German 0.0762
  • Spanish 0.0775

5
Calculating the IC
  • The formula used to calculate IC
  • S(fi (fi-1))
  • N(N-1)
  • where 0 gt i gt 25,
  • fi is the frequency of the ith letter of the
    alphabet in the sample,
  • and N is the number of letters in the sample

6
Example
  • The IC of the text THE INDEX OF COINCIDENCE would
    be given by
  • c(32) d(21) e(43) f(10) h(10) i(32)
    n(32) o(21) t(10) x(10) 34
  • divided by N(N-1) 2120 420
  • which gives us an IC of 34/420 0.0809
  • The IC of the text BMQVSZFPJTCSSWGWVJLIO would be
    given by
  • b(10) c(10) f(10) g(10) i(10) j(21)
    l(10) m(10) o(10) p(10) q(10) s(32)
    t(10) v(21) w(21) z(10) 12
  • divided by N(N-1) 2120 420
  • which gives us an IC of 12/420 0.0286

7
How is this helpful?
  • IC can be used to test if text is plain text or
    cipher text.
  • Text encrypted with a substitution cipher would
    have an IC closer to 0.0385, since the
    frequencies would be closer to random.
  • English plaintext would have an IC closer to
    0.0667.
  • This measure allows computers to score possible
    decryptions effectively.

8
References
  • Kahn, David. The Codebreakers. The MacMillan
    Company, New York 1967.
  • http//raphael.math.uic.edu/jeremy/crypt/coincide
    nce.html
  • http//codebook.org/node26.html
  • http//members.fortunecity.com/jpeschel/gillog1.ht
    m
  • http//mywebpages.comcast.net/erfarmer201/vigenere
    /

9
Questions?
Write a Comment
User Comments (0)
About PowerShow.com