Clustering Algorithms for Perceptual Image Hashing - PowerPoint PPT Presentation

About This Presentation
Title:

Clustering Algorithms for Perceptual Image Hashing

Description:

Embedded Signal Processing Laboratory. Dept. of Electrical and Computer Engineering ... Research supported by a gift from the Xerox Foundation ... – PowerPoint PPT presentation

Number of Views:293
Avg rating:3.0/5.0
Slides: 14
Provided by: niranjanda
Category:

less

Transcript and Presenter's Notes

Title: Clustering Algorithms for Perceptual Image Hashing


1
Clustering Algorithms for Perceptual Image
Hashing
IEEE Eleventh DSP Workshop, August 3rd 2004
Vishal Monga, Arindam Banerjee, and Brian L.
Evans
vishal, abanerje, bevans_at_ece.utexas.edu
Embedded Signal Processing Laboratory Dept. of
Electrical and Computer Engineering The
University of Texas at Austin http//signal.ece.ut
exas.edu
Research supported by a gift from the Xerox
Foundation
2
Hash Example
  • Hash function Projects value from set with large
    (possibly infinite) number of members to set with
    fixed number of (fewer) members
  • Irreversible
  • Provides short, simple representationof large
    digital message
  • Example sum of ASCII codes forcharacters in
    name modulo N,a prime number (N 7)

Database name search example
3
Perceptual Hash Desirable Properties
  • Perceptual robustness
  • Fragility to distinct inputs
  • Randomization
  • Necessary in security applicationsto minimize
    vulnerability againstmalicious attacks

4
Hashing Framework
  • Two-stage hash algorithm
  • Goal Retain perceptual significance
  • Let (li, lj) denote vectors in metric space of
    feature vectors V and 0 lt e lt d, then it is
    desired
  • Minimizing average distance between clusters
    inappropriate

5
Cost Function for Feature Vector Compression
  • Define joint cost matrices C1 and C2 (n x n)
  • n total number of vectors be clustered, C(li),
    C(lj) denote the clusters that these vectors are
    mapped to
  • Exponential cost
  • Ensures severe penalty associated if feature
    vectors far apart
  • Perceptually distinct clustered together

a gt 0, ? gt 1 are algorithm parameters
6
Cost Function for Feature Vector Compression
  • Define S1 as
  • S2 is defined similarly
  • Normalize to get ,
  • Then, minimize expected cost
  • p(i) p(li), p(j) p(lj)

7
Basic Clustering Algorithm
  • Obtain e, d, set k 1. Select the data point
    associated with highest probability mass, label
    it l1
  • Make the first cluster by including all
    unclustered points lj such that D(l1, lj) lt e/2
  • 3. k k 1. Select the highest probability data
    point lk among the unclustered points such that
  • where
  • S is any cluster, C set of clusters formed
    till this step
  • Form the kth cluster Sk by including all
    unclustered points lj such that D(lk, lj) lt e/2
  • 5. Repeat steps 3-4 until no more clusters can be
    formed

8
Observations
  • For any (li, lj) in cluster Sk
  • No errors up to this stage of algorithm
  • Each cluster is at least e away from any other
    cluster
  • Within each cluster, maximum distance between
    any two points is at most e

9
Approach 1
  • Select data point l among unclustered data
    points that has highest probability mass
  • For each existing cluster Si, i 1,2,, k
    compute
  • Let S(d)
    Si such that di d
  • IF S(d) F THEN k k 1. Sk l is a
    cluster of its own
  • ELSE for each Si in S(d) define
  • where denotes the complement of Si i.e.
    all clusters in S(d) except Si. Then, l is
    assigned to the cluster S arg min F(Si)
  • 4. Repeat steps 1 through 3 until all data points
    are exhausted

10
Approach 2
  • Select data point l among unclustered data
    points that has highest probability mass
  • For each existing cluster Si, i 1, 2,, k,
    define
  • and ß lies in 1/2, 1
  • Here, denotes the complement of Si i.e.
    all existing clusters except Si. Then, l is
    assigned to the cluster S arg min F(Si)
  • 3. Repeat steps 1 and 2 until all data points are
    exhausted

11
Summary
  • Approach 1
  • Tries to minimize conditioned on
    0
  • Approach 2
  • Smoothly trades off the minimization of
    vs.
  • via the parameter ß
  • ß ½ ? joint minimization
  • ß 1 ? exclusive minimization of
  • Final hash length determined automatically!
  • Given by bits, where k is number
    of clusters formed
  • Proposed clustering can compress feature vectors
    in any metric space, e.g. Euclidean, Hamming, and
    Levenshtein

12
Clustering Results
  • Compress binary feature vector of L 240 bits
  • Final hash length 46 bits, with Approach 2, ß
    1/2
  • Value of cost function is orders of magnitude
    lower for proposed clustering

13
Conclusion Future Work
  • Two-stage framework for image hashing
  • Feature extraction followed by feature vector
    compression
  • Second stage is media independent
  • Clustering algorithms for compression
  • Novel cost function for hashing applications
  • Applicable to feature vectors in any metric space
  • Trade-offs facilitated between robustness and
    fragility
  • Final hash length determined automatically
  • Future work
  • Randomized clustering for secure hashing
  • Information theoretically secure hashing
Write a Comment
User Comments (0)
About PowerShow.com