Mining Discrete Patterns via Binary Matrix Factorization - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Mining Discrete Patterns via Binary Matrix Factorization

Description:

Binary Rank-One Approximation: Challenges. Conjectured to be NP-Hard. ... It very often results in undesirable rank-one approximations. ... – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 26
Provided by: MICROSOFT69
Category:

less

Transcript and Presenter's Notes

Title: Mining Discrete Patterns via Binary Matrix Factorization


1
Mining Discrete Patterns via Binary Matrix
Factorization
  • Jieping Ye
  • Arizona State University

Joint work with Baohong Shen and Shuiwang Ji
2
Rank-One Binary Matrix Factorization
compression, clustering, pattern discovery
10110101110110 01110000000110 00110101110110 00110
101110110 00110101110111 00000111101010
indicator vector
3
Application I Image Compression
4
An Example of Tree for 45 images from Stage Range
4-6Built byOur Algorithm
Application II Hierarchy Construction
5
An Example of Tree for 45 images from Stage Range
4-6Built byOur Algorithm
Application III Pattern Discovery
M. Koyuturk, A. Grama, and N. Ramakrishnan,
Compression, clustering and pattern discovery in
very high dimensional discrete-attribute
datasets, IEEE TKDE, 2005.
6
Binary Rank-One Approximation Problem
Formulation
7
Binary Rank-One Approximation Challenges
  • Can we compute an approximate solution with a
    guaranteed error bound?
  • Can we compute it efficiently?
  • Conjectured to be NP-Hard.
  • Existing approach based on the iterative updating
  • Koyutürk, M. Grama, A. PROXIMUS A framework
    for analyzing very high dimensional
    discrete-attributed datasets. KDD'03.
  • Heuristics, without known guarantees on
    approximation errors.
  • It very often results in undesirable rank-one
    approximations.

8
Regularized Binary Rank-One Approximation
9
Equivalent Reformulation
Maximum Weight Problem (MWP)
10
Our Main Contributions
  • An exact formulation for MWP, using integer
    linear programming.
  • A formulation for error-bounded integer linear
    programming, using integer linear programming.
  • The proof of an error bound
    .
  • Efficient algorithms to solve the error-bounded
    approximation.

11
Overview
  • This is the first polynomial time algorithm that
    computes an approximate solution with a
    guaranteed error bound.

reformulation
Binary Rank-one Matrix Approximation
Maximum Weight Problem (MWP)
  • This is the first work that explicitly connects
    binary matrix factorization and minimum s-t cut.

12
Formulation for Exact Solutions
Original formulation
  • If x1i x2j1, then zi,j 1.
  • Ui,j gt0?zi,j1.
  • If one of x1i and x2j is o, then zi,j 0.5.
  • zi,j is an integer? zi,j0.

13
Formulation for Approximate Solutions I
14
Formulation for Approximate Solutions II
Proposition The objective value of ILP2 is no
less than that of ILP1 for the same problem
instance.
15
Approximation Error
  • ILP2 achieves an error-bounded approximation.

16
Linear Programming Relaxation of ILP2
  • Proposition The coefficient matrix of the
    constraints in ILP2
  • is totally unimodular.
  • I. Heller and C. B. Tompkins. An extension of a
    theorem of Dantzig's.
  • Ann. of Math. Stud., no. 38, pages 247-254.
    1956.
  • We can obtain an exact solution of ILP2 by
    solving its LP relaxation.
  • LP is still computationally expensive for a large
    matrix A.

17
Overview
reformulation
Binary Rank-one Matrix Approximation
Maximum Weight Problem (MWP)
reformulation
error-bounded approximation
Integer Linear Programming (ILP1)
Integer Linear Programming (ILP2)
LP relaxation
reformulation
minimum s-t cut problem
Linear Programming Relaxation of ILP2
18
Generalized Independent Set Problem
  • Generalized Independent Set Problem (GIS)
  • An undirected graph G(V,E),
  • A nonnegative weight w(v) for each vertex v in V,
  • A nonnegative penalty p(e) for each edge e in E.
  • GIS Problem find a vertex subset S in V

19
Transform ILP2 into a GIS Problem
  • ILP2 defines an instance of GIS, and the
    corresponding graph is bipartite.

20
Efficient Approximation
  • GIS is NP-Hard for general graphs.
  • However, it can be solved in polynomial time for
    bipartite graphs.
  • GIS for bipartite graphs can be solved by solving
    minimum s-t cuts / maximum flows.
  • Hochbaum, D. S. Pathria, A. Forest harvesting
    and minimum cuts a new approach to handling
    spatial constraints, Forest Science, 1997, 43,
    544-554

21
Experimental Evaluation Error Bound
  • We present results by the minimum s-t cut (P1),
    the improvement by iterative updating (P2), and
    theoretical upper bounds.

22
Experimental Evaluation Error Bound
  • We present results by the minimum s-t cut (P1),
    the improvement by iterative updating (P2), and
    theoretical upper bounds.

23
Experimental Evaluation Running Time
  • One dimension is fixed at 1000.

24
Conclusion
reformulation
Binary Rank-one Matrix Approximation
Maximum Weight Problem (MWP)
reformulation
error-bounded approximation
Integer Linear Programming (ILP1)
Integer Linear Programming (ILP2)
LP relaxation
reformulation
minimum s-t cut problem
Linear Programming Relaxation of ILP2
25
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com