A New Approach for Video Text Detection and Localization - PowerPoint PPT Presentation

About This Presentation
Title:

A New Approach for Video Text Detection and Localization

Description:

Top-down scheme. Language-independent characteristics. Contrast ... Projection-based top-down localization. To handle complex text layout. Divisible? Horizontal ... – PowerPoint PPT presentation

Number of Views:272
Avg rating:3.0/5.0
Slides: 16
Provided by: cai51
Category:

less

Transcript and Presenter's Notes

Title: A New Approach for Video Text Detection and Localization


1
A New Approach for Video Text Detection and
Localization
  • M. Cai, J. Song and M.R. Lyu
  • VIEW Technologies
  • The Chinese University of Hong Kong

2
Related work
  • Text Area Detection
  • Uncompressed domain methods
  • Texture-based
  • Color-based
  • Edge-based
  • Compressed domain methods
  • DCT coefficients
  • Number of intra-coded blocks on P- / B- frames
  • Text String Localization
  • Bottom-up scheme
  • Top-down scheme

3
Language-independent characteristics
  • Contrast
  • An adaptive contrast threshold according to the
    background complexity
  • Color
  • Color bleeding caused by compression
  • Orientation
  • Well-defined size and orientation make it easy to
    understand
  • Stationary location
  • Appear a certain long time

4
Language-dependent characteristics
English Chinese
Stroke density roughly similar varies dramatically
Min(Font size) 10-pixel high 20-pixel high
Min(Aspect ratio) Relatively large Relatively small
Stroke direction statistics mainly vertical vertical horizontal Left diagonal Right diagonal
5
Workflow
6
A sequential multi-resolution paradigm
7
Text detection
  • Edge detection
  • Sobel edge detector
  • Local thresholding
  • Adaptive to background complexity
  • Text-like area recovery
  • Enhance the density of text areas

8
Local Thresholding
  • Use a small kernel (gray) to scan the whole edge
    map row by row.
  • In the bigger window surrounding the kernel,
    check the background type Clear or Noisy.
  • For Clear background and Noisy background,
    determined the local threshold by low and high
    parts, respectively, of the edge strength
    histogram in the bigger window.

9
Thresholding result comparison
10
Text-like area recovery
  • Labeling Classify current edge pixels as TEXT
    and NON_TEXT based on its local density.
  • Recovery/Suppression
  • Bring back neighboring lower-strength edge pixels
    of the TEXT edge pixels.
  • The NON_TEXT edge pixels are suppressed.

11
Coarse-to-fine Text localization
  • Projection-based top-down localization.
  • To handle complex text layout.

12
Localization steps
13
Experimental results
14
Experimental results
15
Performance statistics
  • Statistics of 10 news videos
  • Processing time per frame 0.25 s (PIII 1G CPU)
  • Detection rate
    93.6
  • Detection accuracy
  • 87.2
  • Localization accuracy

  • gt 90
Write a Comment
User Comments (0)
About PowerShow.com