Title: Text%20Detection%20in%20Video
1Text Detection in Video
2Background
- Video OCR Text detection, extraction and
recognition - Detection Target Artificial text
- Text detection
- Detect the region from Single frame
- Refine the region by combining consecutive frames
3 Existing Work
Feature Extraction Text Detection based on feature
Color Connected-component
Texture Texture-Segmentation
Edge Top-Down Bottom-Up
4Connected-component-based methods
- Basic idea
- Treat text as an uniform color (color level) and
classify each pixel as text or non-text according
to the color value. - Combine connected text-pixels into connected
components. - Group collinear connected components into a text
string. - Advantage
- Can detect an arbitrary orientation text ----
with similar color and in a simple background. - Disadvantage
- Sensitive to color variance
- Lossy compression of video introduces color
bleeding - Complex background
5Texture Segmentation method
- Basic idea
- Treat text as a type of texture
- Use texture segmentation algorithms to detect
text - Gabor Filter
- Gaussian derivatives
- Advantage
- Can segment text areas graphic areas in a
simple background efficiently. It is usually used
in document analysis. - Disadvantage
- Time-consuming
- Cannot handle well a text embedded in various
background.
6Bottom-Up method
- Basic idea
- A seed region is defined as a small region with
high edge density. - Grow a seed region into successively larger
components until all seed regions are reached on
the image. - Advantage
- It is a generic method to detect a homogeneous
object of various shape. That is, it can detect
not only a rectangular object, but also other
shapes. - Disadvantage
- Sensitive to noise.
- Can not handle the large range of font-size.
- Sensitive to the stroke density (different
language).
7Top-Down method
- Basic idea
- Based on run-length smoothing algorithm
- Analyze horizontal and vertical projection
profiles - Advantage
- Can detect the boundary of horizontal alignment
text string quickly and correctly - Noise insensitive
- Disadvantage
- Cannot handle diagonal alignment text.
- One pass of horizontal vertical projection
cannot handle the complex layout.
8Analysis (1)
- A certain contrast against background
- Artificial text strings are designed to be read
easily - A certain stroke density
- Text strings always appear horizontally
- Spatial cohesion
- Characters of the same text string are of similar
heights, orientation and spacing - Size constraint
- Text strings have certain size restriction
- A text string appears in multiple consecutive
frames and the similar position.
9Analysis (2)
Problems Resolutions
How to extract more useful edge? Local Thresholding
How to highlight text areas? Text area recovery
How to detect text regions fast and correctly? Coarse-To-Fine detection
10Single Threshold
11Local threshold (1)
- Use a small kernel (red) to scan the whole image.
- In a bigger window (gray) surrounding the kernel,
calculate the local threshold corresponding to
its local histogram.
b. Local threshold selection
a. Window move
12Local threshold (2)
13Text-like area recovery (1)
Before recovery After recovery
14Text-like area recovery (2)
Before recovery After recovery
15High pass filter
16Coarse-to-Fine detection
- Using Top-down scheme to detect text-like areas
17Detect text-like areas
1)
2)
3)
4)
b. Coarse vertical projection
18Refinement
- Combine the neighboring text areas with similar
height - Using size constraints to remove unsatisfied areas
19Multi-frame analysis
- Text region matching
- Find all the regions corresponding to the same
text - Text region enhancement
- Enhance the text image quality by multi-frame
integration - Repetitive text elimination
- Only record the text at its first emergence.
20Thank you!
End