Title: Semiautomatic Segmentation and Tracking of Semantic Video Objects
1Semiautomatic Segmentation and Tracking of
Semantic Video Objects
- by Chuang Gu, Ming-Chieh Lee
- from IEEE Transactions on Circuits and Systems
for Video Technology vol. 8, - no. 5, September 1998
Presenter Wei-Cheng Lin Advisor Prof. Ja-Ling
Wu
2Introduction (1/3)
- What is semantic visual information?
- Representing a meaningful entity in the input
data - Called semantic video object in the digit video
domain - Why is the extraction difficult?
- The vague definition of semantic
- Limited mechanism
- Background noise sensitivity
3Introduction (2/3)
- Unsupervised segmentation domain
- region-based homogeneous color criterion
- object-based homogeneous motion criterion
- object tracking
- Problems in the object segmentation
- color oriented lack of generality and
robustness. - motion oriented a object may have different
motion inside it. - tracking no much research!!
4Introduction (3/3)
- Our solution
- Premise A human knows the real meaning of
semantic and a computer helps the human to find
the precise location of the boundaries. - Two phase supervised segmentation for I-frame
and unsupervised segmentation for P-frame - Technologies mathematical morphology and
global perspective motion estimation/com-
pensation
5System Overview (1/2)
- Points in designing a extraction system
- Generality , Quality , Flexibility , Complexity
- Phase one Get a good 2-D template of the
semantic boundary in I-frame. - user assistance
- creation of in and out boundaries
- classification
6System Overview (2/2)
- Phase 2 Using motion information to track the
semantic boundary in P-frame. - motion prediction
- rigid-body motion estimation
- boundary warping
- boundary adjustment
7User assistance in I-frame Segmentation
- Pixel based a user needs to input the position
of the opaque pixels or transparent pixels. - Contour based only the outline of the boundary
needs to drawn . - Hybrid method the approximation has two parts,
polygonal part and pixelwise part.
8Creation of In and Out Boundaries in I-frame
Segmentation
-
- The structure element s is interactively chosen
by user so that - See Fig. 4
9Classification in I-frame Segmentation
- Work in classification find the cluster centers
and group the remaining units to the cluster
centers. - Cluster centers the pixels along the interior
and exterior outlines, denoted as a 5-dimensional
vector ( r, g, b, x, y). - Group methods pixelwise classification and
morphology watershed.
10Classification in I-frame Segmentation
- Pixelwise classification
- for each pixel , compute the distance between the
cluster centers. - Be sensitive to noise and destroy the pixel
geometrical relationship.
11Classification in I-frame Segmentation
- Morphology watershed
- the gray-tone region-growing version of watershed
is further extended and improved to color images,
which is called multivalued watershed. - It starts from markers and extends them until
they occupy all the available space of interest. - In the multivalued watershed, a point is chosen
because its a neighborhood of a marker, and the
similarity between them is the highest among all
the pairs of points and neighborhood markers.
12Classification in I-frame Segmentation
- Calculation of the similarity
- Step 1 Evaluate the representation of the
marker. In practice, use the multivalued mean of
the color image over the markers. Once a point
is assigned to the marker, the representation of
that marker is updated accordingly. - Step 2 Calculate the distance using absolute
distance function.
13Hierarchical Queue - Implementation of the
Multivalued Watershed
- The priority in the hierarchical queue is the
opposite of the distance between the pixel
concerned and the representation of the marker. - Pull out the pixel position from the highest
queue. Once the highest is empty, consider the
first nonempty queue of lower priority.
14Hierarchical Queue - Implementation of the
Multivalued Watershed
- Initialization
- Put all the neighborhood pixels of all in and
out markers into hierarchical queue based on
their similarity with the corresponding markers. - Flooding
- extract a pixel from the queue.
- If it hasnt been classified, calculate the
distance between it and all of the neighborhood
markers. - Classify the pixel to the most similar marker and
update the representation of the marker. - Put all the neighbors into the hierarchical queue
based on the similarity to the representation of
the marker.
15Hierarchical Queue - Implementation of the
Multivalued Watershed
- Gradually, all of the uncertain areas between
in and out boundaries will be assigned to the
corresponding markers. The place where the in
and out pixels meet are the semantic video
object boundary, and the final in area
constitutes the segmented semantic video object.
16Rigid Body Motion Estimation - In P-frame Tracking
- 2-D planar perspective transformation
17Rigid Body Motion Estimation - In P-frame Tracking
- Use the color information in the object to find
to parameters ( a, b, c, d, e, f, g). -
18Rigid Body Motion Estimation - In P-frame Tracking
19Rigid Body Motion Estimation - In P-frame Tracking
20Rigid Body Motion Estimation - In P-frame Tracking
- The Levenberg-Marquardt iterative nonlinear
algorithm is employed to perform the object-based
minimization in order to get the parameters. - It requires computation of the partial
derivatives of ei in the semantic object w.r.t
the unknown motion parameters (a, b, c, d, e, f,
g).
21Motion prediction - In P-frame Tracking
- A good initial guess can not only produce the
accurate results, but also decrease the iteration
steps. - In the real world, the trajectory of a semantic
video object appears to smooth. Therefore, the
motion information in the previous frame provides
a good guess in the current frame.
22Boundary Warping - In P-frame Tracking
- After obtaining (a, b, c, d, e, f, g), the
semantic video object in the previous frame is
warped toward the current frame. Because the
warped points may not fall on the integer pixel
coordinates, we use a inverse warping process to
get the warped boundary.
23Boundary Warping - In P-frame Tracking
-
- This approximation has taken into account the
rigid body motion.
24Boundary Adjustment - In P-frame Tracking
- Dealing with the nonrigid body motion, we use the
warped boundary as an approximation and employ
the same method in I-frame segmentation to solve
to the boundary adjustment.
25Experimental Results
- Three selected color video sequences are all in
QCIF format (176144) at 30 Hz. - Only the size of erosion/dilation needs to be
set. - See Fig. 11, 10 for I-frame ( size 2 ).
26Experimental Results
- See Fig. 12 for P-frame. ( No dropped frame )
- Limitation the occluded/newly exposed
background area with similar colors to the
foreground semantic video object .See Fig. 14. - The experimental results are obtained using
Pentium 133-MHz!!
27Conclusion
- Providing a semantic object extraction system
using supervised I-frame segmentation and
unsupervised P-frame tracking algorithm. - Current tracking has difficulty dealing with a
large nonorigid body movement. - Four new interesting research direction mesh,
elastic deformation, articulated bodies, and
fluids.