Title: Motion From 2D Image Sequences
1Motion From 2D Image Sequences
Dr. Ramprasad Bala Computer and Information
Science UMASS Dartmouth CIS 585 Image
Processing and Machine Vision
2Motion Analysis
- A changing scene may be observed via a sequence
of images. - Motion can be observed due to motion of the
objects or observer (camera motion) or both. - Changes in a scene provide features for detecting
objects that are moving or computing their
trajectories.
3Motion Phenomena
- Four general cases of motion
- Still camera, single moving object, constant
background - Still camera, several moving objects, constant
background - Moving camera, relatively constant scene
- Moving camera, several moving objects.
4Motion Applications
- The simplest application is the detection of
motion in a constant background - Security checkpoints or automatically switching
light on - Tracking of objects or people
- Objects or people can be tracked over time to
predict trajectories. Multiple cameras can be
used to predict 3D motion.
5Motion Applications
- A moving camera creates image changes, even if
the 3D scene is static - Create more observations of the scene than a
single static camera - Makes possible computation of relative depth,
objects closer tend to change faster. - Provide perception and measurement of 3D shape of
nearby objects triangulation similar to stereo
vision.
6Motion Applications
- The most difficult motion problem involves moving
sensors and scenes containing so many moving
objects that it is difficult to identify any
constant background. - Robots navigating through traffic.
- Football games!
- Report the outcome of Exercise 9.1!
7Motion detection Image subtraction
- In surveillance applications a stationary camera
might be observing a non-uniform background.
Image subtraction can be used effectively to
observe changes in the scene. - If images are received at 30 fps, then sampling
the image frames can be more efficient. - The size and location of the change can be
obtained easily.
8(No Transcript)
9Motion Vectors
- Motion of 3D scene points results in motion of
the image points to which they project. - Zooming out can be performed by reducing the
focal length of a still camera or by backing away
from the scene while keeping the focal length
fixed. - The optical axis points toward a scene point
whose image does not movethis is the focus of
contraction.
10- Zooming in is performed by increasing the focal
length of a still camera or by moving towards a
particular scene point whose image does not
change this is called the point of expansion. - Panning a camera or turning our heads causes the
images of the 3D scene points to translate.
11Motion Fields
- A 2D array of 2D vectors representing the motion
of 3D scene points is called the motion field.
The motion vectors in the image represents the
displacements of the images of moving 3D points.
Each motion vector might be formed with its tail
at an image 3D point at time t and its head at
the image of that same 3D point imaged at t?t.
Alternately, each motion vector might correspond
to an instantaneous velocity estimate at time t.
12FOE and FOC
- The focus of expansion (FOE) is that image point
from which all motion field vectors diverge. The
FOE is typically the image of a 3D scene point
towards which the sensor is moving. - The focus of contraction (FOC) is that image
point towards which all motion vectors converge,
and is typically the image of a 3D scene point
which the sensor is receding.
13Motion Fields
- Computation of the motion field can support both
the recognition of objects and an analysis of
their motion. - The intensity of the 3D scene point P and that of
its neighbors remain nearly constant during the
time interval (t1,t2) over which the motion
estimate for P is made. - Image flow is the motion field computed under the
assumption that image intensity near
corresponding points is relatively constant.
14Computing Motion Flow
- Using point correspondences
- A sparse motion field can be computed by
identifying pairs of points that correspond in
two images taken at time t1 and t1 ?t. - The points we must use must be distinguished in
some way so that they can be identified and
located in both images.
15Point correspondence problem
- Automatically extracting point correspondences is
not a trivial problem. It is a complete research
topic. - Several methods have been proposed.
- Corner detectors or high interest points
- Centroid of persistent moving regions from
segmented images. - Interest operators computes intensity variances
in the vertical, horizontal and diagonal
directions. - Searching in a small neighborhoods using a mask.
- Texture based operator described in Exercise 9.3.
16(No Transcript)
17Correspondences
- This procedure once applied to an image at time
t1, searching for interest points in the
subsequent image can be guided by the location of
the points in the first image. - Given motion is not going to be large between
subsequent images, a small neighborhood can be
searched and matched using cross-correlation.
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22MPEG Compression of Video
- MPEG compression uses complex operations to
compress a video stream up to 2001 - An MPEG encoder replaces an entire 16x16 image
block in one frame with motion vector defining
how to locate the best matching 16x16 block of
intensities in some previous frame. - Uniform grid of blocks is used and match of each
block is sought by searching a previous image of
the video sequence. - Ideally each block Bk can be replaced by a single
vector. Changes in intensities can also be
transmitted using small number of bits.
23(No Transcript)
24Computing Image Flow
25Computing Image Flow
- We will look at a classical method that combines
spatial and temporal gradients computed from at
least two frames. - We assume that the object reflectivity and the
illumination of the object does not change during
the interval t1 t2. - We assume that the distances of the object from
the camera or light sources does not vary
significantly over this interval. - We shall also assume that each small intensity
neighborhood Nxy at time t1 is observed in some
shifted position Nx?xy?y.
26The image flow equation
- Using the continuous intensity function f(x,y,t),
we apply its Taylor series representation in s a
small neighborhood of an arbitrary point (x,y,t). - This is a multivariable version of the very
intuitive approximation for the one variable
case.
27(No Transcript)
28- The image flow equation does not give a unique
solution for the flow vector V, but imposes a
linear constraint.
29Solving for Image Flow
- The image flow equation provides a constraint
that can be applied at every pixel position. - By assuming coherence, neighboring pixels are
constrained to have similar flow vectors. - Propagating constraints we can reach two
conclusions - Only at the interesting corner points can image
flow be safely computed using small apertures. - Second, constraints on the flow vectors at the
corners can be propagated down the edges
however, as Figure 9.12(c) shows,it might take
many iterations to reach an interpretation for
edge points, such as P, that are distant from any
corner.
30(No Transcript)
31Computing the path of moving points
- If the intensity neighborhood of each point is
uniquely textured, then we should be able to
track the point over time using normalized
cross-correlation. - Also domain knowledge might make it easier to
track an object orange tennis ball in a tennis
match or a pink face in front of a workstation
etc.
32Tracking objects
- We can exploit the following general assumptions
that hold for physical objects in 3D - The location of a physical object changes
smoothly over time - The velocity of a physical object changes
smoothly (both in speed and direction) over time - An object can be at only one location in space at
a given time - Two objects cannot occupy the same location at
the same time.
33- The first three assumptions hold for 2D
projections of 3D space, i.e smooth 3D motion
results in smooth 2D trajectories. - The fourth may be violated under projections,
since one object might occlude another. - We will see an algorithm that uses these four
assumptions.
34Tracking Algorithm
- Definition if an object i is observed at time
instants t 1,2,,n, then the sequence of image
points Ti (pi,1, pi,2,,pi,t,,pi,n) is called
the trajectory of i. - Between any two points of the trajectory we can
define their difference vector - Vi,t pi,t1 pi,t
35- We can define a smoothness value at a trajectory
point pi,t in terms of the difference of vectors
reaching and leaving that point. - Smoothness of direction is measured by their dot
product. - Smoothness of speed is measured by comparing the
geometric mean of their magnitude to their
average magnitude - The weight w of the two factors is set between 0
and 1, such that Si,t is between 0 and 1.
36- Note that for a straight trajectory with equally
spaced points al the difference vectors are the
same, the equation will yield 1.0, which is the
optimal point smoothness value. - Changes in speed or direction will decrease the
value of Si,t. - Suppose you have m points over n frames, the
problem is to construct m trajectories Ti with
the maximum smoothness value. - The total smoothness is defined by
37(No Transcript)
38Exercise 9.10 as practice.
39(No Transcript)
40Detecting Significant Changes in Video
- Changes can be several forms
- Scene change (large change in the background)
- Shot change (switching to different cameras)
- Camera pan (motion vector to one side)
- Camera zoom (changes in focal length)
- Camera effects fade, dissolve and wipe.
41Segmenting Video Sequence
- The goal of the analysis is to parse a long
sequence into sub-sequences representing single
shots or scene. - For example consider your evening news
- Often there are several different shots of the
event being reported with transitions between
them. - The transitions can be used to segment the video
and can be detected by large changes in the
features of the image over time.
42- One obvious method of computing the difference
between two frames of a sequence is to compute
the average difference between corresponding
pixels. - Depending on the camera effect delta t might be
one for more frames. - This equation is likely to yield large numbers
even for small amounts of changes.
43- A more robust measure could be to break the image
into blocks and compute the mean (u) and variance
(v) of the intensities. - Then compare the corresponding blocks. If most
blocks remain the same then the shot remains the
same. - Kasturi and Jain proposed the following
likelihood ratio for this purpose.
44- An alternate solution is to compute the histogram
(of either color or intensities) of the two
frames and compute the similarities between the
histograms (which would be faster than the method
described earlier). - However the weakness of this method is as before,
no spatial relation is taken into account.
45(No Transcript)
46(No Transcript)
47Ignoring some camera effects
- Certain camera effects can be detected and
ignored. - For example zooming or panning would essentially
result in the same segment. - Zooming and panning can be detected using motion
fields a similar block-wise approach to MPEG
can used for efficiency.
48Storing the sub-sequences
- These sub-sequences, once segmented can be stored
in a database and can be retrieved like in CBIR. - Key frames can be identified for indexing.
49- Image Segmentation Part I