Video Search: Whats New - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Video Search: Whats New

Description:

Video Search: How does it work? 'Conventional' methods: catalogs, ... MPEG-4: DivX, Xvid, 3ivX implementations of certain compression recommendations of MPEG-4. ... – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 48

Provided by: rohmann

Category:

more less

Transcript and Presenter's Notes

Title: Video Search: Whats New

1
Video SearchWhats New

Gloria Rohmann
NYU Libraries
October 14, 2005

2
The problem I know its in there somewhere

Gist (what its about)
Genre
Style
Scenes
People
Objects
Dialogue
Soundtrack

3
Video Search How does it work?

Conventional methods catalogs, databases and
analog previewing
Why digitize?
Discovering video structure
Automatic and manual indexing
Data models user interfaces
Prospects for the future mobile and web services

4
Conventional Methods Browse and Search

Structured databases
AV cataloging (AACR2, MARC 21)
Shot lists
Asset management systems
Pathfinders (librarians, archivists)
Embedded markers hints, chapters, scenes (DVD)
Video logging systems
Hardware browse/skim FF, slow-mo, etc.

5
Video Search in Libraries

Mainly MARC
245 Title (usually main entry)
300 Description (physical piece)
505 Contents
508 Credits
511 Performer note
520 Summary

6
Sample screen from BobCat video record
505 Contents
520 Summary
650 Subject headings
7
Enhanced metadata shot lists, transcripts Open
University video collection
8
Footage - Opening creditsChocolate factory
workers. Alan Coxon and Kathy Sykes preparing
food. Man biting into
chocolate bar (0'00-0'50") Alan opening fridge
and walking over to Kathy at table. Kathy grating
orange. Alan showing
ingredients for cheesecake. Cookingchocolate.
Alan and Kathy breakingchocolate and smelling
it. Breakichocolate. Kathy
tasting chocolate (0'51"-"2'00"
Footage - Opening credits Chocolate factory
workers. Alan Coxon and Kathy Sykes preparing
food. Man biting into chocolate bar
(0'00-0'50") Alan opening fridge and walking over
to Kathy at table. Kathy grating orange. Alan
showing ingredients for cheesecake. Cooking
chocolate. Alan and Kathy breaking chocolate and
smelling it. Breaking chocolate.Kathy tasting
chocolate (0'51"-"2'00)
9
Video Pathfinders
10
Asset Management Systems

Building digital library collections
What metadata (METS, MPEG-21, etc.)?
Distribution standards required
Not born digital ingest problem
DRM what drives commercial distribution?

11
Browse and skim Analog Control (VCRs)

Pause, FF, rewind (all VCRs)
Some VCRs
Pause and frame-by-frame
High-speed picture search AKA FF
Variable speed picture search
Index recording VCR marks beginning of each
recording on a tape.

12
Browse and skim DVDs Digital Advantages

Pause, FF, rewind
Navigate
Frame-by-frame menus, chapters or tracks
Insert markers, repeat play
Change audio, subtitle languages, show closed
captioning
Shuttle/scrub onscreen

13
Browse and Skim Media PlayersDVD player clones
can be enhanced with SDKs

Media Players are DECODERS
Pause, FF, rewind
Variable speed
Navigate menus, chapters, tracks
Insert markers
Change audio subtitles
Show closed captioning
Shuttle/scrub

14
Media Player ExampleDVD player clones can be
enhanced with SDKs
File markers added by end-user
Play speed settings 0.5 gtgt 3X
Start, stop, pause, rewind to beginning, FF to
end, advance by frame
15
What Is Video?

Authored video has
Series of still images _at_25-30 fps
Structure frames gtgt shots gtgt scenes
MODALITIES
(Audio tracks)
(Text captioning, subtitles, etc.)
(Graphics logos, running tickers etc.)
Production metadata timestamp, datestamp, flash
on/off

16
Advantages of Digital Video

Store and deliver over networks
Allow analysis by computers
Allow auto manual indexing
USING
Image processing
Signal processing
Information visualization

17
Why Compress Video?

1 frame (_at_TV brightness) 0.9 megabytes (MB) of
storage
At 29 fps, each second 26.1 MB of storage
30 minute film 53 gigabytes (GB) of storage
OBJECT Make file smaller retain as much
information as possible

18
Encoding Formats

These formats use some kind of compression
similar encoding methodsmany CODECSsome
lossy, others lossless
AVI audio-video interleave or interactive
QuickTime
MPEG family MPEG-1, 2, 4
H261 for video conferencing
New H264 JPEG 2000

19
CODECS

Compressor/Decompressor, or Coder/Decoder
Produce and work with encoding formats.
Central to compression and encoding perform
signal and image processing tasks
Examples Cinepak, Indeo, Windows Media Video.
MPEG-4 DivX, Xvid, 3ivX implementations of
certain compression recommendations of MPEG-4.

20
How Do CODECS Work?

Movement creates temporal aliasing human
eye/brain fills in the gaps
Blurring produced by camera shutter softens edges
Modeled by CODECS and algorithms
Goal acceptable facsimile of moving scene

21
Configuring CODECS for analysis
Psychovisual enhancements
Maximum Keyframe Interval
22
What looks best to you?
Segmentation method B
Segmentation method A
Original image
Jermyn, I. Psychovisual Evaluation of Image
Database Retrieval and Image Segmentation
23
Encoding Methods predictive

Sampling value of function _at_ regular intervals
(example brightness of pixels)
Quantization frequency of sampling (1 in 10 vs.
1 in 100 frames)
Discrete cosine transforms (DCT) an array of data
(not just one pixel) is transformed into another
set of values.
Inter-frame vs. Intra-frame encoding

24
Video Structure

Video
Scene
Shot
Frame

25
Using Encoding Methods to Discover Structure
26
Shot Boundary Detection

Algorithms that compare the similarities between
nearby frames. When the similarities fall below a
pre-determined level, the limit of a shot is
automatically defined
Edge detection
Compare color histograms
Compare motion vectors

27
Revealing Video Structure with Non-linear
Editors

Clips are basis for video editing
Non-linear editors (like iMovie, Windows Movie
Maker) can create clips based on keyframes and
shot boundary detection
NLEs can also isolate frames
Video logging software works the same way
(Virage, Scenalyzer Live)

28
Clip Creation with NLEs
29
Spatial Temporal Segmentation

1. Use shot boundary detection and keyframes to
define shots choose representative frames
2. Use CBIR (Content-based Image Retrieval)
techniques to reveal features in representative
frames
(shapes, colors, textures)

30
CBIR Techniques

Images (frames) have no inherent semantic
meaning only arrays of pixel intensities
Color Retrieval compare histograms
Texture Retrieval relative brightness of pixel
pairs
Shape Retrieval Humans recognize objects
primarily by their shape
Retrieval by position within the image

31
MPEG-4Content-based Encoding

Encodes objects that can be tracked from frame to
frame.
Video frames are layers of video object planes
(VOP).
Each VOP is segmented coded separately
throughout the shot
Background encoded only once.
Objects are not defined as to what they
represent, only their motion, shapes, colors and
textures, allowing them to be tracked through
time.
Objects and their backgrounds are brought
together again by the decoder.

32
MPEG-4 Content-based encoding
Video object plane (VOP)
Video object plane (VOP)
Background encoded only once
Ghanbari, M. (1999) Video Coding An Introduction
to Standard Codecs
33
AMOS Tracking Objects Beyond the Frame
http//www.ctr.columbia.edu/dzhong/rtrack/demo.ht
m
34
Are We Doing Multimedia?Multimodal Indexing

Ramesh Jain To solve multimedia problems, we
should use as much context as we can.
Visual (frames, shots, scenes)
Audio (soundtrack speech recognition)
Text (closed captions, subtitles)
Contexthyperlinks, etc.
IEEE Multimedia. Oct-Nov. 2003
http//jain.faculty.gatech.edu/media_vision/doing_
mm.pdf

35
Multimodal Indexing
Settings, Objects, People
Modalities Video, audio, text
Snoek, C., Worring, M. Multimodal Indexing A
Review of the State-of-the-art. Multimedia Tools
Applications. January 2005
36
Building Video Indexes

Same as any indexing processdecide
What to index granularity
How to index modalities (images, audio, etc.)
Which features?
Discover spatial and temporal structure
deconstructing the authoring process
Construct data models for access

37
Building Video IndexesStructured modeling

Predict relationship between shots
Pattern recognition
Hidden Markov Models
SVM (support vector machines)
Neural networks
Relevance feedback via machine learning

38
Data Models for Video IR

Based on text (DBMS, MARC)
Semi-structured (video XML or hypertext)
MPEG-7, SMIL
Based on context Yahoo Video, Blinkx, Truveo
Multimodal Marvel, Virage

39
Virage VideoLoggerTM
Mark annotate clips
SMPTE timecode
Keyframes
Text or audio extracted automatically
40
Annotation Metadata Schemes

MPEG-7
MPEG-21
METS
SMIL

41
IBM MPEG-7 Annotation Tool
42
MPEG-7 Output from IBM Annotation Tool
Duration of shot in frames
- ltMediaTimegt ltMediaTimePointgtT00002720830F30
000lt/MediaTimePointgt ltMediaIncrDuration
mediaTimeUnit"PT1001N30000F"gt248lt/MediaIncrDurati
ongt lt/MediaTimegt - ltTemporalDecompositiongt -
ltVideoSegmentgt - ltMediaTimegt ltMediaTimePointgtT00
003123953F30000lt/MediaTimePointgt
lt/MediaTimegt - ltSpatioTemporalDecompositiongt -
ltStillRegiongt - ltTextAnnotationgt
ltFreeTextAnnotationgtIndoorslt/FreeTextAnnotationgt
lt/TextAnnotationgt - ltSpatialLocatorgt ltBox
mpeg7dim"2 2"gt14 15 351 238lt/Boxgt
lt/SpatialLocatorgt lt/StillRegiongt
Location and dimension of spatial locator in
pixels
Annotation
43
Browse Video Surrogates
44
SMIL Hypertext Hypermedia
ltwindow type"generic" duration"13000"
height"480" width"320 underline_hyperlinks"tru
e" /gt ltfont face"arial" size"2"gt ltolgt ltligtlta
href"commandseek(00)" target"_player"gtIntrolt/a
gtlt/ligt ltbr/gt ltligt lta href"commandseek(210)"
target"_player"gtQ1 to Kerrylt/agt, lta
href"commandseek(426)" target"_player"gtBush
rebuttallt/agt lt/ligt
45
Scholarly Primitives

Low-level methods for higher-level research
Discovering
Annotating
Comparing
Referring
Sampling
Illustrating
Representing

Unsworth, John. (2000) Scholarly Primitives
what methods do humanities researchers have in
common, and how might our tools reflect this?
46
User Interfaces for Video IR