Title: Next Generation of System Architectures for TeleImmersive Environments
1Next Generation of System Architectures for
Tele-Immersive Environments
- Klara Nahrstedt
- (klara_at_cs.uiuc.edu)
- University of Illinois at Urbana-Champaign
2Outline
- Tele-immersive Environments
- Challenges
- Multi-camera Systems
- Multi-display Systems
- Internet2 Networking
- Architectural Design Space
- 3D Compression, 3D Streaming, Adaptation
- Case Studies Comparison between TEEVE and
Coliseum - Conclusion
3Tele-Immersive Environments
(Application Model)
4TI Goals
- Investigate tools for improving the effectiveness
of remote collaborations using COTS components
for broader audience - Investigate architectures for tele-immersive
systems with 3D realism
- Develop infrastructures, algorithms, systems and
technologies that support - Ease of use
- Quality
- Perceptual cues
- Scalability
- 3D video and audio
5Experimental Examples of Tele-immersive
Environments
TI in UIUC (Prof. Nahrstedt)
TI in UC Berkeley (Prof. Bajcsy)
Internet 2
6Experimental Examples of Tele-immersive
Environments
TI in HP Single PC Coliseum System (Dr. Harlyn
Baker, Dr. Nina Bhatti, et al)
73D Camera Challenges
- 3D Camera
- RGB and Depth yield 5 Bytes per pixel
- Frame Size 640x480 pixels
- Calibration very time consuming
- Real-Time Capturing and Content Creation (15 Fps)
- 3D Camera needs one PC
- Need FireWire Bus for Cameras to get high
throughput - 3D Multi-camera Environment
- 10-20 3D Cameras
- Camera Spacing Issue
- Different spacing for physical activity than
collaborative conferencing - 180 or 300 or 360 degree camera coverage
8 3D Content Challenges
- Huge TI Frame Size
- Macro-frame (e.g., 10 3D Frames in one time
instance) - TI Frame rate
- 15 Macro-frames per second
- Huge bandwidth uncompressed
- One 3D stream 640x480 (pixels/frame) x 5
(bytes/pixel) x15 (frames/sec) 23,040,000
bytes/second (approx. 23MBytes/sec) - One TI (10 3D Streams) 230,040,000 bytes/sec
(approx. 230MBytes/sec) - 3D Compression
- RGB can be delivered in lossy manner
- Depth must be delivered in lossless manner
9Networking Challenges
- Multi-stream Coordination due to large bursts at
routers cant send all streams at the same
time, need spacing - Synchronization of multiple streams need to
send all streams within a constrained time
interval so that rendered frame can be done in
timely fashion - Decision about transport protocol UDP takes
small packet sizes, hence large context switch
overhead, TCP performs positive acknowledgement
algorithms, hence incur larger delays over lossy
Internet. - QoS Issues
- Keep low end-to-end latency,
- Need high throughput,
- Need low or no loss rate,
- Deal with power issues on 3D cameras in terms of
thermal problems, - Keep low synchronization skews
- Deal with very heterogeneous networks (Gigabit
LAN, Internet2, 100 Mbps LAN) - Deal with heterogeneous computing platforms
(processors, OS, busses, ) - Deal with scale (camera arrays, participants,
microphone arrays, )
10Timing Challenges
11Multi-streaming without Coordination
12Rendering Challenges
- Rendering in Real-Time
- Consider between 5-15 3D streams with frame rate
of 10 fps and render them into one 4D stream with
10 fps - Need rendering station that must run at 50-150
fps with 640x480 pixels per frame, and 5 bytes
per pixel - Placement of Rendering Capabilities
- Placement is possible at the sender side or at
the receiver side - Assistance in Display Management
- Rendering may need to adapt based on viewer model
- Feedback to Sender Side with meaningful
information - If rendering cant happen at the desired speed,
which streams to stop at the sender side, which
information to drop?
13Display Challenges
- Selection of scene views
- Automatic computation of scene views
- Visual information management
- Selection of screen layouts
- Mapping of camera views to display locations if
multi-view display is desired - Mapping of user feedback onto view/layout
selection
14Next Generation of System Architectures Design
Choices
- Methodology
- We will consider two case studies, their system
architectures and their approaches to the
challenges - TEEVE system (UIUC/UCB) and Coliseum system
(HP/U.Colorado) - Other tele-immersive systems
- U. North Caroline at Chapel Hill (H. Fuchs, Ketan
Mayer-Patel et al) VR TI Environment - Microsoft Research (Yong Rui et al) Video
Conferencing for Virtual Classroom - Design Choices
- Architectural and Functional Choices
- 3D Camera Choices
- Data Model Choices
- Compression Choices
- Streaming Protocol Choices
- Rendering Choices
- Display Choices
15TEEVE Architecture
16Coliseum Architecture (Complete Dataflow in a
Coliseum Session)
173D Camera Choices
- TEEVE Cluster of Four 2D Cameras where 3
cameras are BW and 1 camera is RGB - Coliseum Five CCD cameras with CMOS imagers
18Data Model
- TEEVE
- 3D Data Model
- Each camera cluster produces a depth image fij,
containing depth and color information per pixel - Each 3D cluster camera creates a 3D video stream
- At time Tj the capturing tear has N 3D
reconstructed frames that constitute a
macro-frame Fjf1,j, f2,j, , fN,j to show the
same scene from different viewing angles - TEEVE operates N 3D video streams
- Coliseum
- 2D Data Model
- Each camera produces color information
- All 5 2D streams get reconstructed and rendered
into one 3D stream. - The 3D stream includes RGB and alpha (depth)
information
193D Compression
- TEEVE
- Lossy Compression for RGB data
- Scheme A color reduction
- Scheme B Motion JPEG
- Lossless (Run-length coding) compression for the
depth data - Coliseum
- Lossy Compression for RGB data
- MPEG4
- Lossless RLE compression for alpha data
203D Video Streaming
- TEEVE
- End-to-End Multi-tier Streaming Protocols
- Bandwidth Estimation
- Rate Adaptation
- TCP/IP protocol per 3D video stream to use the
large packet size and avoid frequent context
switching - Spacing and rate shaping at the gateway
- Coordination among gateways to avoid drop of
packets at routers using token scheme - Coliseum
- UDP for the 3D rendered video stream
Internet
Internet
21Rendering Choices
- TEEVE
- Rendering happens at the receiver side
- Reason allow for more inter and intra-stream
adaptation - Current maximum rendering is possible at 4
macro-frames per second with up to 12 3D camera
clusters, creating 12 3D streams, 640x480 pixels,
rendered together into a 4D video stream (i.e.,
the rendering processor must schedule up to 48
frames per second) - Coliseum
- Rendering happens at the sender side, i.e., the
system renders all desired viewpoints from the
single 3D IBVH model of the local user and ships
them independently to the different participating
sites - Reason need all streams to reconstruct the 3D
stream (depth) and once reconstructed, it can be
rendered together - Current rendering is possible at 15 frames per
second with 5 CCD (firewire from Greypoint)
cameras, creating and rendering one 3D video
stream, 640x480 pixels
22Stream/Display Mapping Choices (TEEVE)
- Mapping of Streams onto displays based on
locality of cameras - Selection of Scene Views
- Default scene views are automatically computed
using geometry of 3D camera locations - Selection of Screen Layout
- Three parameters determine how scene views
rendered at view points are presented to users - Number of view points
- Number of displays
- Which view point is the focus-view point
23Display Choice (Coliseum)
- Use of single display since it is a single user
video conferencing environment with 3D realism - Single 3D view point of multiple users in a
conferencing setup - This was first version -- it later became all
prettified (molded plastic, retractable arms,
etc.)
24Experimental Setup (TEEVE)
- Metrics
- Overall throughput of macro-frame Fj
- Completion time interval for macro-frame Fj
- Individual throughput of stream i
- End-to-end Delay of a macro-frame Fj
- Experimental parameters of remote testbed
- Number of sender gateways 2
- Number of receiver gateways 2
- Number of 3D streams 12
- Frame rate 4fps
- UIUC-UCB
- Equipment
- Dell precision 450 (Dual Xeon processor with 1
GBytes memory), running Fedora Core 2 - LAN 100 Mbps and Internet 2 between UCB and UIUC
25Validation (TEEVE)
Macro-Frame Delay
Overall Throughput
Completion Interval
26Validation of Compression (TEEVE)
Compression Time
Visual Quality
Scheme A
Before Compression
After Compression
Scheme B
27Experimental Setup and Results (Coliseum)
- Multi-PC IBVH, 733 MHz P3s (2-person conference)
- 8-10 Hz QVGA (224 x 224 displays / 7 computers)
- Single-PC IBVH, Dual-processor 2GHz P4
- 17 Hz on VGA (300 x 300 displays)
- In tests of up to 10 users
- Observed Latency 250 ms
- Network Latency 6 ms (local, 25 ms to Boulder)
- Bandwidth (with MPEG) 620 Kbps per stream
- Scales in work-conserving way with participants
28Validation (Coliseum)
- Overall Latencies
- Cameras 20
- Processing 50
- Network lt5
- Display 25
Coliseum is compute intensive, so we can control
end-node behavior and system performance
29Conclusion
- Tele-immersive environments are emerging, but to
make them cost-effective and high-performance,
need new and joint hardware and software
architectures - New 3D COTS cameras can be built but present
serious challenges to the vision, system and
network design - New 3D camera hardware needed
- New vision software needed
- New streaming protocols needed
- New 3D compression algorithms needed
- New multi-tier feedback schemes needed
- New user view models needed
- New timing and synchronization models needed
30Acknowledgement
- The TEEVE work is supported by NSF funding and it
is a joint work with - UIUC researchers Zhenyu Yang, Bin Yu, Jin Liang,
Yi Cui, Zahid Anwar, Robert Bocchino, Nadir
Kiyanclar, Professor Roy Campbell, William Yurcik - UCB researchers Professor Ruzena Bajcsy and Dr.
Sang-Hack Jung - We would like to acknowledge the usage of the
material from the ACM Multimedia 2003 paper about
the Coliseum System with further information from
Dr. Nina Bhatti, Hewlett Packard - Three publications
- Z. Yang et al, Real-time 3D Video Compression
for Tele-Immersive Environments, SPIE/ACM
Multimedia Computing and Networking (MMCN),
January 2006, San Jose, CA - Z. Yang et al, TEEVE The Next Generation
Architecture for Tele-immersive Environments
IEEE International Symposium on Multimedia (ISM),
December 2005, Irvine, CA - H. Baker et al, Computation and Performance
Issues in Coliseum, An Immersive
Videoconferencing System, ACM Multimedia 2003,
Berkeley, CA - H. Baker et al. Understanding performance in
coliseum, an immersive video conferencing
system, ACM Transactions on Multimedia
Computing, Communications, and Applications, Vol.
1, 2005