Multi-core SOC for Future Media Processing - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Multi-core SOC for Future Media Processing

Description:

Title: Multi-core SOC for Future Media Processing Author: qx Last modified by: wanggx Created Date: 6/4/2003 8:59:09 AM Document presentation format – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 18
Provided by: qx
Category:
Tags: soc | core | future | media | mpeg | multi | processing

less

Transcript and Presenter's Notes

Title: Multi-core SOC for Future Media Processing


1
Multi-core SOC for Future Media
Processing
  • Qin Xing, Yan Xiaolang
  • The Institute of VLSI Design, Zhejiang University

2
Outline
  • Opportunities challenges from media processing
  • Multimedia algorithm characteristics mapping
  • Multi-core SOC architecture technology
  • Benchmarking results
  • Project status
  • Future work

3
Opportunities
  • Video conference
  • IP-phone
  • Smart terminal
  • PDA
  • Video camera
  • HDTV
  • Set-top box

4
Challengesmultiple standards
1st MPEG-2 Encoder
6
MPEG-2
MPEG-4
2nd Generation Encoder
5
H.26L
H.263
H.264
3rd Generation Encoder
WMV
4
VP3
AVS
4th Generation Encoder
3
Mbit/s
5th Generation Encoder
WMV
2
VP3
AVS
1
H.264 / MPEG-4 part 10
0
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005

5
Challenges excellent hardware
  • Very high computation complexity
  • H.264 encoding of 720 x 576 pixels _at_ 30 frames/s
    needs up to 30 GOPS
  • Multiple standards co-exist
  • Demands of flexibility programmability
  • Low power
  • Low cost

Best choice Application Specific Instruction
Processor
6
Multimedia algorithm characteristics
  • Outer-loop and inner loop
  • Outer loop
  • Interface (GUI)
  • Os (Linux)
  • Bit-stream parsing
  • (park/unpack, VLC, CABAC)
  • Data transferring
  • Inner loop
  • Regular algorithms
  • (Prediction, FIR, DCT,
  • motion estimation)

7
Multimedia algorithm mapping
  • Programmable and heterogeneous processors are the
    preferred choice for the implementation
  • General MCU (RISC core) outer loop
  • Enhanced DSP(EDSP, bit wise operation) outer
    loop
  • Vector processor(VP, VLIWSIMD) inner loop

8
Multi-core SOC architecture
  • Top level

Media processing kernel
9
Inside the media processing kernel
GAG2
GAG1
GAG3
GAG4
GDM
V-DM1
V-DM2
V-DM3
V-DM4
GTM
EDSP-control path
Vector control path
DMA and off chip memories
2D crossbar connection network
E-DP
V-DP1
V-DP2
V-DP3
V-DP4
10
Technologies specified instruction set
__asm mov edx, mptr movdqu xmm1, edx packssdw
xmm1,xmm1// read m50 from memory to xmm1 __asm
movdqu xmm4, edx 48 packssdw xmm4,xmm4// read
m53 from memory __asm movq xmm5,xmm1 psubw
xmm1,xmm3 //m61(m50-m52) paddw xmm3,xmm5
//m60(m50m52) movq xmm5, xmm2 psraw
xmm2,1 psubw xmm2,xmm4 //m62(m51gtgt1)-m53 ps
raw xmm4,1 paddw xmm4,xmm5 //m63m51(m53gtgt1
)
for (j0jltBLOCK_SIZEj) for
(i0iltBLOCK_SIZEi) m5iimg-gtcofi0j0i
j m60(m50m52) m61(m50-m52) m6
2(m51gtgt1)-m53 m63m51(m53gtgt1)
Our IS
6 cycles
Integer IDCT in H.264
Intel MMX13 cycles
11
Technologiesinstruction mergence
Load/Store
30
result 0 pres_y dy 1 ? y_pos
y_pos1 pres_y max(0,min(maxold_y,pres_
y))//load for(x-2xlt4x) //control
pres_x max(0,min(maxold_x,x_posx))//
load result imYpres_ypres_xCOEFx
2 // computation, permutation and
load result1 max(0, min(255,
(result16)/32))//computation
Permutation
25
Computation
35
Control
10
Ld/St and Perm. Merged
Computation
6 tap sub- pixels interpolation
Control
Reduce a half of time
12
Benchmarking results for CPU core
  • CK520

13
Simulation results for DSP performance
  • Enhanced DSP
  • CAVLC(context adaptive variable length coding)
  • OGG(new audio standard)

Sequence (CIF) MIPS/frame MIPS/frame
Sequence (CIF) Max Average
Foreman 0.147,832 0.029,898
Mobile 0.541,943 0.134,240
Function MIPS/frame
MDCT 6
De_VQ 2.5
Floor/Coupling 3.5
14
Simulation results for DSP performance
  • Vector processor
  • H.264 baseline decoder

Sequence (298 frames) Sequence (298 frames) MIPS_at_30 frames MIPS_at_30 frames
Sequence (298 frames) Sequence (298 frames) Max Average
QCIF Foreman 28.1 12.7
QCIF Aikyo 19.8 5.3
CIF Foreman 116.3 52.3
CIF Aikyo 92.9 22.8
15
Project status
  • Finished 2 versions of CPU Core
  • Released DSP instruction set
  • Writing and verifying RTL of the enhanced DSP
  • Benchmarking vector processor
  • Developing software tools

16
Future work
  • Scheduling for task level parallelism(TLP)
    between heterogeneous processors
  • Simulation/debugging tools for heterogeneous
    processors
  • Methodologies for design space exploration

17
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com