Title: Paul Scherrer Institute
1Paul Scherrer Institute
Stefan Ritt
The PSI DRS4 Integrated Circuit Chip
2Agenda
- Introduction to Switched Capacitor Array Chips
- Comparison with FADCs
- Overview of chips on the market
- The DRS4 chip
- Design principles
- Special features
- Some applications
- New ideas for DRS5 chip to be designed in 2011
- Increased bandwidth
- Zero dead time
3Introduction toSwitched Capacitor ArrayChips
4Detectors in Particle Physics
- Particles interact with matter and produce light
Signal
100s mV
10-100 ns
5Flash ADC Technique
FADC
Q-sensitive Preamplifier
60 MHz12 bit
Shaper
PMT/APD Wire
Amplitude
TDC
Time
- Shaper is used to optimize signals for slow 60
MHz FADC - Shaping stage can only remove information from
the signal - Shaping is unnecessary if FADC is fast enough
- All operations (CFD, optimal filtering,
integration) can be done digitally
6Nyquist-Shannon Theorem
- If a function x(t) contains no frequencies higher
than F Hertz, it is completely determined by
giving its ordinates at a series of points spaced
1/(2F) seconds apart.
If a detector produces frequencies up to 500 MHz
(0.6 ns rise time), all information from that
detector is recorded if sampled at 1 GSPS with
good enough signal-to-noise ratio
7How to measure best timing?
Simulation of MCP with realistic noise and
different discriminators
K. Byrum, H. Frisch, J.-F. Genat et al., IEEE
Trans.Nucl.Sci.57, 525 (2010)
8Currently available fast ADCs
- 8 bits 3 GS/s 1.9 W ? 24 Gbits/s
- 10 bits 3 GS/s 3.6 W ? 30 Gbits/s
- 12 bits 3.6 GS/s 3.9 W ? 43.2 Gbits/s
- 14 bits 0.4 GS/s 2.5 W ? 5.6 Gbits/s
24x1.8 Gbits/s
- Requires high-end FPGA
- Complex board design
- FPGA power
1.8 GHz!
9ADC boards
- PX1500-4 2 Channel3 GS/s8 bits
- ADC12D1X00RB 1 Channel 1.8 GS/s 12 bits
1-10 k / channel
10Switched Capacitor Array
0.2-2 ns
Inverter Domino ring chain
IN
Waveform stored
Out
FADC 33 MHz
Clock
Shift Register
Time stretcher GHz ? MHz
11Switched Capacitor Array
- Cons
- No continuous acquisition
- Limited sampling depth
- Nonlinear timing
- Pros
- High speed (5 GHz) high resolution (11.5 bit)
- High channel density (9 channels on 5x5 mm2)
- Low power (10-40 mW / channel)
- Low cost ( 10 / channel)
Dt
Dt
Dt
Dt
Dt
Goal Minimize Limitations
12The DRS4 Chip
13Design Options
- CMOS process (typically 0.35 0.13 mm) ?
sampling speed - Number of channels, sampling depth, differential
input - PLL for frequency stabilization
- Input buffer or passive input
- Analog output or (Wilkinson) ADC
- Internal trigger
- Exact design of sampling cell
PLL
Trigger
ADC
14DRS History
1995
DSC
Roger Schnyder, Christian Brönnimann, pb
Tiny signal
20 pF
0.2 pF
DRS1
2002
I
Temperature Dependence
kT
2004
DRS2
Roberto Dinapoli
DRS3
2007
2008
DRS4
PLL-regulated Sampling Speed
15DRS4
- Fabricated in 0.25 mm 1P5M MMC process(UMC), 5
x 5 mm2, radiation hard - 81 ch. each 1024 bins,4 ch. 2048, , 1 ch. 8192
- Passive differential inputs/outputs
- Sampling speed 700 MHz 5 GHz
- On-chip PLL stabilization
- Readout speed 30 MHz, multiplexedor in parallel
1612 bit resolution
lt8 bits effective resolution
11.5 bits effective resolution
17Bandwidth
- Bandwidth is determined by bond wire and
internalbus resistance/capacitance - 850 MHz (QFP), 950 MHz (QFN), ??? (flip-chip)
2 nH
Bond wire
Parasitic 10 pF
finalbus width
QFP package
850 MHz (-3dB)
Simulation
Measurement
Ueli Hartmann
18Bump Bonding
- Reduce input inductance by using
- bump bonding instead of wire bonding
200 mm
75 mm
19How to minimize dead time ?
- Fast analog readout 30 ns / sample
- Parallel readout
- Region-of-interestreadout
- Simultaneouswrite / read
AD9222 12 bit 8 channels
20ROI readout mode
normal trigger stop after latency
delayed trigger stop
Trigger
stop
Delay
33 MHz
e.g. 100 samples _at_ 33 MHz ? 3 us dead time ?
300,000 events / sec.
readout shift register
Patent pending!
21Daisy-chaining of channels
Domino Wave
Domino Wave
clock
clock
enable input
enable input
Channel 0
1
Channel 0
0
enable input
enable input
Channel 1
0
Channel 1
1
Channel 2
1
Channel 2
0
Channel 3
0
Channel 3
1
Channel 4
Channel 4
0
1
Channel 5
Channel 5
1
0
Channel 6
Channel 6
0
1
Channel 7
Channel 7
1
0
DRS4 can be partitioned in 8x1024, 4x2048,
2x4096, 1x8192 cellsChip daisy-chaining possible
to reach virtually unlimited sampling depth
22Simultaneous Write/Read
FPGA
Channel 0
0
Channel 1
0
8-foldanalog multi-eventbuffer
Channel 2
0
Channel 3
0
Channel 4
0
Channel 5
0
Channel 6
0
Channel 7
0
Expected crosstalk few mV
23DRS4 around the world
Shipped (-Jan 2011) 2200 Chips 120 Evaluation
Boards
24MEG Experiment
- MEG experiment _at_ PSI searches for m?eg decay
- After 10 years of chip design, DAQ setup,
firmware programming, MEG runs with 3000 channels
as designed - 40 ps timing resolutions between all channels,
running at 1.6 GS/s - Double buffer readout mode increases life time
to 99.7 at 10 Hz event rate (3 MB/event) - Took 400 TB in 2010
25DRS4 _at_ MEG
LMK03000
4 x DRS4
32 channels
3000 Channels
26On-line waveform display
S848 PMTs
virtual oscilloscope
template fit
click
pedestal histo
27Crosstalk elimination
Crosstalk removal by subtracting empty channel
subtract
Hit
Hit
28Template Fit
pb Experiment 500 MHz sampling
- Determine standard PMT pulse by averaging over
many events ? Template - Find hit in waveform
- Shift (TDC) and scale (ADC)template to hit
- Minimize c2
- Compare fit with waveform
- Repeat if above threshold
- Store ADC TDC values
29Trigger and DAQ on same board
- SCA can only sample a limited (1024-bin window)
? many application require a wider window,
trigger capability would require continuous
digitization - Using a multiplexer in DRS4, input signals can
simultaneously digitized at 120 MHz and sampled
in the DRS - FPGA can make local trigger(or global one) and
stop DRSupon a trigger - DRS readout (5 GSPS)though same 8-channel FADCs
global trigger bus
trigger
FPGA
DRS
FADC12 bit 65 MHz
analog front end
LVDS
SRAM
30Slow waveform and Fast window
Window only limited by RAM
Continuous Waveform 120 MSPS (8 ns bins)
31Sine Curve Fit Method
i
yji i-th sample of
measurement j aj fj aj oj sine wave
parameters bi phase error ?
fixed jitter
- Iterative global fit
- Determine rough sine wave parameters for each
measurement by fit - Determine bi using all measurements where sample
i is near zero crossing - Make several iterations
j
S. Lehner, B. Keil, PSI
32Fixed Pattern Jitter Results
- TDi typically 50 ps RMS _at_ 5 GHz
- TIi goes up to 600 ps
- Jitter is mostly constant over time, ? measured
and corrected - Residual random jitter 3-4 ps RMS
- Achievable resolution exceeds best CFD HPTDC
33Time-of-Flight PET
- Conventional electronicsCFD TDC 500 ps RMS
- TOF needs
- 100-200 ps
- gt1 MHz rate
C. Levin, Stanford University
34ToF-PET Project
- Started fall 2010 after NSS/MIC in Knoxville
(Siemens PET RD home) - New project started to replace current PET
electronics with DRS4 (5) - PCB ready summer 2011, firmware by Univ. Tübingen
- Simulations show that SCA technique can achieve
100 ps easily
FPGA
Ping-Pong Scheme
Channel 0
1
Channel 0
ROI
Channel 1
0
Channel 1
Channel 2
0
20 samples (10 ns _at_ 2 GS/s) 30 ns / sample
600 ns 40 ns overhead 640 ns ? 1 MHz rate
Channel 3
0
Channel 4
0
Channel 5
0
Channel 6
0
Channel 7
0
35DRS5 Chip Ideas
36Plans for DRS5
- Increase analog bandwidth 5 GHz
- Smaller input capacitance
- Increase sampling speed 10 GS/s
- Switch to 110 nm technology
- Deeper sampling depth
- 8 x 4096 / chip
- Minimize readout time (dead time free) for
muSR ToF-PET - (minor) reduction in analogreadout speed (30 ns
? 20 ns) - Implement FIFO technology
J. Milnes, J. Howoth, Photek
mSR
MHz event rate
CTA
37Next Generation SCA
Short sampling depth
Deep sampling depth
- Low parasitic input capacitance? High bandwidth
- Large area? low resistance bus, lowresistance
analog switches? high bandwidth
- Digitize long waveforms
- Accommodate long trigger delay
- Faster sampling speed for a given trigger latency
How to combine best of both worlds?
38Cascaded Switched Capacitor Arrays
shift register
input
- 32 fast sampling cells (10 GSPS/110nm CMOS)
- 100 ps sample time, 3.1 ns hold time
- Hold time long enough to transfer voltage to
secondary sampling stage with moderately fast
buffer (300 MHz) - Shift register gets clocked by inverter chain
from fast sampling stage
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
fast sampling stage
secondary sampling stage
39How noise affects timing
voltage noise band of signal
voltage noise Du
signal height U
timing jitter arising from voltage noise
timing uncertainty Dt
rise time tr
timing jitter is much smaller for
faster rise-time
number of samples on slope
40TDC vs. Waveform Digitizing
Constant Fraction Discriminator
Q-sensitive Preamplifier
Shaper
PMT/APD Wire
TDC
- CFD and TDC on same board ? crosstalk
- CFD depends on noise on single point,while
waveform digitizing can average over several
points - Inverter chain is same both in TDCs and SCAs
- Can we replace TDCs by SCAs?? yes if the readout
rate is sufficient
41Typical Waveform
42Dead-time free acquisition
- Self-trigger writing of short 32-bin segments
- Simultaneous reading ofsegments
- Quasi dead time-free
- Data driven readout
- Ext. ADC runs continuously
- ASIC tells FPGA when there is new data
- Coarse timing from300 MHz counter
- Fine timing by waveformdigitizing and analysis
in FPGA - 20 20 ns 0.4 ms readout time? 2 MHz
sustained event rate - Attractive replacement for CFDTDC
DRS5
43Plug Play Firmware
- Emphasis shift from dedicated hardware to
firmware - Pre-designed modules for CFD, TDC, peak sensing
ADC, - Modules can be configured by user and downloaded
CFD
TDC
FIFO
ADC Readout
SCALER
Interface
FIFO
FIFO
ADC
FIFO
Data bus
Parameter bus
44Conclusions
- DRS4 chip successfully used in many areas, true
potential of SCA technology is just now
discovered - Planned DRS5 chip will increase BW and decrease
readout dead time - SCA technology should be able to replace most
traditional electronics in particle detection
45Thanks to
- Roland Horisberger Original Idea
- Roberto Dinapoli Analog Design of DRS34
- Ueli Hartmann DRS4 Evaluation Boards
- PSI chip design core team