## **Exploring the Space-Time limits in Next Generation X-ray Imager Readout**



(basis of slides)



Mānoa

### **Overview**

- Basis is Switched Capacitor Array acquisition
  - Low-cost, commodity CMOS processes
  - Excellent timing, frame-rate, dynamic range
  - − 100's  $\rightarrow$  10's of kSamples  $\rightarrow$  MSamples
- Active research
  - Technology in its infancy
  - Space-Time limits? (micron spatial resolution with fs timing?)
- Key Elements going forward





• Pipelined storage = array of T/H elements, with output buffering



### Switched Capacitor Array Sampling







### An Initial Selling Point



### **Basic Functional components**



### **Design Choices**

- Input coupling
  - Differential versus single-ended input
  - Needed analog bandwidth
  - Gain needed?
- Sampling Options
  - On-chip PLL/DLL
  - External DLL
  - Analog transfer vs. interrogate in situ
- ADC and readout options
  - Sequential output select vs. random access
  - On-chip vs. off-chip ADC
  - Serial, parallel, massively parallel

Many variants have been explored...

### **Toward increased timing precision**

| ASIC       | # chan | Depth/chan | Time Resolution [ps] | Vendor | Size [nm] | Year   |
|------------|--------|------------|----------------------|--------|-----------|--------|
| LABRADOR 3 | 8      | 260        | 16                   | TSMC   | 250       | 2005   |
| BLAB       | 1      | 65536      | 1-4                  | TSMC   | 250       | 2009   |
| STURM2     | 8      | 4x8        | <10 (3GHz ABW)       | TSMC   | 250       | 2010   |
| DRS4       | 8      | 1024       | ~1 (short baseline)  | IBM    | 250       | 2014   |
| PSEC4      | 6      | 256        | ~1 (short baseline)  | IBM    | 130       | 2014   |
| RITC3      | 3      | Continuous | TBD                  | IBM    | 130       |        |
| PSEC5      | 4      | 32768      | TBD                  | TSMC   | 130       |        |
| DRS5       | 8/16?  | 128x32     | TBD                  | UMC    | 110       |        |
| SamPic     | 16     | 64         | ~3 [pic 0]           | AMS    | 180       | [2014] |
| RFpix      | 128?   | TBD        | <= 100fs (target)    | TSMC   | 45 ?      |        |



- 10 real bits (1.3V/1.3mV noise)
- Excellent linearity, noise
- Sampling rates already meet Type I and Type II specifications

### Starting point: Predictions

#### 1GHz analog bandwidth, 5GSa/s



Time Difference Dependence on Signal-Noise Ratio (SNR)

G. Varner and L. Ruckman NIM A602 (2009) 438-445.

#### Simulation includes detector response



J-F Genat, G. Varner, F. Tang, H. Frisch NIM A**607 (2009) 387-393**.

### And now: high space-time Resolution



In a number of communities (future particle/astroparticle detectors, PET medical imaging, etc.) a growing interest in detectors capable of operating at the pico-second resolution and  $\mu$ m spatial resolution limit (for light 1 ps = 300  $\mu$ m) signal electrodes



**Front-End Electronics** 

Fast signal collection x-ray detectors



constructed

### What the detector sees

intensitu

Normalized

 $\begin{bmatrix} A_{\sigma} \\ A_{\pi} \end{bmatrix} = \frac{\sqrt{3}}{2\pi} \gamma \frac{\omega}{\omega_{c}} \left( 1 + X^{2} \right) \left( -i \right) \begin{bmatrix} K_{2/3}(\eta) \\ \frac{iX}{\sqrt{1+X^{2}}} K_{1/3}(\eta) \end{bmatrix},$ 

 $X = \gamma \psi$ 

#### •Source SR wavefront amplitudes:

K.J. Kim, AIP Conf. Proc. 184 (1989). J.D. Jackson, "Classical Electrodynamics," (Second Edition), John Wiley & Sons, New York (1975).

•Kirchhoff integral over mask (+ detector response)  $\rightarrow$  Detected pattern:  $A_{\sigma,\pi} (Detector) = \frac{iA_{\sigma,\pi} (Source)}{\lambda} \times \int_{mask} \frac{t(y_m)}{r_1 r_2} e^{i\frac{2\pi}{\lambda}(r_1+r_2)} \left(\frac{\cos\theta_1 + \cos\theta_2}{2}\right) dy_m$ 

where



Measured slow-scan detector image (red) at CesrTA, used to validate simulation (blue)

- t(y<sub>m</sub>) is complex transmission of mask element at y<sub>m</sub>. val
  Sum intensities of each polarization and wavelength component.
- Sum weighted set of detector images from point sources.
  - The source beam is considered to be a vertical distribution of point sources.
  - Can also be applied to sources with non-zero angular dispersion and longitudinal extent, for more accurate simulation of emittance and source-depth effects.
  - For machines under consideration here these effects are small, so for computational speed we restrict ourselves to 1-D vertical distributions.

## Overview

| Xray Source Bend Par. | S-LER    | S-HER (BS2E.82) | Units |
|-----------------------|----------|-----------------|-------|
| E,                    | 3.20E-09 | 4.60E-09        | m     |
| к                     | 0.27%    | 0.24%           |       |
| ε,                    | 8.64E-12 | 1.10E-11        | m     |
| β                     | 50.0     | 11.5            | m     |
| σ,                    | 20.8     | 11.3            | μm    |
| Beam Energy           | 4        | 7               | GeV   |
| Effective length      | 0.89     | 5.9             | m     |
| Bend angle            | 28.0     | 55.7            | mrad  |
| ρ                     | 31.7     | 105.9           | m     |
| Critical Energy       | 4.4      | 7.1             | keV   |

#### 59-element Uniformly Redundant Array mask pattern



Simulated detector response for various beam sizes at SuperKEKB LER

#### Coded Aperture Mask:

- In-hand :
  - High-power, 59-element, 10 μm/element URA
  - 10 μm Au mask on 625 μm Si substrate
- Under development:
  - 20 μm Au mask on 500 μm CVD diamond (monocrystalline) substrate
    - Substrates manufactured.
    - New pattern being designed for improved resolution (E. Mulyani)

#### Detector:

- 64-pixel (Phase 1), later 128-pixel, 50 μm pitch linear array
- InGaAs detectors in hand (same type as used at CesrTA)
- Deep Si detectors in development for better detection efficiency at high energy (SLAC)

## First Light (accelerator started Feb.)

🛚 <u>F</u>ile <u>E</u>dit <u>W</u>indow

06/01/2016 19:31:13 Help -



Bunches down to RF bucket spacing [508.9MHz]

### First generation readout





## **TOP Electronics - HW**

 Front-end modules consist of 5 PCBs, each with a Zynq (FPGA + Processor):



SCROD

## Firmware ("Production") – Data Path

#### **Carrier ASIC Control**

- Continuous sampling during digitization.
- Global synchronization scheme.
- 32-sample readout per trigger.
- Multi-hit capable.

#### **SCROD data collection**

- FPGA monitors ASICs, can mask if trouble.
- FPGA builds complete event packets.
- 1x DMA x-fer/event w/standard Xilinx blocks.
- Processor does pedestal correction, feature



### **Key Remaining Items**

- Complete thero-mechanics
- RF signal chain
  - Amplifier gain, bandwidth, noise
  - Stability and dynamic range
  - EMI Immunity
- Carrier modifications
  - Wiring modifications
  - Simplified sampling FW, mini-packets
- SCROD FW modifications
  - Streamlined mini-packets ped subtraction
  - Feature extraction
  - 100kHz \* 128 channels \* 2 Bytes (~30MBytes/s)

## Back-up slides



### **Constraint 1: Analog Bandwidth**

Difficult to couple in Large BW (C is deadly)



21

## Constraint 2: kTC Noise

### Want small storage C, but...



### Constraint 3: Leakage Current

### Increase C or reduce conversion time << 1mV



Sample channel-channel variation ~  $fA \rightarrow nA$  leakage (250nm  $\rightarrow$  130nm)

### Timing optimization: ABW, SNR, sampling rate



## Outcome: Target Specifications (separate design study)

| Parameter                                                  | Minimum desired value           |
|------------------------------------------------------------|---------------------------------|
| Sampling frequency (ASIC)                                  | 20 GHz                          |
| Bandwidth (Detector and ASIC)                              | 3 GHz                           |
| Signal to Noise Ratio (Detector and ASIC)                  | 58dB (V <sub>pp</sub> =1 volts) |
| Velocity of Propagation (Transmission Line/<br>strip line) | 0.35c                           |
| Number of Bits of Resolution                               | 9.4 bit                         |

#### This is an ongoing study – evolving quickly

Take the PSEC4 design as a reference



## **Single Sampling Cell Coupling**



- Driver circuit
- Switch with n-p FET pair
- Sampling capacitor
- Comparator as load





- Check Csampling capacitance
- Identify Ron and Roff

### Pass Transistor (Switch) Resistance



• Ron=2.4k @665mVdc

Roff is in GΩ

• The PFET and NFET are not matched and Ron varies considerably

## Small signal frequency response

**Bandwidth** 20 LowZ ideal 18 LowZ par LowZ load&par 16 50Z ideal 50Z par 14 50Z load&par 12 **Bandwidth** [GHz] 10 8 6 4 X: 0.65 Y: 1.688 2 0 0.2 0.6 0.8 1.2 0 0.4 1 Vdc [V]

- BWworst≈2.3GHz @665mVdc @LowZ drive
- BWworst≈1.7GHz @665mVdc
   @50Ω drive



• Isolation is over 60dB over all parameter space

## Snapshot

| Parameter               | Measured (worst cases)  | Requirement |
|-------------------------|-------------------------|-------------|
| Bandwidth (Single cell) | 1.7GHz @665Vdc @50Ω     | 3GHz        |
| Bandwidth (Multi cell)  | 1.0GHz @665Vdc @50Ω     | 3GHz        |
| SNR                     | 61.7 dB                 | 58dB        |
| ENOB                    | 9.8 bits (small region) | 9.4 bits    |
|                         |                         |             |

Things to improve:

- Reduce Ron variance over the dynamic range to reduce distortion and increase the ENOB
- Bandwidth dominated by Cin:
  - Reduce Cin or reshape the channel to increase the bandwidth (first pole)
  - Reduce Ron overall value to increase the bandwidth (second pole)
- Use differential configuration to reduce pedestal error and increase noise coupling and crosstalk immunity

### IRS/TARGET family Single Channel

• Sampling: 128 (2x 64) separate transfer lanes

Recording in one set 64, transferring other ("ping-pong")

• Storage: 64 x 512 (512 = 8 \* 64)

• Wilkinson (32x2): 64 conv/channel



### First order packing density

Compact storage/comparator (Wilkinson ADC)

888 - **88**8 -

3μm x 24μm (13.9k Cells/mm)

10 samples (30um x 24um) 20 samples (60um x 24um)

Commensurate with TSV planar packing, OR

Knife-edge, thinned die [reticle limited width] stacking/bundling of readout (orthogonal to Detector array)



### **Future Plans**

- R&D Program toward needed readout
- PSEC5 ASIC
  - $-256 \rightarrow 32k$  sample storage
  - Work to optimize bandwidth, ENOB
  - Persistence effects
- RFpix ASIC
  - Push limits of ABW, timing
  - Below 100-200fs, direct spatial measurement becomes interesting
  - Many practical issues, but none fundamental (CF 1ps)
- Dedicated pixellated sampler
  - Prototype design rather straightforward how to connect to detector (& detector), funding limited

### Founding WFS ASIC References

- PSI activities (DRS)
  - IEEE/NSS 2008, TIPP09
  - http://midas.psi.ch/drs
- DAPNIA activities
  - MATDAQ: IEEE TNS 52-6:2853-2860,2005 / Patent WO022315
  - SAM; NIM A567 (2006) 21-26.
- Hawaii activities
  - STRAW: Proc. SPIE 4858-31, 2003.
  - PRO: JINST, Vol. 3, P12003 (2008).
  - LABRADOR: NIM A583 (2007) 447-460.
  - BLAB: NIM A591 (2008) 534-545; NIM A602 (2009) 438-445.
  - STURM: EPAC08-TUOCM02, June, 2008.

### SuperKEKB Estimated single-shot resolutions (SuperKEKB full current)

Beam to mask: 12 m



Energy (keV)

## Exploration of the space-time limit

-Sampling at high sampling rate and high bandwidth -Resolve small distances

Current Goals: Spatial resolution of 10µm in z and 20µm in rφ In Silicon 10µm in z corresponds to timing resolution of about 100fs 20µm in rφ will depend on the SNR





Pixel detector (PDX) at SuperKEKB

## Simulated Performance vs. SNR

300MHz ABW, 5.9GSa/s



### **IRS Input Coupling**



- Input bandwidth depends on 2x terms

   f3dB[input] = [2\*π\*Z\*C<sub>tot</sub>]<sup>-1</sup>
  - $f3dB[storage] = [2^*\pi^*R_{on}^*C_{store}]^{-1}$

### Calibration and Sources of Timing Error

Contributions to timing resolution:

Voltage uncertainties



\*Diagram, formulas from Stefan Ritt

### Calibration and Sources of Timing Error



Time Difference Dependence on Signal-Noise Ratio (SNR)



$$\Delta u = 2mV$$
$$U = 1V$$
$$f_s = 26 \text{ GSPS}$$
$$f_{3dB} = 1.2\text{ GHz}$$

$$\frac{\Delta u}{U} \cdot \frac{1}{\sqrt{3f_s \cdot f_{3dB}}} \sim 200 \text{ fs}$$

#### Aperture stability is key

### **Space-Time relations**

#### **1ps = 300um (200um in stripline)**

#### **Space-Time correlation**



#### **Below 10um resolution, competetive & Prompt!**

## **PSEC4:** Sampling Analysis

#### Utilizing PSEC4's SCA as starting place -Adjustable Sampling rate between 4-15 GSPS -1.6 GHz bandwidth



## Equivalent Circuit



# Simulation Results: Bandwidth for worst case operating bias point

#### Whether the 1<sup>st</sup> switch is on or the last, Gain is the



## Simulation Results: Group Delay

Group Delay does vary depending which switch is on by ~25ps which puts a constraint on sampling time window



## Simulation Results: Phase

• At higher frequencies Phase vs freq behavior is also different and depends on which switch is on



## Simulation Results: Capacitance

## Capacitance is 2.2 pF and does not dependent on which switch is on



### **PSEC4 Analysis: Single Sampling Cell**



### **PSEC4 Analysis: Single Sampling Cell**

#### **Structure & Layout**



#### Top view

#### Side view



## **Sampling Capacitor Spread**



 
 Num. of Samp.
 MEAN
 STD
 MIN
 MAX

 1000
 20.27 fF
 1.89 fF
 14.86 fF
 26.24 fF

Monte Carlo with process variation and mismatches shows a discrepancy between Csampling Schematic (13.5 fF) and Measured mean (20.27 fF).

The Spread is about 1.9fF which makes the Capacitor tolerance at about 9.3%

### **Frequency Analysis**

#### **Performance: S(Z)-parameter**



The input impedance is high and it is capacitive.

## Input coupling analysis



The transfer function parts:

- input parasitic capacitance of the transistor plus capacitance of the transmission line section.
- Series resistance of the transistor channel (Rds)
- Output capacitance which is formed of the parasitic capacitance of the transistor, sampling capacitor and load capacitance



| Capacitance | Value [fF] |
|-------------|------------|
| Cin_open    | 8fF        |
| Csw_out     | 10fF       |
| Csamp       | 20.3fF     |
| Cload       | 13fF       |

## Small signal phase analysis



Group Delay with the load



#### Large group delay variation points to large distortion

## Large signal response (I)



 Full dynamic range at low frequency, compression appears when reaching the voltage threshold of the PN junctions at the drain/substrate barrier.



• Gain compression at lower and higher amplitudes

## Large signal analysis (II)

#### High frequency gain compression & distortion



Three region of operation:

- Low distortion & High compression
- Moderate distortion & Moderate
- **High distortion & High**



### **Understanding signal response**



## **Understanding signal response**

### Moderate distortion & Moderate compression



Resistance of the channel is varying

 The bandwidth at instantaneous values
 of the incident voltage waveform is
 different

-> In frequency domain this gives rise to higher harmonics, which interfere constructively hence increasing the overall signal amplitude but also increases distortion



## Harmonic decomposition



- Constructive interference of odd harmonics and destructive interference of even harmonics at the peaks
- Constructive interference of second and third harmonics at zero crossing

#### **Frequency domain decomposition**



### **Noise and Distortion**



• Noise dominated by the ON resistance of the channel

 Total noise is around 0.29mV ± 0.01 mV

### Noise, distortion and dynamic range

Signal to Noise Ratio at full scale input (1Vpp)



• SNR is around 61.7dB ± 0.3 dB

### **Distortion analysis**



• Most of the distortion comes from the Ron variation over the input voltage range



### **Transient Response**



| it vac voltage | Acquisition time | Setting time |
|----------------|------------------|--------------|
| )mV            | 0.14ns           | 0.11ns       |
| )mV            | 0.68ns           | 0.11ns       |
| )mV            | 0.52ns           | 0.11ns       |

- Worst case window time is 0.8ns or 1.25GHz -> due to low bandwidth
- Best case is 0.25ns or 4GHz

30

60

90

15% backlash at 30mV forward transient

18.6

18.8

19

Pedestal error due to charge ۰ injection and transistor mismatch dominate