# mTC recommendation



## G. Varner June 22, 2014 (nihon)

# **Executive Summary**

- Each new generation of complex ASIC takes time to work out the wrinkles
- And much of the problem isn't the ASIC per-se, but accompanying circuitry, firmware and user knowledge
- The IRS family is unique:
  - > Much deeper storage depth (worldwide unique)
  - Hardware timebase correction
  - > On-chip, massively parallel ADC
  - Built-in, low-threshold triggering
- It has taken time, but we are finally there. Let's put this in perspective

# **Detector Instrumentation Evolution**



## WFS Calibration – a history

- Switched Capacitor Array waveform sampling has tremendous advantages in compactness, cost, power, cabling, etc.
- No free lunch theorem applies have to learn how to operate/calibrate them due to timing/voltage non-linearities





# ANITA: Engineering flight

- Sampling unstable
- Thermal, control loop problems (x2, /2 sampling on some ASICs)
- 200-250ps "asymptote"
- "had" to make it work alternatives weren't viable



Circa 2005

## ANITA: second pass – 80 – 150 ps



## ANITA: third pass – 30 (16) ps

#### "Phase 3"

#### **Ultrawide-band Interferometry**

- -Interferometric technique applied by radio astronomers.
- -They use single narrow band frequency.
- -More interested in source imaging rather than point source direction reconstruction.



Produce Ultrawide-band Interferometric Images with ANITA





#### Laboratory Environment: real MCP-PMT Signals



#### Learning to how to deal with









#### A comment on 3 phase ANITA: Engi

Off by 1 Antenna

• Sampling

#### WFS Calibration – a history

- Switched Capacitor Array waveform sampling has tremendous advantages in compactness, cost, power, cabling, etc.
- No free lunch theorem applies have to learn by Really firmware see ...

Vertical Ang Dependency

TX Down

Jiwoo Nam UC Irvine

Face to

Dependency



1)

# Development Timeline (ASIC)

| <u>ASIC</u>     | Dates                     | <b>Milestone</b> (s)     | Hardware |
|-----------------|---------------------------|--------------------------|----------|
| BLABx           | < Spring '11              | Prototyping              |          |
| IRS2            | Summer '11 –<br>Winter 12 | FNAL<br>beamtest         |          |
| IRS3/<br>others | Spring '12 –<br>Winter 12 | Semi-infinite<br>reviews |          |
| IRS3B           | Spring '13 –<br>Summer 13 | LEPS<br>beamtest         |          |

| Development Timeline (critical) |                           |                          |          |  |  |  |  |  |
|---------------------------------|---------------------------|--------------------------|----------|--|--|--|--|--|
| Firmware                        | Dates                     | <b>Milestone</b> (s)     | Hardware |  |  |  |  |  |
|                                 | < Spring '11              | Prototyping              |          |  |  |  |  |  |
| <b>v. 0</b>                     | Summer '11 –<br>Winter 12 | FNAL<br>beamtest         |          |  |  |  |  |  |
| <b>v.</b> 1                     | Spring '12 –<br>Winter 12 | Semi-infinite<br>reviews |          |  |  |  |  |  |
| v. 2<br>v. 3                    | Spring '13 –<br>Summer 13 | LEPS<br>beamtest         |          |  |  |  |  |  |

| Devel       | opment Ti                 | meline (sof              | tware tools) |
|-------------|---------------------------|--------------------------|--------------|
| S/W         | Dates                     | Milestone(s)             | Hardware     |
|             | < Spring '11              | Prototyping              |              |
| Ad hoc      | Summer '11 –<br>Winter 12 | FNAL<br>beamtest         |              |
|             | Spring '12 –<br>Winter 12 | Semi-infinite<br>reviews |              |
| <b>v. 0</b> | Spring '13 –<br>Summer 13 | LEPS<br>beamtest         |              |

-----

| <b>Completion Timeline</b> |                          |          |  |  |  |  |  |  |
|----------------------------|--------------------------|----------|--|--|--|--|--|--|
| Dates                      | <b>Milestone</b> (s)     | Hardware |  |  |  |  |  |  |
| < Spring '11               | Prototyping              |          |  |  |  |  |  |  |
| Summer '11 –<br>Winter 12  | FNAL<br>beamtest         |          |  |  |  |  |  |  |
| Spring '12 –<br>Winter 12  | Semi-infinite<br>reviews |          |  |  |  |  |  |  |
|                            |                          |          |  |  |  |  |  |  |

Phase 2 S

Phase 1

Spring '13 – Summer 13

- LEPS beamtest

v.1 RT recon

Summer '14 v. 4 IRS3D/IRSX

15

**Final boardstack** 

# IRS3D/IRSX

• Baseline ASIC for production (~670 were fabricated in pre-production run [March 2014])



2.6M transistors, 7.7k resistors (DACs)

- High-speed, lower
  power/EMI LVDS outputs for
  fast, asynchronous signals
  Extended dynamic range
- comparator
- Lower-power Gray Code Counter and internal DLL demonstrated (TARGET7)
  IRS3D takes the internal improvements, but keeps simplified IRS3C user I/O 16

# **TSMC** Production Run

• Files at MOSIS, start once PO received (end June, 2014)



Reticle Layout

## **IRSX/3D Improvements over IRS3C/3B**

- **1. Improved Trigger Sensitivity**
- 2. Timebase Servo-locking
- 3. dT hardware adjust
- 4. Improved linearity/dynamic range
- 5. Improved Wilkinson ADC

- = originally reported for TARGET7/X
- = demonstrated initially (TARGET7/X), detailed timing confirmed
- = LABRADOR4 also

## **Trigger Improvement (IRS3D)**

• A significant improvement for smaller pulses where "first strike" initiation of the MCP charge development is retarded



 16x doesn't improve further, as already at the signal-to-noise limit 19

### Improved Wilkinson "cross-feed"



## Result: visually nicer waveforms



## Of course, what we really care about ...

IRS3D Sine Measurement, random phase



22

# CAJIPCI upgrade

- If start with clean clock, don't need/want the complexity of clock jitter cleaner
- Use something more like successful FTSW
- Compare/modify programming methodology



# Micro-TCA

- Based upon advanced TeleCom standard, but a light version, preferred by particle physics community
- Designed for intensive signal processing/handling
- Engineered from the start for extremely high reliability and performance

2U height, 19" rack-mount (\$3,750)



mTC Hub Controller (\$5,341)

CPU (Intex Xeon E3) (\$3,360)



# mTC Upgrade Schedule/Cost

- Production lot of packaged IRS3D should be available by ~mid-Sept (wafers back end of August)
- Example schedule showing Rev E Carrier Dev time

| IRS-based iTOP Readout               |             |      |      |      |     |      |      |      |     |
|--------------------------------------|-------------|------|------|------|-----|------|------|------|-----|
| Schedule to Completion               |             | 6/16 | 6/23 | 6/30 | 7/7 | 7/14 | 7/21 | 7/28 | 8/4 |
| Pre-Production Prototype Board Stack |             |      |      |      |     |      |      |      |     |
| Integration / Test                   |             |      |      |      |     |      |      |      |     |
| IDEV                                 | Ready       |      |      |      |     |      |      |      |     |
|                                      | Evaluation  |      |      |      |     |      |      |      |     |
|                                      | Ready       |      |      |      |     |      |      |      |     |
| SCROD Rev B                          | Design      |      |      |      |     |      |      |      |     |
|                                      | Fab/Assy    |      |      |      |     |      |      |      |     |
|                                      | Ready       |      |      |      |     |      |      |      |     |
| Carrier Rev E                        | 2-stage amp |      |      |      |     |      |      |      |     |
|                                      | Design      |      |      |      |     |      |      |      |     |
|                                      | Fab/Assy    |      |      |      |     |      |      |      |     |

**Costs:** 

- Carrier Fab: can get hard numbers from previous fab/assy runs
- Micro-TCA parts listed (still need to pick a RTM to handle the required # of fibers, but that is passive/inexpensive)
- CAJIPCI replacement few k\$ at most. Probably take existing design, throw out clock jitter cleaners, and put down good clock source and low-jitter fanouts



- mTC Development Status
  - Stuck somewhere between Phase 1 and Phase 2
  - Hardware upgrades will short-circuit some of the development/education time required
  - Getting to Phase 3 will expedite getting the physics
- Specific Recommendations:
  - > Upgrade Carrier cards to IRS3D and improved amplifier
  - > Replace CAJIPCI with simplified module

Migrate from cPCI home-brew to enterprise micro-TCA DAQ platform

• <u>Schedule and costs look reasonable</u>. Leverage the <u>knowledge and experience gained</u>.

# Backup



#### **Calibration and Sources of Timing Error**



\*Diagram, formulas from Stefan Ritt

#### **Calibration and Sources of Timing Error**

Contributions to timing resolution: Voltage uncertainties Timing uncertainties voltage noise  $\Delta u$ signal height U timing uncertainty  $\Delta t$ \*Diagram from Stefan Ritt rise time  $t_r$ 

Of these contributions:

- Random irreducible (without hardware redesign)
- Deterministic in principle can be calibrated away.

Let's talk about where the deterministic pieces come from and what is or is not being done about them right now, and what might be desirable or necessary in the future.

#### **Timing Uncertainties and Timing Calibration**

- Time interval between delay line stages has intrinsic variation.
- Not accounting for this properly causes significant



SLAC

## **IRSX Eval board**

 FMC test card format for Xilinx Zynq-7 (Zynq-706) Evaluation board



Push-Button

SMA Clock P/N

SMA Clock P/N

Transceivers



## **Cross-checking IRSX Improvements:**

(features largely vetted on other ASICs, of similar/identical DNA)

#### • Triggering

- IRS3C has no gain in trigger path
- Insufficient overdrive for small/fast+narrow MCP signals
- IRSX adds selectable trigger gain path
- Improved dynamic range/linearity
  - Added 2<sup>nd</sup> stage, with tuning, to Wilkinson comparators, to extend dynamic range and reduce non-linearity
  - Modified Wilkinson registers for much lower power and critically reduced cross-talk

TARGET7 results courtesy Hiro Tajima (Nagoya) on triggering and Justin Vandenbroucke (Wisconsin) on improved dynamic range/linearity. Comparison with LAB4C ASIC provided by Hawaii ANITA3 collaborators

## Trigger Gain (x1 [IRS3C], x4, x16)

• Transfer slopes match simulated (designed) values well



• Comparator threshold "overdrive" improved by factors of 3.2, 11.1 Full efficiency S-curves are reported in next slide (these important internal probes represent something can't be done directly inside the IRSX [doesn't have these test structures]).

## Timebase servo-locking (DLL)



- Excellent stability visually on monTiming output
- VtrimT to fine-tune between coarse tap settings (scatter from linear is dT values)

(indirect "RCO" feedback mechanism injects asynchronous noise into timebase generator, degrading timing performance – so this is a <u>significant improvement</u>)

## Time base non-uniformity...







If can correct, reduces processing time dramatically, as this is the most computationallyintensive aspect of "fast feature extraction"

## dT Correction Demo



Samples follow each other across channels

Calibrate once and subsequent corrections made in hardware 36

## Observed IRSX (IRS3D) noise



Non-gaussian distributions expected for small noise amplitude due to non-linearity in Gray-code least count

Take away message: noise is comparable, or better than IRS3B/C, and acquired while sampling continues to run

## Improved Linearity [TARGET7/X]





## Improved Residuals, repeatability

Note: IRS3D -- no comparator bias tuning yet done



~1% Integral deviation from 3<sup>rd</sup>-order over key sensitivity range

IRS3D Residual Difference, Sample 2 vs Sample 1



Shape repeatable samplesample (common lookup table, with only pedestal offset)

#### Improved dynamic range



 Could tune somewhat for desired range of operation in IRS3C, but still could not get much above about 2V (and large scatter in where comparators would stop working)

#### IRSX (IRS3D) 80 MHz sine response (2 adjacent samples)



42

## 2x Fast LVDS Serial: Write Address, Readout



## **IRS3D Eval board**

 Rather limited "universal eval" variant (Spartan-3 based);



## Readout ASIC status: Design completed/reviewed, in fabrication



- 8 channels per chip @ 2.7-4 GSa/s
  Samples stored, 12-bit digitized in groups of 64
- 32k samples per channel (8us at 4GSa/s)
- IRS3C\* (April 2013) usable for Belle II
- Increased performance margin ASICs in fab:

**•IRSX** with high-speed serial interfaces

IRS3D with enhanced dynamic range, same I/O

\* **IRS3C** = **IRS3B** with low power-on current, ext. dynamic range







8mm

# Pre-production Board Stack

• Amplifier and calibration signal path

Typical raw single-pe PMT pulse HVB @ -3200 V 25 Ohm load 20 GS/s (RTO1044) measured risetime: 140 ps

*PMT gain* ~  $5 \times 10^5$ 

Typ. amplified single-pe pulse HVB @ -3200 V Voltage on 10 pF load (IRSX eq.) 20 GS/s (RTO1044) Measured risetime: 565 ps [NOTE: different event & channel]



## **Calibration requirements**

Subtract storage cell pedestal (avg. ~2000 ADC +/- 100's counts) 1.

1110.8

100

200

300

- Linearity correction (optional) 2.
- 3. Individual sample time offset correction

Three sets of calibration constants required:

- Sample pedestal values
  - (262144 samples/ASIC)
- Sample time widths
  - (128 values per ASIC) 0
- Timewalk correction
  - (~20 values per ASIC) 0

#### Pulse Time Vs Sample Array Bin # (used to measure Sample-DTs)





#### Waveform Pedestal Correction

700

600

800

# Data Analysis in Hardware



with replayed data underway





Sampling: 128
 (2x 64) separate
 transfer lanes

Recording in one set 64, transferring other ("ping-pong")

- Storage: 64 x 512 (32k per ch.)
  - Wilkinson ADC (64 at once)
  - 64 conv/channel (512 in parallel)



## **Readout Electronics -- requirements**

Subdetector Readout Module

On or in Detector

FPGA firmware consists of 3 parts:

3) Unified DAQ transport protocol

2) Trigger/feature extract (subdet. specific)

SuperKEKB RF clock

1) ASIC/ADC driver (common)

FPGA

ASICs

or ADC

- Operate within Belle-II Trigger/DAQ
   environment
   Giga-bit Fiber
   Transceiver Links
- >= 30 kHz L1 trig
- Gbps fiber Tx/Rx
- COPPER backend
- Timing trigger
- iTOP: 8k channels
- 16 iTOP modules
- 4x 128-channel SRM/iTOP module (64x total)

COPPER

Global Decision Logic

Clock/Event Timing Distribution

FINESSE

# Belle II back-end



- COPPER (COmmon Pipelined Platform for Electronics Readout)
- Used in Belle, J-PARC experiments
- •FINESSE (Front-end Instrumentation Entity for Subdetector Specific Electronics)

COPPER

-

Giga-bit Fibe Transceiver Links

Subdetector Readout Modu

# Belle II DAQ: Got fiber?





#### COPPER



#### Belle2link and "remote" FINESSE



- In the FPGA on detector front-end card, "virtual" FINESSE" is implemented, and it talks with "Belle2link transmitter core".
- In COPPER, Belle2link receiver(HSLB) is implemented instead of digitizer FINESSE, and connected to front-end card via optical fibers.
- The receiver "remote controls" the "virtual FINESSE" (slow control) and receives the data stream via optical fibers as if the remote FINESSE is implemented on the COPPER.

# Trigger/Timing Distribution (FTSW)

JTAG

IN

01

03

O5

07

O9

011

20110805 version

#### Timing signals over CAT7 cables

7 ports, O1 to O7

ACK: 254 Mbps serialized, unused TRG TRG: 254 Mbps serialized RSV RSV: pulled down to GND CLK CLK: 127 Mhz

#### JTAG signals over CAT7 cables

4 ports, O9 to O12



#### Monitoring signals over a CAT7 cable

AUX port





# Trigger/Timing Distribution

#### 20110805 version

#### Timing signals over CAT7 cables

7 ports, O1 to O7

ACK: 254 Mbps serialized, unused TRG TRG: 254 Mbps serialized RSV RSV: pulled down to GND CLK CLK: 127 Mhz

#### JTAG signals over CAT7 cables

4 ports, O9 to O12





JTAG

IN

01

03

LAN

AUX

02

04

#### Monitoring signals over a CAT7 cable

AUX port





## High Level Trigger (HLT)

- Unit structure (O(10))

\* to reduce the number of output port of event builder

- \* to keep up with the gradial luminosity increase
- \* fault-tolerant : each unit is completely independent

- Based on the parallel processing technology developed for basf2



# Belle II Throughput

#### Estimated event size and bandwidth

Assumed L1 rate = 30kHz (maximum of average)

|      | #ch    | 000 | #link | /link | FNS  | #CPR       | ch sz | ev sz | total | /CPR  |
|------|--------|-----|-------|-------|------|------------|-------|-------|-------|-------|
|      |        | [%] |       | [B/s] |      |            | [B]   | [B]   | [B/s] | [B/s] |
| PXD  | 8M     | 2   | 40    | 455M  | —    | —          | 4     | 800k  | 18.2G |       |
| SVD  | 243456 | 1.9 | 40    | 13.8M | HSLB | 40         | 4     | 18.5k | 555M  | 13.8M |
| CDC  | 14336  | 10  | 302   | 0.6M  | HSLB | 75         | 4     | 6k    | 175M  | 2.3M  |
| BPID | 8192   | 2.5 | 128   | 7.5M  | DSP  | 16         | 16    | 4k    | 120M  | 8M    |
| EPID | 65664  | 1.5 | 78    | 1.1M  | HSLB | 20         | 2.8   | 2.8k  | 84M   | 4.2M  |
| ECL  | 8736   | 33  | 52    | 7.7M  | HSLB | 26         | 4     | 12k   | 360M  | 15M   |
| BKLM | 19008  | 1   | 16    | 9.7M  | HSLB | 6          | 8     | 2K    | 60M   | 10M   |
| EKLM | 16800  | 2   | 66    | 19.5M | HSLB | to be fixe | ed 4  | 1.4k  | 42M   | 5.3M  |
| TRG  |        |     |       |       | HSLB | 10         |       |       |       |       |

