## AMchip architecture & design Alberto Stabile - INFN Milano ## AMchip theoretical principle ### Associative Memory chip: AMchip - Dedicated VLSI device maximum parallelism - ▶ Each pattern with private comparator - Track search during detector readout ## Amchip role inside FTK - Main aim: pattern recognition of Super Strips! - Count the number of matching layers (from 6 to 8 layers) - Bit map of matching patterns - Compatibility with the daisy chain of roads on the LAMB ## AMchip state-of-the-art | | version | design<br>approach | CMOS<br>technology | number of patterns | number<br>of layers | working<br>state | |-------------|----------|---------------------|--------------------|--------------------|---------------------|----------------------| | | I | Full custom | 700 nm | 0.128 kpat/chip | 6 | completed | | | 2 | FPGA | 350 nm | 0.128 kpat/chip | 6 | completed | | C<br>D<br>F | 3 | STD cells | 180 nm | 5.0 kpat/chip | 6 | completed | | A<br>T | 4 | Hybrid <sup>a</sup> | 65 nm | 8.0 kpat/chip | 8 | completed | | L<br>A | mini@sic | Hybrid <sup>a</sup> | 65 nm | 0.128 kpat/chip | 4 | submitted yesterday! | | S | 5 | Hybrid <sup>a</sup> | 65 nm | 64 kpat/chip | 8 | design | <sup>&</sup>lt;sup>a</sup> STD cell + Full custom approaches have been used ## Amchip04 results ### The chip is completely functional! - Chip characterization has been performed by using input stimuli generated by the C++ simulation code - We have used the same stimuli that we used for the digital simulation pre-fabrication - Expected output data (C++ simulations) have been compared with measured data - NO ERRORS have been found - Tests have been performed at different frequency: - ▶ 12.5 MHz, 25 MHz, 50 MHz, and 100 MHz (FTK working frequency) - In future: possibility to select **more different frequency** from 10 MHz to 120 MHz with a pitch of 10 MHz (Alessandro Colombo's master thesis) ## Amchip04 power consumption | Frequency | Power consumption | |-----------|-------------------| | w/o clk | 22 mA | | 12.5 MHz | 28 mA | | 25.0 MHz | 48 mA | | 50.0 MHz | 101 mA | | 100 Mhz | 191 mA | | Chip | Power consumption | |----------|-------------------| | AMchip03 | 1000 mA | | AMchip04 | 191 mA | ## Characterisation set-up ## Mini@sic goals & description - ▶ Test chip between the AMchip04 and AMchip05 - New features - ▶ Hits and roads are **serialized and de-serialized** inside the chip core - ☐ Silicon Creation® SerDes IP blocks have been used for this purpose - □ Silicon Creation® LVDS pad has been used to bring inside the chip an external LVDS clock - ☐ The LVDS pad needs of a Band Gap Voltage Reference (always by Silicon Creation ®) - ▶ Two different type of Associative Memory (AM) cells have been used - □ A new XOR+RAM cell has been used with the aim to reduce silicon area and power consumption - ▶ A more programmable Bit Line (BL) width mode - □ With the XOR+RAM is possible to select how much is large the hit buses: 30 bit or 15 bit - ▶ Built-In Self Test (BIST) used to stimulate the AM banks at maximum working frequency through JTAG commands - □ PRBS generator to stimulate the banks ## Mini@sic goals & description - ▶ Test chip between the AMchip04 and AMchip05 - New features - Majority circuit with the possibility to count one, or from 6 to 8 matching layers - Inherited AMchip04 features - ▶ Variable resolution "pattern-by-pattern" and "layer-by-layer" - Fischer tree for the readout bus - ▶ Road daisy chain compatibility - JTAG - Dismissed features - **Boundary chain to** test wire connections (this is only a test chip!) - Possibility to swap the chip bottom up rev enable signal. ## Mini@sic goals & description - This chip is strongly pad-limited - > 2 mm × 2 mm silicon area - 9 metallization layers - Reuse approach: only 6 layers have been used for the AM full custom block - Cheap: 23 k€ - IMEC gives mini@sic possibility for prototype little circuits and test chip purpose! - Inside the little core area we have placed: - ▶ JTAG, control logic, and AM banks - Pad ring are split in analog and digital IO cells: - North side: digital and SERDES pads - West side: only digital pads - South side: digital, SERDES pads, analog pad for Band Gad Voltage Reference - East side: clock analog LVDS pads and digital pads ## Mini@sic floorplan QFN64 pin is the maximum number of pin that we can use for mini@sic: IMEC sets this limit to avoid long bonding wires that could be problematic for serial links Triple bonding for the VDDCORE, GND and VDDIO SerDes, LVDS pad and BGVR are biased with independent power supplies ## Mini@sic IPs #### Full Custom Block list: - No. 4 "TOP2" Associative memory array of 32 kpat/block - No. 4 "XORAM" Associative memory array of 32 kpat/block - No. 5 SiliconCreation® Deserializer (4 for the hits and one for roads) - No. I SiliconCreation ® Serializer for roads - No. I SiliconCreation ® LVDS pads: used for the clock - No. I SiliconCreation ® BGVR used to bias the LVDS pad ## Mini@sic pinout configuration #### Pin list: - No. 5 digital pads for JTAG - TMS,TDI,TDO,TCK.TRST - No. 2 digital pads for hold - Pattin\_hold, pattout\_hold - No. I digital pads for xor PLL lock signal - No. I digital pads for INIT signal - No. 12 pads for VSS (0 V core) - No. 6 pads for VDD (1.2 V core) - No. 6 pads for VDDIO (2.5 V core) - For each Serdes - No. I VDDA analog pad - No. 3 VSSS analog pad - No. I VDDH digital pad - No. 2 LVDS pads (TX or RX) - No. 5 LVDS Pads for clock - No.3 analog power supplies - No.2 LVDS pads for clk signal ## Mini@sic bonding diagram ## Mini@sic Design Methodology - The entire chip has been designed with a hybrid approach - More repetitive regions have been designed with a full custom approach - More complex logics have been designed with a standard cell approach - To place and route standard cells, we have used Foundation Flow of Cadence Encounter ## Content Associative Memory IPs - The AM block used for the AMchip04 - ▶ "TOP2" - ▶ Cell of Content Associative Memory (CAM) hybrid approach: - □ NAND cell ———— low power consumption BUT slow in time - □ NOR cell fast timing BUT high power consumption ## Content Associative Memory IPs - The AM block used for the AMchip04 - → "TOP2" - ▶ Cell of Content Associative Memory (CAM) hybrid approach: - □ NAND cell ———— low power consumption BUT slow in time ## Content Associative Memory IPs ### NAND cell layout ## Content Associative Memory IPs ### ▶ NOR cell layout ## "TOP2" layout of 32 patterns ## The new "XOR+RAM = XORAM" ### XORAM architecture ## XORAM block of 32 patterns ## **XORAM** simulations ## XORAM simulation results | | | Power cons. of 32 | | |--------------------|-----------------|-------------------|-----------| | Worst case | delay time [ns] | pat. [uA] | kpat [mA] | | Typical | 1.33 | 111 | 28 | | Slow | 3.55 | 92 | 23 | | Fast | 0.66 | 156 | 39 | | Worst '0' - Low T | 0.85 | 108 | 27 | | Worst '0' – High T | 2.48 | 156 | 39 | | Worst '1' – Low T | 0.79 | 84 | 21 | | Worst '1' – High T | 2.15 | 132 | 33 | | Parameters | Slow | Typical | Fast | |------------------|--------|---------|---------| | Temperature | 150 °C | 27 °C | – 55 °C | | Power Supply | 0.8 V | 1.0 V | 1.2 V | | Transistor model | SS | tt | ff | ## XORAM vs "TOP2" comparison - XORAM design halve the expected power consumption - "TOP2" expected power consumption: 80 mW - XORAM expected power consumption: 39 mW - As the XORAM is only a mere combinational logic + flipflops the delay time is higher then "TOP2" - "TOP2" delay from CLK to OUT: 2.1 ns - XORAM delay from CLK to OUT: 3.6 ns ## Methodology to compare the different AM banks: TOP2 vs XORAM - ▶ Turn off the "TOP2" blocks and measure the power consumption of XORAM and vice versa - Necessity to complete turn off AM bank blocks! - In future: divide the power supplies to improve this comparison ### **Encounter Foundation Flow** VDD and GND are connected both in MI and M6 VDD and GND are connected both in MI and M6 VDD and GND are connected only in M6 Analog 2.5 VDDH power connection for LVDS pad ### Foundation flow results - Routing DRC errors - NO ERROR! - ▶ The router is able to correctly interconnect all wires - Liberty limitations - Some liberty files do not complete describe the logic digital functions BUT only describes the time delays. - NOT problematic HOWEVER pay attention to the timing report results as some path is not real and need to be set as false paths or some path can be exists but are not considered by Encounter ## The timing constraints CLOCK periods: ▶ TCK: 25 ns - Master CLK (output of LVDS pad): 8 ns - Recovered CLKs (output of deserializer): 8 ns - Encounter checks all setup and hold time in Multi Mode Multi Corner (MMMC) analysis - Parameters used: - Different block & transistor models (liberty, celtic) - Different RC extraction of parasitic net at different value of resistance, capacitance, and temperature ## Encounter timing reports | <br> | | |------------|---------| | timeDesign | Summary | | Setup mode | all | | | | reg2out | | | |------------------|-------|---|-------|-------|-----------|-----|-------| | WNS (ns): | 0.128 | | 0.128 | 2.193 | 5.270 | N/A | 3.299 | | TNS (ns): | 0.000 | İ | 0.000 | 0.000 | j 0.000 j | N/A | 0.000 | | Violating Paths: | 0 | İ | 0 | 0 | j 0 j | N/A | 0 | | All Paths: | 12825 | i | 10927 | 3031 | j 1 j | N/A | j 21 | | DDV- | Rea | Total | | |------------|----------------|-----------|----------------| | DRVs | Nr nets(terms) | Worst Vio | Nr nets(terms) | | max cap | 0 (0) | 0.000 | 0 (0) | | max tran | j 0 (0) | 0.000 | 1 (1) | | max fanout | 2 (2) | j -28 | 2 (2) | Density: 100.046% Total number of glitch violations: 0 ...... ## Encounter timing reports | optDesign | Final | Summary | |-----------|-------|---------| | Hold mode | all | 1 | reg2reg | in2reg | reg2out | in2out | clkgate | |------------------|-------|-----|---------|--------|---------|--------|---------| | WNS (ns): | 0.000 | 1 | 0.000 | 0.000 | 13.926 | N/A | 0.052 | | TNS (ns): | 0.000 | ı İ | 0.000 | 0.000 | 0.000 | N/A | 0.000 | | Violating Paths: | 0 | į | 0 | 0 | j 0 | į N/A | į 0 | | All Paths: | 12825 | i | 10927 | 3031 | i 1 | i N/A | i 21 | | DDV- | 1 | Rea | Total | | |------------|-------|-----------|-----------|----------------| | DRVs | Nr ne | ts(terms) | Worst Vio | Nr nets(terms) | | max cap | 1 0 | (0) | 0.000 | 0 (0) | | max tran | j 0 | (0) | 0.000 | 1 (1) | | max fanout | j 2 | (2) | j - 28 | 2 (2) | Density: 100.046% ...... ### DRC & LVS checks - Design Rule Checker (DRC) is clean for all checks - Exceptions: - **PO.R.8** this error is due to long polysilicon wire not attached to PN junction (For us is due to the fact that we do not have GDS layout of TSMC standard cells) - ▶ M\*.DN.I this error is due to not maximum density of metal - ☐ **IMEC** will fix this problem - ▶ **ESD.\*** this error is due to the fact that we do not have GDS layout of TSMC IO analog cells) - Layout Versus Schematic (LVS) is clean - We have used black boxes for TSMC cells - Long and complex procedure! ## Submission: yesterday! ## Amchip 05 goals & milestones - In the next moths: Take the test results of mini@sic - Increase the number of pattern up to 64kpatterns - Continue with the study of techniques able to decrease power consumption and silicon area both for the Full custom block and standard cell - May be multiVDD core from 1.2V for the SERDES and 1.0V for standard cells and AM blocks - Consolidate the logic at the interface between SERDES and AM core. - Consolidate the communication protocol in the serial links # ευχαριστώ για την προσοχή σας