# A Low Power Capacitive Coupled Bus Interface Based on Pulsed Signaling

Jongsun Kim, Jung-Hwan Choi\*, Chang-Hyun Kim\*, M. Frank Chang, and Ingrid Verbauwhede

Electrical Engineering Department, University of California, Los Angeles \*DRAM Design, Memory Division, Samsung Electronics, Hwasung, Republic of KOREA

## Abstract

A synchronous pulsed signaling system using on-chip capacitive coupling for low-power high-speed buses has been implemented in a 0.10-µm CMOS DRAM process. We demonstrate 1.0-Gb/s/pair differential pulsed signaling over 10-cm PCB lines with an increased channel 3-dB bandwidth of 2.94GHz by blocking the driver/receiver capacitances on a bus and eliminating ESD structures. The system dissipates 1.92mW for both the driver and channel termination at 500MHz and 1.8-V supply, which is only 1/8 to 1/13 of conventional memory buses using SSTL-2 and RSL.

#### Introduction

As the channel frequency and the number of wires increase in inter-chip communications, conventional voltageor current-mode square wave signaling becomes more difficult and dissipates excessive interface power. The extensive increase of signaling power, switching noise, and thermal design complexity are becoming issues in batteryoperated systems and even in power-rich multi-chip systems consisting of over 1000 signal lines. However, these power and signal integrity problems could be reduced dramatically by using the proposed capacitive coupled pulsed signaling.

Traditional capacitive coupling circuits for chip-to-chip communication have a relatively high *RC* time constant and often exhibit DC balancing problems (or zero wander effect). Thus these circuits require encoding of the input signal or the use of DC restoration with quantized feedback in the receiver. It limits the maximum data rate and increases the circuit overhead and power [1]. Capacitive coupling has also been applied in multi-chip modules (MCMs) on a common substrate DIP package [2] and face-to-face chips, by coupling the transmitter pad directly to the receiver pad, to replace conductive mechanical junction path and transfer signals [3-4]. However, those consume too much I/O power over limited bandwidth [1], are applied to extremely short distance point-to-point interconnects [2-4], and are thus unsuitable for use in low-power parallel links over 1cm.

In this paper, we present a new high-speed, low-power, low-cost bus system based on synchronous differential pulsed signaling over conventional printed circuit board (PCB) lines with on-chip metal-insulator-metal (MIM) capacitors and standard wire-bond ball-grid-array (WBGA) packages. This capacitive coupled bus interface (CCBI) pulsed signaling circuit can eliminate the DC balancing problems without using data encoding or quantized feedback. Also, I/O interface power dissipation is drastically reduced as a result of pulsed signaling, which unlike current/voltage mode square wave signaling, utilizes AC coupling with a low RC time constant and thus has no DC current component. This also minimizes the effect of intersymbol interference (ISI) and simultaneous swing Ldi/dt noise. Moreover, the input capacitance of the CCBI device is reduced to 0.6pF, which is only 30% of the typical currentmode signaling device (Cin=2pF) using the same WBGA package. This reduction is possible because the on-chip coupling capacitor decouples the driver/receiver circuit and the electrostatic discharge (ESD) protection structures can be eliminated. Therefore, the signal attenuation introduced by the device loading loss effect, which is dominated by the device input capacitance in a high-speed multi-drop bus [5], becomes much less severe than those of conventional signaling schemes with very large driver/receiver and ESD capacitances. CCBI pulsed signaling is appropriate for multi-Gbps, low-power, synchronous parallel inter-chip interfaces (such as a memory bus) between periodically loaded chips on a PCB line with a length of up to a few tens of centimeters.





Fig. 1 Capacitive coupled bus interface (CCBI) using pulsed signaling

Fig. 2 (a) Modeling of capacitive coupled pulsed signaling on a PCB line (b) Equivalent differentiating circuit with a small RC (c) Signaling wave forms

## Low Power Pulsed Signaling for a Multi-drop Bus

Fig. 1 depicts the proposed architecture of CCBI (singleended for simplicity) with four WBGA chips on a PCB line. The transmitter (Tx of chip1) and the receiver (Rx of chip4) are coupled to the PCB through an on-chip coupling capacitor Cc and a pad at point C and D, respectively. The data channel is double parallel terminated by impedance matching resistors ( $Z_0=50\Omega$ ) with Vterm=Vdd/2. Source synchronous clocking (with one round trip or two forwarded clocks) is used to remove the skew between the clock and the pulse. Fig. 2 (a) shows the equivalent model of this pulsed signaling on a 50- $\Omega$  FR4 PCB line. Since we use a terminated transmission line and a WBGA, one of chip scale packages (CSP), with low net parasitics of  $R_{pk}=0.5\Omega$ ,  $L_{pk}=1.6$ nH, and  $C_{pk}=0.2$ pF, the equivalent circuit can serve as a differentiating circuit with very small time constant RC as shown in Fig. 2(b). A step input voltage  $V_A$  on node A by the full-swing driver results in a transient on node C, transforming a square wave into a short pulse with an approximate amplitude of  $V_p = (Z_0/2 + R_{pk} + 2\pi f L_{pk})I_c \approx$  $0.6Z_oC_cdV_A/dt$ , where the induced small current,  $I_c =$  $CcdV_A/dt$ , is determined by the edge speed (dt=Td) of the driver, and f is the operating frequency. The transient decays very rapidly with a time constant RC  $\approx 0.6Z_0(C_c+C_p+C_{pk})$ , which is usually less than 100ps where Cc=0.5pF and  $C_p=0.2pF$  for the bonding pad. The induced pulse with a short width (Tw, less than 10% of the cycle time) propagates over a PCB line with a flight-time (*Tf*) as shown in Fig. 2 (c). In order to achieve better noise immunity, by rejecting common-mode disturbances as well as crosstalk from other noise sources, we use differential signaling in the test chip.



Transceiver Architecture and Synchronous Signaling

Fig. 3 shows the transmitter circuit, consisting of D flipflops, a mux, and an output driver. The differential output driver consists of very small tri-state buffers with full-swing outputs (O/Ob) and a controlled slew rate. When it drives the node A of *Cc1*, a small current signal is induced on the other side of *Cc1* and converted to a short voltage pulse, with a common mode of Vterm (=0.9V) and a peak amplitude of about +/-200mV. The pulses are synchronized and transferred in parallel with the external clock (ExCLK) without board level skew by using the Tclk/Tclkb generated from the transmit DLL as shown in the timing diagram of Fig. 4 (a). When the output driver is not active, it is turnedoff (en/enb=low/high) and O/Ob nodes are precharged and equalized by signal Vinit to a voltage Vcon (=Vterm) to initiate a common mode. Fig. 5 depicts the receiver circuit, which consists of a differential static pre-amplifier and two sense amplifier based registers (SAFF). Since the incoming signal is a short pulse with limited amplitude, a preamplifier, which is a static cross-coupled flip-flop, is required to sense and latch the pulse with a hysteresis effect for noise suppression. The synchronous timing diagram of the receiver is shown in Fig. 4 (b). The pulse signal arrives at the receiver in parallel with the ExCLK, and is recovered by the pre-amplifier with a data-to-q delay (td1). The center of this data (out/outb) window is phase locked with the Rclk, of which delay is compensated (td2=td1). The 90 degree shifted clock, Rclk', is generated from the receive DLL by synchronizing with the ExCLK. Finally the differential output is amplified and latched by demultiplexing SAFFs. Although the pre-amplifier consumes static currents of about 1.5mA, this could be minimized by switching it off (en=low) when the receiver is not active.





Fig. 6 shows the simulation results of a 1.0-Gbps/pair CCBI signaling using 0.5pF *Cc* with two extra device loads on a shared 10-cm FR4 differential PCB line from C to D as depicted in Fig. 1. When the transmitter chip1, which is connected at point C/Cb, sends out 500MHz NRZ data Din, it is converted to short pulses with  $V_p$  of +/-200mV and  $T_w$  of 100ps at point C/Cb. Then the transmitted pulses disperse with  $V_p$  of +/-100mV at point D/Db after experience attenuation and dispersion (due to device losses, skin effect and dielectric losses of the transmission line). These are then recovered with differential small swing out/outb and full swing Dout at the receiver chip4. When the differential

pulses are in the null detecting range (between Vterm-60mV and Vterm +60mV where Vterm=0.9V), the pre-amplifier maintains the previous value. The minimum pulse amplitudes for the logical thresholds of sensing operation are above Vterm+60mV and below Vterm-60mV.



Fig. 6 Simulated 1.0-Gb/s/pair pulsed signaling on the 10-cm CCBI of Fig. 1



Signal Integrity and Channel Power Dissipation

A conventional 2-PAM 4-drop bus operating over 800MHz experiences severe attenuation and becomes exceedingly difficult to use as frequency and device count are increased [5]. This is mainly because the device loading effect, which can be modeled with a series RLC network (e.g., Rin=7 $\Omega$ , Lin=2nH, Cin=2pF) for a typical currentmode signaling DRAM interface using a WBGA package, becomes a dominant loss factor in a multi-drop bus at high frequencies. Here the input capacitance Cin, which primarily dominates the input reactive impedance and attenuation up to a few GHz, is approximately composed of the output driver (0.6pF), receiver (0.2pF), ESD (0.6pF), bonding PAD (0.2pF), package (0.2pF), and extra wire interconnect capacitances (0.2pF). In CCBI, however, the device input capacitance is dramatically reduced to 0.6pF, which is only 30% of the typical current-mode device (Cin=2pF) using the same WBGA,  $Cin=C_p+C_{pk}+(C_cC_a)/(C_c+C_a)\approx 0.6pF$ , where Cc=0.5pF~0.8pF, and Ca=0.25pF for the driver/receiver drain and gate capacitance. Cin of the CCBI circuit is primarily determined by the net package parasitics and the series combination of Cc and Ca. This is because Cc decouples the driver and receiver from the I/O pin and thus ESD protection circuits that usually increase the device I/O capacitance over 0.5pF can be eliminated (since MIM Cc blocks the DC current path). Therefore, multiple device loading losses are effectively decreased by moving added poles to higher frequencies and these results in lower signal attenuation and reflections. Fig. 7 shows the simulated transfer characteristics of 10-cm, 20-cm, and 30-cm CCBI channels with four WBGA device loads, indicating considerably improved 3-dB bandwidth of 2.94GHz, 1.67GHz, and 1.20GHz, respectively, for a data rate of up to 5Gb/s. Although PCB skin effect and dielectric losses still exist, the signal integrity problems of CCBI channel become much less severe than those of conventional current/voltage mode signaling schemes in high-speed buses. Moreover, CCBI carries no DC signal component, thus inherently minimizes the effect of ISI. Also, the CCBI channel is double parallel terminated and consumes less than 10% of I/O power with a much smaller and shorter signal swing. Therefore, these results improve the signal integrity on a loaded bus and avoid system noise problems such as simultaneous switching of many pins (Ldi/dt noise), cross talk, supply drop and ground bouncing. The reduced I/O power also greatly simplifies the thermal design and the requirement for packages with over a few hundreds of pins.



Fig. 8 Signaling power dissipation of RSL, SSTL-2, and pulsed signaling CCBI

Pulsed signaling using capacitive coupled interconnects can be a very effective approach in reducing the energy consumption of parallel bus systems. Fig. 8 shows the interface signaling power of conventional memory buses [6] (i.e., Rambus Signaling Levels (RSL) for 800-Mb/s/pin Direct RDRAM, Stub Series Terminated Logic (SSTL-2) for 266-Mb/s/pin DDR SDRAM) and this 1.0-Gb/s/pair CCBI. For data patterns with a balanced stream of 1's and 0's, the RSL dissipates 28.6mW for the driver and 22.9mW for the termination with a 0.8-V swing by sinking 28.6mA (average power is 25.75 mW). The SSTL-2 consumes 7.7mW for the driver, 4.9mW for the parallel termination, and 3.6mW for the series termination with a 0.7-V swing by sinking 14mA. In CCBI, the driver dissipates maximum dynamic power of 0.8mW (= $CcVdd^2f$ =0.5pF(1.8v)<sup>2</sup>/2ns) with an induced peak current of around 10mA. The termination dissipates only 0.16mW. Thus, the total channel power consumption is drastically reduced to maximum 1.92mW for a differential interface operating at 500MHz. Consequently, this pulsed signaling CCBI consumes only 1/8 ~ 1/13 of signaling power compared to the typical high-speed memory buses (dis)charging the whole line (SSTL-2) or driving current through the whole pulse width (RSL) [6].

## **Measurement Results**

The test transceiver chip in Fig. 9 was fabricated in 0.10um CMOS DRAM technology and packaged in a WBGA. The transceiver active area is  $330 \times 85 \ \mu\text{m}^2$  and each 0.8-pF on-chip MIM capacitor occupies an area of  $110 \times 110 \text{ }\mu\text{m}^2$ . The coupling capacitors (Cc1, Cc2) have been increased a little since we intentionally kept the ESD structures (=0.6pF), located in between the pad and Cc, for more severe loading effect in this test chip. Fig. 10 shows the measurement setup for a 10-cm (Tx to Rx) FR4 PCB with double parallel terminated 50- $\Omega$  differential  $\mu$ -strip lines, consisting of two WBGA chips mounted in a chip-on-board fashion. The measured 1.0-Gb/s single-ended pulse signal on the test board (near Rx) is shown in Fig. 11. The pulse has amplitude (Vp) of 80mV and a width (Tw) of 200ps at 500MHz operating frequency, which shows the possibility of increasing data rate of up to 4Gb/s. Table I summarizes the performance of the CCBI pulsed signaling chipset.

## Conclusion

This paper demonstrates differential pulsed signaling over standard PCB lines as an effective solution to reduce the signaling power and improve the signal integrity in low-cost multi-drop buses. A 1.0-Gb/s/pair synchronous pulsed signaling system based on on-chip capacitive coupling for low-power high-speed parallel links (such as memory buses) has been implemented in a 0.10-µm CMOS DRAM process. The total signaling power has been reduced dramatically up to  $1/8 \sim 1/13$  of conventional memory buses using SSTL-2 and RSL. This results in lower ISI, simultaneous switching Ldi/dt noise and crosstalk. The 3-dB bandwidth of a 4-drop 10-cm CCBI channel is increased up to 2.94GHz due to the reduced device input capacitance (70% reduction from 2pF to 0.6pF by decoupling the driver/receiver circuit and eliminating ESD structures), which significantly decreases the signal attenuation and reflections resulting from the multiple device loading losses at high frequencies.

## Acknowledgement

The authors would like to thank Prof. Behzad Razavi for valuable discussions, Daehe Jung, Youngsoo Son, Chankyoung Kim, Sungwoo Shin, and W. Hant for their support. This work was supported in part by NSF CCR-0098361.

## References

[1] Thaddeus J. Gabara, Wilhelm C. Fischer, "Capacitive Coupling and Quantized Feedback Applied to Conventional CMOS Technology," *IEEE J.*  Solid-State Circuits, pp.419-427, March 1997.

[2] Stephen Mick, et al., "4 Gbps High-Density AC Coupled Interconnection," *IEEE Custom Integrated Circuits Conference*, pp.133-140, May. 2002.

[3] Robert J. Drost, et al., "Proximity Communication," *IEEE Custom Integrated Circuits Conference*, pp.469-472, Sep. 2003.

[4] K. Kanda, D. Antono, K. Ishida, H. Kawaguchi, T. Kuroda, T. Sakurai, "1.27Gb/s/pin 3mW/pin Wireless Superconnect (WSC) Interface Scheme," *ISSCC Digest of Technical Papers*, pp.186-187, Feb. 2003.

[5] Jared L. Zerbe, et al., "1.6 Gb/s/pin 4-PAM Signaling and Circuits for a Multidrop Bus," *IEEE J. Solid-State Circuits*, pp.752-760, May 2001.
[6] Bruce M, Peter G, "Two High-Bandwidth Memory Bus Structures," *Design and Test of Computers, IEEE*, Vol. 16, pp.42-52, Jan.-March 1999.



Fig. 9 Transceiver chip microphotograph



Fig. 10 Measurement test PC board



Fig. 11 Measured 1.0-Gb/s single-ended pulse signal on the test board

Table I. Performance Summary

| Supply Voltage                  | 1.8V                          |
|---------------------------------|-------------------------------|
| Technology                      | 0.10-µm CMOS DRAM process     |
| Data Rate                       | 1.0Gb/s/pair (at 500MHz)      |
| Power Dissipation (Termination) | 0.32 mW/pair                  |
| Power Dissipation (Transmitter) | 1.6 mW (differential driver)  |
| Power Dissipation (Receiver)    | 2.7 mW (static pre-amplifier) |
| Active Transceiver (Tx+Rx) Area | 330×85 μm <sup>2</sup>        |