# An 18-Gb/s Fully Integrated Optical Receiver With Adaptive Cascaded Equalizer

Quan Pan, Yipeng Wang, Student Member, IEEE, Yan Lu, Member, IEEE, and C. Patrick Yue, Fellow, IEEE

Abstract—An 18-Gb/s fully integrated optoelectronic integrated circuit for short-distance communications is realized in the TSMC 65-nm CMOS process. The system consists of a CMOS on-chip photodetector, an inverter-based cascode transimpedance amplifier, a DC offset cancellation buffer, a main amplifier, a three-stage tunable continuous-time linear equalizer, a two-stage modified limiting amplifier, a DC offset cancellation network, an adaptive equalization loop, a low dropout regulator, and a 50- $\Omega$  termination output buffer. The CMOS P-Well/Deep N-Well on-chip photodetector improves bandwidth and responsivity without technology modification. Moreover, the adaptive cascaded equalization further compensates for the limited bandwidth of the on-chip photodetector with a 5-10-dB/dec roll-up frequency response. The electrical measurement results show a transimpedance gain of 102 dB $\Omega$  and a bandwidth of 12.5 GHz. Furthermore, the optical measurement results demonstrate a fully integrated solution with (1) standard mode: data traffic of 9 Gb/s for  $2^{15}-1$ PRBS with 10<sup>-12</sup> BER, -4.2-dBm optical input sensitivity, and 5.33-pJ/b efficiency; (2) avalanche mode: data traffic of 18 Gb/s for  $2^{15}-1$  PRBS with  $10^{-12}$  BER, -4.9-dBm optical input sensitivity, and 2.7-pJ/b efficiency. The chip occupies a core area of 0.23 mm<sup>2</sup> and dissipates 48 mW from a 1/1.2-V voltage supply.

*Index Terms*—Optoelectronic integrated circuits, optical receivers, silicon photodetector, inverter-based cascode transimpedance amplifier, low dropout regulator, DC offset cancellation, continuous-time linear equalizer (CTLE), adaptive equalizer, limiting amplifier (LA).

## I. INTRODUCTION

**F** OR the past few decades, a boom of data traffic has emerged from the exponential growth of multimedia consumer applications. Optical interconnects have shown great superiority compared with copper-based electrical interconnects, in terms of cost, bandwidth, channel loss per kilometer, crosstalk, and electromagnetic interference (EMI). Krishnamoorthy *et al.* have shown that for a data rate over 10 Gb/s and length over 10 m (or

Manuscript received November 29, 2015; revised April 24, 2016; accepted May 24, 2016. Date of publication May 30, 2016; date of current version September 8, 2016. This work was supported in part by the Research Grants Council of the Hong Kong Special Administrative Region Government under the Theme-Based Research Scheme under Grant T23-612/12-R and the HKUST-Qualcomm Joint Innovation and Research Laboratory.

Q. Pan was with the Electronic and Computer Engineering Department of Hong Kong University of Science and Technology, Hong Kong, and now is with eTopus Technology Inc., Sunnyvale, CA USA (e-mail: panquan@connect.ust.hk).

Y. Lu was with the Electronic and Computer Engineering Department of Hong Kong University of Science and Technology, Hong Kong. He is now with the State-Key Lab of Analog and Mixed-Signal VLSI, University of Macau, Guangdong, China (e-mail: yanlu@umac.mo).

Y. Wang and C. P. Yue are with the Electronic and Computer Engineering Department of Hong Kong University of Science and Technology, Hong Kong (e-mail: ywangbp@ust.hk; eepatrick@ust.hk).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/JSTQE.2016.2574567

data rate-distance product >100 Gb/s-m), copper-based electrical cables become much less efficient than optical fibers, based on a comprehensive survey of the historical deployment of electrical vs. optical links [1].

For short-range (<100 m) optical links over 5 Gb/s, 850-nm wavelength is a promising choice by using the verticalcavity surface-emitting laser (VCSEL) as the cost-effective light source. Therefore, low-cost CMOS-based optical receivers with on-chip photodetectors (PDs) have been actively pursued in recent years to take advantage of silicon PN junctions' ability to detect 850-nm light [2]-[12]. Compared to expensive hybrid implementations with III-V PDs, CMOS monolithic optoelectronic integrated circuits (OEICs) are superior by low cost and elimination of package parasitic. However, the responsivity and the bandwidth of CMOS PDs are extremely limited by several factors, including the shallow junction depth ( $<1 \mu m$ ) in standard CMOS, the 850-nm light's relatively deep absorption length in silicon (>15  $\mu$ m), and large carrier transit time in the substrate absorption layer [3], [13], [14]. Typical plain CMOS NW/Psub PDs without spatially modulated light (SML) can achieve 0.2-A/W responsivity but with a sacrifice of 2-MHz bandwidth. Although their SML alternatives are used to boost the bandwidth by cancelling out slow diffusion carriers, their responsivities are halved since half of the PD is covered by dummy metals. Typical CMOS SML PDs can achieve 30-mA/W responsivity and 100-MHz bandwidth. CMOS PDs with process modification are presented in [12]. These PDs have lower parasitic, higher bandwidth, but resulting in much higher cost. Biasing CMOS PDs under the avalanche mode is the third option but introducing reliability issues to the whole system [2], [7].

To further enhance the data rate of an optical system with on-chip PD, equalization techniques are required to boost the system's bandwidth. Both decision feedback equalizers (DFEs) and continuous-time linear equalizers (CTLEs) have been studied extensively in recent years. The DFE is well-suited for channels with a few dominant post-cursor inter symbol interference (ISI) [15]-[17], but the number of taps is limited by the feedback loop settling time, which strongly depends on the process. The CTLE has a main drawback of amplifying the high-frequency noise, which will introduce a penalty in sensitivity. However, the CTLE is more suitable for OEICs integrated with CMOS PDs due to their lower bandwidth and poorer responsivity. There are two types of CTLEs: one is the passive CTLE [18], and the other is the active CTLE [19]. The passive CTLE consumes sub-1mW power only, but it introduces too much loss at low frequency. The active CTLE consumes more power than its passive alternative, but it has much smaller loss and thus is more suitable to achieve better input sensitivity. Besides this, with the



Fig. 1. The architecture of the proposed fully integrated optical receiver.

existing 65-nm CMOS process, the circuit bandwidth limitations cause significant design difficulties. Therefore, bandwidth enhancement techniques are utilized in this work, including series inductive peaking, parallel inductive peaking, and negative capacitance compensation (NCC).

Mainly due to the limited performance of CMOS PDs, it is still extremely challenging to achieve IEEE 802.3ba 40-Gb/s and 100-Gb/s Ethernet Standard (4 channels) by fully integrated OEICs. On the other hand, by using discrete high-performance PDs, OEICs is capable of delivery up to 30 Gb/s with appropriate equalization design [20], [21]. Until now, under a standard PD reverse bias  $(V_{\rm PD})$  condition  $(V_{\rm PD} \leq V_{\rm PD})$ , the reported OEIC has a maximum data rate and power efficiency of 8.5 Gb/s and 5.53 pJ/bit, respectively [5]. Under an avalanche  $V_{\rm PD}$  condition  $(V_{\rm PD} < V_{\rm PD})$ , the maximum data rate and efficiency are 12.5 Gb/s and 4.72 pJ/bit, respectively [2]. In order to achieve a higher data rate and better power efficiency, in this work, a CMOS P-Well/Deep N-Well (PW/DNW) PD with better bandwidth and responsivity is presented [22]. Based on the measured slow roll-off frequency response of the CMOS PD, an OEIC circuit with a 3-stage cascaded CTLE utilizing NCC and inductive shunt peaking at three different frequencies is presented to complementarily compensate for the limited bandwidth of the CMOS on-chip PD [22]. With both the PW/DNW PD and cascaded CTLE, the proposed OEIC can achieve new record data rates and efficiencies.

This paper provides more detailed descriptions of circuit-level analyses and system-level insights by integration and elaboration of the results in [13], [20]–[24]. This paper is organized as follows. Section II describes the system architecture of the proposed fully integrated OEIC receiver. Section III provides the design of the CMOS on-chip PW/DNW PD without any process modifications. Section IV presents circuit implementations of individual building blocks. Section V demonstrates the measurement results of both the PD and the optical receiver. Finally, a conclusion is drawn in Section VI.

## II. ARCHITECTURE OF THE PROPOSED OEIC RECEIVER

Fig. 1 illustrates the architecture of the proposed fully integrated OEIC receiver, which consists of a CMOS on-chip PW/DNW PD, a transimpedance amplifier (TIA), a DC offset cancellation buffer (DOCB), a main amplifier (MA), a 3-stage tunable CTLE, a 2-stage modified Cherry-Hooper limiting amplifier (LA), a DC offset cancellation network (DOCN),



Fig. 2. (a) Layout, and (b) cross-section views of the proposed CMOS PW/DNW PD.

an adaptive equalization loop (AEL) consisting of a low-pass filter (LPF) and a differential power detector (DPD), a low dropout regulator (LDO) for better power supply rejection (PSR) performance, and a  $50-\Omega$  termination output buffer.

To overcome the on-chip PD's capacitance, the TIA employs a push-pull cascode topology with shunt-shunt feedback, input series peaking, and NCC techniques to attain adequate transimpedance gain and sufficient bandwidth. Due to the pseudodifferential characteristics of the TIA, the DOCB is designed to cancel out the offset voltages from PD and TIA; otherwise the receiver will be saturated with further amplification. The tunable CTLEs are designed to compensate for the on-chip PD's limited optical frequency response and also the process, voltage and temperature (PVT) variations. The MA stage between DOCN and CTLE is essential for the system's sensitivity performance since it can suppress the input referred noise (IRN) from the following stages when the CTLE is enabled [20]. The twostage LA provides sufficient bandwidth and power gain to offer a limiting function at the system's output. Finally, the DOCN eliminates the offset voltages from the MA, CTLE, and LA. The high-pass corner frequency is verified to be 15 kHz during the chip measurement, to avoid significant output droop even in the presence of consecutive '0' or '1' runs.

## III. DESIGN OF THE P-WELL/DEEP N-WELL PHOTODETECTOR

The layout and cross-section views of the PW/DNW PD are depicted in Fig. 2. To reduce the extrinsic series resistance, P+ is added on top of the PW region. Meanwhile, to minimize the reflection of the incident light by the silicide layer and thus avoid responsivity degradation, one silicide blocking layer, the resist protection oxide, is used to keep P+ unsilicided, except under the metal contact area [13].

The presented PW/DNW PD has four main advantages. First, it does not use the P-sub as one junction terminal. This successfully reduces the slow diffusion current generated in the substrate and thus increases the PD's bandwidth. Second, the PW/DNW PD has a lighter doping concentration than the conventional P+/N-well (or PW/N+) PDs. Therefore, with the same  $V_{\rm PD}$ , more light can be absorbed in the wider depletion region. Third, it has a deeper physical junction by adopting the DNW layer (2  $\mu$ m). More electron-hole pairs can be generated due to the relatively long absorption length of 850-nm light in silicon. Moreover, by using the PW instead of



Fig. 3. Schematic of the inductive peaking inverter-based cascode TIA.

the P-sub, the PW/DNW PD is compatible to the optical receiver design when operating in the avalanche mode. The PW terminal can be freely connected to the receiver's TIA input (which is typically at half of  $V_{\rm DD}$ , around 0.5 V for the 65-nm design) while a high positive voltage is applied to the DNW terminal to obtain the required  $V_{\rm PD}$ . However, the conventional APDs do not have this biasing freedom, since the P-sub of conventional avalanche PDs (APD) must be biased with a large negative DC voltage and the N-terminal of the PD is connected to the TIA input. This large negative DC voltage biased to the P-sub can give rise to stability issues of CMOS circuits [13].

Before designing the following optical receiver chip, one standalone PW/DNW PD was fabricated, measured and modeled. The details are shown in Section V.

## **IV. CIRCUITS DESCRIPTION**

## A. The Inductive Inverter-based Cascode TIA

The TIA is a critical building block of the optical receiver, since the receiver's sensitivity essentially depends on the inputnode capacitor, the feedback resistor, and the transconductance of the input transistors. Inverter-based TIAs have been studied in the past years, which show better noise performance than the simple common-source (CS) TIAs [8], [23], [25].

Given the same feedback resistor, to achieve higher amplifier gain and better input sensitivity (i.e., noise performance), input transistors with reasonable size and bias are required. A larger transistor size will help with the gain, but it will introduce more noise and gate-source capacitance directly to the input node. Due to the Miller effect, the impact of the gate-drain capacitance on input and output will also increase with the voltage gain. Together these two capacitances can dominate over the PD capacitance and limit the TIA's bandwidth.

Fig. 3 shows the schematic diagram of the proposed inductive peaking inverter-based cascode TIA. The cascode devices' gate bias ( $V_{B1}$  and  $V_{B2}$ ) are provided on chip to keep  $M_1-M_4$  in the saturation region over process corners.

The simplified low frequency transimpedance gain and 3-dB bandwidth of the inductive peaking inverter-based cascode TIA  $Z_T$  is given by

$$Z_T = \frac{A_{\text{core}} \cdot R_f}{A_{\text{core}} + 1} \tag{1}$$



Fig. 4. (a) Simulated progressive TIA bandwidth enhancement and (b) TIA output noise.

$$A_{\text{core}} = (g_{m1} + g_{m4}) [(g_{m2}r_{\text{ds, M1}}r_{\text{ds, M2}}) || \times (g_{m3}r_{\text{ds, M3}}r_{\text{ds, M4}})]$$
(2)  
$$BW = \frac{A_{\text{core}} + 1}{2\pi R_f (C_{\text{gs, M1}} + C_{\text{gs, M4}} + C_{\text{gd, M1}} + C_{\text{gd, M4}})}$$
(3)

where  $A_{\text{core}}$  is the open-loop voltage gain. Compared to the conventional inverter-based TIA topology, its  $A_{\text{core}}$  is increased roughly by  $g_{m2}r_{\text{ds},M1}$  times.

The simulated progressive TIA bandwidth enhancement is shown in Fig. 4(a). The improvement in the TIA open-loop gain by the cascode devices reduces TIA input impedance hence extends bandwidth. In addition, the presented TIA design incorporates series inductive peaking and NCC to extend its bandwidth. The series peaking inductor  $L_{\rm S}$  not only peaks the frequency response, increasing the TIA bandwidth, but also shapes the input referred current noise spectral density. The NCC circuit, consisting of cross-coupled transistors  $M_9 - M_{10}$ , capacitors  $C_1 - C_2$ , and two current sources  $I_{B1} - I_{B2}$ , is used to enhance the boosting at high frequency while maintaining the DC gain. Overall, this TIA attains 58-dB $\Omega$  gain with 12.7-GHz bandwidth while dissipating 11 mW.

While achieving substantial bandwidth enhancement, the noise contribution from all boosting techniques must be examined carefully. As shown in Fig. 4(b), the noise from



Fig. 5. Schematic of the DOC buffer.

cascode devices is degenerated by the transconductance devices and only contributes to extra 0.4 nV/sqrt(Hz) output noise at low frequency. However, large parasitic capacitance at the source of each cascode device attenuates the noise degeneration and makes cascode devices the major noise contributor from 1 GHz to 10 GHz. The noise contribution from the series resistance of  $L_S$  is negligible due to its high quality factor. In addition, the noise from NCC is not significant because of the relatively low transconductance of  $M_9$  and  $M_{10}$ .

Due to the pseudo-differential characteristic, the TIA is vulnerable to the noise from the power supply. Therefore, a fully integrated tri-loop LDO with ultra-fast response is required to supply the TIA and DOCB running above 10 Gb/s [24]. To ensure there is enough voltage headroom, the voltage supply of the LDO is designed to be 1.2 V.

#### B. The DC Offset Buffer and DC Offset Network

The unbalanced current output of the differential photodetector can cause large DC offset at the output of the inverter-based TIA. To alleviate the offset mismatch, a DOCB is inserted between the TIA and the MA amplifier, as shown in Fig. 5. It consists of two differential pairs and two sets of RC low-pass filters. The positive pair formed by  $M_1$  and  $M_4$  behaves as a conventional differential amplifier that provides a gain of  $g_{m1} \cdot R_{D1}$ . In contrast, the differential pair formed by  $M_2$  and  $M_3$  is reversely connected, which produces the negative gain to cancel out the DC offset. The high pass degeneration mechanism is achieved by the RC low pass filter at the input of the negative gain pairs. The low frequency gain is designed to be -20 dB while the mid-band gain remains positive to avoid diminishing SNR too much. The low cut-off frequency is set by off-chip capacitors  $C_1$  and  $C_2$  at 15 kHz.

With the same principle, in order to remove the accumulated offset voltages from the MA, CTLE, and LA stages, a DOCN is utilized, which consists of RC low-pass filters and conventional differential amplifiers, as depicted in Fig. 5.

The accumulated offset voltages can be minimized by careful analog layout techniques. The differential paths should be fully symmetrical so that in post-layout simulation, even without the DOCN, the accumulated offset voltage of the six cascaded amplifiers is below 1 mV. However, the DOCN cannot be omitted since PVT variations are unpredictable.



Fig. 6. Schematic of the CTLE with both inductive peaking and NCC.

# *C. Three-Stage Cascaded CTLE With Different Peaking Frequencies*

The conventional CTLE filter has been studied clearly [26]. With the capacitive degeneration, this CTLE circuit achieves a boosting at high frequencies with the sacrifice of DC gain. The transfer function is [26]

$$\frac{V_{\text{out}}}{V_{\text{in}}}\left(s\right) = \frac{g_{m1}R_D}{1 + \frac{g_{m1}R_S}{2}} \frac{1 + \frac{s}{\omega_{z1}}}{\left(1 + \frac{s}{\omega_{p1}}\right)\left(1 + \frac{s}{\omega_{p2}}\right)}$$
(4)

where  $\omega_{z1} = 1/(R_S C_S)$ ,  $\omega_{p1} = (1 + \frac{g_{m1}R_D}{2})/(R_S C_S)$ ,  $\omega_{p2} = 1/(R_D C_L)$ , and  $g_{m1}$  is the transconductance of  $M_1$ . This topology suffers from limited bandwidth and consequently insufficient boosting at high frequencies. Therefore, inductive peaking was introduced into this topology as shown in Fig. 6(b). The transfer function is [26]

$$\frac{V_{\text{out}}}{V_{\text{in}}}\left(s\right) = \frac{g_{m1}R_D}{1 + \frac{g_{m1}R_S}{2}} \frac{\left(1 + \frac{s}{\omega_{z1}}\right)\left(1 + \frac{s}{\omega_{z2}}\right)}{\left(1 + \frac{s}{\omega_{p1}}\right)\left(1 + \frac{2\zeta s}{\omega_n} + \frac{s^2}{\omega_n^2}\right)} \quad (5)$$

where  $\omega_{z1}$  and  $\omega_{p1}$  do not change,  $\omega_{z2} = 2\zeta\omega_n$ ,  $\zeta = (\frac{R_D}{2})\sqrt{C_L/L_P}$ , and  $\omega_n = 1/\sqrt{C_LL_P}$ . The second zero,  $\omega_{z2}$ , is created to cancel out the first pole,  $\omega_{p1}$ . Therefore, the gain boosting and phase compensation at higher frequencies are achieved.

In this work, for the first stage, to save chip area, the NCC circuit is used instead of an on-chip inductor, as illustrated in Fig. 6. The second and third stage employs an on-chip inductor for the shunt peaking making the trade-off between chip area and power. Each CTLE stage also features a tunable RC source degeneration circuit to adjust the zeroes and poles in the frequency response. The three stages are tuned simultaneously by a single control voltage ( $V_{\rm CTLE}$ ) from an adaptive control loop.

As presented in Section V, the standalone PW/DNW PD testing chip measured optical frequency response has demonstrated a slow roll-off frequency response with a slope of 5-10 dB/decade at the mid-band. Therefore, a circuit with a slow roll-up frequency response is required to complementarily compensate for the PD's loss. However, a single CTLE stage can only have a 20-dB/dec roll-up slope, which starts from its first zero, and reaches the peaking value at the highest poles. In this work, a three-stage CTLE is presented with different



Fig. 7. Simulated frequency response of the proposed OEIC receiver front-end with the on-chip PD biased at 12.3 V  $V_{PD}$ .



Fig. 8. Schematic of the modified CH LA.

peaking frequencies at 2.5 GHz, 6 GHz, and 12 GHz. By interpolating the positions of different zeros and poles, it achieves a slow roll-up frequency response of 5-10 dB/decade. Although multi-stage cascaded CTLEs had been presented by others [5], [25], they had the same peaking frequencies, thus were unable to completely compensate for the PD's slow roll-off loss.

To achieve the co-simulation of on-chip PD and receiver circuits, PD circuit models are built from the measured PD frequency response under different bias conditions [13]. An equivalent PD model under 12.3-V  $V_{\rm PD}$  bias is adopted with a bandwidth of 1.1 GHz. In Fig. 7, the simulated frequency responses at three different CTLE stages' outputs and LA output are plotted. With the multi-stage cascaded peaking techniques, the receiver bandwidth is improved stage by stage and the final receiver output bandwidth is boosted to be 11.9 GHz.

## D. Limiting Amplifier

The gain-bandwidth of a conventional differential pair with resistive or inductive load is not large enough for broadband operation. Therefore, a modified Cherry-Hooper LA is adopted in this design. Fig. 8 shows the schematic of the LA. It is worth noting that the MA has the identical schematic with the LA except the DOCN feedback differential input pairs. To improve the overall sensitivity, the MA stage is inserted between the



Fig. 9. (a) Schematic of the variable-gain LPF, and (b) simulated gain and bandwidth for variable-gain LPF.

DOC buffer and 3-stage CTLE. Otherwise, when the CTLE is enabled, the IRN of the whole system will deteriorate since there will not be enough gain to alleviate the high-frequency noise from the CTLE stages, seen from the OEIC inputs. The 2-Stage LA is allocated after the CTLE to achieve the overall gain requirement for the entire OEIC.

In sum, the differential MA, CTLE, and LA stages totally provide 44-dB DC gain, and 30-dB offset cancellation capability is designed based on system optimization of gain, IRN, bandwidth, and power consumption. The optimization is achieved by using a Verilog-A model after each circuit block is individually designed and key parameters are extracted.

## E. Adaptive Equalization Loop

To accommodate and overcome different input data rates, channel losses, and PVT variations, an AEL has been utilized in this design. It consists of both a variable-gain LPF and a DPD, as seen in Fig. 1.

The conventional implementation of the adaptive loop usually employs both a low- and a high-pass filter to capture the power of both the low and high spectral components and then compare the difference [18]. In this design, to save more chip area and power, the high-pass path is eliminated by comparing the low-pass signal power directly with the full-pass power. Therefore, the fixed low-pass filter is revised to be a variablegain one with an amplification ratio of the fundamental frequency of the signal to the ideal low-pass corner frequency. The schematic of the variable-gain LPF is shown in Fig. 9(a). The diode-connected PMOS  $M_8 - M_9$  and the cross-coupled pair  $M_6 - M_7$  are adopted to fix the output nodes' common-mode voltage, eliminating the need for a common-mode feedback circuit. The NMOS transistor  $M_5$  operating in the linear region is used as a variable source degeneration resistor to change the LPF gain  $(A_{\rm LPF})$  and bandwidth  $(f_{-3 \, dB\_LPF})$ . The required  $A_{\rm LPF}$ and  $f_{-3 \text{ dB}\_LPF}$  for different data rates are plotted in Fig. 9(b). The  $A_{\rm LPF}$  and  $f_{-3 \, dB_{\rm LPF}}$  are also plotted as  $V_{\rm LPF}$  changes from 0.4 to 1 V with a 0.05-V step.

Fig. 10 shows the schematic of the DPD. Each differential pair's  $(M_{1a} + M_{1b} \text{ or } M_{2a} + M_{2b})$  sources and drains are tied together; therefore, the odd harmonics of the output currents are cancelled, only leaving a DC term and an AC term that are proportional to the square of the input voltage amplitude.



Fig. 10. Schematic of the DPD.



Fig. 11. CoB testing fixture for (a) electrical measurement, and (b) optical measurement.

With these two source- and drain-connected differential pairs, a super-differential pair is achieved. The final output current is shown as below, which is proportional to the difference between the two input voltage amplitude squared [18]

$$I_{\rm out} = I_{\rm out2} - I_{\rm out1} = K \times \left( V_{\rm in1_{-dm}}^2 - V_{\rm in2_{-dm}}^2 \right).$$
(6)

Current bleeding technique is used in the DPD circuits. The adaptive equalization control voltage is generated by integrating the  $I_{out}$  over the loading capacitor  $C_{\rm L}$ . The parameter K is the transfer characteristic of the DPD. In this work, K is designed to be 4.5 mA/V<sup>2</sup> for trade-off between equalization speed and power consumption.

## V. CHIP FABRICATION AND MEASURED RESULTS

The OEIC chips have been fabricated using 65-nm CMOS technology without any process modifications to enhance the performance of the on-chip PD. Fig. 11 shows the chip microphotograph, where the core chip occupies 0.23 mm<sup>2</sup> and consumes 48 mW power only.

## A. PW/DNW PD Electrical Measurement

One standalone PW/DNW PD test chip is fabricated to characterize the performance of the PD itself. Using the 20-GHz network analyzer, the electrical reflection coefficients of the CMOS PD, from 20 MHz to 10 GHz at 0.5-V  $V_{\rm PD}$  on Smith Chart, are also measured in [13]. An equivalent electrical extrinsic sub-model (consists of a variable resistor and a



Fig. 12. Measured bias dependency of the PD responsivity. Inset: Ratio of illumination to dark current vs.  $V_{\rm PD}$ .

capacitor) is proposed to mimic the measured S-Parameter, as shown in [13]. With the PD model, the co-simulation of PD and the following receiver circuits can be performed seamlessly. The measured PD capacitance is 480 fF under 0.5-V  $V_{\rm PD}$ . With the same method, under 12.3-V  $V_{\rm PD}$  the measured PD capacitance is 328 fF.

## B. PW/DNW PD Optical Measurement

As shown in Fig. 2(a), the 50- $\mu$ m diameter octagonal PD layout is optimized for light exposure by commercial single-mode fibers or multi-mode fibers. Fig. 12 depicts measured bias dependency of the PD responsivity. At low  $V_{\rm PD}$ , the measured output illumination current is 16  $\mu$ A with a .5-dBm input power  $(P_{\rm in})$  whereas the background dark current ranges from 100 to 200 pA. Both currents have a weak dependence on  $V_{\rm PD}$  until the PD enters the avalanche mode at about 11 V. In the avalanche mode, both currents increase rapidly due to impact ionization and carrier multiplication under the high electric field in the depletion region. When  $V_{\rm PD}$  is equal to 0.5-V supply for the 65-nm standard CMOS process, a responsivity of 51 mA/W is obtained. The peaking responsivity is 1.03 A/W at 12.8-V breakdown voltage. However, when biased at breakdown voltage, the noisy dark current will dominates the PD output current and closes the data eye completely. In this work, the optimal  $V_{\rm PD}$  for the avalanche mode is 12.3 V, and a responsivity of 272 mA/W is achieved.  $V_{\rm PD}$  is controlled manually by external voltage supply. A feedback control loop will be designed to auto-tune  $V_{\rm PD}$  in future work.

Fig. 13 shows the measured optical frequency response with different bias conditions. The bandwidth of the CMOS PD has two limitations: its optical intrinsic limitation and electrical extrinsic bandwidth limitation. The plots are normalized to the low-frequency responsivity of 272 mA/W at 12.3-V  $V_{\rm PD}$ . The measured -3-dB bandwidth is 500 MHz and 1.1 GHz at 0.5-V and 12.3-V  $V_{\rm PD}$ , respectively. As shown, under both standard and avalanche modes, the optical frequency response has a 5.10-dB/decade slow roll-off characteristic. It is worth mentioning that, although higher PD noise exists under avalanche mode, the OEIC still has better input sensitivity than the standard



Fig. 13. Measured optical frequency response of the PW/DNW PD.

TABLE I Comparison with Recently Published CMOS PDs for  $V_{\rm PD} \leq$  Supply

| CMOS Tech          | [6]            | [11]        | [27]           | [28]           | This Work      |
|--------------------|----------------|-------------|----------------|----------------|----------------|
| Junction           | P+/N-well      | P-sub/N-    | P-sub/N-       | P-sub/N-       | P-well/DN-     |
|                    |                | well        | well           | well           | well           |
| Supply Voltage (V) | 1.5            | 1.8         | 3.3            | 1.0            | 1.0            |
| $V_{\rm PD}$ (V)   | $\sim 0.5$     | < 1.8       | 2.3            | 0.3            | 0.5            |
| Meas. Resp. (mA/W) | 50             | n/a         | 20             | n/a            | 51             |
| Meas.BW (MHz)      | 348            | n/a         | 1100           | 75 - 150       | 500            |
| $C_{\rm PD}$ (fF)  | 3,000          | 1,600       | 416            | 14,000         | 480            |
| Area $(\mu m^2)$   | $70 \times 70$ | 50 	imes 50 | $100\times100$ | $250\times250$ | $50 \times 50$ |



Fig. 14. Measured electrical frequency response with different CTLE settings (without on-chip PD).

mode. It is due to a responsivity improvement from 51 mA/W to 272 mA/W and a bandwidth boosting from 500 MHz to 1.1 GHz. Table I summarizes the PD performance and compares it to recent published results.

## C. OEIC Circuit Electrical Measurement

To investigate the OEIC's circuit characteristics, a standalone OEIC chip without the on-chip PD is fabricated for electrical S-Parameter measurement, as shown in Fig. 11(a). Fig. 14 demonstrates the measured electrical frequency response of the fully integrated OEIC with different CTLE settings, which is tested by directly probing on the chip-on-board (CoB). When the CTLE is disabled, the system achieves a transimpedance gain of 102 dB $\Omega$  with a bandwidth of 12.5 GHz. When the CTLE is maximally enabled, it provides 33-dB adjustable gain



Fig. 15. Measured optical PRBS-15 data eyes at the maximum DR in standard mode (-4.2-dBm input optical power) and avalanche mode (-4.9-dBm optical power) with the CTLE enabled and disabled.

with a slow roll-up slope to compensate for the gradual roll-off slope of the on-chip PD. Therefore, the 3-stage cascaded CTLE provides a slow roll-up frequency response of 0-11-dB/decade to fully complementarily compensate for the limited response of the on-chip PD.

Besides this, the PSR of the tri-loop LDO up to 20 GHz is measured [24]. It confirms that the PSR is better than -21 dB at low frequencies and the worst case occurs at 5 MHz with a -12 dB rejection.

## D. OEIC Optical Measurement

To demonstrate the effectiveness of the 3-stage CTLE and AEL, Fig. 15 shows the optical eyes measured at the maximum data rate achieved with the CTLE disabled and enabled. Here, the maximum data rate is defined as the highest measured value given  $2^{15}-1$  pseudo random binary sequence (PRBS) input and BER $< 10^{-12}$ . As shown in the plots, with CTLE disabled, the maximum data eye is improved from 3 Gb/s to 7 Gb/s by changing the bias condition from 0.5-V standard mode to 12.3-V avalanche mode. The reason is due to the higher responsivity and better bandwidth of the CMOS on-chip PD under avalanche mode. Moreover, when the CTLE and AEL are enabled, the maximum data rates are improved from 3 to 9 Gb/s and 7 to 18 Gb/s under standard and avalanche mode, respectively. It is worth mentioning that the difference between manual and automatic equalizations are small, verifying the effectiveness of the AEL. At high data rate, the receiver suffers from a significant amount of jitter, due to the fluctuation in the mid-band shown in Fig. 14.

The measured optical BER bathtub curves with different PD operation modes are depicted in Fig. 16. The measured optical BERs versus optical input power with different PD operation modes are shown in Fig. 17.

Table II compares the measured performance to other 850-nm optical receivers with integrated PDs. This work achieves faster data rate, better efficiency, and higher figure-of-merit (FoM) [6].



Fig. 16. Measured BER bathtub curves with PD under different modes (-4-dBm optical power): (a) 0.5-V standard mode, and (b) 12.3-V avalanche mode.



Fig. 17. Measured BER versus optical input power.

 TABLE II

 COMPARISON TO PUBLISHED CMOS 850-nm OPTICAL RECEIVERS

| Tech.               | [2]<br>250-nm<br>BiCMOS | [4]<br>180-nm<br>CMOS | [5]<br>130-nm<br>CMOS | [6]<br>130-nm<br>CMOS | This work<br>65-nm CMOS |              |
|---------------------|-------------------------|-----------------------|-----------------------|-----------------------|-------------------------|--------------|
| PD Mode             | Avalan                  | Avalan                | Std                   | Std                   | Std                     | Avalan       |
| PRBS Type           | $2^{7}-1$               | $2^{31} - 1$          | $2^{7}-1$             | $2^{31} - 1$          | Sta.                    | $2^{15} - 1$ |
| PD Bias (V)         | 12                      | 14.2                  | 0.5                   | 1.2                   | 0.5                     | 12.3         |
| PD Area             | 100                     | 2505                  | 4900                  | 3600                  | 2071                    |              |
| $(\mu m^2)$         |                         |                       |                       |                       |                         |              |
| Resp.               | 70                      | 29                    | 50                    | 5                     | 51                      | 272          |
| (mA/W)              |                         |                       |                       |                       |                         |              |
| PD BW               | 5                       | 6.9                   | 0.348                 | 0.5                   | 0.5                     | 1.1          |
| (GHz)               |                         |                       |                       |                       |                         |              |
| Supply (V)          | 2.5                     | 1.8                   | < 1.5                 | 1.2                   | 1.0/1.2#                |              |
| Gain (dB $\Omega$ ) | 68.4##                  | 88                    | 120                   | 105                   | 102                     |              |
| Power*              | 59                      | $118^{*}$             | 47                    | 74                    | 48                      |              |
| (mW)                |                         |                       |                       |                       |                         |              |
| Max. DR             | 12.5                    | 10                    | 8.5                   | 4.5                   | 9                       | 18           |
| (Gb/s)              |                         |                       |                       |                       |                         |              |
| BER                 | $10^{-12}$              | $10^{-12}$            | $10^{-12}$            | $10^{-12}$            | $10^{-12}$              | $10^{-12}$   |
| Sens. (dBm)         | -7                      | 6                     | -3.2                  | -3.4                  | -4.2                    | -4.9         |
| Eff. (pJ/bit)       | 4.72                    | 11.8                  | 5.53                  | 16.44                 | 5.35                    | 2.7          |
| FoM**               | 2                       | 42                    | 242                   | 55                    | 471                     | 1089         |
|                     |                         |                       |                       |                       |                         |              |

#.1.2 V for LDO and TIA supply.

##. Simulation result.

\* Not including output buffer.

\*\*.FoM: Eq. 21 in [6].

## VI. CONCLUSION

In summary, the 18-Gb/s OEIC measurement results demonstrate a fully integrated solution for the short-range 850-nm optical communications. On the one hand, under the standard mode (0.5-V PD  $V_{\rm PD}$ ), a record data traffic of 9 Gb/s for  $2^{15}-1$  PRBS with  $10^{-12}$  BER, -4.2-dBm optical input sensitivity, and 5.33-pJ/bit efficiency is presented; on the other hand, under the avalanche mode (12.3-V PD  $V_{PD}$ ), a record data rate of 18 Gb/s for  $2^{15}-1$  PRBS with  $10^{-12}$  BER, -4.9-dBm optical input sensitivity, and 2.7-pJ/bit efficiency is exhibited. To achieve the performance, in device level, a CMOS PW/DNW PD with improved bandwidth and responsivity is presented. In circuit level, an inverter-based cascode TIA and multi-stage cascaded CTLEs with different peaking frequencies are presented to obtain higher bandwidth performance. To satisfy the IEEE 802.3ba 100-Gb/s Ethernet standard, the limited performance of CMOS PD is still the system bottleneck. Novel CMOS PDs or equalization techniques are desired in future work.

#### REFERENCES

- A. V. Krishnamoorthy *et al.*, "Progress in low-power switched optical interconnects," *IEEE J. Sel. Topics Quantum Electron.*, vol. 17, no. 2, pp. 357–376, Mar./Apr. 2011.
- [2] J. S. Youn, M. J. Lee, K. Y. Park, H. Rucker, and W. Y. Choi, "An integrated 12.5-Gb/s optoelectronic receiver with a silicon avalanche photodetector in standard SiGe BiCMOS technology," *Opt. Express*, vol. 20, no. 27, pp. 28153–28162, Dec. 2012.
- [3] A. Carusone, H. Yasotharan, and T. Kao, "CMOS technology scaling considerations for multi-Gbps optical receivers with integrated photodetectors," *IEEE J. Solid-State Circuits*, vol., 46, no. 8, pp. 2856–2867, 2011.
- [4] S. H. Huang, W. Z. Chen, Y. W. Chang, and Y. T. Huang, "A 10-Gb/s OEIC with meshed spatially-modulated photo detector in 0.18-µm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 46, no. 5, pp. 1158–1169, May. 2011.
- [5] D. Lee, J. Han, G. Han, and S. M. Park, "An 8.5-Gb/s fully integrated CMOS optoelectronic receiver using slope-detection adaptive equalizer," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2861–2873, Dec. 2010.
- [6] F. Tavernier and M. S. J. Steyaert, "High-speed optical receivers with integrated photodiode in 130 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 44, no. 10, pp. 2856–2867, Oct. 2009.
- [7] J. S. Youn, H. S. Kang, M. J. Lee, K. Y. Park, and W. Y. Choi, "High-speed CMOS integrated optical receiver with an avalanche photodetector," *IEEE Photon. Technol. Lett.*, vol. 21, no. 20, pp. 1553–1555, Oct. 2009.
- [8] T. Chalvatzis *et al.*, "Low-voltage topologies for 40-Gb/s circuits in nanoscale CMOS," *IEEE J. Solid-State Circuits*, vol. 42, no. 7, pp. 1564–1573, Jul. 2007.
- [9] W. Z. Chen *et al.*, "A 3.125 Gbps CMOS fully integrated optical receiver with adaptive analog equalizer," in *Proc. IEEE Asian Solid-State Circuits Conf. Tech. Paper*, Nov. 2007, pp. 396–399.
   [10] W. Z. Chen and S. H. Huang, "A 2.5 Gbps CMOS fully integrated optical
- [10] W. Z. Chen and S. H. Huang, "A 2.5 Gbps CMOS fully integrated optical receiver with lateral PIN detector," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2007, pp. 293–296.
- [11] S. Radovanovic, A. J. Annema, and B. Nauta, "A 3-Gb/s optical detector in standard CMOS for 850-nm optical communication," *IEEE J. Solid-State Circuits*, vol. 40, pp. 1706–1717, Aug. 2005.
- [12] H. Zimmermann and T. Heide, "A monolithically integrated 1-Gb/s optical receiver in 1-μm CMOS technology," *IEEE Photon. Technol. Lett.*, vol. 13, no. 7, pp. 711–713, Jul. 2001.
- [13] Q. Pan, Z. Hou, Y. Li, A. W. Poon, and C. P. Yue, "A 0.5-V P-Well/Deep N-Well photodetector in 65-nm CMOS for monolithic 850-nm optical receivers," *IEEE Photon. Technol. Lett.*, vol. 26, no. 12, pp. 1184–1187, Jun. 2014.
- [14] M. K. Lee, H. S. Kang, and W. Y. Choi, "Equivalent circuit model for Si avalanche photodetectors fabricated in standard CMOS process," *IEEE Electron Device Lett.*, vol. 29, no. 10, pp. 1115–1117, Oct. 2008.
- [15] C. Thakkar, N. Narevsky, C. D. Hull, and E. Alon, "A mixed-signal 32-coefficient RX-FFE 100-coefficient DFE for an 8 Gb/s 60 GHz receiver in 65 nm LP CMOS," in *Proc. IEEE Int. Solid-State Circ. Conf. Dig. Tech. Papers*, 2013, pp. 238–239.
- [16] M. H. Nazari and A. Emami-Neyestanak, "A 15-Gb/s 0.5-mW/Gbps twotap DFE receiver with far-end crosstalk cancellation," *IEEE J. Solid-State Circuits*, vol. 47, no. 10, pp. 2420–2432, Oct. 2012.

- [17] T. O. Dickson, J. F. Bulzacchelli, and D. J. Friedman, "A 12-Gb/s 11-mW half-rate sampled 5-tap decision feedback equalizer with currentintegrating summers in 45-nm SOI CMOS technology," *IEEE J. Solid-State Circuits*, vol. 44, no. 4, pp. 1298–1305, Apr. 2009.
- [18] D. Shin, J. E. Jang, F. O'Mahony, and C. P. Yue, "A 1-mW 12-Gb/s continuous-time adaptive passive equalizer in 90-nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.*, Jun. 2009, pp. 117–120.
- [19] H. Wang and J. Lee, "A 21-Gb/s 87-mW transceiver with FFE/DFE/Analog equalizer in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 909–919, Apr. 2006.
- [20] Q. Pan et al., "A 41-mW 30-Gb/s CMOS optical receiver with digitallytunable cascaded equalization," in Proc. Eur. Solid-State Circuits Conf., Sep. 2014, pp. 127–130.
- [21] Q. Pan et al. "A 30-Gb/s 1.37-pJ/b CMOS receiver for optical Interconnects," J. Lightwave Technol., vol. 33, no. 4, pp. 778–786, Feb. 2015.
- [22] Q. Pan et al. "A 48-mW 18-Gb/s fully integrated CMOS optical receiver with photodetector and adaptive equalizer," in *Proc. Symp. VLSI Circuits Dig. Tech. Papers*, Jun. 2014, pp. 116–117.
- [23] Y. Wang, Y. Lu, Q. Pan, Z. Hou, L. Wu, W. H. Ki, and C. P. Yue, "A 3-mW 25-Gb/s CMOS transimpedance amplifier with fully integrated lowdropout regulator for 100GbE systems," in *Proc. IEEE Radio Frequency Integr. Circuits Symp. Dig. Papers*, Jun. 2014, pp. 275–278.
- [24] Y. Lu, W. H. Ki, and C. P. Yue, "A 0.65ns-response-time 3.01ps FOM fully-integrated low-dropout regulator with full-spectrum power-supplyrejection for wideband communication systems," in *Proc. IEEE Int. Solid-State Circ. Conf. Dig. Tech. Papers*, Feb 2014, pp. 306–307.
- [25] S. M. Park and H. J. Yoo, "1.25-Gb/s regulated cascode CMOS transimpedance amplifier for gigabit Ethernet applications," *IEEE J. Solid-State Circuits*, vol. 39, no.1, pp. 112–121, Jan. 2004.
- [26] J. Lee, "A 20-Gb/s adpative equalizer in 0.13-μm CMOS technology," IEEE J. Solid-State Circuits, vol. 41, no. 9, pp. 2058–2066, Sep. 2006.
- [27] M. Jutzi *et al.*, "2-Gb/s CMOS optical integrated receiver with a spatially modulated photodetector," *IEEE Photon. Technol. Lett.*, vol. 17, no. 6, pp. 1268–1270, Jun. 2005.
- [28] Y. Dong and K. Martin, "A monolithic 3.125 Gbps fiber optic receiver front-end for POF applications in 65 nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2011, pp. 1–4.



Quan Pan (S'08–M'14) received the B.S. degree in electrical engineering from the University of Science and Technology of China, Hefei, Anhui, China, in 2005 and the M.S. degree in 2008. He received the Ph.D. degree in electronics and computer engineering from the Hong Kong University of Science and Technology, Hong Kong, in 2014. He joined a U.S. Silicon Valley IC startup company eTopus Technology, Inc., as Senior Staff Engineer. He was with MXTronics, Beijing, as a RFIC Engineer from 2008 to 2009, working on GPS receivers. His re-

search interests include high-speed optical transceiver and wireless and wireline circuit design.



Yipeng Wang (S'13) received the B.S. degree from the Xiamen University, Xiamen, China, in 2010, and the M.S. degree in electrical engineering in 2012 from the University of California, Santa Barbara, CA, USA. He is currently working toward the Ph.D. degree from the Hong Kong University of Science and Technology, Hong Kong, with research focusing on high-speed wireline communication circuits.



**Yan Lu** (S'12–M'14) received the B.Eng. and M.Sc. degrees in microelectronic engineering from the South China University of Technology, Guangzhou, China, in 2006 and 2009, respectively, and the Ph.D. degree in electronic and computer engineering from the Hong Kong University of Science and Technology, Hong Kong, in 2013.

Since July 2014, he has been an Assistant Professor in the State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Zhuhai, China, His research interests include near-

field coupled wireless power transfer, fully-integrated power converters, and low dropout regulators.



**C. Patrick Yue** (S'93–M'98–SM'05–F'14) received the B.S.E.E. degree (Hons.) from the University of Texas at Austin, Austin, TX, USA, in 1992, and the M.S. and Ph.D. degrees in electrical engineering from Stanford University, Stanford, CA, USA, in 1994 and 1998, respectively.

He is the Associate Provost for Knowledge Transfer, a Professor in the Electronic and Computer Engineering Department, and the Founding Director of the HKUST-Qualcomm Lab and the Center for Industry Engagement and Internship at HKUST. His

current research interests include CMOS wireless and optical communication IC design, high-frequency device modeling, LED SoC for visible light communication and micro-display systems, and wireless power transfer for biomedical implants.