# A 0.0071-mm<sup>2</sup> 10.8ps<sub>pp</sub>-Jitter 4 to 10-Gb/s 5-Tap Current-Mode Transmitter Using a Hybrid Delay Line for Sub-1-UI Fractional De-Emphasis

Yong Chen<sup>(b)</sup>, *Member*, *IEEE*, Pui-In Mak<sup>(b)</sup>, *Fellow*, *IEEE*, Zunsong Yang, Chirn Chye Boon<sup>(b)</sup>, *Senior Member*, *IEEE*, and Rui P. Martins<sup>(b)</sup>, *Fellow*, *IEEE* 

Abstract—This paper proposes an ultra-compact 4 to 10-Gb/s 5-tap current-mode transmitter to realize the sub 1-UI fractional de-emphasis (DE) using a hybrid delay line, which is alternatively controlled by the voltage bias and clock. It exhibits the scalability between the clocked 0.5-UI and 1-UI DEs and data rate. The sub-1-UI DE provides wide tunability of the data amplitude and delay to compensate different channel losses between the  $1^{st}$  and  $2^{nd}$  Nyquist frequencies while effectively compensating the high-frequency portion of the pseudo-random binary sequence (PRBS) spectrum for data jitter improvement. Additional techniques are a two-step current-summing scheme, namely, two-step DE in the data path, and active inductors in both the data and clock paths to enhance the internal bandwidth without the need for passive inductors. In addition, we present an analytical model for predicting data-dependent jitter (DDJ) based on a generic system's step response, derive the exact closed-form DDJ expression of DE, and verify its validity by mean of circuit simulation. Prototyped in 65-nm CMOS technology, it achieves a figure-of-merit of 4.6 mW/Gb/s and an output jitter of 10.8 pspp at 10 Gb/s under a PRBS  $2^{31} - 1$  pattern. The data eves measure 0.62-UI-horizontal and 19.5%-vertical openings after -20-dB channel loss. The die area is 0.0071 mm<sup>2</sup>.

*Index Terms*—Fractional de-emphasis (DE), CMOS, latch, active inductor (AI), bandwidth (BW) extension, data-dependent jitter (DDJ), current reuse, hybrid delay line, current-mode logic (CML), unit interval (UI), flip-flop (FF), pulse-width-modulated (PWM), current-mode transmitter, figure-of-merit (FOM).

Manuscript received December 26, 2018; revised April 27, 2019; accepted May 19, 2019. Date of publication June 12, 2019; date of current version September 27, 2019. This work was supported in part by the University of Macau under Grant MYRG2017-00167-AMSV and in part by Macau Science and Technology Development Fund (FDCT)—SKL Fund. This paper was recommended by Associate Editor M. Onabajo. (*Corresponding author: Yong Chen.*)

Y. Chen and Z. Yang are with the State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau 999078, China (e-mail: ychen@um.edu.mo).

P.-I. Mak is with the State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau 999078, China, and also with the Department of ECE, Faculty of Science and Technology, University of Macau, Macau 999078, China (e-mail: pimak@um.edu.mo).

C. C. Boon is with the Department of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798.

R. P. Martins is with the State Key Laboratory of Analog and Mixed-Signal VLSI, University of Macau, Macau 999078, China, and also with the Department of ECE, Faculty of Science and Technology, University of Macau, Macau 999078, China, on leave from the Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal (e-mail: rmartins@ um.edu.mo).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2019.2919623

#### I. INTRODUCTION

E-EMPHASIS (DE) techniques have been widely explored for high-speed data transmission [1]–[3] to compensate the frequency-dependent channel loss. While the fully-clocked DE [4]–[11] could equalize the data eye by improving the vertical (VO) and horizontal (HO) openings, the improvement of data-dependent jitter (DDJ) is limited due to its fixed delay  $(t_d)$  of 1 unit interval (UI), in which the DDJ modeled with 2-impulse-based function must exist. Alternatively, the clocked 1-UI DE [Fig. 1(a)] employs the flip-flop (FF) line to propagate the integer-delayed signals, which are then multiplied by adjustable coefficients and finally combined as the amplitude DE. Yet, the offered DDJ equalization effect is marginal. It is possible to add the phase DE [5] for DDJ suppression by varying the data transition time, but demanding extra circuitries (e.g., duty-cycle control (DCC) and delay generation) for DDJ compensation. Despite the fact that the analog fractional DE [12]-[13] uses asynchronous active delay elements [Fig. 1(b)] and it can enable tuning around the integer t<sub>d</sub>, each delay does not scale with different data rates (being tuned properly by voltage bias without the clocked control), thus the DDJ compensation becomes susceptible to the process variation. The time-based pulse-width-modulated (PWM) DE [14]-[19] is an option [Fig. 1(c)], realized by XOR-ing the PWM data with a variable clock generated by the DCC [14]-[15] or the tunable delay [16]–[19]. Yet, penalized by the duty-cycle distortion of the clock phases, the even and odd data eyes are asymmetric. This issue remains unsolved in the DE using the integrated pulse width modulation (iPWM) [20].

In view of the above, we propose a sub-1-UI fractional DE technique using a hybrid delay line, applied to realize a 10-Gb/s 5-tap current-mode (CM) transmitter (TX) that can flexibly adjust the equalization effects in both amplitude and timing. Prototyped in 65-nm CMOS, the TX aided by a number of bandwidth (BW)-enhancement techniques achieves a better quality of the data eyes for both near- and far-end testing, while occupying a compact die area of 0.0071 mm<sup>2</sup>.

Section II presents the proposed sub-1-UI DE and its timeand frequency-domain characteristics. Section III discusses the DDJ analysis of the proposed DE. Section IV focuses on the complete sub-1-UI fractional DE TX and its verification. Section V summarizes the experimental results, and finally Section VI draws the conclusions.

1549-8328 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 1. Traditional and proposed DE techniques: (a) 1-UI clocked DE, (b) analog fractional DE, (c) time-based PWM DE, and (d) proposed sub-1-UI fractional DE. The latter aims to offer wide tunability of amplitude and delay to improve both DE and DDJ.

#### II. PROPOSED SUB-1-UI FRACTIONAL 5-TAP DE

In [21], the 0.5-UI DE is configured by quadrature half-rate clocking, whereas the 1-UI DE is configured by quadrature quarter-rate clocking + differential half-rate clocking. Both involve complicated timing calibration to reduce the DDJ. The core of the proposed DE technique is a hybrid delay line [Fig. 1(d)], which alternates two tunable fractional-delay cells with two clocked fixed-delay cells to support the tunability of both amplitude and delay. A single FF implemented by two latches, L1 and L2, generates two clocked fixed delays of  $t_{d2} = 0.5$  UI and  $t_{d4} = 1$  UI, respectively. Between them, the two fractional delays are for the variable  $t_{d1} = t_d$  UI and  $t_{d3} = 0.5 + t_d$  UI. Unless otherwise mentioned, we exemplify  $t_{d1} = 0.25$  UI and  $t_{d3} = 0.75$  UI as an example of  $t_d$  in Sections II and III. Unlike the analog fractional DE [12], [13], as shown in Fig. 1(b), that is more sensitive to the data rate, we insert the factional delay t<sub>d</sub> into different clocked latches [Fig. 1(d)], indicating that our solution upholds the scalability of the clocked DE in data-rate scaling. Thus, the clocked 0.5-UI and 1-UI DEs are insensitive to the process variation. All delayed signals are weighted  $(\alpha_{1-4})$  as the post-cursor taps and summed with the main tap [x weighted by  $\alpha_0$ ] to generate the desired time-domain behavior of the 5-tap DE at the output y [Fig. 1(d)],

$$y(t) = \alpha_0 x(t) - \sum_{i=1}^{4} \alpha_i x(t-t_{di})$$
 (1)

The corresponding frequency-domain behavior of the proposed sub-1-UI fractional 5-tap DE is written as,

$$Y(j\omega) = \left(\alpha_0 - \sum_{i=1}^4 \alpha_i e^{-j\omega \cdot t_{di}}\right) X(j\omega)$$
(2)

where  $X(j\omega)$  and  $Y(j\omega)$  denotes the Fourier transform of x(t) and y(t), respectively. Then, we obtain the transfer behavior of the proposed sub-1-UI fractional 5-tap DE, such as the transfer function in the frequency domain

$$H(j\omega) = \alpha_0 - \sum_{i=1}^4 \alpha_i e^{-j\omega \cdot t_{di}}$$
(3)

and the corresponding unit impulse response in the time domain

$$h(t) = a_0 \sigma(t) - \sum_{i=1}^4 \alpha_i \sigma(t - t_{di})$$
(4)

Due to the complicated time- and frequency-domain behaviors of (1) by independently adjusting  $\alpha_i$  and  $t_{di}$  of each post-cursor tap, our analysis is up to the main tap plus one post-cursor tap by zeroing the coefficients of other postcursor taps. Our analysis mainly aims to reveal the features of each post-cursor tap, e.g., DE effect and DDJ, in both the time and frequency domains. Therefore, the impulse response [Fig. 2(b)] is given by,

$$h_i(t) = \alpha_0 \sigma(t) - \alpha_i \sigma(t - t_{di})$$
(5)

The integration of  $h_i(t)$  is the step response  $g_i(t)$  used for the DDJ calculation (Section III). When  $\alpha_i$  is preseted, decreasing  $t_{di}$  of each post-cursor tap can move both the second term in (5) and the  $\alpha_0$ -to- $(\alpha_0-\alpha_1)$  transition at  $t_{di}$  in the step response  $g_i(t)$  [Fig. 2(d)] to the left. The pulse response  $y_i(t)$  [Fig. 2(c)] can be derived as,

$$y_i(t) = x(t) * h_i(t)$$
 (6)

Compared with the conventional time-based PWM pulse with only two discrete voltage levels  $\pm 1$  (i.e., rail-to-rail differential voltages),  $y_i(t)$  can be expressed as the partly amplitude-limited PWM pulse, in which the time-domain tunability occurs within two discrete voltage levels ( $\alpha_0 + \alpha_1$ ,  $\alpha_0 - \alpha_1$ ). The impact of these DEs on the amplitude is  $20\log[(\alpha_0 + \alpha_1)/(\alpha_0 - \alpha_1)]$ . Theoretically, it exhibits the same DDJ for each post-cursor tap, i.e. both fractional (0.25/0.5/0.75-UI) DE and 1-UI DE have the same rising and falling transitions.

The aforesaid time-domain compression behavior [Fig. 2(b) and (d)] matches the frequency-domain spreading behavior of the transfer function [Fig. 2(e)], with its magnitude given by,

$$|H_i(jf)| = \sqrt{\alpha_0^2 + \alpha_i^2 - 2\alpha_0 \alpha_i \cos(2\pi f t_{di})}$$
(7)

When  $t_{di}$  decreases, the first peak located at  $f_{pi} = 1/(2t_{di})$ right-shifts and amplifies the higher frequency content of x(t). Fig. 3 depicts a theorectical illustration of (7). Each post-cursor tap  $(\alpha_{1-4})$  can be designed to contribute to the low- and high-frequency gain with respect to the normalized frequency  $f_n$ . Moreover, a clocked 1-UI DE [Fig. 3(a)] can only peak at  $f_n = 0.5$  (Nyquist), while a clocked 0.5-UI DE [Fig. 3(b)] can shift the peaking to  $f_n = 1 (2^{nd}$  Nyquist). Besides, our sub-1-UI fractional DE [Fig. 3(c) and (d)] not only can provide fine gain compensation between  $f_n = 0.5$  to 1, but also can extend the gain compensation beyond  $f_n = 1 (3^{rd} \text{ and } 4^{th} \text{ Nyquist zones})$ . This broadband multi-tap scheme shortens the data transition time, thereby suppressing the DDJ effectively before and after passing through the channel, as evidenced in the VO and HO results to be presented in Section V.



Fig. 2. Steps of fractional DE generation: the convolution of (a) the input pulse x(t) and (b) the impulse response  $h_i(t)$  equals to (c) the output  $y_i(t)$ . (d)–(e) detailed time- and frequency-domain features. The normalized frequency  $f_n$  = frequency f x T<sub>B</sub> and  $t_{d1}$  = 0.25 UI are preset. [T<sub>B</sub> denotes the data period.]



Fig. 3. (a) Fixed 1-UI DE and (b) 0.5-UI DE. Sub-1-UI fractional DEs (c) and (d) offers wide tunability and extends the gain peaking to higher frequencies beyond Nyquist.  $\alpha_0 = 1$  is preset.

# III. PROPOSED DDJ ANALYSIS

Before delving into the theoretical analysis in this Section, we would like to remark that the data timing jitter (DTJ) [22], is separated into DDJ and random jitter (RJ) for simplicity. Herein, we consider that the magnitude of DTJ varies significantly (e.g., from  $10^{-4}$  to 10 ps), one separated *Line J* is suggested to distinguish DDJ (above *Line J*) and RJ (below *Line J*). Observing the behavior of the DTJ reduction: large DTJ (obvious 2-impulse-based DDJ) is gradually reduced to small DTJ (i.e., obvious 2-impulse-based DDJ disappears) and then small DTJ converges to zero (i.e., RJ dominates), this *Line* 

J is the conceptual reference rather than the actual absolute value, indicating that it is floating up and down.

Fig. 4 shows the implementation of the sub-1-UI fractional 5-tap DE [Fig. 1(d)]. Its core is a hybrid delay line in the data path, which driven by the delay control signal (VDCTRL) and clock signal (CK) alternatively, through which the present signal x travels. The delayed signals become the four post-cursor voltage signals. They are amplified by 5 common-source amplifiers serving 1 main tap ( $G_{m1}$ ) and 4 post-cursor taps ( $G_{m2}$ ). Each post-cursor tap switches on/off driven by a control signal, with the 1<sup>st</sup> post-cursor tap is controlled



Fig. 4. Implementation of the proposed fractional DE TX consisting of the hybrid delay line and output driver with 5 taps.

by T<sub>C</sub>. Simultaneously, all the taps are summed in the cm domain. Thereafter, the 5-tap DE signal is combined at the output node y, which is BW-limited due to the one-pole response consisting of the resistive load (50  $\Omega$ ) and parasitic capacitance (C<sub>po</sub>). To arrive at the DDJ at y, we mainly consider two nonideal factors: 1) the timing delay in the hybrid delay line will be discussed in Section IV-C. 2) we can denote a 1<sup>st</sup>-order RC frequency response [H<sub>RC</sub>(j $\omega$ )] with a 3-dB BW of  $\omega_0$ , and the associated time constant is  $\tau_{RC} = 1/\omega_0$  for each tap in the data path.

Consider our implementation (Fi.g 4), we developed an analytical model [Fig. 5(a)]. Compared to the pulse [Fig. 2(a)] as input, the pseudo-random binary sequence (PRBS) input [Fig. 5(b)] will be used in the following analysis. When  $t_{d2} = 0.5$  UI is preset, the impulse response  $h_2(t)$  is plotted in Fig. 5(c) and Fig. 5(d) shows the ideal pulse response  $y_2(t)$  according to (6). Further, convolving  $y_2(t)$  with the impulse response  $h_{RC}(t)$  [Fig. 5(e)] results in the real output  $y_{RC}(t)$  [Fig. 5(f)]. The bandlimited effect of the 1<sup>st</sup>-order RC response impacts the progressive increment and decrement of the rising and falling transition in the partly amplitude-limited PWM signal within 1 UI, respectively. Additionally, it also determines the maximum amplitude ( $\alpha_0 + \alpha_1$ ) in the time domain.

To reveal the one-pole impact on the DE equalization [Fig. 6(a)], we first convolve  $h_i(t)$  with  $h_{RC}(t)$ , and calculate its Laplace transform, and then obtain the overall transfer function  $H_{all}(j\omega)$  from x to  $y_{RC}$  in Fig. 5(a),

$$H_{all}(j\omega) = H_i(j\omega) \cdot H_{RC}(j\omega) \tag{8}$$

Its magnitude response entails the DE reduction and the movement of the peak location  $f_{peak}$ .  $\beta = e^{-T_B/\tau_{RC}}$  defines the relationship between the time constant of the 1<sup>st</sup>-order RC response and data rate. For an unlimited BW,  $\beta$  approaches zero in Fig. 6(b) and (c), the lossless DE is located at different peaking locations are 2, 1, 0.67 and 0.5, respectively. For 0 <  $\beta$  < 0.2, both DE and  $f_{peak}$  drop faster when  $t_{di}$  goes up. Yet, a 0.25-UI DE still has the high-frequency amplification to improve the rising or falling edge. If increasing  $\beta$  further, DE and  $f_{peak}$  vanish firstly, owing to the wider BW required by the narrower fractional DE.

Unlike the DDJ of signals (e.g., duobinary [23] and nonreturn-to-zero [22] and [24], [25]) induced by unpredicted and complicated ISI effects, the DDJ of DE is mainly caused by the predicted ISI effect. Thus, we aim to derive a *closed-form* DDJ expression of DE based on the one-pole step response  $[g_{RC}(t)]$ . The overall step response of DE with a 1<sup>st</sup>-order RC response can be written by the integration of the inverse Laplace transform of (8).

$$g_{all}(t) = \alpha_0 g_{RC}(t) - \alpha_1 g_{RC}(t - t_{di})$$
(9)

From another perspective, we can obtain  $g_{all}(t)$  by deriving the step responses of different taps in the data path and combining them. The Appendix is about this method.

Depicted in Fig. 7(a), we observe DDJ, i.e.,  $\Delta t =$  $(t_2 - t_1)T_B$ , generated by the derivation between two zero-crossings in two step responses [e.g.,  $g_1(t)$  and  $g_2(t)$ ] with different initial values (e.g.,  $-A_0$  and  $-A_1$ ) at t = 0. When the observe at time approaches t<sub>di</sub>, they will go up and converge to the maximum value of A2 at tdi together through both step responses [e.g.,  $g_1(t)$  and  $g_2(t)$ ], namely that the initial derivation  $(A_1 - A_0)$  on the waveform reduces to zero at  $t_{di}$ . Between  $t_{di}$  and 1 UI,  $g_1(t)$  and  $g_2(t)$  slowly fall together with the same one-pole step response. With fixed 1-UI DE, the DDJ reaches the maximum value under  $t_{d4} = 1$  UI, at that moment  $-A_1$  equals to the minimum value  $(-A_2)$  of the eye diagram [Fig. 7(a)]. As t<sub>di</sub> decreases, -A<sub>2</sub> at t<sub>di</sub> shifts to the upper left,  $-A_1$  moves upward at t = 1 UI, resulting in the initial derivation reduction. This means  $g_2(t)$  keeps close to  $g_1(t)$  within (0,  $t_{di}$ ). When  $t_{di}$  approaches  $t_4$ , the large DTJ decreases gradually (namely the obvious 2-impulse-based DDJ vanishes gradually), implying that RJ will dominate the performance after Line J (see Fig. 8 later). If reducing further  $t_{di}$ , RJ will drop. Until  $t_{di} = t_3$ , RJ disappears ideally, indicating that there is no DE. With t<sub>di</sub> fixed, DDJ increases with the increment of DE. The eye diagram in Fig. 7(a) can be unfolded to the time-domain waveform in Fig. 7(b). Herein,  $g_{all1}(t)$  and  $g_{all2}(t)$  will be calculated in the Appendix. The two figures have the same y-axis scale. The boundary mechanism of DDJ can be illustrated visually. Previously-consecutive ZEROs, generally more than two consecutive ZEROs, lead to enough time before t = 0 to stabilize  $-A_0 = -(\alpha_0 - \alpha_1)$ . The 0-to-1 transition appears at t = 0 in the Case I, correspondingly,  $g_1(t)$  in the second term of (A3) in the Appendix begins to increase gradually. A similar behavior repeats for a sequence of "...110...". Yet, the single ZERO pulse (i.e., a series bits of "...101...") as the Case II results in  $g_2(t)$  in the fourth term of (A5), whose ascent can be determined by the initial value -A1 at n UI, and the BW of the 1st-order RC response. By solving  $g_1(t) = 0$  and  $g_2(t) = 0$ , the two instant times  $t_1$  and  $t_2$  can be solved and  $\Delta t$  is written as,

$$\Delta t = (t_2 - t_1) T_B = \frac{1}{\omega_0} ln \left[ 1 + \left( \frac{\alpha_1}{\alpha_0} e^{\omega_0 t_{di} T_B} - 1 \right) \cdot e^{-\omega_0 T_B} \right]$$
(10)

Interestingly, we can directly extract two general step responses, namely,  $g_1(t)$  with  $A_{ZS} = A_m$  and  $A_{ZI} = -A_0$ , and  $g_2(t)$  with  $A_{ZS} = A_m$  and  $A_{ZI} = -A_1$ , as shown in Fig. 7(a). Thus, we can rewrite  $\Delta t$  as,

$$\Delta t = \frac{1}{\omega_0} ln \left[ \frac{A_m + A_1}{A_m + A_0} \right] \tag{11}$$



Fig. 5. (a) Proposed analytical model of DDJ. (b) PRBS input x(t). (c) Ideal impulse response  $h_2(t)$ . (d) Ideal output  $y_2(t)$ . (e) 1<sup>st</sup>-order RC impulse response  $h_{RC}(t)$ . (f) Real output  $y_{RC}(t)$ .  $t_{d2} = 0.5$  UI is preset.



Fig. 6. (a) Effect of  $1^{st}$ -order RC response. (b) DE and (c) peaking location  $f_{peak}$  versus  $\beta$ .



Fig. 7. Conceptual (a) eye diagram and (b) time-domain waveform used for the DDJ analysis.  $A_m = \alpha_0 + \alpha_1$  and  $A_0 = \alpha_0 - \alpha_1$  correspond to DE's effect in Fig. 2.

where  $A_1$  is the only variable to be solved. Furthermore,  $\Delta t$  is related to the initial value  $A_1$  of  $g_2(t)$ , determined by  $t_{di}$ . Based on the third term of (A3),  $A_1$  will become,

$$A_{1} = (\alpha_{0} - \alpha_{1}) - 2e^{-\omega_{0}T_{B}} \left(\alpha_{0} - \alpha_{1}e^{\omega_{0}t_{di}T_{B}}\right)$$
(12)

Plotted in Fig. 8, (11) is an accurate simplification of (10) and provides purely-theoretical intuition, but it is only suitable for the 1<sup>st</sup>-order RC. Fig. 8(a) and (b) show the calculated DDJs and their close-up view under four time delays covering the 6-to-14 Gb/s data rate, respectively. We can exactly



Fig. 8. (a) Calculated DDJs in the axis-linear and (b) their close-up view in the axis-log across data rate (DR).  $t_{d1} = 0.25$  UI is preset. (c) Calculated DDJs in the axis-linear and (d) their close-up view in the axis-log across one post-cursor tap time delay. Note that *line J* is relative separated line.

predict the constant obvious 2-impulse-based DDJ of the integer DE regardless of the data rate. Upon traveling through the fractional DE (e.g.,  $t_{d3} \rightarrow t_{d2} \rightarrow t_{d1}$ ), the obvious 2-impulse-based DDJ decreases by degrees, and RJ will decrease and dominate [Fig. 8(b)], finally there is no jitter in theory. Regrettably, we cannot fix the borderline (*Line J*) between DDJ and RJ, and both purely-calculated jitters are separated at  $t_{d1} = 0.25$  UI and  $t_{d2} = 0.5$  UI, with the former much smaller than the latter. Regarding different data rates, the calculated DDJ goes up almost exponentially with the increment of the post-cursor tap time delay and converges into the point "A" at the 1-UI time delay, as highlighted in Fig. 8(c) and (d). Also, the DDJ for the fractional DE increases along with the data rate. If the BW is sufficiently large and fixed,  $A_1$  is infinitely close to  $A_m$  at  $t_{di} = 1$  UI. Thus, equation (10) can be simplified as,

$$\Delta t = \frac{1}{\omega_0} \ln(1 + \frac{\alpha_1}{\alpha_0} - e^{-\omega_0 T_B}) \approx \frac{1}{\omega_0} \ln(1 + \frac{\alpha_1}{\alpha_0}) \quad (13)$$

Considering an extremely small rounding error, i.e.,  $e^{-\omega_0 I_B}$  is always less than 0.0046 across the 6-to-14-Gb/s data rate, therefore, equation (13) is independent of the date rate.

#### **IV. TX IMPLEMENTATION**

## A. TX Architecture

To improve the internal BW in the data path, the summing of all taps (Fig. 4) realized in the CM domain has two steps (Fig. 9). We call it "two-step DE" technique. The first step is among the 4 post-cursor taps at a low-impedance node provided by a current-reuse tunable active inductor (AI) [26]–[28], which also enhance the signal swing before delivering it to the output driver. The 4-bit post-cursor tap control signals,  $T_F$ ,  $T_E$ ,  $T_D$ , and  $T_C$ , define the 16 operation modes, involving TX without DE (S0) and with DE (S1-S15), detailed in Table I. The second step is among I<sub>main</sub> (main tap) and tunable Itap (sum of the 4 post-cursor taps) generating Dout at the 50- $\Omega$  load. As such, the speed limit of the power-hungry output driver (i.e., G<sub>m3</sub> and G<sub>m4</sub>) can be partially shifted to the low-power pre-amplifiers (i.e.,  $G_{m2} = G_{m4}/4$ ), enhancing the internal BW, where we tune  $G_{m4}$  for targeting  $G_{m4}$  = G<sub>m3</sub>/2. The AIs in both the main-tap and post-cursor-tap paths further boost the BW. By separately optimizing them and adjusting  $V_{L,SET}$ , we can balance the transmission delay between the main-tap and post-cursor-tap paths. The clock buffer is embedded with a grounded AI [23], [26] and [28] for gain peaking (simulated a 3.1-dB gain at 10 GHz in the employed 65-nm CMOS process). The FF<sub>1</sub> synchronizes the input  $D_{in}(t)$  while generating the main-tap signal x(t). Its function is alike the synchronization at the output of the previous multiplexer. Here, the overall hybrid delay line is also known as a FF<sub>2</sub> driven by a clock buffer, in which we insert two tunable delay cells controlled by VDCTRL, which is generated by an off-chip bias.

## B. Tunable Delay Cell

The proposed tunable delay stage and current-mode logic (CML) clocked latch are similar to match their layout, but both circuits are different from  $S_{p,N}$  [Fig. 10(a)]. Boosting VDCTRL steers more  $I_{DCn}$  from  $I_{DCb}$ , increasing the transconductance  $(g_{m3})$  of  $M_3$  and the time delay  $t_d \approx 2 \ln 2 \cdot C_L \cdot [R_D / (-1/g_{m3})]$ . Here,  $R_D$  and  $C_L$  are resistive and parasitic capacitive loads, respectively. Due to the channel-length modulation of the bias current,  $I_{DCb}$  slightly



Fig. 9. Proposed complete sub-1-UI fractional DE TX. We use cm and AI techniques to extend the internal BW of the data and clock paths. TABLE I

Details of the operation mode in the proposed sub-1-UI fractional DE TX.

| Tap<br>Control | 4-Bit Binary |    |    |            |    |    |            |            |    |    |     |     |     |     |     |     |
|----------------|--------------|----|----|------------|----|----|------------|------------|----|----|-----|-----|-----|-----|-----|-----|
|                | S0           | S1 | S2 | <b>S</b> 3 | S4 | S5 | <b>S</b> 6 | <b>S</b> 7 | S8 | S9 | S10 | S11 | S12 | S13 | S14 | S15 |
| T <sub>F</sub> | 0            | 0  | 0  | 0          | 0  | 0  | 0          | 0          | 1  | 1  | 1   | 1   | 1   | 1   | 1   | 1   |
| T <sub>E</sub> | 0            | 0  | 0  | 0          | 1  | 1  | 1          | 1          | 0  | 0  | 0   | 0   | 1   | 1   | 1   | 1   |
| T <sub>D</sub> | 0            | 0  | 1  | 1          | 0  | 0  | 1          | 1          | 0  | 0  | 1   | 1   | 0   | 0   | 1   | 1   |
| Tc             | 0            | 1  | 0  | 1          | 0  | 1  | 0          | 1          | 0  | 1  | 0   | 1   | 0   | 1   | 0   | 1   |



Fig. 10. (a) Schematic of the tunable delay stage (and latch with different  $S_{p,n}).$  (b) Simulated results versus VDCTRL.

rises as VDCTRL goes up. Each tunable delay block in the hybrid delay line (Fig. 9) cascading two such delay stages [Fig. 10(a)] achieves a sufficient tunable delay range (i.e.,  $\sim 2.5x = \text{Delay}_{\text{max}}/\text{Delay}_{\text{min}}$ ), and is with 14.4% total current variation that brings the stable amplitude in Fig. 10(b).

By directly changing the load resistance [12], 107.8% current variation generates a tunable delay range of  $\sim$ 3.5x.

#### C. Timing Delay Consideration

Due to the ideal retiming of the negative edge-triggered FF<sub>1</sub>, as displayed in Fig. 11(a),  $t_{d_p5} = T_B/2$  (between  $x_D$  and x) and  $t_{d_1} = T_B$  (between  $x_F$  and x) are fixed by the positive level-sensitive latch  $L_{21}$  and negative level-sensitive latch  $L_{22}$ , respectively.  $T_B$  denotes the data period. Both tunable delay cells provide  $t_{d1}$  equal to  $t_{d_p25}$  between  $x_C$  and x, and  $t_{d1}$ pluses  $T_B/2$  will result in  $t_{d_p75}$  between  $x_E$  and x. The hybrid delay line based on the clocked latch + FF has to consider the clock-to-Q ( $t_{cqx}$ ) delays in the data-clocked path [Fig. 11(a)]. Fig. 11(a) gives the timing parameters, where  $t_{cq21}$ ,  $t_{cq22}$  and  $t_{cq1}$  denote the clock-to-Q delays of  $L_{21}$ ,  $L_{22}$ and FF<sub>1</sub>, respectively. Correspondingly, Fig. 11(b) shows the timing diagram in Fig. 11(a). FF<sub>1</sub> and FF<sub>2</sub> share the same CK.  $t_{d_p25}$  is only determined by VDCTRL, independent of CK. However, other relative timing delays (e.g.,  $t_{d_p25}$ ,  $t_{d_p75}$ 



Fig. 11. (a) Effect of timing parameters in the data-clocked path. (b) Detailed timing diagram and (c) simulated relative timing delay error versus data rate.

and  $t_{d_p1}$ ) are related to CK in Fig. 9(b). In the equations in Fig. 11(b), we extract the relative timing delay errors of  $t_{ae1} = t_{cq21} - t_{cq1}$  and  $t_{ae2} = t_{cq22} - t_{cq1}$ , normalized to T<sub>B</sub> and plotted in Fig. 11(c) based on circuit simulation. Interestingly, two curves ( $t_{ae1}$  and  $t_{ae2}$ ) cross the zero point in the vertical axis upward as data rate goes up. This happens because the occurrence of two VDTCRL-controlled delay blocks slightly change the internal operating points in the hybrid delay line and further incur in different clock-to-Q delays between FF<sub>1</sub>,  $L_{21}$  and  $L_{22}$ . Also, the parasitics in the data path contribute to this phenomenon as the data rate increases.

#### D. Simulation Results

To study the DDJ of the proposed fractional DE, a jitterfree PRBS of 27-1 length as an input (Din) passes through the complete fractional DE TX (Fig. 9) controlled by the jitter-free CK, and then Dout outputs the 10-Gb/s DE eyes [Fig. 12(a)] with only one post-cursor tap activated. The last 1-to-0 transition follows the consecutive bit string of "1". Its zero-crossing time t<sub>1</sub> determines the minimum boundary of DDJ. A bit string of "010", namely that a single digital one pulse next to the last 1-to-0 transition appears, whose zerocrossing time t<sub>2</sub> estimates the maximum boundary of DDJ. The DE under S1 mode exhibits the partially established response, which is due to the insufficient BW in the data path resulting in DE reduction [Fig. 12(a)]. Conversely, it shows a smaller jitter (e.g.,  $\Delta t_{p25} = 0.25$  ps) due to a sufficient roll-off time when its zero-crossing time  $t_2$  approaches  $t_1$  [Fig. 12(b)]. For S2 mode, we obtain the expected DE of 5.48 dB without sacrificing the jitter performance (i.e.,  $\Delta t_{p5} = \Delta t_{p25}$ ). Both cases elicit the RJ with a Gaussian distribution [Fig. 12(c)]. When entering S4 mode and transferring from S4 to S8 mode, the DE still remains, but t<sub>2</sub> increases because the 1-to-0 transition departs from that as the minimum boundary [Fig. 12(d)]. Fig. 12(e) shows the simulated jitter of  $\Delta t_{p75} = 0.713$  ps (S4) and  $\Delta t_1 = 3.55$  ps (S8). Therefore, the obvious 2-impulse-based DDJ [Fig. 12(f)] appears and rises up significantly. Based on our simulation with ideal data and clock inputs, the main contribution of RJ can be divided into three parts. First, the noise of the sub-blocks in the data path is converted to RJ. The second is related to the output random jitter from the clock buffer in the clock path [23]. The last is the use of an AI in the data path, inducing the group delay distortion which further generates RJ [25]. Thus, as suggested by our simulations, a 0.5-ps jitter is set as the separated Line J between RJ and DDJ in Figs. 13 and 14, which is a relative value.

Based on (10), we consider the timing error in Fig. 11 and replace  $t_{d3}$  by  $t_{d3} + t_{ae1}$ , and  $t_{d4}$  by  $t_{d4} + t_{ae2}$ , respectively. Fig. 13 illustrates the calculated and simulated DDJs under different DE modes and covering the 6-to-14-Gb/s data rate. Plotted in Fig. 13, the calculated DDJ rises up with the simulated DDJ, whose gap is mainly due to the RJ. Compared to the purely-theoretical calculations based on (10) with the separated RJs under S1 and S2 modes in Fig. 8(b), those in Fig. 13(b) based on the complete schematic (Fig. 9) basically overlap due to the transistor-level noideal factors (i.e., noise, phase distortion and vice versa), but the latter is significantly greater than the former. Increasing the time





Fig. 13. (a) Calculated and simulated DDJs in the axis-linear and (b) their close-up view in the axis-log under different operation modes (S2, S4 and S8).



Fig. 12. (a) Simulated eye diagram, (b) zoomed-in eye diagram at the rising and falling edges, and (c) jitter histogram under  $S_1$  and  $S_2$  modes. (d) Simulated eye diagram, (e) zoomed-in eye diagram at the rising and falling edges, and (f) jitter histogram under  $S_4$  and  $S_8$  modes. The input pattern is a 10-Gb/s  $2^7$ -1 PRBS.

delay at 10 Gb/s from 0.75 to 1 UI results in a larger 2-impulse-based DDJ, as shown in Fig. 12(e), which always holds over the data rate of 6-to-14 Gb/s (Fig. 13). For a 0.75-UI DE, if the BW is extended so that there is enough time for the roll-off process, the 2-impulse-based DDJ under

Fig. 14. (a) Simulated DDJs in the axis-linear and (b) their close-up view in the axis-log under different operation modes (S1, S2, S4 and S8) and process corners: Fast-Fast (FF), Slow-Slow (SS), Typical-Typical (TT), Fast-NMOS-Slow-PMOS (FS) and Slow-NMOS-Fast-PMOS (SF).

the 10+ Gb/s data rate may decrease below 0.5 ps. If the BW becomes narrow, the 2-impulse-based DDJ at <10-Gb/s data rate goes up beyond 0.5 ps.



Fig. 15. Die photo and zoomed-in layout of the fabricated fractional DE TX in 65-nm CMOS.



Fig. 16. Measured channel responses of 3 PCB-based transmission lines with different length.



Fig. 17. Measured 4-Gb/s eyes at  $D_{out}$  verifying the sub-1-UI tunability.

We perform the simulations under different data rates and process corners to verify the robustness of the complete fractional DE TX (Fig. 9). From Fig. 14, the DDJs of 1-UI DE (S8 mode) are larger than those of sub-1-UI DE covering all data rates. Smaller RJs under S1 mode (6-to-14 Gb/s) and S4 mode (6-to-14 Gb/s) spread out, being sensitive to the process parameters. Inversely, larger DDJs under S8 mode (6-to-14 Gb/s) gather together are insensitive to the process parameters. Entering into S4 mode, RJ is below 0.5 ps at 6 Gb/s and then DDJs appears gradually as the data rate increases from 8 to 14 Gb/s. The limited BW in different data



Fig. 18. Measured data eyes at 4 Gb/s under 16 modes (S0-S15) of the sub-1-UI fractional DE.



Fig. 19. Measured (a) rising/falling time and (b) jitter at  $D_{out}$  with and without sub-1-UI DE.

paths dominates the small increment of the DDJ (Fig. 14) at high data rates and different process corners.

#### V. EXPERIMENTAL RESULTS

The fractional DE TX prototyped in 65-nm CMOS uses dual supplies (1.2 and 1.5 V). The die size is just 0.0071 mm<sup>2</sup> (Fig. 15). At 10 Gb/s, 57% of the total power consumption (46 mW), is due to the CM circuitry (pre-amplifiers and output driver). The hybrid delay line and clock buffer consume other 34% and 10%, respectively. For the different fractional DE modes, the total power of the former increases from 15.54 mW (S0 mode), to 29.4 mW after turning on all taps (S15 mode).

Fig. 16 presents the testing scheme, in which  $D_{out}$  of the DE TX goes directly as a near-end output without the channel loss, and passes through different losses using the PCB traces (e.g., CH1 and CH2 for 10-Gb/s testing, CH3 for 6.5-Gb/s testing) as the far-end testing. Both testing cases were performed under a  $2^{31}$ -1 PRBS pattern generated by a J-BERT 4903B.

| Parameters                                          | This Work                                        |             | [10]                  | [11]                         | [13]                                    | [8]                        | [17]              | [18]                   | [20]              |  |
|-----------------------------------------------------|--------------------------------------------------|-------------|-----------------------|------------------------------|-----------------------------------------|----------------------------|-------------------|------------------------|-------------------|--|
| CMOS Technology                                     | 65nm                                             |             | 65nm                  | 40nm                         | 28nm                                    | 90nm                       | 90nm              | 90nm                   | 65nm              |  |
| Equalization<br>Techniques                          | Tunable Sub-1-<br>UI Fractional<br>5-tap DE (CM) |             | 1-UI 3-tap<br>DE (CM) | 0.5-UI+IIR 3-<br>tap DE (CM) | Sub-1-UI<br>Fractional<br>3-tap DE (VM) | Hybrid VM TX<br>with CM DE | PWM (CM)          | Time-Based<br>PWM (VM) | iPWM              |  |
| Channel Loss (dB)                                   | 20                                               | 21.6        | 24                    | 20                           | 3 cm PCB trace                          | 4                          | 33                | 28                     | 19                |  |
| Data Rate (Gb/s)                                    | 10                                               | 6.5         | 6.5                   | 10                           | 28                                      | 6                          | 5                 | 5                      | 16                |  |
| PRBS Signal                                         | <b>2</b> <sup>3</sup>                            | 1 <b>-1</b> | 2 <sup>31</sup> -1    | 2 <sup>7</sup> -1            | 2 <sup>7</sup> -1                       | 2 <sup>15</sup> -1         | 2 <sup>7</sup> -1 | 2 <sup>7</sup> -1      | 2 <sup>7</sup> -1 |  |
| TX Pk-to-Pk Jitter (ps)<br>w/o PE                   | 10.8                                             | 10.2        | 35.8                  | N/A                          | 45 <sup>*</sup>                         | 60.7                       | N/A               | N/A                    | N/A               |  |
| TX Pk-to-Pk Jitter (ps)<br>w/ PE                    | 8.4                                              | 8.3         | N/A                   | N/A                          | 35.7 *                                  | 51.7                       | N/A               | N/A                    | N/A               |  |
| TX Vertical Opening (%)<br>w/o PE                   | 75.4 81.5                                        |             | 48.5                  | N/A                          | 20 **                                   | 42.3                       | N/A               | N/A                    | N/A               |  |
| TX Horizontal Opening<br>(UI) w/o PE                | 0.89                                             | 0.94        | 0.7                   | N/A                          | 0.63 *                                  | 0.64                       | N/A               | N/A                    | N/A               |  |
| TX Vertical Opening (%)<br>w/ PE & after channel    | 19.5                                             | 15.7        | N/A                   | 18.1                         | 13 **                                   | 65.2                       | 33                | 8.89                   | 7                 |  |
| TX Horizontal Opening<br>(UI) w/ PE & after channel | 0.62                                             | 0.67        | N/A                   | 0.52                         | 0.52 *                                  | 0.69                       | 0.75              | 0.3                    | 0.256             |  |
| Even and Odd Eyes                                   | Sym                                              | metry       | Symmetry              | Symmetry                     | Symmetry                                | Symmetry                   | Asymmetry         | Asymmetry              | Asymmetry         |  |
| Supply Voltage (V)                                  | 1.2/1.5                                          | 1.1/1.5     | N/A                   | 0.9/1.8                      | N/A                                     | 1.2                        | 1.2               | 1.0/1.1/1.25           | 0.9/1.0/1.1       |  |
| Active Area (mm <sup>2</sup> )                      | 0.0                                              | 071         | 0.7 *                 | 0.22                         | 0.048                                   | 0.035                      | 0.0225 *          | 0.13                   | 0.056             |  |
| FOM (mW/Gb/s) 4.6                                   |                                                  | 6.62        | N/A                   | 12.5                         | 1.59                                    | 1.26                       | 22                | 3.1                    | 2.82              |  |

TABLE II Chip summary and benchmark with the state-of-the-art DE TXs.

\* Estimated from plots. \*\* PAM-4 output.



Fig. 20. Measured jitter of all sub-1-UI DE modes (S1-S7) is better than that of 1-UI DE mode (S8) from 4 to 10 Gb/s.

#### A. Near-End Testing

To clearly illustrate the tunability of the sub-1-UI DE on amplitude and time, the eye diagrams at  $D_{out}$  under tuning of 1 and 2 post-cursor taps are captured at a lower data rate of 4 Gb/s (Fig. 17), showing the fine tunability of DE by VDCTRL. The tunability has no effect on DTJ. To validate the effectiveness of the two-step DE embedded 4 post-cursor taps, we measured the data eyes (Fig. 18) at 4 Gb/s under 16 modes (S0-S15) of the sub-1-UI fractional DE. The DE variation with 2 taps (i.e., 1 main tap plus 1 post-cursor tap corresponding to 0.25, 0.5, 0.75 or 1 UI) is consistent with that depicted in Fig. 12(a). When more post-cursor taps are turned on, the DE on the magnitude increases, indicating the loss-compensation improvement.



Fig. 21. Measured HO and VO at 10 Gb/s at different DE modes (S3 to S11) under different channel losses of -12 (CH1) and -20 (CH2) dB. Measured symmetric even and odd data eyes.

As sub-1-UI DE can compensate more than high-frequency portion of the PRBS spectrum, the measured data rising or falling time at  $D_{out}$  shortens by 46.8% [Fig. 19(a)] when comparing our sub-1-UI DE (S1 mode) with no DE (S0 mode). Also, the resultant Pk-to-Pk jitter improves from 10.8 to 8.4 ps [Fig. 19(b)]. Importantly, the problem of the obvious 2-impulse -based DDJ of 1-UI DE can be eliminated in the proposed sub-1-UI DE between 4 to 8 Gb/s (at least being suppressed at 10 Gb/s), as shown in Fig. 20. Comparing with



Fig. 22. (a) DTJ and (b) DE at different DE modes under the simulation at 10 Gb/s and measurement at 4 Gb/s.



Fig. 23. Illustration of different compensations under different DE modes.

the calculated and simulated jitters in Fig. 13, the measured DTJs result from three more factors: 1) the data from the PRBS generator has a Pk-to-Pk jitter of ~5.5 ps [26], 2) the off-chip clock provided by a signal generator having a rms jitter of <300 fs, 3) the noise coupling from the supply and ground as well as the bias voltage, and 4) the delay-and-BW mismatches between the main-tap and post-cursor paths worsen the DTJ, resulting in the obvious 2-impulse-based DDJ, especially for the higher data rate.

## B. Far-End Testing

The HO and VO after the channel measurement at 10 Gb/s (Fig. 21, left) show a similar performance between pure 1-UI DE and sub-1-UI DE when the channel loss is small (–12 dB). Herein, we properly tune VDCTRL to optimize the eye quality (i.e.,  $t_d < 0.25$  UI). Yet, when the channel loss increases to -20 dB, only the sub-1-UI DE can recover a HO of 0.62 UI and a VO of 19.5%, while those of 1-UI DE are completely closed. It means that the sub-1-UI DE can effectively reduce DTJ except the loss compensation. Additionally, we can confirm symmetric even and odd data eyes (Fig. 21, right). Further, all data eyes are closed when DE is off.

#### C. Summary and Comparison

Based on the above analysis and verification, we summarize the useful insight on the fractional DE as follows. Generally, both simulation and measurement under the S8 mode show the largest DTJ [Fig. 22(a)] and the strong DE effect at Nyquist

[Fig. 22(b)]. Entering the fractional modes (S2 and S4) involving the clocked circuits, the compensation effects at different peak frequencies remain, but the DTJ reduces significantly, due to the high-frequency extension with the same DE (Fig. 23). With a small DTJ under S1 mode, the DE effect weakens. However, it only employs the tunable analog delay cell. Thus, the DE modes in "Region I" can extend the internal BW of the pre-driving stage to improve the DTJ, and we utilize the DE modes in "Region II" at the final output driver to compensate the external channel loss. Interestingly, the fractional DE can be viewed as the equivalent BW-extension technique with the occurrence of zero in (2), but is attenuated by the actual pole-induced roll-off. It results from an inverse-phase two-path combination similar to [29], but differing from the conventional AI-based or inductive BW-extension technique. We favor the flexible choice of the combination of the sub-1-UI DE modes to deliver high-quality eyes over the entire data path.

Benchmarking with the state-of-the-art [8], [10], [11], [13], [17], [18] and [20] at different data rates in Table II, our TX incorporating the sub-1-UI fractional DE and CM-based AI techniques results in a better area efficiency (>3.1x)due to CML-based DE topology, while achieving wider tunability of amplitude and delay to suppress the Pk-to-Pk jitter by >3.5x when comparing with [8], [10] and [13]. Although [18] and [20] have better figure-of-merits (FOMs), they primarily suffer from asymmetric data eyes, and the results were measured with only the 2<sup>7</sup>-1 PRBS signal. The proposed fractional DE TX further replaces the CML-based delay stage and latch by the dynamic counterparts to improve FOM significantly. Our fractional DE technique also suits the pulse-amplitude-modulation-4 (PAM-4) and duobinary signals to improve the jitter performance and equalization quality.

# VI. CONCLUSIONS

This paper reported an area-efficient sub-1-UI fractional DE technique based on the tunable and clocked hybrid delay line to effectively compensate the channel loss and improve the jitter performance. The latter can be calculated by a closed-form expression. Fabricated in 65-nm CMOS, the 0.0071-mm<sup>2</sup> 5-tap current-mode TX employing the two-step DE obtains better data-eye opening at both near- and far-end verification, while keeping the even and odd eyes symmetric.

#### Appendix

This Appendix gives the calculation of two step responses,  $g_{all1}(t)$  and  $g_{all2}(t)$ , corresponding to the minimum and maximum boundaries of DDJ shown in Fig. 7, respectively. The general step function  $x_g(t) = A_{ZI}u(-t) + A_{ZS}u(t)$  as we use the input signal for the 1<sup>st</sup>-order RC response, where u(t) is the unit step function.  $g_{RC}(t)$  is the total step response which can be decomposed into the zero-state response  $g_{RCZS}(t)$  and zero-input response  $g_{RCZI}(t)$  as given by,

$$g_{RC}(t) = \underbrace{A_{ZS}\left(1 - e^{-\omega_0 t}\right)u(t)}_{g_{RCZS}(t)} + \underbrace{A_{ZI}e^{-\omega_0 t}u(t)}_{g_{RCZI}(t)}$$
(A1)

We first find the step response  $g_{RC1}(t)$  for  $x_{01}(t)$  with  $A_{ZI} = -1$  and  $A_{ZS} = 1$ , as shown in Fig. 24.

$$g_{RC1}(t) = -1 + 2(1 - e^{-\omega_0 t})u(t)$$
(A2)



Fig. 24. Calculation of the  $1^{st}$ -order RC step response under different inputs: (a)  $x_{01}(t)$  for case I and (b)  $x_{101}(t)$  for Case II.

By manipulating (9) and using (A2), the overall response  $g_{all1}(t)$  across 3-time intervals in the case I [Fig. 7(b)] leads to,

$$g_{all1}(t) = \begin{cases} -(\alpha_0 - \alpha_1), & t < 0\\ \underline{\alpha_0 + \alpha_1 - 2\alpha_0 e^{-\omega_0 t}}, & 0 < t < t_{di} T_B \\ \underline{\alpha_0 - \alpha_1 - 2\alpha_0 e^{-\omega_0 t} + 2\alpha_1 e^{-\omega_0 (t - t_{di} T_B)}}_{\text{To calculate} A_1}, & t_{di} T_B < t \end{cases}$$
(A3)

In case II, a series bits of "...101..." as input  $x_{101}(t)$  can be divided into cascades of "10" and "01", in which the zero occupies a period of T<sub>B</sub>. The above analysis can be repeated for the 1<sup>st</sup>-order RC step response  $g_{RC2}(t)$  for  $x_{101}(t)$  (Fig. 24), expressed y,

$$g_{RC2}(t) = 1 - 2\left(1 - e^{-\omega_0(t+T_B)}\right)u(t+T_B) + 2(1 - e^{-\omega_0 t})u(t)$$
(A4)

By replacing (A4) into (9), the overall response  $g_{all2}(t)$  covering 5-time intervals [Fig. 7(b)] becomes,

$$g_{all2}(t) = \begin{cases} a_0 - a_1, t < -T_B \\ -(a_0 + a_1) + 2a_0 e^{-\omega_0(t+T_B)}, -T_B < t < -(1 - t_{di}) T_B \\ -(a_0 - a_1) + 2(a_0 - a_1 e^{\omega_0 t_{di} T_B}) e^{-\omega_0(t+T_B)}, \\ -(1 - t_{di}) T_B < t < 0 \\ (a_0 + a_1) - 2 \left[ a_0 - \left( a_0 - a_1 e^{\omega_0 t_{di} T_B} \right) e^{-\omega_0 T_B} \right] e^{-\omega_0 t}, \\ g_2(t) \\ 0 < t < t_{di} T_B \\ (a_0 - a_1) - 2 \left( 1 - e^{-\omega_0 T_B} \right) (a_0 - a_1 e^{\omega_0 t_{di} T_B}) e^{-\omega_0 t}, \\ t_{di} T_B < t \end{cases}$$
(A5)

#### REFERENCES

- J. Fan, X. Ye, J. Kim, B. Archambeault, and A. Orlandi, "Signal integrity design for high-speed digital circuits: Progress and directions," *IEEE Trans. Electromagn. Compat.*, vol. 52, no. 2, pp. 392–400, May 2010.
- [2] T.-L. Wu, F. Buesink, and F. Canavero, "Overview of signal integrity and EMC design technologies on PCB: Fundamentals and latest progress," *IEEE Trans. Electromagn. Compat.*, vol. 55, no. 4, pp. 624–638, Aug. 2013.
- [3] Y. Chen, P.-I. Mak, L. Zhang, and Y. Wang, "A 0.002-mm<sup>2</sup> 6.4-mW 10-Gb/s full-rate direct DFE receiver with 59.6% horizontal eye opening under 23.3-dB channel loss at Nyquist frequency," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 12, pp. 3107–3117, Dec. 2014.

- [4] J. F. Bulzacchelli *et al.*, "A 10-Gb/s 5-tap DFE/4-tap FFE transceiver in 90-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2885–2900, Dec. 2006.
- [5] J. F. Buckwalter, M. Meghelli, D. J. Friedman, and A. Hajimiri, "Phase and amplitude pre-emphasis techniques for low-power serial links," *IEEE J. Solid-State Circuits*, vol. 41, no. 6, pp. 1391–1399, Jun. 2006.
- [6] Y. Chen, P.-I. Mak, L. Zhang, H. Qian, and Y. Wang, "Pre-emphasis transmitter (0.007 mm<sup>2</sup>, 8 Gbit/s, 0–14 dB) with improved data zero-crossing accuracy in 65 nm CMOS," *Electron. Lett.*, vol. 49, no. 15, pp. 929–930, May 2013.
- [7] S.-Y. Kao and S.-I. Liu, "A 1.62/2.7-Gb/s adaptive transmitter with two-tap preemphasis using a propagation-time detector," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 57, no. 3, pp. 178–182, Mar. 2010.
- [8] Y.-H. Song and S. Palermo, "A 6-Gbit/s hybrid voltage-mode transmitter with current-mode equalization in 90-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no. 8, pp. 491–495, Aug. 2012.
- [9] W. Bae, G.-S. Jeong, and D.-K. Jeong, "A 1-pJ/bit, 10-Gb/s/ch forwarded-clock transmitter using a resistive feedback inverter-based driver in 65-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 12, pp. 1106–1110, Dec. 2016.
- [10] M. Hekmat *et al.*, "23.3 A 6Gb/s 3-tap FFE transmitter and 5-tap DFE receiver in 65 nm/0.18 μm CMOS for next-generation 8K displays," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Jan./Feb. 2016, pp. 402–403.
- [11] H. Cirit and M. J. Loinaz, "A 10Gb/s half-UI IIR-tap transmitter in 40nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf.*, Feb. 2011, pp. 448–450.
- [12] M. Bichan and A. C. Carusone, "A 6.5 Gb/s backplane transmitter with 6-tap FIR equalizer and variable tap spacing," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2008, pp. 611–614.
- [13] H. Ju, M.-C. Choi, G.-S. Jeong, W. Bae, and D.-K. Jeong, "A 28 Gb/s 1.6 pJ/b PAM-4 transmitter using fractionally spaced 3-Tap FFE and G<sub>m</sub>-Regulated resistive-feedback driver," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 64, no. 12, pp. 1377–1381, Dec. 2017.
- [14] H. Cheng and A. C. Carusone, "A 32/16 Gb/s 4/2-PAM transmitter with PWM pre-emphasis and 1.2 Vpp per side output swing in 0.13μm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf.*, Sep. 2008, pp. 635–638.
- [15] H. Cheng, F. A. Musa, and A. C. Carusone, "A 32/16-Gb/s dual-mode pulsewidth modulation pre-emphasis (PWM-PE) transmitter with 30-dB loss compensation using a high-speed CML design methodology," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 8, pp. 1794–1806, Aug. 2009.
- [16] J.-R. Schrader, E. A. M. Klumperink, J. L. Visschers, and B. Nauta, "Wireline equalization using pulse-width modulation," in *Proc. Custom Integr. Circuits Conf.*, Sep. 2006, pp. 591–598.
- [17] J.-R. Schrader, E. A. M. Klumperink, J. L. Visschers, and B. Nauta, "Pulse-width modulation pre-emphasis applied in a wireline transmitter, achieving 33 dB loss compensation at 5-Gb/s in 0.13-μ/m CMOS," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 990–999, Apr. 2006.
- [18] S. Saxena, R. K. Nandwana, and P. K. Hanumolu, "A 5 Gb/s energy-efficient voltage-mode transmitter using time-based deemphasis," *IEEE J. Solid-State Circuits*, vol. 49, no. 8, pp. 1827–1836, Aug. 2014.
- [19] W.-J. Su and S.-I. Liu, "A 5 Gb/s voltage-mode transmitter using adaptive time-based de-emphasis," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 64, no. 4, pp. 959–968, Apr. 2017.
- [20] A. Ramachandran, A. Natarajan, and T. Anand, "29.4 A 16Gb/s 3.6 pJ/b wireline transceiver with phase domain equalization scheme: Integrated pulse width modulation (iPWM) in 65 nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2017, pp. 488–489.
- [21] T. O. Dickson, H. A. Ainspan, and M. Meghelli, "6.5 A 1.8pJ/b 56Gb/s PAM-4 transmitter with fractionally spaced FFE in 14nm CMOS," in *Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC)*, Feb. 2017, pp. 118–119.
- [22] B. Analui, J. F. Buckwalter, and A. Hajimiri, "Data-dependent jitter in serial communications," *IEEE Trans. Microw. Theory Techn.*, vol. 53, no. 11, pp. 3388–3397, Nov. 2005.
- [23] Y. Chen, P.-I. Mak, C. C. Boon, and R. P. Martins, "A 36-Gb/s 1.3-mW/Gb/s duobinary-signal transmitter exploiting power-efficient cross-quadrature clocking multiplexers with maximized timing margin," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 65, no. 9, pp. 3014–3026, Sep. 2018.
- [24] J. Buckwalter, B. Analui, and A. Hajimiri, "Predicting data-dependent jitter," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 51, no. 9, pp. 453–457, Sep. 2004.

- [25] Y. Chen, P.-I. Mak, H. Yu, C. C. Boon, and R. P. Martins, "An area-efficient and tunable bandwidth-extension technique for a wideband CMOS amplifier handling 50+ Gb/s signaling," *IEEE Trans. Microw. Theory Techn.*, vol. 65, no. 12, pp. 4960–4975, Dec. 2017.
- [26] Y. Chen, P.-I. Mak, and Y. Wang, "A highly-scalable analog equalizer using a tunable and current-reusable for 10-Gb/s I/O Links," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 5, pp. 978–982, May 2015.
- [27] A. Pirola, A. Liscidini, and R. Castello, "Current-mode, WCDMA channel filter with in-band noise shaping," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1770–1780, Sep. 2010.
- [28] Y. Chen, P.-I. Mak, C. C. Boon, and R. P. Martins, "A 27-Gb/s time-interleaved duobinary transmitter achieving 1.44-mW/Gb/s FOM in 65-nm CMOS," *IEEE Microw. Wireless Compon. Lett.*, vol. 27, no. 9, pp. 839–841, Sep. 2017.
- [29] M. Erett et al., "A 126mW 56Gb/s NRZ wireline transceiver for synchronous short-reach applications in 16 nm FinFET," in Proc. IEEE Int. Solid-State Circuits Conf. (ISSCC), Feb. 2018, pp. 274–276.



Yong Chen (S'10–M'11) received the B.Eng. degree in electronic and information engineering from the Communication University of China (CUC), Beijing, China, in 2005, and the Ph.D. Engineering degree in microelectronics and solid-state electronics from the Institute of Microelectronics, Chinese Academy of Sciences (IMECAS), Beijing, in 2010.

He is currently an Assistant Professor with the State Key Laboratory of Analog and Mixed-Signal VLSI (AMSV), University of Macau, Macao, China. His research interests include analog/biomedical

detection and RF integrated circuits, mm-wave system and circuits, and high-speed on-chip and chip-to-chip electrical/optical Interconnects.



**Pui-In Mak** (S'00–M'08–SM'11–F'19) received the Ph.D. degree from the University of Macau (UM), Macau, China, in 2006.

He is currently a Full Professor with the Faculty of Science and Technology–ECE, UM, and the Associate Director (Research) of the State Key Laboratory of Analog and Mixed-Signal VLSI, UM. His research interests include analog and radio-frequency (RF) circuits and systems for wireless and multidisciplinary innovations.

Prof. Mak has been an elected Overseas Expert of the Chinese Academy of Sciences since 2018. He is a fellow of the U.K. Institution of Engineering and Technology (IET) for contributions to engineering research, education, and services, since 2018, and a fellow of the IEEE for contributions to radio-frequency and analog circuits since 2019. He has been serving /served as the Chair of the Distinguished Lecturer Program since 2018. He was the Member of the Board-of-Governors from of the IEEE Circuits and Systems Society (CASS) from 2009 to 2011. He was the Distinguished Lecturer of both the IEEE Circuits and Systems Society from 2014 to 2015 and the IEEE Solid-State Circuits Society from 2017 to 2018. He was the TPC Vice Chair of ASP-DAC in 2016. His involvements with the IEEE include an Editorial Board Member of the IEEE Press from 2014 to 2016, the Member of the Board-of-Governors of the IEEE Circuits and Systems Society from 2009 to 2011, a Senior Editor of the IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS from 2014 to 2015, an of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I from 2010 to 2011 and from 2014 to 2015, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2010 to 2013. He has been an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS since 2018 and the IEEE SOLID-STATE CIRCUITS LETTERS since 2018.



Zunsong Yang received the B.E. degree in microelectronics from Qingdao University, Qingdao, China, in 2014, and the M.S. degree in electronics and communications engineering from the Institute of Microelectronics, Chinese Academy of Sciences (IMECAS), Beijing, China, in 2017. He is currently pursuing the Ph.D. degree with the University of Macau (UM), Macao, China.

His current research interests include high-speed IC circuits, specializing in the phase-locked loops and wireline transmitters.



Chirn Chye Boon (M'09–SM'10) received the B.E. (Hons.) and Ph.D. degrees in electrical engineering from Nanyang Technological University (NTU), Singapore, in 2000 and 2004, respectively.

He was with Advanced RFIC, NTU, where he was a Senior Engineer. Since 2005, he has been with NTU, where he is currently an Associate Professor. He is involved in radio frequency and mm-wave circuits and systems design for biomedical and communications applications. He has conceptualized, designed, and verified 80 silicon circuits/chips for

biomedical and communication applications. Since 2010, he has been the Program Director of RF and mm-wave research in the S\$50 million research center of excellence, VIRTUS, NTU. He has authored over 100 refereed publications in the fields of RF and mm-waves. He has authored the book *Design of CMOS RF Integrated Circuits and Systems* (2010).



**Rui P. Martins** (M'88–SM'99–F'08) born in 1957. He received the bachelor's, the master's, the Ph.D., and the Habilitation degrees for Full-Professor in electrical engineering and computers from the Department of Electrical and Computer Engineering, Instituto Superior Técnico (IST), TU of Lisbon, Portugal, in 1980, 1985, 1992, and 2001, respectively.

He has been with the Department of Electrical and Computer Engineering (DECE)/IST, TU of Lisbon, since 1980. Since 1992, he has been on leave from IST, TU of Lisbon (now University of Lisbon since

2013). He is also with the Department of Electrical and Computer Engineering, Faculty of Science and Technology (FST), University of Macau (UM), Macao, China, where he has been a Chair-Professor since2013. In FST, he was the Dean of the Faculty from 1994 to 1997, and he has been the Vice-Rector of the University of Macau since 1997. Since 2008, after the reform of the UM Charter, he was nominated after open international recruitment, and reappointed (in 2013), as the Vice-Rector (Research) until 2018. Within the scope of his teaching and research activities, he has taught 21 bachelor's and master's courses, and he has supervised (or co-supervised) 40 theses (19 Ph.D. and 21 master's theses) with UM. He has co-authored six books and nine book chapters. He holds 18 patents, USA (16) and Taiwan (2). He has authored or coauthored 377 papers in scientific journals (111) and in conference proceedings (266), and other 60 academic works, in a total of 470 publications. He was the Co-Founder of Chipidea Microelectronics (Macao) [now Synopsys] in 2001/2002. He has created the Analog and Mixed-Signal VLSI Research Laboratory, UM, in 2003, elevated in 2011 to the State Key Laboratory of China (the first in Engineering in Macao), being its Founding Director.

Dr. Martins was the Founding Chair of the IEEE Macau Section from 2003 to 2005, the IEEE Macau Joint-Chapter on Circuits and Systems (CAS)/Communications (COM) from 2005 to 2008, and the World Chapter of the Year of the IEEE CAS Society (CASS) in 2009. He was the General Chair of the 2008 IEEE Asia-Pacific Conference on CAS-APCCAS 2008 and the Vice-President for Region 10 (Asia, Australia, and Pacific) of the IEEE CASS from 2009 to 2011. Since 2011, he has been the Vice-President of (World) Regional Activities and Membership of the IEEE CASS from 2012 to 2013. He was a member of the IEEE CASS Fellow Evaluation Committee in 2013 and 2014, and the CAS Society Representative of the Nominating Committee, for the election in 2014, the Division I (CASS/EDS/SSCS)-Director of the IEEE. He was the General Chair of the ACM/IEEE Asia South Pacific Design Automation Conference-ASP-DAC 2016. He was a Nominations Committee Member in 2016. He is currently the Chair of the IEEE Fellow Evaluation Committee (class of 2018), both of the IEEE CASS. He was a recipient of two government decorations: the Medal of Professional Merit from Macao Government (Portuguese Administration) in 1999 and the Honorary Title of Value from Macao SAR Government (Chinese Administration) in 2001. In 2010, he was elected unanimously as the Corresponding Member of the Portuguese Academy of Sciences (in Lisbon), being the only Portuguese Academician living in Asia. He was an Associate Editor of the IEEE TRANSACTIONS ON CAS II: EXPRESS BRIEFS from 2010 to 2013, nominated as the Best Associate Editor of TCAS II from 2012 to 2013.