# A 40-Gb/s PAM-4 Transmitter Using a 0.16-pJ/bit SST-CML-Hybrid (SCH) Output Driver and a Hybrid-Path 3-Tap FFE Scheme in 28-nm CMOS

Chao Fan, Student Member, IEEE, Wei-Han Yu<sup>®</sup>, Student Member, IEEE, Pui-In Mak<sup>®</sup>, Fellow, IEEE, and Rui P. Martins<sup>®</sup>, Fellow, IEEE

Abstract—This paper proposes an SST-CML-Hybrid (SCH) output driver, and its corresponding hybrid-path feed-forward equalization (FFE) scheme, to enhance the energy efficiency of a PAM-4 transmitter (TX). Specifically, the SCH driver features one SST branch + one CML branch to co-synthesize the PAM-4 data, reducing substantially the signaling power, switching power and equalization power. The PAM-4 TX further integrates a halfrate serializer with 4-bit 3-tap FFE, duty-cycle correction circuits and a T-coil output matching network. Prototyped in 28-nm CMOS, the PAM-4 TX achieves a broadband return loss <-10dB up to 50 GHz, and occupies a compact die area of  $0.0345 \text{ mm}^2$ . Operating at 40 Gb/s and at a 0.9-V supply, the TX dissipates 19.5 mW, of which 6.4 mW is due to the SCH driver. The corresponding energy efficiencies are 0.16 and 0.5 pJ/bit for the SCH driver and TX, respectively; both compare favorably with the prior art.

*Index Terms*—CMOS, current-mode-logic (CML) driver, feed-forward equalization (FFE), four-level pulse-amplitude modulation (PAM-4), source-series-terminated (SST) driver, SST-CML-Hybrid (SCH) driver, transmitter (TX).

#### I. INTRODUCTION

ASSIVE data traffic in the communication infrastructure such as cloud servers unceasingly pushes the speed of serial links. The four-level pulse-amplitude modulation (PAM-4) signaling exhibits a higher spectral efficiency than its non-return-to-zero (NRZ) counterpart. The bandwidth advantage of PAM-4 data offers the prospect of improving both the data rate and energy efficiency of advanced I/O interfaces

Manuscript received April 3, 2019; revised June 23, 2019 and July 15, 2019; accepted August 15, 2019. Date of publication September 12, 2019; date of current version December 6, 2019. This work was supported in part by the University of Macau Fund under Grant MYRG2017-00223-AMSV, in part by the Macao Science and Technology Development Fund (FDCT) - SKL Fund, and in part by the China Ministry of Science and Technology under Grant EF001/AMSV/2018/MOST. This article was recommended by Associate Editor D. Zito. (*Corresponding author: Pui-In Mak.*)

C. Fan, W.-H. Yu, and P.-I. Mak are with the State-Key Laboratory of Analog and Mixed-Signal VLSI and Faculty of Science and Technology, Department of ECE, University of Macau, Macao, China (e-mail: pimak@umac.mo).

R. P. Martins is with the State-Key Laboratory of Analog and Mixed-Signal VLSI and Faculty of Science and Technology, Department of ECE, University of Macau, Macao, China, and also with the Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal (e-mail: rmartins@umac.mo).

Color versions of one or more of the figures in this article are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2019.2936226

under the CEI-56G or IEEE802.3bs standard [1]. Still, it is challenging to preserve the signal integrity of PAM-4 signaling when traveling at tens-of-gigabits per second. Hence, for a PAM-4 transmitter (TX), its output driver normally dominates the overall power budget, as it has to deliver a high-swing voltage output with low distortion, while preserving backresistance termination (i.e. the impedance looking back into the TX), which should match with the channel impedance (e.g. 50  $\Omega$ ). Meanwhile, to compensate for the frequencydependent channel loss, the output driver has to adopt feedforward equalization (FFE). The total power consumption of an output driver can be decomposed into three parts: signaling power P<sub>sig</sub> (static power overhead for symbol level generation), switching power Psw (dynamic power consumption for inverter-based SST cells) and FFE power PEO. The currentmode-logic (CML) and source-series-terminated (SST) drivers are the common topologies. The SST driver, also known as voltage-mode (VM) driver [2]-[18], can be four times more power-efficient than its CML counterpart in terms of P<sub>sig</sub> [19]–[26], while offering an output swing up to railto-rail differentially with high linearity. Nevertheless, when the SST driver is configured into multiple segments to support FFE, substantial parasitic capacitance is incurred at the terminated node, penalizing the P<sub>sw</sub> especially at high data rates. Also, the TX has to keep a constant back-resistance termination. For the P<sub>EQ</sub>, it raises with the large FFE settings as well, since the de-emphasis paths operate between the supply rails [4]. The data-dependent current injected into the supply rails (i.e. voltage ripples) affects the data-dependent jitter performance of the TX [6]. The typical VM PAM-4 TX exacerbates such drawbacks owing to the dual-SST topology of its output driver.

There are other SST driver topologies aiming to pursue a better energy efficiency, and their emphasis are on a constant  $P_{EQ}$  [3]–[7]. For [3]–[5], they incorporate shunt slices between the differential outputs to realize a constant impedance, while mitigating the data-dependent supply current fluctuation when embedding the FFE. Yet, the large-sized switches in shunt slices lead to extra pre-driver power penalty. [5] adopts smaller switches to improve the pre-driver energy efficiency with an extra shunt transistor, but this shunt slice replica calls

1549-8328 © 2019 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



 TABLE I

 Comparison of Different PAM-4 Output Drivers

#  $\alpha$  is the equalization tap coefficient \* RLM =  $3V_{min} / V_{pp}$ 

for a calibration loop to correct the shunt slice impedance. In [6], it proposes finite-impulse-response (FIR) compensation circuits to uphold a constant current drawn from the supply, avoiding the perturbation in the regulator voltage. Regrettably, this approach leads to an extra 15% power penalty. In [7], the VM TX with current-mode equalization decouples the impedance matching of the SST driver with the equalization that simplifies the complexity of the segmented SST drivers and eliminates the PEQ. Another alternative [16], [18] is to employ pulse-width-modulation (PWM) for channel loss compensation, which avoids amplitude modulation for FFE. These VM architectures mainly focus on the P<sub>EO</sub> reduction for NRZ signaling, but are not suitable for PAM-4 modulation. In [10], it shows a hybrid-mode PAM-4 driver for output-swing enhancement by adding the CML branches. Nevertheless, constrained by the dual-SST topology, none of the existing works are capable of alleviating the fundamental Psig and Psw penalty. To surmount such power bottlenecks, this work proposes an SST-CML-Hybrid (SCH) PAM-4 driver, and its corresponding hybrid-path FFE scheme, to merge the merits of SST and CML drivers into a high-speed PAM-4 TX for a better energy efficiency.

After the Introduction, Section II overviews the existing and proposed PAM-4 driver topologies. Section III details our hybrid-path FFE scheme, hybrid PAM-4 FFE encoder, and derives analytically the equalization power  $P_{EQ}$ . Section IV presents the TX architecture and details its implementation. Section V reports the experimental results. Finally, we draw the conclusions in Section VI.

#### **II. PAM-4 DRIVER TOPOLOGIES**

This Section focuses on the power consumption of four types of PAM-4 TX drivers: CML, SST, output-swingenhanced SST and the proposed SCH. The linearity of a PAM-4 TX is characterized by the ratio of level mismatch (RLM), which is defined as three times of the smallest PAM-4 eye ( $V_{min}$ ) divided by the output peak-to-peak eye height ( $V_{pp}$ ) [1]. Also, we analyze the output swing and RLM in the PAM-4 modulation. Table I exhibits a comparative summary of the following discussion.

# A. CML PAM-4 Driver

The left column of Table I reports the CML PAM-4 driver [20]–[24], in which the tail current source in the MSB path is 2x-upscaled with respect to the LSB path to obtain four equally-spaced PAM-4 symbol levels. The loading resistor  $R_T$  directly serves for impedance matching with the channel impedance  $R_L$ . The FFE is implemented by splitting the tail

current source into several independently-controlled segmentation in parallel with the FIR filter settings. The total current drawn from the supply ( $V_{DD}$ ) remains constant, and there is no extra FFE power  $P_{EQ}$  [4]. The output swing is subjected to the CML tail current source, and the voltage efficiency is limited by the saturation voltage ( $V_{tail}$ ) of the tail current source. It can be shown that the voltage efficiency of CML driver is given by  $3I_SR_T/V_{DD}$ . The RLM of the CML driver can be limited by distortion that occurs when the tail transistor enters the triode region, or there is a current mismatch between the MSB and LSB segments. Even with an elevated  $V_{DD}$ , the recent CML PAM-4 drivers still show a limited RLM value (observed from their measurement results) [21]–[23].

## B. Dual-SST & Output-Swing Enhancement PAM-4 Driver

The VM PAM-4 TX (Table I, middle column) comprises the dual-SST drivers fed by the MSB and LSB input data to encode the PAM-4 symbol levels [10]–[14]. Typically, the equivalent impedance in the MSB path is  $1.5R_L$  (75  $\Omega$ ), and the value in the LSB path is doubled to  $3R_L$  (150  $\Omega$ ), yielding  $1.5R_L||3R_L = R_L = 50 \Omega$  for back-resistance termination. The average signaling power is  $(13V_{DD}^2/36R_L)$ if the four symbol levels appear with uniform probabilities, which is theoretically  $\sim 4x$  lower than its CML counterpart. Still, the SST driver can suffer from large Psw and PEQ penalty at a high data rate, offsetting the advantage of P<sub>sig</sub>. By labeling C<sub>L</sub> and C<sub>M</sub> as the parasitic capacitance at the switching-FET terminated node, the  $P_{sw}$  goes up since  $P_{sw} = f(C_L + C_M)V_{DD}^2$ , where f denotes the operating frequency at which CL and C<sub>M</sub> charge between 0 to V<sub>DD</sub>. Also, the dual de-emphasis paths induce substantial FFE power PEQ (to be elaborated in Section III), where  $\alpha$  is the tap weight coefficient,

$$P_{EQ} = \frac{V_{DD}^2}{R_L} \frac{13 + 10\alpha - 10\alpha^2}{36} \tag{1}$$

The SST driver ideally can achieve an output swing up to railto-rail differentially, and its linearity is inherently better than the CML driver for its VM operation. The on-state resistance of the switching-FET ( $R_{on}$ ) governs the output impedance of the SST driver, in combination with the series-termination resistor ( $R_T$ ). Since  $R_{on}$  is nonlinear and more sensitive to PVT variations,  $R_T$  should dominate the output impedance for a better signal linearity. The RLM can be >96% with  $R_T/R_{on} > 1$  under PVT variations according to [10].

The SST PAM-4 driver with output-swing enhancement is reported in [10]. It features auxiliary dual-CML branches at the output node to boost the output swing (thereby eye opening). The voltage efficiency is enhanced by 30% at the expense of extra V<sub>DD</sub>I<sub>S</sub> power consumption. Thus, despite its high output swing, its energy efficiency is limited to 2.6 pJ/bit. Also, under a 1-V V<sub>DD</sub>, the CML branch leads to RLM distortion. Thus, a replica-based bias calibration loop is entailed to keep the RLM >94%, which raises the power and area overheads.

# C. Proposed SCH PAM-4 Driver

The above discussion shows that the SST PAM-4 driver has the advantage of low  $P_{sig}$ , high linearity and voltage



Fig. 1. Simulated power consumption of the typical "Dual-SST" PAM-4 driver (Table I, middle column, without output-swing enhancement) and the proposed SCH PAM-4 driver (Table I, right column).

efficiency, but the CML driver is valuable for its zero  $P_{sw}$ and PEO. To merge their benefits, we propose an SCH PAM-4 driver and its corresponding hybrid-path FFE scheme (to be detailed in the next section) to realize an energy-efficient PAM-4 TX. As shown in Table I (right column), by adding the shunt resistor  $(2 \times 1.5 R_L)$  between the differential outputs, our SCH driver features one SST branch + one CML branch to co-synthesize the PAM-4 data. The impedance of the SST branch is 3R<sub>L</sub>. Together with the shunt resistor 1.5R<sub>L</sub>, backresistance termination can be achieved. The current flowing in the CML branch is set to  $V_{DD}/3R_L$  for four equallyspaced symbol levels. When the driver is transmitting data (MSB, LSB) = (0,1) or (1,0), the CML branch is disabled, and the inner two symbol levels are obtained. The signaling power consumption is  $5V_{DD}^2/36R_L$ . On the other hand, when the CML part is activated, the SCH driver is delivering the data (MSB, LSB) = (0,0) or (1,1) and the signaling power overhead is  $15V_{DD}^2/36R_L$ . If the four symbol levels happen with uniform probabilities, the signaling power consumption is  $10V_{DD}^2/36R_L$  on average, which offers  $\sim 23\%$  reduction of Psig when compared with the typical dual-SST PAM-4 driver [11]–[14].

With only one SST branch fed by the MSB data, we eliminate the parasitic capacitance of the SST branch dedicated to the LSB data, then reducing  $P_{sw}$  to  $2fC_M V_{DD}^2$ . Moreover, the added CML branch only draws a constant current of  $I_{ini} =$  $V_{DD}/3R_L$  when employed to minimize  $P_{sw}$  at high data rates. On the other hand, the SCH PAM-4 driver involves only one de-emphasis path in the SST branch, while the CML branch upholds a constant current when embedding FFE. Thus, PEO is held back also under large FFE settings. For an equalization tap coefficient  $\alpha$  of 0.25 (i.e. 6-dB equalization), our SCH PAM-4 driver together with the hybrid-path FFE scheme can save up to  $\sim 28\%$  of P<sub>EO</sub> when compared with the typical dual-SST PAM-4 driver. Assuming that the parasitic capacitance of  $C_L$  and  $C_M$  have a value of 50 fF, we can simulate the power consumption of the two driver topologies against the data rate as shown in Fig. 1, regardless of FFE. Our SCH PAM-4 driver offers not only 23% reduction of Psig, but also has the benefit of less P<sub>sw</sub> (shaded area) especially at high data rates.



Fig. 2. PAM-4 FFE: (a) Conventional dual-path scheme. (b) Proposed hybrid-path scheme.

# III. PROPOSED SCH PAM-4 DRIVER WITH EQUALIZATION

# A. Hybrid-Path FFE Scheme and Implementation of the SCH Driver

The frequency-dependent channel loss is compensated by the de-emphasis-based FFE at the TX side in the form of an FIR filter. For the typical dual-path 3-tap PAM-4 FFE scheme [Fig. 2(a)], the MSB and LSB data streams are transmitted through two identical FIR filters. The two filtered data streams are summed up at the output node and the weight is set to 2:1. Thus, the output  $Y_{PAM-4}$  (k) can be expressed as (2), where  $\alpha_i$  denotes the tap weight coefficients.

$$Y_{PAM-4}(k) = \frac{2}{3} \left[ \alpha_0 M(k) - \alpha_{-1} M(k+1) - \alpha_1 M(k-1) \right] + \frac{1}{3} \left[ \alpha_0 L(k) - \alpha_{-1} L(k+1) - \alpha_1 L(k-1) \right]$$
(2)

Fig. 2(b) presents the proposed hybrid-path 3-tap PAM-4 FFE scheme suitable for the SCH driver topology, in which the FFE is implemented by the independent SST and CML branches simultaneously. The SST branch is fed by the weighted MSB data directly, while the CML branch is fed by weighted hybrid data H<sub>i</sub> encoded by MSB and LSB data through the hybrid PAM-4 FFE encoder. The tap weight in two separated branches remains identical.

Fig. 3 shows the configuration of our SCH driver with 3-tap FFE that includes a total number of n (n = i+k+j) SCH driver segments in parallel, where i, k, j segments are linearly related to the tap coefficients  $\alpha_{-1}$ ,  $\alpha_0$ ,  $\alpha_1$ , attributed to pre, main and post taps, respectively. The FFE resolution determines the number of SCH segments. Thus, the FFE provided for channel



Fig. 3. Proposed SCH output driver with 3-tap FFE.

loss compensation is about  $-20\log[(k-i-j)/n]$  at Nyquist frequency. In the SST branch, the main-tap segments are routed with the incoming data M<sub>0</sub>, while the pre-/post-tap segments are fed with the inverse anticipated/subsequent data M<sub>-1</sub> and M<sub>1</sub>. The impedance of a single SST segment is set to 3nR<sub>L</sub> to secure impedance matching. Similarly, the CML segments dedicated to different taps are driven by the corresponding hybrid data H<sub>i</sub> produced by one set of hybrid FFE encoder and the current injection in a single CML segment is set to V<sub>DD</sub>/3nR<sub>L</sub>. The parasitic capacitance introduced by segmented CML branch is ~50 fF nearly to the SST branch, which will be compensated by the T-coil network (to be detailed later).

# B. Hybrid PAM-4 FFE Encoder

For PAM-4 modulation with equalization, n-tap FFE results in 4<sup>n</sup> possible separated symbol levels [27]. The 3-tap FFE encoder for hybrid data H<sub>i</sub> is designed by studying and summarizing the 64 possible combinations of 3 sequential PAM-4 data (i.e. M<sub>i</sub> and L<sub>i</sub>). Table II presents partially the output  $Y_{PAM-4}$  obtained from (2). The coefficients  $\alpha_i$  of Y<sub>PAM-4</sub> determine the total current injection of different input data sequences. If coefficients of  $\alpha_i$  are  $\pm 1$ , the corresponding H<sub>iP</sub> or H<sub>iN</sub> in the CML branch are switched on. Otherwise, H<sub>iP</sub> and H<sub>iN</sub> are both switched off and no current is injected into the output node. Deriving from the 64 input data sequence and the corresponding encoded hybrid data  $H_{iP}$  and  $H_{iN}$ , the 3-tap hybrid PAM-4 FFE encoder can be obtained as a simple AND gate and NOR gate. With a 20-Gb/s input data and a 0.9-V supply voltage, the PAM-4 FFE logic contributes low power consumption of <0.1 mW according to the simulation results. Similarly, by repeating the encoder logic in pre and post taps with the corresponding input data, we can develop a hybrid PAM-4 FFE encoder with more taps.

# C. FFE Power $P_{EQ}$ Analysis

Herein we conduct the analysis of  $P_{EQ}$  for the typical and proposed PAM-4 drivers. The pre-/post-tap coefficient  $\alpha$  is set to m/n by assigning m units to the pre/post taps among the n SST segments; with the rest routed to the main tap. Using the Thévenin equivalent circuit from Fig. 4, we can derive the formula of  $P_{EQ}$  versus  $\alpha$ .

| 8 Encoding Examples of the Proposed Hybrid-Path 3-Tap PAM-4 FFE     |                                                 |                                            |                                                                     |                                                    |                                                     |                                                   |                                               |                                                     |  |  |  |  |
|---------------------------------------------------------------------|-------------------------------------------------|--------------------------------------------|---------------------------------------------------------------------|----------------------------------------------------|-----------------------------------------------------|---------------------------------------------------|-----------------------------------------------|-----------------------------------------------------|--|--|--|--|
| MSB Input:<br>M-1M0M1                                               | 101                                             | 101                                        | 0 0 1                                                               | 1 0 0                                              | 0 0 1                                               | 1 0 0                                             | 0 0 1                                         | 0 0 0                                               |  |  |  |  |
| LSB Input:<br>L-1L0L1                                               | 1 0 1                                           | 0 0 1                                      | 0 0 1                                                               | 1 0 1                                              | 0 0 0                                               | 0 0 0                                             | 1 1 0                                         | 0 1 0                                               |  |  |  |  |
| Output:<br>Y <sub>PAM-4</sub>                                       | α <sub>-1</sub> +α <sub>0</sub> +α <sub>1</sub> | $\frac{1}{3}\alpha_{.1}+\alpha_0+\alpha_1$ | -α <sub>-1</sub> +α <sub>0</sub> +α <sub>1</sub>                    | $\alpha_{-1}+\alpha_0-\frac{1}{3}\alpha_1$         | $-\alpha_{-1}+\alpha_0+\frac{1}{3}\alpha_1$         | $\frac{1}{3}\alpha_{-1}+\alpha_0-\alpha_1$        | $\frac{1}{3}(-\alpha_{-1}+\alpha_0+\alpha_1)$ | $-\alpha_{-1}+\frac{1}{3}\alpha_{0}-\alpha_{1}$     |  |  |  |  |
| Current<br>Injection                                                | linj                                            | (α₀+α₁)l <sub>inj</sub>                    | (-α <sub>-1</sub> +α <sub>0</sub> +α <sub>1</sub> )l <sub>inj</sub> | (α <sub>-1</sub> +α <sub>0</sub> )l <sub>inj</sub> | (-α <sub>-1</sub> +α <sub>0</sub> )l <sub>inj</sub> | (α <sub>0</sub> -α <sub>1</sub> )l <sub>inj</sub> | 0                                             | (-α <sub>-1</sub> -α <sub>1</sub> )l <sub>inj</sub> |  |  |  |  |
| Encoded Output:<br>H <sub>-1P</sub> H <sub>0P</sub> H <sub>1P</sub> | 1 1 1                                           | 0 1 1                                      | 0 1 1                                                               | 1 1 0                                              | 0 1 0                                               | 0 1 0                                             | 0 0 0                                         | 0 0 0                                               |  |  |  |  |
| $H_{-1N}H_{0N}H_{1N}$                                               | 0 0 0                                           | 0 0 0                                      | 1 0 0                                                               | 0 0 0                                              | 1 0 0                                               | 0 0 1                                             | 0 0 0                                         | 1 0 1                                               |  |  |  |  |
| 3-Tap<br>PAM-4<br>FFE<br>Encoder                                    | L.<br>M.                                        |                                            | • H.₁p<br>• H.₁N                                                    |                                                    |                                                     | L₁ ⊶<br>M₁ ⊶                                      |                                               |                                                     |  |  |  |  |

TABLE II Examples of Hybrid PAM-4 FFE Encoder



Fig. 4. Thévenin equivalent circuit of the output driver.



Fig. 5. Conceptual equivalent circuit of the dual-SST PAM-4 driver with FFE (a) Data pattern (0,0) or (1,1). (b) Data pattern (0,1) or (1,0).

1) Dual-SST PAM-4 Driver: Depending on the input data sequence, the Thévenin equivalent circuit of the dual-SST PAM-4 driver with FFE can have two topologies (Fig. 5), where  $G_0$  denotes the 50- $\Omega$  load conductance. Fig. 5(a) shows the topology for the input data patterns (0,0) or (1,1) and Fig. 5(b) the topology for the input data patterns (0,1) or (1,0). The Thév- enin equivalent  $G_{eq}$  is  $G_0/2$ . With the load impedance  $G_L = G_0/2$ , the differential output swing can be expressed as,

$$V_{OUT} = \frac{G_{eq}}{G_{eq} + G_L} = \frac{V_{eq}}{2}$$
(3)

In Fig. 5(a), the Thévenin equivalent voltage  $V_{eq}$  is given by,

$$V_{eq} = V_p - V_N = V_{DD} (1 - 2\alpha)$$
(4)

Therefore Vout is given by,

$$V_{OUT} = \frac{V_{eq}}{2} = \frac{V_{DD}}{2} (1 - 2\alpha)$$
(5)

Considering the output common mode fixed at  $V_{DD}/2$ , the equivalent differential output nodes will be,

$$V_{OUT}^{+} = \frac{V_{DD}}{2} + \frac{V_{OUT}}{2}$$
(6)

$$V_{OUT}^{-} = \frac{V_{DD}}{2} - \frac{V_{OUT}}{2}$$
(7)

Applying the results above, the  $P_{EQ}$  of dual-SST PAM-4 driver with symbol levels (0,0) or (1,1) can be,

$$P_{EQ,SST(a)} = V_{DD}G_0[(V_{DD} - V_{OUT}^+)(1 - \alpha) + (V_{DD} - V_{OUT}^-)\alpha]$$
  
=  $G_0 V_{DD}^2 \frac{1 + 4\alpha - 4\alpha^2}{4}$  (8)

The same derivation can be applied to the  $P_{EQ}$  evaluation in Fig. 5(b), which will become,

$$P_{EQ,SST(b)} = G_0 V_{DD}^2 \frac{17 + 4\alpha - 4\alpha^2}{36}$$
(9)

The signaling power without FFE is simply  $P_{DC,SST} = 13G_0V_{DD}^2/18$ . The circuit topology varies with respect to the input data sequence. If the PAM-4 symbol levels appear with equal probabilities and a half case with FFE, the total power consumption  $P_{EQ,SST}$  will result in,

$$P_{EQ,SST} = \frac{1}{4} \left( P_{EQ,SST(a)} + P_{EQ,SST(b)} + P_{DC,SST} \right)$$
$$= G_0 V_{DD}^2 \frac{13 + 10\alpha - 10\alpha^2}{36}$$
(10)



Fig. 6. Conceptual equivalent circuit of the SCH PAM-4 driver with FFE (a) Data pattern (0,1) or (1,0). (b) Data pattern (0,0) or (1,1).



Fig. 7. Simulated and calculated driver power consumption as a function of the equalization coefficient  $\alpha$ .

2) SCH PAM-4 Driver: The foregoing  $P_{EQ}$  analysis can be extended to the proposed SCH PAM-4 driver, with the corresponding equivalent circuit illustrated in Fig. 6. Because of the conductance  $G_0/3$  in the SST branch, the Thévenin equivalent  $G_{eq}$  is equal to  $G_0/6$ . With the shunt resistor  $G_0/3$  between the differential output and the load conductance  $G_L = 5G_0/6$ , the differential output will be,

$$V_{OUT} = \frac{G_{eq}}{G_{eq} + G_L} = \frac{V_{eq}}{6} \tag{11}$$

From Fig. 6(a), the  $P_{EQ}$  of SCH driver with the symbol levels (0,1) or (1,0) is,

$$P_{EQ,SCH(a)} = G_0 V_{DD}^2 \frac{5 + 4\alpha - 4\alpha^2}{36}$$
(12)

and from Fig. 6(b), with a constant current in the CML branch of  $G_0V_{DD}/3$ , the  $P_{EQ}$  of SCH driver with the symbol levels (0,0) or (1,1) will be,

$$P_{EQ,SCH(b)} = G_0 V_{DD}^2 \frac{3 + 10\alpha - 8\alpha^2}{36} + \frac{G_0 V_{DD}^2}{3}$$
(13)

The signaling power without FFE is  $P_{DC,SCH} = 10G_0V_{DD}^2/18$ leading to a total FFE power consumption of the SCH PAM-4 driver  $P_{EO,SCH}$ ,

$$P_{EQ,SCH} = \frac{1}{4} \left( P_{EQ,SCH(a)} + P_{EQ,SCH(b)} + P_{DC,SCH} \right)$$
$$= G_0 V_{DD}^2 \frac{20 + 7\alpha - 6\alpha^2}{72} (14)$$
(14)

We carried out simulations with 20-Gb/s input data streams at a 0.9-V  $V_{DD}$ . Fig. 7 plots the derived formula  $P_{EO}$  which is

consistent with the simulated power consumption as a function of different equalization coefficients. When increasing the FFE settings,  $P_{EQ}$  is held back for the proposed SCH PAM-4 driver. As an example, with a tap coefficient  $\alpha$  of 0.25 (6-dB equalization), our SCH driver and its corresponding hybridpath FFE scheme together can save up to 28% of  $P_{EQ}$  when compared with the conventional dual-SST PAM-4 driver.

#### IV. PAM-4 TX IMPLEMENTATION

Fig. 8 depicts the block diagram of the proposed PAM-4 TX. A resistor-feedback AC-coupled inverter converts the incoming differential clocks into rail-to-rail CMOS levels. We insert the cross-coupled inverters located between the differential clock buffer chains to sharpen the clock transition edge. The half-rate TX is sensitive to clock duty-cycle distortion, then, the clock distribution for data serialization is aided by duty-cycle correction (DCC) circuits. Based on a hybrid-path FFE scheme, we employ a 3-tap FFE with the proposed SCH PAM-4 driver for channel loss compensation. The shunt resistors  $(2 \times 1.5R_L)$  together with SST resistance  $(3R_L)$ provide back-resistance termination and the shunt resistors are binary-weighted. We detail the circuit implementation of the key building blocks below.

#### A. SCH PAM-4 Driver

Fig. 9 details the proposed SCH PAM-4 driver segmented into 50 sub-units to support the 3-tap FFE with a tap resolution of  $\sim 0.02$ . Each pre/post tap consists of 15 segments (i.e. equalization is up to 8 dB), with the rest dedicated to the main tap. The segments in the pre/post tap are binary-weighted and the tap-control word  $C_i[0:3]$  routed to the multiplexers is for FFE tap-coefficient adjustment. The SST segments dedicated to the main tap can be programmable from 12 to 27 segments through the binary-weighted impedance controlled word C<sub>imp</sub>[0:3] that can be slightly altered for impedance calibration. The NAND and NOR logic gates preceding the SST branch determine which segment is involved, or in the highimpedance mode to achieve the segments tunability [6]. Taking a zoom-in capture of a pre-tap segment as an example, our SCH driver core consists of an SST branch and a CML branch separately. For a 0.9-V V<sub>DD</sub>, we set the total current in the CML branch to 6 mA for four equally-spaced symbol levels. On the other hand, we set the current source in the relative segment to  $m \times 120 \ \mu A$ , and the impedance of the relative SST branch to  $(50/m) \times 3R_L$ , where m denotes the weight of the segment. To realize the tap coefficient adjustment, a set of 2:1 multiplexers switched by the tap-control word  $C_{-1}$  <0> select whether the main-tap data  $(M_0/H_0)$  or the inversed pre-tap data  $(M_{-1}/H_{-1})$  fed to the SCH driver. The binary-weighted posttap segments are configured in the same way with the input data  $M_0/H_0$ ,  $M_1/H_1$  and tap-control word  $C_1[0:3]$ .

The SST branch is configured with a shared termination resistor to halve the parasitic capacitance  $C_T$  at the termination node. Fewer parasitic capacitance benefits not only less SST switching power  $P_{sw}$ , but also a shorter transition time, yielding a large eye opening. The impedance of the switching FET  $R_{on}$  is more susceptible to PVT variations when comparing



Fig. 8. Block diagram of the proposed hybrid-mode PAM-4 TX.



Fig. 9. Implementation details of the SCH driver with 3-tap FFE.

it with the termination resistor  $R_T$ . The return loss variation across PVT grows up with large  $R_{on}$ . Yet, large size of the switching FET associated with small  $R_{on}$  increases the predriver loading. There is a trade-off between the return-loss variation and pre-driver power consumption. If driving a load capacitance of  $C_0$ , the power dissipation of the pre-driver chain will become,

$$P_{pre-driver} = \sum_{n=0}^{\infty} \frac{fC_0 V_{DD}^2}{k^n} = \frac{k}{k-1} fC_0 V_{DD}^2 \qquad (15)$$



Fig. 10. Simulated return loss variation and pre-driver power consumption as a function of  $R_T/R_{\rm on}.$ 

where k denotes the fan-out of pre-driver chain and f is the clock frequency. Fan-out of 4 (FO4) is common to enhance the energy efficiency, but suffers from jitter amplification at high data rates due to the limited bandwidth [28]. A fan-out less than 2 is a better choice here for high-speed data transmission. To gain more insight, Fig. 10 presents the simulation results of the return loss variation and pre-driver power consumption as a function of  $R_T/R_{on}$ . The simulation results indicate that a value of  $R_T/R_{on} > 4$  is recommended to favor a less return loss variation (<5%) [11].

The linearity of the PAM-4 TX is characterized by the RLM. Dual-SST PAM-4 driver can slightly alter the MSB:LSB ratio from a fixed 2:1 to (2-x):(1-y) for RLM improvement, which leads to complicated segments control logic [14]. Our SCH PAM-4 driver generates the symbol levels in two ways: the symbol levels  $\pm V_{DD}/6$  for the middle eyes and symbol levels  $\pm (V_{DD}/6 + R_L I_{inj})$  for the top/bottom eyes. The latter allows the RLM improvement by optimizing the Iini in the CML branch. For example, to account for 5% RLM variation,  $I_{inj}$  should cover  $37V_{DD}/120R_L$  to  $43V_{DD}/120R_L$ . Using a bias circuit that generates the I<sub>ini</sub> proportional to V<sub>DD</sub>, the RLM is kept relatively constant with respect to V<sub>DD</sub> variation. The data dependent drain voltage of the current mirror suffers from current variation in the CML branch, which leads to RLM distortion. Care must be taken for the implementation of the current source. To ensure accurate current replication with limited voltage headroom, the low-voltage cascade current mirror is employed [29]. From simulations, the current variation in CML segment is <2.5% with different process corners.

# B. TX Data Path

The TX data path includes an on-chip  $2^7$ -1 PRBS generator providing 8-bit data streams in which the odd bits D<sub>O</sub> serve as LSB data, and the even bits D<sub>E</sub> as MSB data. Each parallel 4-bit input data in the two data path are used in 4:2 serialization, respectively, followed by a 3-tap generator (pre, main and post). Generating symbol-spaced data for FFE in full-rate incurs in more power penalty especially considering the operation in slow process corners. Fig. 11 exhibits, after retiming, the six half-rate data attributed to pre, main and post tap fed to the 2-to-1 MUX clocked by the DCD-free half-rate clock to generate the 3-tap full-rate FFE data. We optimize the clock path to ensure the sampling time at the center of



Fig. 11. 3-tap FFE generator with a half-rate 2:1 MUX.



Fig. 12. Simulated eye diagrams at different  $\Delta T$  (a) With 10-ps delay. (b) With 39-ps delay. (c) With 23-ps delay correction and (d) Eye misalignment versus delay.

the incoming data eye to minimize ISI. With respect to the hybrid PAM-4 solution, the full-rate data streams arrive at the SCH driver core in two paths. The 1-UI spaced LSB and MSB data streams are fed to the hybrid PAM-4 FFE encoder for hybrid data H<sub>i</sub> generation, which controls the CML branch and co-generates top/bottom eyes with the SST branch. The filtered MSB data is routed to the SST branch for the middle eye. A concern might arise for the timing mismatch between the two paths, which may cause the eye misalignment between the top/bottom eye and the middle eye. We have optimized the delay of the pre-drivers between the tap generator and SST branch to match the PAM-4 FFE encoder timing. For the PVT variation, the timing delay of the two paths vary simultaneously in the same direction, and no eye misalignment is observed in both simulation and measurement results, suggesting no background adjustment is needed. We intentionally swept the delay of pre-driver to assess the possible eye misalignment. Depicted in Fig. 12(a) and (b), the simulation results show that a 10-ps (39-ps) delay results in a 10% (15%) eye misalignment. In Fig. 12(c), a 23-ps delay eliminates the eye misalignment. Bathtub curve can be simulated to evaluate the eye misalignment versus the delay setting. From Fig. 12(d), the simulation result suggests that a negligible eye misalignment (<1%) is available with a delay



Fig. 13. (a) Implementation of the T-coil. (b) Simulated insertion gain with and without the output network.

between 15 to 28 ps. Further simulations have been carried out to evaluate the delay of pre-drivers over PVT variations. In FF (SS) corner, the delay is about 19.8 ps (26.9 ps). Monte-Carlo simulations are performed with >1000 runs, and the  $\pm 3\sigma$  delay value is between 22.3 to 24.7 ps.

## C. Output Matching Network

Large parasitic capacitance attributed to the SCH segments (~100 fF), the ESD protection (~100 fF) and the PAD (~80 fF) impair the TX bandwidth. The asymmetric T-coil has been widely used to extend the output bandwidth and ensure broadband impedance matching in high-speed SerDes TXs [8], [10], [11]. Fig. 13(a) details our T-coil that occupies a footprint of 40 × 40  $\mu$ m<sup>2</sup> with the inductances L<sub>p</sub> and L<sub>s</sub> as 200 and 180 pH, respectively. The self-resonant frequency (SRF) is >60 GHz and the ESR is absorbed as part of the SST terminated resistance and provides ESD protection. From simulation, the T-coil extends the bandwidth by 2.3x from 10.6 to 24.7 GHz [Fig. 13(b)].

# V. EXPERIMENTAL RESULTS

The proposed PAM-4 TX prototyped in 28-nm CMOS occupies a compact die area of 0.0345 mm<sup>2</sup> [Fig. 14(a)]. We conducted all measurements at a 0.9-V V<sub>DD</sub> with the test setup depicted in Fig. 14(b). We captured the eye diagram by the Keysight MSOV134A oscilloscope, and measured the return loss with the Keysight N5247A network analyzer. The Labjack is used for chip programming through an on-chip SPI interface. The prototype has been mounted on the PCB and characterized by a high-speed probe station. The TX is firstly experimented at 36 Gb/s, and the insertion loss of the test setup is -3.4 dB at a 9 GHz Nyquist frequency associated with the probe, cable, connectors and DC block (Fig. 8). To compensate it, the FFE is enabled with equalization tap coefficients [ $\alpha_{-1} = -0.04$ ,  $\alpha_0 = 0.84, \alpha_1 = -0.12$ ]. Fig. 15(a) shows the measured eye diagram with FFE. The TX delivers a PAM-4 eye amplitude of  $\sim 600$  mV, which corresponds to maximum  $0.9V_{ppd}$  when FFE is disabled. The PAM-4 data eye is  $\sim 150$  mV-vertical and  $\sim 0.52$  UI-horizontal. At this data rate, the chip draws 17.9 mW and the SCH driver core dissipates 6.12 mW. Pushing the data rate to 40 Gb/s, the PAM-4 eye-opening is still 60 mV-vertical and 0.32 UI-horizontal with FFE [Fig. 15(b)]. Table III outlines the TX power breakdown. In the TX data



Fig. 14. (a) Chip photograph of the fabricated PAM-4 TX in 28-nm CMOS. (b) Test setup.



Fig. 15. Measured TX output eye diagrams (a) 36 Gb/s with FFE and (b) 40 Gb/s with FFE and (c) 36 Gb/s with channel without FFE and (d) 36 Gb/s with channel with FFE.

path, the FFE encoder and pre-driver contributes low power consumtion of 0.9 and 1.1 mW, respectively.With the FFE enabled, the TX draws 19.5 mW at 40 Gb/s, corresponding to a 0.49 pJ/bit energy efficiency.

To assess the equalization capability, our TX drives a 36-inch test cable and the measured corresponding channel loss at a 9-GHz Nyquist frequency is -6.9 dB. Fig. 15(c) shows the eye diagram captured after the channel with FFE turned off, and the eye is completely closed. Depicted in Fig. 15(d), after embedding the FFE setting [ $\alpha_{-1} = -0.08$ ,  $\alpha_0 = 0.74$ ,  $\alpha_1 = -0.18$ ], the eye is reopened with 68 mV-vertical and 0.38 UI-horizontal.

Fig. 16 shows the measured return loss at the TX differential output that is <-10 dB up to 50 GHz. The asymmetric T-coil

|                              | This Work                |        | [10]                 | [11]           | [12]                  | [13]         | [30]         |
|------------------------------|--------------------------|--------|----------------------|----------------|-----------------------|--------------|--------------|
| CMOS Technology              | 28nm<br>Bulk             |        | 28nm<br>SOI          | 14nm<br>FinFET | 14nm<br>FinFET        | 65nm<br>Bulk | 65nm<br>Bulk |
| PAM-4 Output Driver Topology | SST-CML-<br>Hybrid (SCH) |        | SST                  | SST            | SST                   | SST          | LVDS         |
| Output Swing w/o FFE (V)     | 0.9                      |        | 1.3                  | 0.9            | 0.9                   | 1.2          | N/A          |
| TX FFE                       | 3-tap                    |        | 4-tap                | No EQ          | 3-tap                 | 2-tap        | 2-tap        |
| Data Rate (Gb/s)             | 36                       | 40     | 45                   | 16-40          | 56                    | 25           | 32           |
| Loss @ Nyquist (dB)          | 3.4                      | 3.6    | 6                    | 4              | 6                     | 6.8          | 6            |
| Vertical Eye (mV)            | 150                      | 60     | 80 <sup>&amp;</sup>  | 61 #           | 65 <sup>&amp;</sup>   | 25           | 78           |
| Horizontal Eye (UI)          | 0.52                     | 0.32   | 0.4 <sup>&amp;</sup> | 0.65 #         | 0.45 <sup>&amp;</sup> | 0.4          | 0.6          |
| Power * (mW)                 | 17.9                     | 19.5 ^ | 120                  | 167.5          | 100.7                 | 50           | 53           |
| Energy Eff. (pJ/bit)         | 0.5                      | 0.49   | 2.6                  | 4.19           | 1.8                   | 2            | 1.66         |
| Driver Power (mW)            | 6.12                     | 6.4    | 50                   | N/A            | 15.2                  | N/A          | 12           |
| Driver Eff. (pJ/bit)         | 0.17                     | 0.16   | 1.11                 | N/A            | 0.27                  | N/A          | 0.37         |
| RLM                          | 98%                      |        | >94%                 | N/A            | N/A                   | N/A          | N/A          |
| Die Area (mm²)               | 0.0345                   |        | 0.28                 | 0.0279         | 0.035                 | 0.083        | 0.028        |

TABLE III TRANSMITTER PERFORMANCE COMPARISON

\* Excluding the PLL power & Estimated from the measurement result # With software-based CTLE at scope ^ Power breakdown (SST: 17.1%; CML: 15.9%; Clock distribution & DCC: 20.4%; TX data path & FIR: 46.6%)



Fig. 16. Measured TX differential return loss.



Fig. 17. RLM measurement results at 40 Gb/s.

ensures a broadband impedance matching. The result meets the CEI-56G-PAM-4 standard with adequate margin. Fig. 17 plots the measured PAM-4 TX RLM at 40 Gb/s, in which the output contains 10 symbols with each lasting for 16 UI [1]. With a constant current bias for CML branches, the RLM evaluation without FFE is 98%, satisfying the specification of 92%. Although the upper and bottom eyes are bounded together since the  $I_{inj}$  in the CML driver generates the top and bottom eyes simultaneously, no asymmetric eyes are measured as the differential outputs could cancel the possible nonideal reflection and settling issue.

Table III summarizes the measurement results and compares this work with the prior art. The PLL power consumption has been excluded. The proposed SCH driver has achieved an improved energy efficiency (0.16 pJ/bit) under a similar output swing [11]–[13], [30]. Regardless of the current injection, and after normalizing to the same swing [10], our SCH driver is still favored for its lower power consumption due to the reduction of  $P_{sig}$ ,  $P_{sw}$  and  $P_{EQ}$ .

Since this work mainly focuses on the validation of the SCH PAM-4 driver, the data serialization ratio is limited to 8, which is less than the 16 in [11] and the 32 in [12]. Thus, the relaxed clock-load effect leads to lower power dissipation in the clock distribution and data path. With the improved energy efficiency of our SCH PAM-4 driver, the entire TX reaches an energy efficiency of 0.5 pJ/bit at 40 Gb/s. Although the data rate of our TX prototype does not meet the speed requirement of the CEI-56G standards, our speed limit is not

dominated by our SCH PAM-4 driver topology. Suggested by simulation, the data rate of our TX can be further improved by optimizing the fan-out of the pre-driver and widening the output bandwidth.

# VI. CONCLUSIONS

The typical SST driver features low signaling power and high linearity, but with the drawbacks of substantial SST switching power and equalization power. For the typical CML driver, it has the benefits of no switching power and constant power consumption with the FFE embedded. Our proposed SCH driver merges the merits of the SST and CML drivers, to realize an energy-efficient PAM-4 TX. Thanks to the SCH driver, the PAM-4 signaling power reduces by 23%, mitigating the large percentage of the SST switching power. Together with our hybrid-path FFE scheme, the equalization power does not raise significantly with the large FFE settings. Validated in 28-nm CMOS and clocked at 40 Gb/s, our SCH driver with a 3-tap FFE achieves an energy efficiency of 0.16 pJ/bit, and the entire TX exhibits an energy efficiency of 0.5 pJ/bit.

#### REFERENCES

- IEEE P802.3bs 400 GbE Task Force. Accessed: Mar. 2015. [Online]. Available: http://www.ieee802.org/3/bs/
- [2] H. Hatamkhani, K.-L. J. Wong, R. Drost, and C.-K. K. Yang, "A 10-mW 3.6-Gbps I/O transmitter," in *Symp. VLSI Circuits Dig. Tech. Papers* (VLSI), Jun. 2003, pp. 97–98.
- [3] W. D. Dettloff *et al.*, "A 32 mW 7.4 Gb/s protocol-agile source-seriesterminated transmitter in 45 nm CMOS SOI," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2010, pp. 370–371.
- [4] Y. Lu, K. Jung, Y. Hidaka, and E. Alon, "Design and analysis of energyefficient reconfigurable pre-emphasis voltage-mode transmitter," *IEEE J. Solid-State Circuits*, vol. 48, no. 8, pp. 1898–1909, Aug. 2013.
- [5] N. Kocaman et al., "A 3.8 mW/Gbps quad-channel 8.5–13 Gbps serial link with a 5 tap DFE and a 4 tap transmit FFE in 28 nm CMOS," *IEEE J. Solid-State Circuits*, vol. 51, no. 4, pp. 881–892, Apr. 2016.
- [6] K. L. Chan et al., "A 32.75-Gb/s voltage-mode transmitter with threetap FFE in 16-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 10, pp. 2663–2678, Oct. 2017.
- [7] Y.-H. Song and S. Palermo, "A 6-Gbit/s hybrid voltage-mode transmitter with current-mode equalization in 90-nm CMOS," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 59, no. 8, pp. 491–495, Aug. 2012.
- [8] M. Kossel et al., "A T-coil-enhanced 8.5 Gb/s high-swing SST transmitter in 65 nm bulk CMOS with «-16 dB return loss over 10 GHz bandwidth," *IEEE J. Solid-State Circuits*, vol. 43, no. 12, pp. 2905–2920, Dec. 2008.
- [9] S. Yuan, L. Wu, Z. Wang, X. Zheng, C. Zhang, and Z. Wang, "A 70 mW 25 Gb/s quarter-rate SerDes transmitter and receiver chipset with 40 dB of equalization in 65 nm CMOS technology," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 63, no. 7, pp. 939–949, Jul. 2016.
- [10] M. Bassi, F. Radice, M. Bruccoleri, S. Erba, and A. Mazzanti, "A highswing 45 Gb/s hybrid voltage and current-mode PAM-4 transmitter in 28 nm CMOS FDSOI," *IEEE J. Solid-State Circuits*, vol. 51, no. 11, pp. 2702–2715, Nov. 2016.
- [11] J. Kim et al., "A 16-to-40 Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [12] T. O. Dickson, H. A. Ainspan, and M. Meghelli, "A 1.8 pJ/b 56 Gb/s PAM-4 transmitter with fractionally spaced FFE in 14 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2017, pp. 118–119.
- [13] B. Hu, Y. Du, R. Huang, J. Lee, Y.-K. Chen, and M.-C. F. Chang, "A capacitor-DAC-based technique for pre-emphasis-enabled multilevel transmitters," *IEEE Trans. Circuits Syst.*, *II, Exp. Briefs*, vol. 64, no. 9, pp. 1012–1016, Sep. 2017.
- [14] P. Upadhyaya et al., "A fully adaptive 19-to-56 Gb/s PAM-4 wireline transceiver with a configurable ADC in 16 nm FinFET," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2018, pp. 108–110.

- [15] S. Kim, Y. Jeong, M. Lee, K.-W. Kwon, and J.-H. Chun, "A 5.2-Gb/s low-swing voltage-mode transmitter with an AC-/DC-coupled equalizer and a voltage offset generator," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 1, pp. 213–225, Jan. 2014.
- [16] A. Ramachandran, A. Natarajan, and T. Anand, "Line coding techniques for channel equalization: Integrated pulse-width modulation and consecutive digit chopping," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 3, pp. 1192–1204, Mar. 2019.
- [17] B. Song, K. Kim, J. Lee, J. Chung, Y. Choi, and J. Burm, "A 13.5-mW 10-Gb/s 4-PAM serial link transmitter in 0.13-μm CMOS technology," *IEEE Trans. Circuits Syst.*, *II, Exp. Briefs*, vol. 61, no. 9, pp. 646–650, Sep. 2014.
- [18] S. Saxena, R. K. Nandwana, and P. K. Hanumolu, "A 5 Gb/s energyefficient voltage-mode transmitter using time-based de-emphasis," *IEEE J. Solid-State Circuits*, vol. 49, no. 8, pp. 1827–1836, Aug. 2014.
- [19] A. Nazemi et al., "A 36 Gb/s PAM4 transmitter using an 8b 18 GS/S DAC in 28 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [20] Y. Frans et al., "A 56-Gb/s PAM4 wireline transceiver using a 32-way time-interleaved SAR ADC in 16-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 52, no. 4, pp. 1101–1110, Apr. 2017.
- [21] J. Lee, P.-C. Chiang, P.-J. Peng, L.-Y. Chen, and C.-C. Weng, "Design of 56 Gb/s NRZ and PAM4 SerDes transceivers in CMOS technologies," *IEEE J. Solid-State Circuits*, vol. 50, no. 9, pp. 2061–2073, Sep. 2015.
- [22] H. Cheng, F. A. Musa, and A. C. Carusone, "A 32/16-Gb/s dual-mode pulsewidth modulation pre-emphasis (PWM-PE) transmitter with 30-dB loss compensation using a high-speed CML design methodology," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 8, pp. 1794–1806, Aug. 2009.
- [23] J. Lee, M.-S. Chen, and H.-D. Wang, "Design and comparison of three 20-Gb/s backplane transceivers for duobinary, PAM4, and NRZ data," *IEEE J. Solid-State Circuits*, vol. 43, no. 9, pp. 2120–2133, Sep. 2008.
- [24] Y. Chang, A. Manian, L. Kong, and B. Razavi, "An 80-Gb/s 44-mW wireline PAM4 transmitter," *IEEE J. Solid-State Circuits*, vol. 53, no. 8, pp. 2214–2226, Aug. 2018.
- [25] K. Huang, Z. Wang, X. Zheng, C. Zhang, and Z. Wang, "A 80 mW 40 Gb/s transmitter with automatic serializing time window search and 2-tap pre-emphasis in 65 nm CMOS technology," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 5, pp. 1441–1450, May 2015.
- [26] H. Wang and J. Lee, "A 21-Gb/s 87-mW transceiver with FFE/DFE/analog equalizer in 65-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 4, pp. 909–920, Apr. 2010.
- [27] H. Zhang, B. Jiao, Y. Liao, and G. Zhang, "PAM-4 signaling for 56G serial link applications—A tutorial," DesignCon, Santa Clara, CA, USA, Tech. Rep., 2016.
- [28] B. Casper and F. O'Mahony, "Clocking analysis, implementation and measurement techniques for high-speed data links—A tutorial," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 56, no. 1, pp. 17–39, Jan. 2009.
- [29] O. Charlon and W. Redman-White, "Ultra high-compliance CMOS current mirrors for low voltage charge pumps and references," in *Proc. IEEE Proc. Eur. Solid-State Circuits Int. Conf. (ESSCIRC)*, Sep. 2004, pp. 227–230.
- [30] L. Tang, W. Gai, L. Shi, X. Xiang, K. Sheng, and A. He, "A 32 Gb/s 133 mW PAM-4 transceiver with DFE based on adaptive clock phase and threshold voltage in 65 nm CMOS," in *IEEE ISSCC Dig. Tech. Papers*, Feb. 2018, pp. 114–116.



**Chao Fan** (S'18) received the B.Sc. and M.Sc. degrees (Hons.) in integrated circuit design and integration system from Xidian University, Xi'an, China, in 2011 and 2014, respectively. He is currently pursuing the Ph.D. degree with the Faculty of Science and Technology and the State Key Laboratory of Analog and Mixed-Signal VLSI, Department of Electrical and Computer Engineering, University of Macau, Macao, China.

From 2014 to 2015, he was an Analog IC Design Engineer with the Xi'an Microelectronic Technology

Institute, developing high-performance power management IC. His current research interests include RF/mm-wave integrated circuits and high-speed interface circuits. He was a recipient of the first prize of Academic Scholarship in 2012 and 2013, respectively.



Wei-Han Yu (S'09) received the B.Sc. and M.Sc. degrees in electrical and electronics engineering from the University of Macau (UM), Macao, China, in 2010 and 2012, respectively, and the Ph.D. degree from the State-Key Laboratory of Analog and Mixed-Signal VLSI and Faculty of Science and Technology, Department of Electronic and Computer Engineering, UM, in 2018. He is currently a Lecturer (Macao Fellow) in microelectronics with UM and a Visiting Scholar with Stanford University. His current research interests include RF and mmwave

transmitter, power amplifier, digital predistortion, and EM modeling for nextgeneration mobile communications. He was a recipient the IEEE ISSCC Student Travel Grant Award and the FDCT Science and Technology Postgraduate Student Award in 2016, and the IEEE SSCS Predoctoral Achievement Award in 2018.



**Pui-In Mak** (S'00–M'08–SM'11–F'19) received the Ph.D. degree from the University of Macau (UM), Macao, China, in 2006.

He is currently a Full Professor with the UM Faculty of Science and Technology – ECE and an Associate Director (Research) with the UM State Key Laboratory of Analog and Mixed-Signal VLSI. His research interests are on analog and radiofrequency (RF) circuits and systems for wireless and multidisciplinary innovations.

Dr. Mak was an Editorial Board Member of IEEE Press from 2014 to 2016 and a member of Board-of-Governors of IEEE Circuits and Systems Society from 2009 to 2011. He is a Fellow of the IET. He co-received the DAC/ISSCC Student Paper Award in 2005, CASS Outstanding Young Author Award in 2010, National Scientific and Technological Progress Award in 2011, Best Associate Editor of IEEE Transactions on Circuits and Systems II from 2012 to 2013, A-SSCC Distinguished Design Award in 2015, and ISSCC Silkroad Award in 2016. He is/was the TPC Vice Co-Chair of ASP-DAC in 2016 and a TPC Member of A-SSCC from 2013 to 2016, ESSCIRC from 2016 to 2017, and ISSCC from 2017 to 2019. He has been the Chair of the Distinguished Lecturer Program of the IEEE Circuits and Systems Society in 2018. He was a Senior Editor of the IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS from 2014 to 2015 and an Associate Editor of the IEEE JOURNAL OF SOLID-STATE CIRCUITS since 2018, the IEEE SOLID-STATE CIRCUITS LETTERS since 2017, the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I from 2010 to 2011 and from 2014 to 2015, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2010 to 2013. He was a Distinguished Lecturer of the IEEE Circuits and Systems Society from 2014 to 2015 and the IEEE Solid-State Circuits Society from 2017 to 2018. In 2005, he was decorated with the Honorary Title of Value for Scientific Merits by the Macau Government. He has been an Overseas Expert of the Chinese Academy of Sciences since 2018.



**Rui P. Martins** (M'88–SM'99–F'08) was born in April 1957. He received the bachelor's, master's, and Ph.D. degrees, and the Habilitation degree, for Full-Professor in Electrical Engineering and Computers, from the Department of Electrical and Computer Engineering (DECE), Instituto Superior Técnico (IST), University of Lisbon, Portugal, in 1980, 1985, 1992, and 2001, respectively.

He has been with the DECE/IST, University of Lisbon since October 1980. Since 1992, he has been on leave from the University of Lisbon and from the

DECE, Faculty of Science and Technology (FST), University of Macau (UM), Macao, China, where he is the Chair-Professor since August 2013. He was the Dean of the Faculty of FST from 1994 to 1997, and has been the Vice-Rector of UM since 1997. From September 2008 to August 2018, he was the Vice-Rector (Research) and from September 2018 to August 2023, he is the Vice-Rector (Global Affairs). Within the scope of his teaching and research activities, he has taught 21 bachelor and master courses and, in UM, he has supervised (or co-supervised) 46 theses, 25 Ph.D., and 21 Masters. He coauthored seven books and 11 book chapters, holds 33 Patents, USA (30) and Taiwan (3), published 497 papers, in scientific journals (184) and in conference proceedings (313), and other 64 academic works, in a total of 612 publications. He founded the Analog and Mixed-Signal VLSI Research Laboratory, UM, in 2003, elevated in January 2011 to the State Key Laboratory (SKLAB) of China (the 1<sup>st</sup> in Engineering in Macao), being its Founding Director. He was the Founding Chair of UMTEC (UM company) from January 2009 to March 2019, supporting the incubation and creation in 2018 of Digifluidic, the first UM Spin-Off, whose CEO is a SKLAB Ph.D. graduate. He was also a Co-Founder of The Chipidea Microelectronics, Macao (now Synopsys-Macao) in 2001/2002.

Dr. Martins was the Founding Chair of the IEEE Macau Section from 2003 to 2005 and the IEEE Macau Joint-Chapter on Circuits And Systems (CAS)/Communications (COM) from 2005 to 2008 [2009 World Chapter of the Year of IEEE CAS Society (CASS)], the General Chair of IEEE Asia-Pacific Conference on CAS - APCCAS 2008, the Vice-President (VP) of Region 10 (Asia, Australia, and Pacific) from 2009 to 2011 and the VP of World Regional Activities and Membership of IEEE CASS from 2012 to 2013, an Associate-Editor of IEEE TRANSACTIONS ON CAS II: EXPRESS BRIEFS from 2010 to 2013, and a nominated Best Associate Editor from 2012 to 2013. He was also a member of the IEEE CASS Fellow Evaluation Committee in 2013, 2014, 2018 - Chair, and 2019, the IEEE Nominating Committee of Division I Director (CASS/EDS/SSCS) in 2014, and the IEEE CASS Nominations Committee from 2016 to 2017. He was the General Chair of ACM/IEEE Asia South Pacific Design Automation Conference - ASP-DAC in 2016, receiving the IEEE Council on Electronic Design Automation (CEDA) Outstanding Service Award in 2016. He was also the Vice-President from 2005 to 2014 and the President from 2014 to 2017 of the Association of Portuguese Speaking Universities (AULP), and received two Macao Government decorations: the Medal of Professional Merit (Portuguese-1999) and the Honorary Title of Value (Chinese-2001). In July 2010, he was elected, unanimously, as Corresponding Member of the Lisbon Academy of Sciences, being the only Portuguese Academician living in Asia.