# A Hardware-Efficient Feedback Polynomial Topology for DPD Linearization of Power Amplifiers: Theory and FPGA Validation

Chak-Fong Cheang, Student Member, IEEE, Pui-In Mak<sup>(D)</sup>, Senior Member, IEEE, and Rui P. Martins, Fellow, IEEE

Abstract—This paper describes a hardware-efficient feedback polynomial topology for digital predistortion (DPD) linearization of power amplifiers. Unlike the existing pruned Volterra- series DPD linearization that compensates the nonlinearities in parallel, our topology tailors a feedback memory block, such that the nonlinearities and memory effects can be constructed separately, minimizing the running complexity while significantly reducing the size of the coefficients extractor. Yet, the coefficients of the feedback memory block cannot be extracted in the direct form. To surmount it, a design methodology is developed with the aid of complexity-reduced Volterra-series model. Also, it is known that the least square estimation can extract the coefficients of the digital predistorter, but its pseudo-inverse operation between the inputs and outputs involves heavy matrix multiplications and division. With a computational complexity of  $O(N^3)$ , the coefficients extractor could hardly be implemented efficiently in the field-programmable gate array (FPGA). Here, we propose a division-free line-searched-based recursive least square algorithm for adaptive linear and nonlinear coefficient estimation, relaxing the computational complexity to O(N) and supporting adaptive estimation in the FPGA. Our DPD experiments demonstrate both identification and predistortion procedures fully implemented in the FPGA. The measured error vector magnitude is reduced from 10.1% to <3.2%, and the adjacent channel leakage ratio (ACLR) is improved from -28.4 to -46.1 dBc, for a 20-MHz 64-QAM orthogonal frequency division multiplexing signal. For carrier-aggregation signals, the ACLR is improved from -35.8 to -45.3 dBc.

*Index Terms*—Carrier-aggregation, digital predistortion (DPD), field-programmable gate array (FPGA), identification, power amplifier (PA), recursive least square (RLS).

Manuscript received September 1, 2017; revised November 28, 2017 and December 24, 2017; accepted December 27, 2017. Date of publication January 12, 2018; date of current version August 3, 2018. This work was supported in part by the Macao Science and Technology Development Fund-SKL Fund, and in part by the University of Macau under Grant MYRG2017-00223-AMSV. This paper was recommended by Associate Editor G. Masera. (*Corresponding author: Pui-In Mak.*)

C.-F. Cheang and P.-I. Mak are with the State-Key Laboratory of Analog and Mixed-Signal VLSI, Faculty of Science and Technology, Department of Electrical and Computer Engineering, University of Macau, Macau, China (e-mail: pimak@umac.mo).

R. P. Martins is with the State-Key Laboratory of Analog and Mixed-Signal VLSI, Faculty of Science and Technology, Department of Electrical and Computer Engineering, University of Macau, Macau, China, on leave from the Instituto Superior Técnico, Universidade de Lisboa, 1649-004 Lisboa, Portugal (e-mail: rmartins@umac.mo).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TCSI.2017.2788082

I. INTRODUCTION

ULTICARRIER signals such as orthogonal frequencydivision multiplexing (OFDM) have been extensively utilized in modern wireless transceivers to enhance the throughput over a limited frequency resource. Yet, such modulated signals vary rapidly in time with a non-constant envelope, resulting in a high peak-to-average-power ratio (PAPR). To enhance the power efficiency of the transmitter path, it is preferable to operate the power amplifiers (PAs) closer to the compression region at the expense of certain linearity. Together with the frequency- dependent memory effects (e.g., impedance variation in the matching network), costeffective techniques that can address the asymmetrical spectral regrowth and error vector magnitude (EVM) are worth to be explored. Digital predistortion (DPD) is promising as it can benefit from the high capacity of digital computation in the baseband. DPDs and joint DPDs have been proposed to compensate the modulator and PA distortions with or without memory effects [1]-[21]. Most DPDs involve two procedures: identification and predistortion. To befit field- programmable gate array (FPGA) implementation, in the literature only predistortion is implemented inside the digital baseband, as the implementation complexity of *identification* is enlarged under direct- and indirect-learning algorithms. Comparatively, the complexity of *predistortion* is directly related to its basis construction and the number of coefficients in the models. The complexity and efficiency of different DPDs and joint DPDs via the floating point operations (FLOPs) has been studied in [22]-[24]. Indirect-learning DPDs can accurately compensate the PA nonlinearities with memory effects; examples are Wiener model [1], generalized Hammerstein [2] and pruned Volterra-series [3]–[7]. Different dynamic orders of the nonlinear memory effect are predistorted with these models, showing promising performances in handling the PA nonlinearities and memory effects. Nevertheless, the coefficients of indirect-learning algorithm can hardly be extracted adaptively using the general least-square (LS) estimation that entails matrix multiplications and divisions. The external digital signal processing unit is required to implement the matrix division with a large computational complexity and power consumption. Furthermore, LS operation becomes more complex when more coefficients are included in different models. Thus, the coefficients extraction could hardly be implemented online

1549-8328 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.

for resource-limited applications if considering the large array size. The recent interests are therefore shifted to complexity-efficient coefficients extraction in the FPGA.

In contrast to the pruned Volterra-based predistortor, the adaptive direct-learning DPDs have applied the memoryless model as the basic constructor, compensating the PA nonlinearities alone with the look-up-table (LUT) [8]-[10] or polynomial [11]. The memoryless model can be updated adaptively in real time, as it involves low complexity training that can be implemented inside the FPGA. Yet, the memory effects cannot be compensated apparently, as they have to track the memoryless amplitude-to-amplitude modulation (AM/AM) and amplitude-to-phase modulation (AM/PM) curve of the PA. As a result, the asymmetrical spectral regrowth remains unsolved, and the EVM is just partially compensated with the memoryless DPD. NARMA-based predistorter compensate the nonlinearities and memory effect using the multi-LUTs [12], leaving the memoryless nonlinearities to be compensated by a LUT. For the nonlinear memory effect, it is compensated independently with other LUTs to describe different nonlinear infinite-impulse-response filters in parallel. The DDR-based predistorter [13] allows a low-cost FPGA realization using the combination of LUTs and multiplier-reducing technique. The parameter estimator in [13]–[15] has to be implemented in a digital signal processor with floating-point calculation, since the matrix size of the parameter estimator has to be enlarged. Instead, with the concept of the two-block model, additional memory effects block could be cascaded to the memoryless model. Separated coefficients of the FIR filter [16]–[18] can be trained by a recursive algorithm, to compensate those nonlinear memory effects. The dimension of the nonlinear memory effects can be determined by combining different pruned Volterra-series models such as the Hammerstein [14] or memory polynomial [16]-[18]. Yet, the number of the estimated coefficients is  $P \times M$ , where P is the nonlinear order and M is the memory length. The estimated matrix size of the recursive least square (RLS) core is  $[P \times M, P \times M]$ , which is still too large to be realized in the FPGA and compatible to the indirect learning DPDs. Unlike [16]-[18], multiple-LUT model [19] constructs the memory polynomial with *M*-paralleled memoryless polynomials, such that different LUTs can be estimated adaptively with the least-meansquare (LMS) algorithm. Only moderate memory effect can be compensated efficiently, as it is based on the memory polynomial with low complexity-efficiency [19].

This paper reports an adaptive hardware-efficient feedback polynomial topology that can compensate the PA nonlinearities with memory effects. With inheriting the properties of the complexity-reduced Volterra-series (CRV) model [7], the large dispersal properties on basis construction of the CRV are also implemented in our proposed model. Both identification and predistortion procedures are completed within the same loop, with a better balance of complexity-efficiency in the FPGA. Nonlinearities and memory effects are estimated separately to reduce the size of matrix in the same estimation core. To save multiplication and division inside the core, the coefficients are estimated with line-search-based RLS algorithm [25]. Also, no phase-shift adjuster is entailed for estimation and compensation with the coordinate rotational digital computer (CORDIC) blocks, as this operation is embedded inside the coefficients after cross-correlation. Nonlinearities, memory effects and cross-intermodulation, which generate by the distortions between two OFDM signals with a 20-MHz spacing, can also be compensated by the proposed DPD, no matter it is for single- carrier or carrier-aggregation OFDM signals.

# II. PROPOSED ADAPTIVE IDENTIFIER AND PREDISTORTER

This section describes the relationship between the Volterraseries model and low-pass equivalent feedback topology model. From the feedback topology model, separated descriptions on the memoryless nonlinearities and nonlinear memory effects are given. Also, the structure of the predistorter is addressed.

# A. Characteristic of Volterra-Series and Its Pruned Models

The baseband-equivalent Volterra-series model is a general approach for modeling and compensating PA nonlinearities and its memory effects [1], [26] with multi-dimensional convolutions [1], which is given by

$$y(n) = \sum_{p=1}^{P} \sum_{m_1=0}^{M} \sum_{m_2=m_1}^{M} \cdots \sum_{m_p=m_{p-1}}^{M} \sum_{m_{p+1}=0}^{M} \sum_{m_{p+2}=m_{p+1}}^{M} \cdots \sum_{m_{2p-1}=m_{2p-2}}^{M} a_{2p-1} (m_1, m_2, \cdots, m_{2p-1}) \times \prod_{q_1=1}^{P} x (n - m_{q_1}) \prod_{q_2=p+1}^{2p-1} x^* (n - m_{q_2})$$
(1)

where x(n) and y(n) are the input and output of the PA, respectively. P is the nonlinear order; M is the considered memory depth.  $a_{2p-1}(\cdot)$  is the *p*th-order kernel of the Volterraseries.  $m_{p_1}$  and  $m_{p_2}$  represent the memory effect of the PA. The term  $\prod_{q_1=1}^{p} x(n-m_{q_1}) \prod_{q_2=p+1}^{2p-1} x^*(n-m_{q_2})$  represents the odd-order nonlinearity. Only the harmonics located around the transmitted center frequency  $f_{LO}$  are considered in (1), the terms  $|x|^{2p} x$  are thus remained. However, the number of coefficients of (1) increases dramatically as the considered nonlinearities and memory effects increase. Pruned models have been proposed to reduce the computational complexity in [2]-[7]. The PA distortions can be characterized through the AM/AM and AM/PM characteristics of the PA model. For the memory-related distortions that are the interaction of nonlinearities and memory effect, they can be observed from the dispersion on the AM/AM and AM/PM of the memoryless nonlinearities. In overall, Volterra-series and its related pruned models [2]-[7] are summarized to share the same construction. A combination of memoryless nonlinearities and memory- related distortions can model the PA nonlinearities and memory effect, i.e., a memoryless nonlinearity block plus a nonlinear memory block. It can be

expressed as,

$$y(n) = G(x(n)) + \sum_{p=1}^{P} \sum_{m_1=1}^{M} \sum_{m_2=m_1}^{M} \cdots \sum_{m_p=m_{p-1}}^{M} \sum_{m_{p+1}=1}^{M} \sum_{m_{p+2}=m_{p+1}}^{M} \cdots \sum_{m_{2p-1}=m_{2p-2}}^{M} b_{2p-1} (m_1, m_2, \cdots, m_{2p-1}) \times \prod_{q_1=1}^{P} x (n - m_{q_1}) \prod_{q_2=p+1}^{2p-1} x^* (n - m_{q_2})$$
(2)

where  $b_p(\cdot)$  is the coefficients of the pruned Volterra-series model and the first term G(x) is the memoryless nonlinear model, which can be expressed as,

$$G(x(n)) = \sum_{p=1}^{P} c_p x(n) |x(n)|^{p-1}.$$
 (3)

where  $c_p$  is the coefficients of the memoryless polynomial. The second term  $\sum_{p=1}^{P} \sum_{m=1}^{M} b_{2p-1}(m_1, m_2, ..., m_{2p-1}) \prod_{l=1}^{p} x (n - m_l) \prod_{l=p+1}^{2p-1} x^* (n - m_l)$  represents the interaction between the nonlinearities and memory effects under different basis construction. Different dynamic order and nonlinear memory effects are involved in different pruning techniques to describe the nonlinear memory. The complexity-efficiency tradeoff can be addressed by the disposal property on basis construction [23]. Most existing structures involve heavy efforts to estimate the interaction between the nonlinearities and memory effect with building the basis constructions in parallel; this increases not only the running complexity, but also the size of the estimating matrix [3]–[7].

# B. Proposed Feedback Topology Predistorter - fCRV

Under limited multipliers, the memoryless predistorter can compensate the PA nonlinearities alone. In fact, memory effects are dispersed around the memoryless nonlinearities. From the concepts of two-block model [16]-[18] in which the memoryless nonlinearities and memory effects are *separately* compensated, the memory effects can be compensated by cascading an auxiliary block. This separation allows modeling, or predistortion, of the PA nonlinearities and memory effects with substantial order reduction. Note that those memory effects are commonly formulated similar to [3]-[7]. The resultant running complexity of the memory part and estimated matrix are still enlarged, which is comparable with the pruned Volterra-series DPDs. Here, the feedback topology is applied as the memory-effect block as shown in Fig. 1. The input u(t) is a nonlinear memory-included and passed through the memoryless nonlinear model to form the output y(t) [26], which can be expressed as,

$$y_{RF}(t) = G\left(x_{RF}(t) + \sum_{j=1}^{M} \gamma_j y_{RF}(t-j)\right), \quad (4)$$

where  $\gamma_j$  is the coefficient of *j*th delay of feedback FIR filter. This predistorter provides the interaction of nonlinearities and memory effect within the feedback memory block, minimizing the running complexity and basis construction. The comparison of different basis construction and their

 $x_{RF}(n) \longrightarrow \bigoplus_{u_{RF}(n)} \bigoplus_{v_{RF}(n)} \bigoplus_$ 

Fig. 1. The feedback topology PA behavioral modeling [23].



Fig. 2. (a) Block diagram of the proposed DPD based on the feedback topology with a training algorithm core  $RLS_{DCD}$ . (b) The block diagram of the  $RLS_{DCD}$  core. The switches are controlled by *MODE* in (a) to train the parameters of the memoryless nonlinearities and feedback FIR filter.

running complexity are summarized in [23]. The basis construction of the proposed predistorter in (4) is 3 + (P - 1)FLOPs (the number of floating point operations), which is the same as the memoryless DPD and memory polynomial DPD. Thus, the running complexity of the feedback-topology predistorter is similar to the memoryless DPD is (5P + 4) +(4M + 2). However, the coefficients of G(·) and  $\gamma_j$  are difficult to be estimated and decomposed in the direct form, especially for the feedback filter coefficients. The proposed *feedback complexity-reduced Volterra-series predistorter* (fCRV) provides the methodology to estimate the feedback coefficients of the feedback topology.

fCRV consists of a memoryless DPD  $G^{-1}(x)$  with an auxiliary memory compensation block with the feedback filter. Fig. 2(a) shows the block diagram of the proposed DPD, where the output y(n) is fed to train the coefficients of memoryless DPD  $d_i$ , where i = 1, 3, 5, ..., and the feedback FIR filter  $\gamma_j$  separately with the input u(n) through the training algorithm *RLS*<sub>DCD</sub>. Thus, two types of coefficients are estimated here:



Fig. 3. Flow chart of the proposed DPD. It separates the memoryless nonlinearities and memory effect compensation into two steps.

1) the memoryless DPD; 2) the memory compensation part with the feedback filter, which is determined by the training setting switch *MODE*. The flow chart of the proposed predistorter is depicted in Fig. 3. The parameters of the memoryless predistorter are firstly identified when the auxiliary memory block is opened in this step. Secondly, the memoryless predistorted data is applied to the linearization test and captured the linearized data when the coefficients of the feedback filter are identified. Finally, the memory-included predistorted data is constructed and applied.

The memoryless predistorter, which is  $G^{-1}(x)$ , is first assumed that it can be estimated accurately through pseudoinverse to track the AM/AM and AM/PM characteristics, which is the inverse of G(x) and expressed as  $G^{-1}(x) = d_1x + d_3x^3 + d_5x^5 + d_7x^7 + d_9x^9 + \cdots$ . Note that it will be applied to the derivation later.

For further analysis on the auxiliary memory compensation block, the extraction of feedback filter coefficients is illustrated by starting from the PA model in (4) with the help of CRV simplifications. The feedback path ( $j \ge 1$ ) can be expressed as,

$$\sum_{j=1}^{M} \gamma_{j} y_{RF} (t-j) = \gamma_{j} \sum_{p=1}^{P} c_{p} x_{RF}^{p} (t-j) + \gamma_{j} \sum_{p=1}^{P} c_{p} \sum_{k=1}^{p} \left\{ \binom{p}{k} x_{RF}^{p-k} (t-j) \\ \left[ \sum_{m=1}^{M} \gamma_{m} y_{RF} (t-j-m) \right]^{k} \right\}$$
(5)

Note that the second term of (5) represents the multiple cycles through the feedback path. The factor  $\gamma_j^a \gamma_m^b$  becomes negligible as considering  $|\gamma_j| < 1$  and the total order a+b > 1. Thus,  $y_{RF}(t-j) \cong G(x_{RF}(t-j))$  [7] is assumed. Substituting (3) into (4),  $y_{RF}(t)$  is expressed as,

$$y_{RF}(t) = \sum_{i=1}^{P} c_i \left( x_{RF}(t) + \sum_{j=1}^{M} \gamma_j G \left( x_{RF}(t-j) \right) \right)^p.$$
(6)

The binomial expansion of (6) can further be expressed as,

$$y_{RF}(t) = \sum_{i=1}^{P} \sum_{k=0}^{i} \left\{ c_i \begin{pmatrix} i \\ k \end{pmatrix} x_{RF}^{i-k}(t) \left[ \sum_{j=1}^{M} \gamma_j G \left( x_{RF} \left( t - j \right) \right) \right]^k \right\},$$
(7)

where  $\binom{i}{k}$  is the coefficient of binomial expansions of the p power of x(t) and  $\sum_{j=1}^{M} \gamma_j G(x_{RF}(t-j))$ . By extracting the k = 0 and k = 1 terms from (7) and considering the higher power of  $\gamma_j$  are negligible( $|\gamma_j| < 1$ ), y(t) is simplified as,

 $y_{RF}(t)$ 

$$\approx G(x_{RF}(t)) + \sum_{i=1}^{P} i c_i x_{RF}^{i-1}(t) \left[ \sum_{j=1}^{M} \gamma_j \sum_{k=1}^{P} c_k x_{RF}^k(t-j) \right].$$
(8)

And, with the replacement of the index of i starting from 0,

 $y_{RF}(t)$ 

$$= G\left(x_{RF}(t)\right) + \sum_{i=0}^{P-1} \sum_{j=1}^{M} \sum_{k=1}^{P} (i+1) c_i \gamma_j c_k x_{RF}^i(t) x_{RF}^k(t-j).$$
(9)

In (10), the feedback topology has the same basis construction with the Volterra series and its related pruned models in (2). Also, the structure of the feedback topology in (9) shares the same basis construction with CRV [7], which is the largest disposal property in the PA behavior modeling [23]. The interaction between the nonlinearities and memory effects can be achieved in the 2<sup>nd</sup> term in (9). From the 2<sup>nd</sup> term of (8), it can be separated into two summations. The first part  $\sum_{i=1}^{P} ic_i x_{RF}^{i-1}(t)$  can be observed as the derivation of G(x), which is expressed as G'(x), and the second part  $\sum_{j=1}^{M} \gamma_j \sum_{k=1}^{P} c_k x_{RF}^k(t-j)$  is the nonlinear FIR filter response of *j*th delayed G(x), which can be expressed as,

$$y_{RF}(t) = G(x_{RF}(t)) + G'(x_{RF}(t)) \left[ \sum_{j=1}^{M} \gamma_j \sum_{k=1}^{P} c_k x_{RF}^k(t-j) \right].$$
(10)

To estimate the coefficients of the memory compensation block, memoryless predistorter  $G^{-1}(x)$  is firstly applied to (10) to compensate the PA memoryless nonlinearities. The linearized  $y_{DPD1}(t)$  is then captured, which can be expressed as,

$$y_{DPD_{1}}(t) = G\left(G^{-1}(x_{RF})\right) + G'\left(G^{-1}(x_{RF})\right) \left[\sum_{j=1}^{M} \gamma_{j} \sum_{k=1}^{P} c_{k} \left[G^{-1}(x_{RF}(t-j))\right]^{k}\right].$$
(11)

In (11), the derivation of  $G(G^{-1}(x_{RF}))$  can be expressed as,

$$G'\left(G^{-1}\left(x_{RF}\right)\right) = \frac{1}{\left[G^{-1}\left(x_{RF}\right)\right]'}.$$
 (12)

Furthermore,  $\sum_{j=1}^{M} \gamma_j \sum_{k=1}^{P} c_k \left[ G^{-1} \left( x_{RF} \left( t - j \right) \right) \right]^k \text{ of } (11) \text{ is}$ 

simplified to a linear FIR filter response after  $G^{-1}(x_{RF})$  is applied, which is expressed as,

$$\left[\sum_{j=1}^{M} \gamma_j \sum_{k=1}^{P} c_k \left[ G^{-1} \left( x_{RF} \left( t - j \right) \right) \right]^k \right] = \sum_{j=1}^{M} \gamma_j x_{RF} \left( t - j \right).$$
(13)

In (13),  $y_{DPD1}(t)$  is linearized with the nonlinear FIR filter response remains, which is expressed as,

$$y_{DPD_1}(t) = x_{RF}(t) + \frac{1}{\left[G^{-1}(x)\right]'} \sum_{j=1}^{M} \gamma_j x_{RF}(t-j). \quad (14)$$

To further derive the FIR filter coefficients  $\gamma_j$  which describe the nonlinear memory effect, the error term,  $err(t) = y_{DPD_1}(t) - x(t)$ , is substituted into (14),

$$\sum_{j=1}^{M} \gamma_j x_{RF} (t-j) = (y_{DPD_1} (t) - x_{RF} (t)) (d_1 + 3d_3 x_{RF}^2 (t) + d_1 + 3d_3 x_{RF}^2 (t) + \dots + j d_j x_{RF}^{j-1} (t)).$$
(15)

To digitally estimate the coefficients in the FPGA, the feedback topology is converted to the baseband model, in which  $2x_{RF}(t) = x(n) e^{j\omega_c t} + x^*(n) e^{-j\omega_c t}$ . As the interesting RF bandwidth of DPD is around  $\omega_c$ , the baseband-equivalent expression of (14) is then simplified as,

$$\sum_{j=1}^{M} \gamma_{j} x (n - j)$$

$$= d_{1} y_{DPD_{1}}(n)$$

$$+ \frac{3}{8} d_{3} \left[ x^{2}(n) y_{DPD_{1}}^{*}(n) + 2 |x(n)|^{2} y_{DPD_{1}}(n) \right]$$

$$+ \cdots \frac{p}{2^{p}} d_{p} \left[ \binom{p-1}{(p-1)/2} |x(n)|^{p-3} x^{2}(n) y_{DPD_{1}}^{*}(n) + \binom{p-1}{(p+1)/2} |x(n)|^{p-1} y_{DPD_{1}}(n) \right] - G^{-1}(n). \quad (16)$$

where  $(\cdot)^*$  denotes the complex conjugate;  $\binom{p-1}{(p-1)/2}$  and  $\binom{p-1}{(p+1)/2}$  are the coefficients of binomial expansions of the (p-1) power of complex value conversion x(t). To summarize the involved equations from (10) to (16), Fig. 2(b) shows the block diagram of the  $RLS_{\text{DCD}}$ . Considering that the coefficients of the memoryless model are trained,  $uvec = [u(n)u(n)|u(n)|^2 \dots u(n)|u(n)|^{(p-1)/2}]^{\text{T}}$  is defined as the nonlinear input vector. Whereas  $uvec = [u(n)u(n-1) \dots u(n-M)]^{\text{T}}$  is the delayed tapped input, when considering the linear filter coefficients. *MODE* is the control signal of the switches for different coefficients training.  $G_{\text{err}}$  is defined



Fig. 4. The AM/AM curve of memoryless DPD and feedback path block, which shows that the feedback path gain is much <1. Thus, the stability of the system is guaranteed under sufficient stability condition.

to represent the right-hand operations in (16). The FIR filter path is firstly opened to estimate  $d_i$  of  $G^{-1}(x)$  alone with the feedback signal y(n) and input x(n). After  $d_i$  is estimated, *MODE* is switched to activate the memoryless predistorter and start to estimate  $\gamma_j$  with capturing the nonlinear-predistorted  $y_{\text{DPD}_1}(n)$  through y(n). To further eliminate the number of multiplications, a fraction number of  $G_{\text{err}}$  in (16) is altered to a  $2^{-K}$ , where  $K = 1, 3, 5, \ldots$ , such that only the shift-bit is applied to  $d_i$ . The estimated error of the coefficients is ~1% in the simulation.

# C. Stability of fCRV

With the bounded input-output norms, the stability issue of a nonlinear system can be determined by the small-gain theorem [27], [28] and the Nyquist criterion. For the feedback system as depicted in Fig. 2(a),  $f(\cdot)$  and  $g(\cdot)$  describe the forward and feedback gain blocks, respectively. For a sufficient stability condition for finite gain Lyapunov stability of feedback connection [28], the circle criterion is considered in order to obtain a sufficient condition for L<sup>2</sup> stability [28]. The input- output inequality of a 2<sup>nd</sup>-order norm of (8) can be expressed for finite-energy signals,

$$\|y\|_{2} \leq \frac{\Gamma\{G(x)\}}{1 - \Gamma\{G'(x)\}} \sum_{j=1}^{M} \Gamma\{\gamma_{j}\} \|x\|_{2}$$

where  $\Gamma \{\cdot\}$  is the norm gain of the function. Thus, the L<sup>2</sup> stability is guaranteed if the inequality

$$\Gamma\left\{G'(x)\right\}\sum_{j=1}^{M}\Gamma\left\{\gamma_{j}\right\}<1$$
(17)

is accomplished [28]. The simulated AM/AM curve is plotted in Fig. 4. Thus, the proposed predistorter is stable. From the practical viewpoint of the modeling/DPD, PA nonlinearities are mainly provided by the memoryless nonlinearities block in Fig. 2(a). The FIR feedback path acts as an auxiliary block that describes the nonlinear memory effect. The feedback gain



Fig. 5. DPD implementation in the FPGA with total processing time =  $225 \ \mu$ s. A band-pass FIR filter eliminates the out-of-band distortion of the feedback signal at IF. The feedback signal is time-aligned precisely to the input u(n) via cross-correlation to reduce the phase shift for the RLS<sub>DCD</sub> estimator. After training the coefficients, the parameters are sent to the predistorter for the output of DPD.

is much smaller than the memoryless nonlinearities gain. The stability of the system is guaranteed under sufficient stability condition.

#### D. Coefficients Estimator

Considering the multi-linear properties of the memoryless predistorter and the linear FIR filter structure for  $\gamma_j$ , the LS algorithm can be used for extracting such parameters. A separated parameter sets  $W_{nl}$  and  $W_l$  can be constructed, containing the coefficients of the memoryless predistorter and the linear feedback memory, respectively. The matrix form of the parameters estimation is given by,

$$\mathbf{\Phi}(n) \mathbf{W}(n) = \mathbf{z}(n) \tag{18}$$

where  $\Phi(n)$  is a  $N \times L$  correlation matrix of the input, constructed by the linear and nonlinear approaches. W(n) is a  $N \times 1$  estimated coefficients, and  $\mathbf{z}(n)$  is the expected output vector, which is a  $N \times 1$  cross-correlation between tap inputs; N is the number of estimated coefficients and Lis the total length of the capture data. The optimized W(n)can be estimated directly with an inverse matrix via the LS algorithm. For the coefficients of the memoryless predistorter  $d_i$ ,  $\Phi(\mathbf{n})$  is then constructed by including the product terms  $y[n]|y[n]|^{p-1}$ , for n = 1 to L. The modeling error vector  $\mathbf{e} =$  $[e[M+1], ..., e[L]]^T$  is defined, where e[n] = y[n] - u[n]. Thus, the LS solution of the parameters estimation of the separated sets can be extracted with minimizing the cost function  $J(\boldsymbol{\varphi}) = \mathbf{e}^H \mathbf{e}$ , where the expected output vector is  $\mathbf{z} =$  $[u[n], \ldots, u[n-L]]^T, [\cdot]^H$  represents Hermitian transpose and  $[\cdot]^T$  represents the transpose operation, which is

$$\hat{\mathbf{W}}_{nl} = \left(\boldsymbol{\Phi}^{H}\boldsymbol{\Phi}\right)^{-1}\boldsymbol{\Phi}^{H}\mathbf{z}$$

For the FIR filter  $\gamma_i$ , the LS solution is thus represented by,

$$\hat{\mathbf{W}}_{l} = \left(\mathbf{Y}_{l}^{H}\mathbf{Y}_{l}\right)^{-1}\mathbf{Y}_{l}^{H}\mathbf{z}$$

where  $Y_1$  is constructed by the delayed tapped vector of  $[y[n], \ldots, y[n-M]]^T$ . The computational complexity of the LS algorithm is enlarged with  $O(N^2L)$  and matrix division is included, which is not realistic for training the coefficients adaptively. To minimize the complexity, the details of the training core are described next.

## **III. FPGA IMPLEMENTATION**

In the literature, the coefficients of the memory-included predistorters are mostly estimated offline with an external digital processor or hosted in the PC since the number of estimated coefficients is enlarged. The identification core are reduced dramatically due to the separation of nonlinearities and memory effect inside fCRV. Fig. 5 depicts the block diagram of the FPGA realization. Both identification and predistortion are implemented with limited resources to demonstrate the ability of an adaptive approach, apt for handling different variations of PA. In our design, the baseband signal is obtained by performing an integer down-sampling from IF. The raised cosine FIR filter, cross-correlation block and adaptive complexity-efficient coefficients estimator core RLS<sub>DCD</sub>, are detailed next. Note that the accuracy is highly related to the used number of bits at each stage.

# A. Estimator With a Line-Searched-Based RLS Algorithm

Static parameter estimation, such as LS, is common to estimate the coefficients of the nonlinear model. Yet, considering that the coefficients have to be trained adaptively, the matrix multiplication and division are costly for the FPGA. Thus, a learning- based algorithm is chosen as an estimator such as the LMS algorithm [10] or the RLS algorithm [29]. Instead of using the Kalman filter [29] approach as the RLS algorithm, a division-free line-search-based RLS algorithm [25] is extended to operate in linear and nonlinear mode coefficients estimation, which is known as dichotomous-coordinate-descent RLS ( $RLS_{DCD}$ ), and can reduce the multipliers, division and complexity. Fig. 6(a) shows the block diagram of the entire  $RLS_{DCD}$ . The  $RLS_{DCD}$  estimation is then given by,

$$\Phi(n) = \lambda \Phi(n-1) + \mathbf{v}(n) \mathbf{v}^{H}(n)$$
  

$$\beta(n) = \lambda \beta(n-1) + \mathbf{v}(n) e^{*}(n)$$
  

$$e(n) = \zeta(n) - \mathbf{W}^{H}(n-1) \mathbf{v}(n)$$
  

$$\Phi(n) \Delta \mathbf{W}(n) = \beta(n)$$
  

$$\mathbf{W}(n) = \mathbf{W}(n-1) + \Delta \mathbf{W}(n)$$
(19)

where  $\lambda$  is the forgetting factor  $0 < \lambda < 1$ , which gives the exponential weight to the previous samples;  $\zeta(n)$  is the desired output of the estimator;  $\beta(n)$  is defined as the residual term after estimating  $\Delta \mathbf{W}(n)$  in each cycle.  $\boldsymbol{\beta}(n)$  is recursively updated  $\mathbf{z}(n)$  of (19), which is calculated by incorporating the previous  $\beta$  (*n* - 1) and the loop error e(n), which is expressed as  $\beta(n) = \mathbf{z}(n) - \Phi(n) \mathbf{W}(n)$  and  $\mathbf{v}(n)$  is the tapped input of u(n) with linear and nonlinear approaches, which are determined by the switch MODE in Fig. 2. For estimating coefficients  $d_i$ ,  $\mathbf{v}(n)$  is constructed by a nonlinear approach, in which  $\mathbf{v}(n) = [u(n)u(n)|u(n)|^2 \dots u(n)|u(n)|^{(p-1)/2}]^T$ , while for estimating  $\gamma_i$  with switching MODE,  $\mathbf{v}(n)$  is represented as  $\mathbf{v}(n) = [u(n)u(n-1)\dots u(n-M)]^{\mathrm{T}}$ . The computational complexity of (19) is decreased to  $O(N^2)$  when compared to LS and traditional RLS. Yet, the size of this approach is limited by the correlation matrix  $\Phi(n)$ , which is constructed by multiplying the vector  $\mathbf{v}(n)$  with its Hermitian  $\mathbf{v}^{\mathrm{H}}(n)$ , in each cycle, and the matrix division is still involved to extract the parameters vector  $\mathbf{W}(n)$ . In each time instance n, the RLS<sub>DCD</sub>



Fig. 6. Details of the training core RLS<sub>DCD</sub>. (a) The block diagram of RLS<sub>DCD</sub> with updating W(n) in (20); (b) The flowchart of the Leading DCD algorithm; (c) The lower triangular formulation of  $\Phi(n)$  in (20), with 16-bit resolution, the marked in red part is required to multiple in each cycle.

abandons the concept of finding W(n) from the direct inverse in (19). Instead,  $\Delta \mathbf{W}(n)$  is extracted recursively from  $\mathbf{\Phi}(n)$ and  $\beta$  (*n*) in (19). Note that an inverse matrix is still involved to estimate  $\Delta W(n)$  in (19). With the help of the line-searchbased method,  $\Delta W(n)$  is solved iteratively by choosing a different updating direction of vector  $\boldsymbol{\beta}(n)$  and step size  $\alpha$ , where  $\alpha$  is a *m*-bit resolution factor of  $\Delta W(n)$ . Fig. 6(b) illustrates the flowchart of the leading DCD core. In the DCD algorithm [25], the updating directions are chosen as Euclidean coordinates with the leading index. This leading index of the Euclidean coordinates of  $\mathbf{r}(n)$  is determined by searching the maximum position k with its amplitude, where k indicates the considered row in the single loop. The matrix-vector calculation of  $\mathbf{r}(n)$  is then descent to a vector-scalar calculation  $r_k$  to minimize the hardware usage. Thus, the coefficients of different nonlinear order can be trained separately during estimating  $\Delta \mathbf{W}(n)$  with complexity O(N). Note that s represents the real or imaginary  $r_k$  and is determined while updating the direction. The accuracy of W(n) is determined by the step size  $\alpha$  of the inner loop and residual vector  $\mathbf{r}(n)$ . All the direction of  $\mathbf{r}(n)$  is recursively adjusted with its corresponding  $\mathbf{R}^{(k)}$  to estimate  $\Delta \mathbf{W}(n)$  until  $\mathbf{r}(n)$  is sufficiently small, or the number of iterations reaches the limit number  $\eta$ , where the resolution of  $\Delta \mathbf{W}(n)$  and  $\mathbf{r}(n)$  is 16 and 28 bits, respectively, and the number of iterations of the leading DCD  $\eta = 50$ are chosen. The complexity of the leading DCD core is 3N multiplications per sample, which is slightly higher than



Fig. 7. The rate of convergence of LMS, RLS and RLS<sub>DCD</sub>.

LMS with 2N multiplication. Nevertheless, the training core still combines the advantage of fast convergence with low computational complexity. The LMS algorithm provides low complexity training, with O(N) complexity. Comparatively, the RLS algorithm provides fast convergence speed with accurate parameters estimation. Fig. 7 shows the comparison of rate of convergence through LMS, traditional RLS and



Fig. 8. Implementation of the distributed arithmetic (DA) FIR filter.

line-searched-based RLS. In the simulation, with the nonlinear approach included, the convergence rate of the traditional RLS and the line-searched based RLS is close ( $\sim$ 150 iterations), while LMS needs 100 times more iterations. Alternatively, the steady state error of LMS is involved after training, which decreases the training accuracy. To further minimize the number of multipliers, Fig. 6(c) depicts the matrix  $\Phi(n)$ to illustrate the implementation. The lower triangular part of  $\Phi(n)$  is only considered due to the matrix symmetry of the correlation matrix. For estimating  $d_i$ , the nonlinear mode is approached in RLS<sub>DCD</sub>. All terms of  $\Phi(n)$  are directly related to  $|u(n)|^2$  such that only the 1<sup>st</sup> term requires the complex multiplication. Others can be calculated as real multiplication in the linear memory. In summary, the 1<sup>st</sup> column and row marked in red is calculated in each cycle to construct  $\Phi(n)$ . For estimating  $\gamma_i$ , only the 1<sup>st</sup> column of  $\Phi(n)$  marked in red is calculated in each cycle since the other columns can be copied from the time-delayed matrix  $\Phi(n-1)$  in the previous step. In overall, the identification complexity is O(N)with no division involved. The parameter estimation time is  $\sim$ 180  $\mu$ s under an 80-MHz clock. The running complexity for the proposed predistorter is (5P + 4) + (4M + 2) FLOPs, which is only slightly larger than the memoryless model with (5P + 4) FLOPs. In overall, when compared with the CRV, which is an indirect learning algorithm (with pseudoinverse), the identification complexity is decreased here from  $O(N^2L)$  to O(N), and the running complexity is relaxed from  $5P + 2PM + 4(P^2 + 1)M$  FLOPs to (5P + 4) + (4M + 2)FLOPs.

#### B. Signal Processing Before the Estimator

A 21-tap lowpass raised cosine FIR filter is designed to limit the out-of-band distortion of the feedback path for the OFDM signal with an 80-MHz signal bandwidth. In general, the raised cosine filter is designed for the baseband to avoid the inter-symbol interference (ISI) after filtering. As the receiver is down-converted to IF in the measurement, a frequency-shifted raised cosine filter from baseband to IF is thus involved to filter the signal at IF. Yet, precise coefficients and calculations are indeed required to minimize the ISI. The conventional FIR filter consists of multiplication and addition units. High resolution multiply-and-accumulate (MAC) blocks are required to compute the inner-product of input and filter coefficients

TABLE I THE LUT-CORRESPONDING 4-TAP FIR FILTER

| $x_{\rm b}(q)$ | <i>x</i> <sub>b</sub> (q-1) | <i>x</i> <sub>b</sub> ( <b>q-2</b> ) | <b>x</b> b( <b>q-3</b> ) | LUT Coefficients                                                    |  |  |
|----------------|-----------------------------|--------------------------------------|--------------------------|---------------------------------------------------------------------|--|--|
| 0              | 0                           | 0                                    | 0                        | 0                                                                   |  |  |
| 0              | 0                           | 0                                    | 1                        | <i>h</i> (0)                                                        |  |  |
|                |                             | 1                                    | 1                        |                                                                     |  |  |
| b <sub>3</sub> | <i>b</i> <sub>2</sub>       | <i>b</i> 1                           | b <sub>0</sub>           | $h(0) \cdot b_0 + h(1) \cdot b_1 + h(2) \cdot b_2 + h(3) \cdot b_3$ |  |  |
|                |                             | 1                                    |                          |                                                                     |  |  |
| 1              | 1                           | 1                                    | 1                        | h(0)+ h(1)+ h(2)+ h(3)                                              |  |  |

O(N), and the running complexity is relaxed from  $5P + 2PM + 4(P^2 + 1)M$  FLOPs to (5P + 4) + (4M + 2) FLOPs.

vector under low throughput of implementation. A distributed arithmetic (DA) FIR filter [31] is developed to conserve the MAC blocks by a bit-serial rearrangement through a series of LUTs and adders. The DA FIR filter output  $y_{\text{FIR}}(n)$  can be expressed as,

$$y_{FIR}(n) = -2^{B} x_{B}(n) h(n) + \sum_{b=0}^{B-1} 2^{b} \sum_{q=1}^{Q-1} h(q) x_{b}(q), \quad (20)$$

where B is the number of bits of x(n); h(n) is the filter coefficients;  $x_{\rm b}(n)$  is the *b*-bit resolution of x(n). In (20), the inner product of x(n) and h(n) of the conversional FIR filtering is considered as a distributed *b*-bit input of x(n)corresponding to its filter coefficients h(n). The FIR filter output is thus transformed from a direct inner product to a combination sum of bitwise input with its h(n). Under this property, the results of the inner product are stored in the LUT by pre-calculating all the sum of h(n) according to its address, which is the binary value of  $x_b(n)$ . The size of the LUT originally requires  $2^{Q}$  entries under *Q*-tap. With an add-on bit-shifter and adder, it can be decomposed into a sub-LUT by considering 4-tap input bits to minimize the use of registers [30]. Fig. 8 details the implementation of the DA FIR filter, where its coefficients are decomposed into Q/4 phase  $(Q1 \sim Q5$  in this work) to build the sub-LUT. Table I exhibits the corresponding LUT output of Q1. The total size of the LUT requires  $(Q/4) \times 2^4$  memory elements. Considering the symmetry of the designed FIR filter, the rightmost bits of the shift registers of x(q) and x(Q-1-q) are added, which become the address of the LUT. All the corresponding outputs of LUTs through Q1 to Q5 are added. The shift-bit input register is shifted into the LUT in every clock cycle. The output  $y_{\text{FIR}}(n)$  is generated by accumulated *B* consecutive time from the LUT. The sign control is used to change the addition to subtraction for the sign bits included in (20). Thus, the DA FIR filter is implemented with (Q/4), 16-word sub-LUT, (Q/4) shifted-add and a pipelined shift- adder. Note that the number of clock cycle used of the DA FIR filter is related to the bit resolution of x(n).

Then the filtered output  $y_{FIR}(n)$  enters into the crosscorrelation block (xcorr) in Fig. 5, which estimates and adjusts the loop delay between the feedback filtered signal  $y_{FIR}(n)$ and its original input u(n). The loop delay adjustment can easily be completed as it is related to the address of the FIFO. The concept of estimation the loop delay is based on the maximum amplitude of the cross-correlation between the two signals. To minimize the hardware, an alternative method consists of the simplification of the convolution process by matching the trend (shape) of the envelope of u(n) and  $y_{\rm FIR}(n-\tau)$ . Multipliers are replaced by an XOR gate under 1-bit multiplication. A bit counter counts the number of 1's from the pipelined XOR gates to calculate the cross correlation. The loop delay  $\tau$  is finally determined by the peak value from the register array. The implementation of (21) is detailed in [19] with utilizing the absolute value of a complex number by the "alpha max plus beta min" algorithm [32]. The choices of  $\alpha$  and  $\beta$  are based on the tradeoff between acceptable accuracy and complexity. Here,  $\alpha = 1$  and  $\beta = 0.25$  are chosen in this work by adding a bitwise shifter and 2 multiplexers. In the literature, the digital IF is down-converted to the baseband with a digital down-converter (DDC). However, two precise delay lock loops (DLLs) are entailed to achieve a minimum phase shift. In fact, DLLs are a limited resource inside the low-cost FPGA. It is generally provided to tune the phase difference between the system and ADC clocks. In our design, the baseband signal is obtained by down-sampling  $N_{\rm down}$  to match the sampling frequency of u(n). The concept is based on sub-sampling, which can be thought as mixing the IF signal with the sampling frequency and its harmonics. Thus, the IF signal can be down-mixed to DC without any distortions. The considered ratio between the sampling frequency and the bandwidth of the baseband signal should be integer in order to fold correctly. Thus, the 24-bit DDC is ignored in the design. Table II lists the summary table of the FPGA as considering all mentioned parameters in Section III with a 12-bit feedback ADC; P = 11 and M = 5. The processing time of the FIR filter, cross-correlation and the down-sampling is  $\sim 90 \ \mu s$  under a 160-MHz clock rate. Thus,  $y_D(n)$  and u(n) are inputted to the RLS<sub>DCD</sub> estimator for further training in Fig. 5.

# IV. DISCUSSION AND MEASUREMENT RESULTS

The running complexity and estimated order of a predistorter allow a fair comparison between different DPDs. Table III benchmarks this work with the existing pruned Volterra-series DPDs. The running complexity is to count the number of multiplications and additions on each input signal, such that it must operate at a high rate to preserve the same throughput. In other words, it must be kept as simple

TABLE II FPGA UTILIZATION SUMMARY

| Module              | Slice<br>Registers | Slice<br>LUTs | Block<br>RAM/FIFO | Internal<br>18-bit<br>Multipliers |
|---------------------|--------------------|---------------|-------------------|-----------------------------------|
| Estimator           | 29480              | 10196         | 0                 | 345                               |
| Predistorter        | 1482               | 1570          | 0                 | 39                                |
| BPF                 | 2097               | 1381          | 229376            | 0                                 |
| Loop Delay Adjuster | 1002               | 32            | 282120            | 0                                 |
| Downsampling        | 192                | 128           | 28672             | 0                                 |
| ADC Interfaces      | 8041               | 5464          | 3860147           | 0                                 |
| Total Used          | 42294              | 18770         | 4398515           | 384                               |
| Utilization         | 73%                | 32%           | 66%               | 100%                              |

TABLE III THE COMPLEXITY AND ESTIMATED ORDER OF DIFFERENT DPDs

| DPD Method     | Running Complexity<br>(FLOPs)                                              | Estimated Order                                                           |  |
|----------------|----------------------------------------------------------------------------|---------------------------------------------------------------------------|--|
| Memoryless     | 5P+4                                                                       | (P+1)/2                                                                   |  |
| MP             | P+4(M+1)(P+1)                                                              | (M+1)(P+1)/2                                                              |  |
| GMP            | P+(P-3)G<br>+4(M+1)[(P+1)+G(P-1)]<br>+M(M+1)[4(P-1)+(P-3)/2]               | (M+1)[(P+1)/2+G(P-1)/2]<br>+[M(M+1)/2](P-1)/2                             |  |
| DDR2           | 23+14M+(M+1)(P-1)(6M+5)<br>+(P-3)(13M <sup>2</sup> +7M)<br>+3(P-5)M(M+1)/2 | (M+1)(P+1)/2+<br>(M <sup>2</sup> +M+M(M+1)/2)(P-1)/2<br>(M(M+1)/2)(P-3)/2 |  |
| CRV            | 5P+2PM+ 4(P <sup>2</sup> +1)M                                              | (P+1)/2+(P <sup>2</sup> +1)M/2                                            |  |
| Proposed fCRV* | (5P+4) + (4M+2)                                                            | (P+1)/2 *                                                                 |  |

\* separated estimation on memory and polynomial coefficients in fCRV

as possible. The running complexity of the proposed fCRV compares favorably with other pruned Volterra-series based DPDs. For the estimated order, fCRV is the same as the memoryless DPD; both are much simpler than the pruned Volterra-series DPDs. Thus, both the predistortion and the online identification of the proposed DPD is hardware-efficient in the FPGA.

#### A. Measurement Setup

Fig. 9 shows the measurement setup. An Agilent E4438C vector signal generator (VSG) is employed to generate an RF modulated signal from the digital baseband. Two commercial PA device-under-tests (DUTs), MAX2242 from Maxim Integrated and ZHL1724 from MiniCircuits, are chosen. An RF-to-digital receiver board, Texas Instrument (TI) TSW1266, is utilized with a sample frequency of 614.4-MSPS, a 12-bit resolution ADC to capture the DUT's input and output for further signal processing in the FPGA. A 10-MHz trigger signal is synchronized with the VSG in order to minimize the frequency offset. We employ a 20-dB attenuator (RADIALL R413820000) to lower the power of the PA output to meet the ADC's requirement with an acceptable 8-dB increment of noise floor. The output spectrum



Fig. 9. Measurement setup for the proposed DPD.

can be observed from the spectrum analyzer Agilent PXA N9030A or the graphical user Interface (GUI) of the data capturing board, TI TSW1400. In the digital part, TSW1400 is chosen as a data capturing and sending center, which consists of Altera's FPGA EP4SGX70HF35C3, on-board DDR2 RAM, a Serial Peripheral Interface Bus (SPI) and the high speed mezzanine card (HSMC) bus. All the digitalized IF signal, which is centered at 153.2MHz in IF, from ADC passes through the HSMC bus to TSW1400 and is stored inside the on-board RAM. Furthermore, the legacy FIFO is used for reading the ADC's digitalized data as the RAM operates at  $\sim 1.2$  GHz. The data can be sent to GUI through the SPI interface or written to the FIFO for processing. Band- passed, cross-correlated and down-sampled outputs are inputted to the estimator and predistorter for parameters estimation and predistortion. The predistorted data is extracted as a CSV file through SPI interface and sent to the VSG.

The test data is a 20-MHz 64-QAM OFDM signal with an 8.53-dB PAPR, 52 active subcarriers out of 64, 312.5-kHz subcarrier spacing and 4x oversampling. This test signal is modulated by a 2.44-GHz carrier, and measured as having a native EVM of 1.2% and ACLR of (-50.8 dBc, -50.7 dBc). Another test data is the carrier aggregation of two 5-MHz 64QAM OFDM signals, with 8.53- and 8.91-dB PAPR, 52 active subcarriers out of 64, 312.5-kHz subcarrier spacing and 8x oversampling. The separation between the two OFDM signals is 20 MHz. For the evaluation metrics, the EVM shows the symbol accuracy, and the ACLR concerns of the spectral clearness.

# B. DPD on Maxim 2242 PA

This DUT operates between 2.4 to 2.5 GHz, with a power gain of 28.5 dB and an output power of +22.5 dBm. The output power is relatively small when comparing with the PAs used in the base-station. Considering that the input power of the DUT is burst to relatively a small back-off power, severe nonlinearities and memory effect are generated. Fig. 10 plots the measured ACLR and the EVM performance, with and without proposed DPD, under different average output power. The ACLR improves from -28.4 to -46.1 dBc. The EVM improves from 10.1% to 3.2%. At an average output power > 12 dBm, the DUT operates inside the saturation region with



Fig. 10. Measured (a) ACLR and (b) EVM versus average output power, with (black) and without (grey) DPD.

 TABLE IV

 Performance Comparison Using the Max2242 PA

|                   | Accuracy      |            |              |               |            |              |
|-------------------|---------------|------------|--------------|---------------|------------|--------------|
|                   | Case I        |            |              | Case II       |            |              |
| DPD<br>Method     | ACLR<br>(dBc) | EVM<br>(%) | NMSE<br>(dB) | ACLR<br>(dBc) | EVM<br>(%) | NMSE<br>(dB) |
| Memory-<br>less   | -46.2/-45.4   | 3.4%       | -35.2        | -38.4/-37.6   | 3.9        | -27.2        |
| GMP               | -50.6/-50.5   | 2.0%       | -38.7        | -46.6/46.8    | 3.0        | -29.7        |
| DDR2              | -50.8/-50.1   | 1.9%       | -37.5        | -46.0/-46.1   | 3.4        | -29.0        |
| CRV               | -50.8/-50.7   | 1.8%       | -38.5        | -46.3/-46.5   | 3.1        | -29.7        |
| fCRV              | -50.7/-50.5   | 1.8%       | -38.5        | -46.5/-46.6   | 3.1        | -29.5        |
| Proposed<br>fCRV* | -50.8/-50.7   | 1.9%       | -38.2        | -46.0/-46.1   | 3.2        | -29.1        |

Only fCRV\* uses  $RLS_{DCD}$  to estimate the coefficients with complexity O(N), while others estimate with LS estimation with complexity  $O(N^2L)$ .

severe nonlinearities and memory effects. Thus, two power levels, which are the output power at 9 and 7.5-dB relative back-off from the saturation power, are chosen for further discussion.

These two scenarios correspond to medium and severe RF nonlinearites and memory effects, with the 1<sup>st</sup> ACLRs for the lower and upper band (-33.7, -34.9) dBc at P<sub>in</sub> = -12 dBm, and (-27.1, -28.5) dBc at P<sub>in</sub> = -10 dBm, respectively. Table IV gives the performance comparison between the



Fig. 11. Measured (a) AM/AM and (b) AM/PM characteristics of the Max2242 PA at a 9-dB backoff from the saturation power with (black) and without (blue) DPD.



Fig. 12. Measured spectra plots at a 9-dB backoff from the saturation power.

memoryless DPD, GMP, DDR, CRV and the fCRV. The configurations of each pruning DPDs are determined and simulated by searching different P and M using the Qhull algorithm [Qhull 2012, Available: http://www.qhull.org]. Note that the coefficients of GMP, DDR and CRV can only be estimated offline with matrix inversion, while the coefficients of fCRV are estimated with the inverse matrix and adaptive RLS<sub>DCD</sub>. With a 9-dB relative back-off from the saturation power (Case I), P = 9 is chosen for both methods while M = 2 is found by sweeping from M = 1 to 4 for GMP (with sweeping G from 0 to 2), DDR, CRV and fCRV. Fig. 11 plots the measured AM/AM and AM/PM characteristics with and without our DPD. The nonlinearities of the PA can be compensated by the memoryless DPD. However, moderate memory effects cause asymmetric ACLR as shown in Fig. 12. Under the same parameters estimation with indirect learning algorithm, the identification complexity is dominated by the pseudo-inverse operation of the matrix, which is of  $O(N^2L)$ . Assuming that the number of samples L for the identification is fixed, the matrix size of memoryless DPD and fCRV (N = 5) is much less than that of CRV (N = 87). Further, with identification using  $RLS_{DCD}$ , the complexity is of O(N)such that fCRV is complexity-efficient. The ACLR of fCRV improves from <-40.1 to <-50.8 dBc. The EVM improves



Fig. 13. Measured constellation diagram of without and with DPD at 9-dB back-off from saturation power.



Fig. 14. Measured (a) AM/AM and (b) AM/PM characteristics of the Max2242 PA at a 7.5-dB backoff from the saturation power with (black) and without (blue) DPD.

from 5.5% to 1.9% (Fig. 13). The running complexity of the fCRV is slightly larger than that of a memoryless DPD when considering the feedback FIR filter involved. For the severe nonlinearities with memory effects with a 7.5-dB relative backoff from the saturation power (Case II), the DUT will be swept inside the saturation region. Fig. 14 plots the measured AM/AM and AM/PM characteristics, the severe nonlinearities and memory effect can still be suppressed effectively with the proposed DPD under P = 9 and M = 3 for CRV and fCRV. Fig. 15 illustrates the output spectrum of the different DPDs. The interaction between the nonlinearities and memory effect cannot be compensated by the memoryless DPD as its structure only compensates the nonlinearities alone. Comparatively, with M changes from 2 to 3, DDR and GMP have the similar performance with CRV (ACLR of -46.6 dBc and  $\sim 3.1\%$  of EVM). From the complexity point of view, the dispersal properties of the CRV show the ability of the diagonal dispersal compensation. The proposed DPD exhibits a performance closely to the CRV while using RLS<sub>DCD</sub>. The EVM improves from 10.1% to 3.2% (Fig. 16). The EVM performance starts to degrade as the signal swings inside the saturated region.

# C. DPD on Minicircuit ZHL1724 PA

This DUT operates between 1.7 to 2.4 GHz, with a power gain of 36 dB and an output power of +26 dBm at the 1-dB compression point. The unflat gain response leads to asymmetrical memory effect within the signal bandwidth (20 MHz). The measured ACLR and EVM performances, with and without the proposed DPD, under different average



Fig. 15. Measured spectra plots at a 7.5-dB backoff from the saturation power.



Fig. 16. Measured constellation diagram (a) without and (b) with DPD at a 7.5-dB back-off from saturation power.



Fig. 17. Measured (a) ACLR and (b) EVM, versus average output power, with (black) and without (grey) DPD of MiniCircuits ZHL1724 PA.

output power are summarized as plotted in Fig. 17. The ACLR improves from -28.4 to -44.2 dBc, and the EVM improves from 10.1% to 3.2% with P = 9 and M = 3 chosen for both fCRV and CRV. The EVMs degrade slightly after a high PAPR input signal is swept inside the saturation region. In Table V, the same back-off input power (-15 dBm) is operated at the DUT, three different PAPR input signals varying from 8.5 to 9.5 dB are applied to the DUT. The corresponding average output power is  $\sim 20$  dBm. The ACLR and EVM performance without DPD varies dramatically in this region. The ACLR and EVM performances can be maintained by switching different input signals after applying the DPD.

TABLE V Performance Comparison Using the ZHL1724 PA

| DPD Method |         | Method       | Memory-<br>less | GMP    | DDR2   | CRV    | Proposed<br>fCRV |  |
|------------|---------|--------------|-----------------|--------|--------|--------|------------------|--|
| Accuracy   |         | ACLR         | -41.8/          | -46.2/ | -46.3/ | -45.8/ | -45.4/           |  |
|            | Case I  | (dBc)        | -41.3           | -45.8  | -45.9  | -45.9  | -45.6            |  |
|            |         | EVM (%)      | 3.2             | 2.5    | 2.6    | 2.3    | 2.4              |  |
|            |         | NMSE<br>(dB) | -23.2           | -29.9  | -29.6  | -30.7  | -30.1            |  |
|            | Case II | ACLR         | -40.5/          | -45.5/ | -45.6/ | -45.5/ | -45.1/           |  |
|            |         | (dBc)        | -39.8           | -45.6  | -45.3  | -45.4  | -45.0            |  |
|            |         | EVM (%)      | 3.8             | 2.8    | 2.8    | 2.7    | 2.9              |  |
|            |         | NMSE<br>(dB) | -22.0           | -29.7  | -29.7  | -29.8  | -29.4            |  |
|            |         | ACLR         | -39.5/          | -44.6/ | -44.3/ | -44.5/ | -44.2/           |  |
|            | =       | (dBc)        | -39.1           | -44.6  | -44.1  | -44.4  | -44.3            |  |
|            | ase     | EVM (%)      | 4.7             | 3.2    | 3.6    | 3.2    | 3.3              |  |
|            | S       | NMSE<br>(dB) | -21.3           | -29.2  | -28.9  | -29.4  | -29.1            |  |



Fig. 18. Measured spectra plots at an 18.5-dBm output power of two-channel carrier aggregation.

For carrier aggregation of 2 signals with 18.5 and 19.0-dBm output power, the measured output power is less than that of the single carrier signal as the PAPR goes up from 8.63 to 9.81 dB. Comparing with the single-carrier signal, the cross- intermodulation terms between the two carriers are generated as the nonlinear distortion exists. Fig. 18 and Fig. 19 plot the output spectrums, when applying the proposed DPD. At the 18.5-dBm output power (Fig. 18), all the DPDs perform similarly to eliminate the in-band distortion and the cross- intermodulation terms with the ACLR improves from (-38.4, -37.6) to (-41.0, -41.3) dBc, with P = 9 and M = 1 are chosen after sweeping the parameters. In the 19-dBm output power (Fig. 19), the ACLR of fCRV improves from (-35.8, -36.0) to (-45.3, -45.2) dBc with P = 11 and M = 2. Comparatively, the memoryless DPD



Fig. 19. Measured spectra plots at a 19.0-dBm output power of two-channel carrier aggregation.

only can suppress the nonlinearities of the PA distortion, with ACLR improving to (-42.5, -43.9). The interaction between the cross-intermodulation and memory effects cannot be eliminated. The ACLR of CRV improves to (-43.5, -43.4) dBc with P = 11 and M = 2 are chosen after sweeping. The performance degradation of CRV is based on the inaccuracy of the LS estimation due to the enlarged estimated matrix with N = 128. In overall, the performance of fCRV is comparable to CRV with a complex-efficiency estimated matrix.

# V. CONCLUSION

A feedback polynomial topology for DPD linearization of PAs has been introduced. The nonlinearities and memory effects are constructed separately to minimize the size of the estimated matrix. The division-free and complexity-efficient  $RLS_{DCD}$  core have been accomplished to estimate both the coefficients of nonlinear and linear approaches adaptively, with a relaxed computational complexity of O(N). As a result, both identification and predistortion procedures are implementation- friendly and have the potential to estimate the coefficients adaptively in the FPGA. The measured ACLR and EVM performances of the proposed DPD have been validated with single-carrier and carrier-aggregation OFDM signals.

## REFERENCES

- [1] M. Schetzen, *The Volterra and Wiener Theories of Nonlinear Systems*. Hoboken, NJ, USA: Wiley, 1980.
- [2] L. Anttila, P. Handel, and M. Valkama, "Joint mitigation of power amplifier and I/Q modulator impairments in broadband direct-conversion transmitters," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 4, pp. 730–739, Apr. 2010.
- [3] D. Mirri, G. Iuculano, F. Filicori, G. Pasini, G. Vannini, and G. P. Gabriella, "A modified Volterra series approach for nonlinear dynamic systems modeling," *IEEE Trans. Circuits Syst. I, Fundam. Theory Appl.*, vol. 49, no. 8, pp. 1118–1128, Aug. 2002.
- [4] D. R. Morgan, Z. Ma, J. Kim, M. G. Zierdt, and J. Pastalan, "A generalized memory polynomial model for digital predistortion of RF power amplifiers," *IEEE Trans. Signal Process.*, vol. 54, no. 10, pp. 3852–3860, Oct. 2006.

- [5] A. Zhu, P. J. Draxler, J. J. Yan, T. J. Brazil, D. F. Kimball, and P. M. Asbeck, "Open-loop digital predistorter for RF power amplifiers using dynamic deviation reduction-based Volterra series," *IEEE Trans. Microw. Theory Techn.*, vol. 56, no. 7, pp. 1524–1534, Jul. 2008.
- [6] X. Yu and H. Jiang, "Digital predistortion using adaptive basis functions," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 12, pp. 3317–3327, Dec. 2013.
- [7] F. Mkadem, M. C. Fares, S. Boumaiza, and J. Wood, "Complexityreduced Volterra series model for power amplifier digital predistortion," *Analog Integr. Circuits Signal Process.*, vol. 79, no. 2, pp. 331–343, Feb. 2014.
- [8] Y. Y. Woo et al., "Adaptive digital feedback predistortion technique for linearizing power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 55, no. 5, pp. 932–940, May 2007.
- [9] C. D. Presti, D. F. Kimbal, and P. M. Asbeck, "Closed-loop digital predistortion system with fast real-time adaptation applied to a handset WCDMA PA module," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 3, pp. 604–618, Mar. 2012.
- [10] Y.-J. Liu, B. Lu, T. Cao, B.-H. Zhou, J. Zhou, and Y.-N. Liu, "On the robustness of look-up table digital predistortion in the presence of loop delay error," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 59, no. 10, pp. 2432–2442, Oct. 2012.
- [11] S. P. Stapleton and F. C. Costescu, "An adaptive predistorter for a power amplifier based on adjacent channel emissions," *IEEE Trans. Veh. Technol.*, vol. 39, no. 4, pp. 374–382, Nov. 1990.
- [12] P. L. Gilabert, G. Montoro, and E. Bertran, "FPGA Implementation of a Real-Time NARMA-based digital adaptive predistorter," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 58, no. 7, pp. 402–406, Jul. 2011.
- [13] L. Guan and A. Zhu, "Low-cost FPGA implementation of Volterra series-based digital predistorter for RF power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 4, pp. 866–872, Apr. 2010.
- [14] S. Chen, "An efficient predistorter design for compensating nonlinear memory high power amplifiers," *IEEE Trans. Broadcast.*, vol. 57, no. 4, pp. 856–865, Dec. 2011.
- [15] H. Qian, H. Huang, and S. Yao, "A general adaptive digital predistortion architecture for stand-alone RF power amplifiers," *IEEE Trans. Broadcast.*, vol. 59, no. 3, pp. 528–538, Sep. 2013.
- [16] J. Kim, Y. Y. Woo, J. Moon, and B. Kim, "A new wideband adaptive digital predistortion technique employing feedback linearization," *IEEE Trans. Microw. Theory Techn.*, vol. 56, no. 2, pp. 385–392, Feb. 2008.
- [17] J. Moon and B. Kim, "Wideband digital feedback predistortion employing segmented memory compensation for linearization of Doherty amplifier," in *Proc. Eur. Microw. Conf.*, Sep. 2010, pp. 727–730.
- [18] O. Hammi, A. Kwan, S. Bensmida, K. A. Morris, and F. M. Ghannouchi, "A digital predistortion system with extended correction bandwidth with application to LTE-A nonlinear power amplifiers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 61, no. 12, pp. 3487–3495, Dec. 2014.
- [19] Y. Ma, Y. Yamao, Y. Akaiwa, and C. Yu, "FPGA implementation of adaptive digital predistorter with fast convergence rate and low complexity for multi-channel transmitters," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 11, pp. 3961–3973, Nov. 2013.
- [20] L. Ding, Z. Ma, D. R. Morgan, M. Zierdt, and G. T. Zhou, "Compensation of frequency-dependent gain/phase imbalance in predistortion linearization systems," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 55, no. 1, pp. 390–397, Feb. 2008.
- [21] C.-F. Cheang, K.-F. Un, W.-H. Yu, P.-I. Mak, and R. P. Martins, "A combinatorial impairment-compensation digital predistorter for a sub-GHz IEEE 802.11af-WLAN CMOS transmitter covering a 10x-wide RF bandwidth," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 4, pp. 1025–1032, Apr. 2015.
- [22] A. S. Tehrani, H. Cao, T. Eriksson, M. Isaksson, and C. Fager, "A comparative analysis of the complexity/accuracy tradeoff in power amplifier behavioral models," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 6, pp. 1510–1520, Jun. 2010.
- [23] Y. Li, C.-F. Cheang, P.-I. Mak, and R. P. Martins, "The dispersal analysis on basis construction of digital predistortion techniques for power amplifiers," *Analog Integr. Circuit Signal Process.*, vol. 86, no. 1, pp. 77–86, Jan. 2016.
- [24] Y. Li, C.-F. Cheang, P.-I. Mak, and R. P. Martins, "Joint-digital- predistortion for wireless transmitter's I/Q imbalance and PA nonlinearities using an asymmetrical complexity-reduced Volterra series model," *Analog Integr. Circuit Signal Process.*, vol. 87, no. 1, pp. 35–47, Mar. 2016.
- [25] Y. V. Zakharov, G. P. White, and J. Liu, "Low-complexity RLS algorithms using dichotomous coordinate descent iterations," *IEEE Trans. Signal Process.*, vol. 56, no. 7, pp. 3150–3161, Jul. 2008.

- [26] T. R. Cunha, J. C. Pedro, and E. G. Lima, "Low-pass equivalent feedback topology for power amplifier modeling," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2008, pp. 1445–1448.
- [27] P. L. Gilabert, G. Montoro, and A. Cesari, "A recursive digital predistorter for linearizing RF power amplifiers with memory effects," in *Proc. Asia–Pacific Microw. Conf.*, Dec. 2006, pp. 1043–1047.
- [28] C. A. Desoer and M. Vidyasagar, Feedback Systems: Input-Output Properties. New York, NY, USA: Academic, 1975.
- [29] S. S. Haykin, Adaptive Filter Theory, 4th ed. Englewood Cliffs, NJ, USA: Prentice-Hall, 2001.
- [30] A. Peled and B. Liu, "A new hardware realization of digital filters," *IEEE Trans. Acoust., Speech, Signal Process.*, vol. ASSP-22, no. 6, pp. 456–462, Dec. 1974.
- [31] P. K. Meher, S. Chandrasekaran, and A. Amira, "FPGA realization of FIR filters by efficient and flexible systolization using distributed arithmetic," *IEEE Trans. Signal Process.*, vol. 56, no. 7, pp. 2009–2017, Jul. 2008.
- [32] R. G. Lyons, Understanding Digital Signal Processing, 2nd ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2004, ch. 13.2.



**Chak-Fong Cheang** (S'13) received the B.Sc. and M.Sc. degrees from the Department of Engineering Science, National Cheng Kung University, Tainan, Taiwan, in 2008 and 2010, respectively, and the Ph.D. degree from the State-Key Laboratory of Analog and Mixed-Signal VLSI, Faculty of Science and Technology (ECE), University of Macau, Macau, China, in 2017. His research interests are digital predistortion and digital mitigation on RF impairment and field-programmable gate-array-based embedded signal processing.



**Pui-In Mak** (S'00–M'08–SM'11) received the Ph.D. degree from the University of Macau (UM), Macau, China, in 2006. He is currently a Professor with the UM Faculty of Science and Technology, Department of Electrical Communication Engineering, and an Associate Director (Research) with the UM State-Key Laboratory of Analog and Mixed-Signal VLSI. His research interests are on analog and radio-frequency circuits and systems for wireless, and biomedical and physical chemistry applications.

He was the Editorial Board Member of IEEE Press from 2014 to 2016, the Senior Editor of the IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS from 2014 to 2015, an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I from 2010 to 2011 and from 2014 to 2015, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2010 to 2011 and from 2012 to 2013, and the Guest Editor of the IEEE RFIC VIRTUAL JOURNAL in 2014 and the IEEE JOURNAL OF SOLID-STATE CIRCUITS in 2018. He was the Board-of-Governor of the IEEE Circuits and Systems Society from 2009 to 2011, and a Distinguished Lecturer of the IEEE Circuits and Systems Society from 2014 to 2015 and the Solid-State Circuits Society from 2017 to 2018. He is/was the TPC Member of ISSCC (2016), ESSCIRC (from 2016 to 2017), and A-SSCC (from 2013 to 2016).

He was a co-recipient of numerous merit paper awards: ISSCC'16, A-SSCC'15, ASQED'13, APCCAS'08, DAC/ISSCC'05, MWSCAS'04 and ASICON'03, the IEEE CASS Outstanding Young Author Award 2010 and the IEEE CASS Chapter-of-the-Year Award 2009, and the Best Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II from 2012 to 2013.



**Rui P. Martins** (M'88–SM'99–F'08) born in 1957. He received the bachelor's (five years), master's, Ph.D. degrees, and the Habilitation for Full-Professor in electrical engineering and computers from the Department of Electrical and Computer Engineering, Instituto Superior Técnico (IST), University of Lisbon, Portugal, in 1980, 1985, 1992, and 2001, respectively. He has been with the Department of Electrical and Computer Engineering, IST, University of Lisbon, since 1980. Since 1992, he has been on leave from IST, University of Lisbon and is

also with the Department of Electrical and Computer Engineering, Faculty of Science and Technology (FST), University of Macau (UM), Macau, China, where he is currently a Chair-Professor. In FST, he was the Dean of the Faculty from 1994 to 1997 and he has been a Vice-Rector of the University of Macau since 1997. Since 2008, after the reform of the UM Charter, he has been nominated after open international recruitment, and reappointed in 2013, as the Vice-Rector (Research) until 2018. Within the scope of his teaching and research activities, he has taught 21 bachelor and master courses in UM, and has supervised (or co-supervised) 40 theses, Ph.D. (19), and master's (21). He has co-authored: six books and nine book chapters; 18 patents, USA (16) and Taiwan (2); 377 papers, in scientific journals (111) and in conference proceedings (266); and other 60 academic works, in a total of 470 publications. He was a Co-Founder of Chipidea Microelectronics (Macao) [now Synopsys] in 2001/2002, and created in 2003 the Analog and Mixed-Signal VLSI Research Laboratory of UM, elevated in 2011 to State Key Laboratory of China (the first in Engineering in Macao), being its Founding Director.

Dr. Martins was a Founding Chairman of the IEEE Macau Section from 2003 to 2005 and the IEEE Macau Joint-Chapter on Circuits and Systems (CAS)/Communications (COM) from 2005 to 2008 [2009 World Chapter of the Year of IEEE Transactions on Circuits and Systems Society (CASS)]. He was a General Chair of the 2008 IEEE Asia-Pacific Conference on CAS - APCCAS'2008, and a Vice-President for Region 10 (Asia, Australia and the Pacific) of IEEE CASS from 2009 to 2011. Since then, he was the Vice-President (World) Regional Activities and Membership of IEEE CASS from 2012 to 2013, and an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: Express Briefs from 2010 to 2013, nominated as Best Associate Editor of IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: for 2012 to 2013. He was a member of the IEEE CASS Fellow Evaluation Committee from 2013 to 2014, and CAS Society representative in the Nominating Committee, for the election in 2014, of the Division I (CASS/EDS/SSCS) - a Director of the IEEE. He was the General Chair of the ACM/IEEE Asia South Pacific Design Automation Conference -ASP-DAC'2016. He was a Nominations Committee Member in 2016 and is currently the Chair of the IEEE Fellow Evaluation Committee (class of 2018), both of IEEE CASS. He was a recipient of two government decorations: the Medal of Professional Merit from Macao Government (Portuguese Administration) in 1999, and the Honorary Title of Value from Macao SAR Government (Chinese Administration) in 2001. In 2010, he was elected, unanimously, as a Corresponding Member of the Portuguese Academy of Sciences, Lisbon, being the only Portuguese Academician living in Asia.