ISCAS 2000 - IEEE International Symposium on Circuits and Systems, May 28-31, 2000, Geneva, Switzerland

# A LINEAR-PHASE HALFBAND SC VIDEO INTERPOLATION FILTER WITH COEFFICIENT-SHARING AND SPREAD-REDUCTION

Seng-Pan U<sup>1</sup>, R.P.Martins<sup>1</sup>, J.E.Franca<sup>2</sup>

1 - Faculty of Science and Technology University of Macau, P.O.Box 3001, Macau, China E-mail - fstspu@umac.mo (1' - on leave from IST, E-mail - rtorpm@umac.mo)

## ABSTRACT

This paper proposes a 4-fold multistage Switched-Capacitor (SC) interpolation filter with 5 MHz passband and 54 MHz output sampling rate for NTSC/PAL digital video signal processing systems. The circuit implements an impulse sampled halfband interpolation with 23- and 7-tap FIR filtering in 1<sup>st</sup>- and 2<sup>nd</sup>-stage respectively for achieving a linear-phase response. A novel area-efficient technique including symmetrical-coefficient-sharing and spread-reduction is proposed in this transversal SC circuit embedding minimized mismatch-free analog delay lines with accurate, wideband gain- and offset-compensation. This filter is designed with optimized speed of the analog components in 0.35  $\mu$ m CMOS technology and expected to consume about 2 mm<sup>2</sup> active area and 90 mW at 3.0 V supply.

# **1. INTRODUCTION**

The video restitution post-filtering is obligatorily required in digital video encoding which can be found in the increasingly developed consumer and professional video applications, such as DVD players, TV-Output in DVD-equipped PCs, PC multimedia video editing systems, digital set-top boxes, digital still cameras, video phones as well as studio and broadcast video systems [1-5]. In such systems, the digital YCrCb (4:2:2) 8- or 16-bit component video stream inputs are converted into standard analog composite (NTSC or PAL) or S-video outputs, and a post anti-imaging filter is necessary to smooth the DAC outputs by attenuating the images from the inherent sampling process in digitizing analog video. In many cases, however, such analog video filters have been essentially discrete or passive because of the difficulty in realizing high-order monolithic continuous-time (C-T) filters with phaseequalization which require complex filter trimming for meeting the strict specifications such as CCIR-601 digital video standards [6-7]. Thus, digital multirate schemes can be employed with the price of increasing the speed of encoder datapath and DAC that consumes more power and silicon area to relax the specification of C-T filters which, nevertheless, still usually are realized off-chip [2,5]. Another more economic alternative, namely, analog multirate technique [8-9], relaxes both DSP/DAC speed and C-T filter specifications by the insertion of an Switched-Capacitor (SC) interpolation filter between the DAC and the post C-T filter. The effectiveness of this approach results from the analog components of this SC interpolator operating at lower input sampling rate by employing efficient impulse sampled polyphase structures [10].

Digital video standards recommend that the filtering of 5-MHz and 4.2-MHz bandwidth NTSC and PAL signals, respectively, needs to achieve equi-ripple gain ( $< \pm 0.25$  dB) & linear phase characteristics

 2 - Integrated Circuits and Systems Group, Instituto Superior Técnico (IST),
Av. Rovisco Pais, 1, 1096 Lisboa Codex, Portugal E-mail - franca@gcsi.ist.utl.pt

 $(<\pm 10 \text{ ns} \text{ group delay variation})$  in passband with typically a 40 dB stopband attenuation at sampling rate of 13.5 MHz. Such strict phase specification, which is specially required for usual video filtering, makes sampled-data linear phase Finite Impulse Response (FIR) realization very attractive.

This paper proposes a realization of FIR SC multirate video interpolation circuit with a first stage of 23-tap followed a second stage of 7-tap halfband filtering embedding the sampling rate increase from 13.5 MHz to 54 MHz. The circuit is designed with AMS 0.35  $\mu$ m CMOS technology with the novel halfband polyphase architecture and efficient symmetrical-coefficient-sharing and spread-reduction as well as Gain- and Offset-Compensation (GOC) techniques achieved by the Predictive Correlated-Double Sampling (P-CDS).

#### 2. CIRCUIT DESIGN

## 2.1 Half-Band Polyphase Structure

Various approaches for realization of SC FIR filters have been proposed [11-15]. However, they either require large amount of active elements, i.e. OTA's or buffers, or are not suitable for high frequency implementation. More importantly, even though they can be operated at the higher output rate 54 MHz for interpolation, it will not only increase the speed requirement of OTA's, but also introduce extra 2.1 dB-rolloff at 5 MHz in passband caused by the additional sample-and-hold (S/H) effect at input sampling rate 13.5 MHz due to the specific nature of analog interpolation, thus extra effort in Sinx/x correction is always mandatory. Thus, the impulse sampled ADB polyphase interpolation [10] is an optimum technique that will eliminate such input lower S/H effect and allow OTA's operating at lower input sampling rate which is especially appropriate for video high speed applications. For an optimum solution in power and silicon consumption with respect to the required OTA's, SC branches as well as capacitance spread and area, the multistage interpolation with halfband filtering [10] is adopted. One main advantage of halfband filter is that almost half of impulse response coefficients are zero that will not only reduces the SC branches and capacitance area but the sensitivity problem which is a critical limitation of long-tap analog transversal filter.

The overall interpolation circuit contains two stages in which the first one is a 23-tap halfband interpolator with a sampling rate increase from 13.5 MHz to 27 MHz, and the second is a 7-tap halfband filter with the final output sampling rate at 54 MHz. For clarity, the simplified circuit diagrams based on the halfband impulse sampled ADB polyphase structure are shown in Fig.1(a) and (b), respectively, for  $1^{st}$ - and  $2^{nd}$ -stage, although they are implemented in fully-differential architectures.

## 2.2 Coefficient-Sharing

One advantage of FIR linear phase filtering in implementation is the feasibility of the sharing of symmetrical coefficients. Unlike in digital signal processing, it is not achievable previously in pure SC circuits (unless for the semi-analog SC FIR structures [15] which is not suitable for high-speed filtering) due to the impossibility of analog summing by only one SC branch. As both positive and negative outputs are available in fully-differential structures, we propose here a simple elegant solution for sharing coefficients by just subtracting one positive version output signal with another but in negative version that will have same tap weight from distinct delay stages. Such subtraction can be implemented by SC branch in either Fig.2(a) or (b). The latter one is preferred as it is insensitive to parasitics and the former does, and it can also eliminate the signal-dependent charge injection & clock feedthrough errors by using bottom-plate sampling which is not applicable to the former. Note that the latter will subtract two inputs in two consecutive phases whose charge transferring can be described as

$$\Delta Q(z) = C \Big( V_{int}(z) z^{\sqrt{2}} + V_{in2}(z) \Big)$$
(1)

where the extra  $z^{-1/2}$  delay can be embedded in the delay line. Such sharing leads to not only a substantial reduction in total capacitor area which normally dominates the total chip area but importantly a large improvement in capacitance ratio matching and sensitivity. Moreover, it also reduces the total capacitive loading and increases the feedback factor to output accumulator OTA, which in turn, improves the achievable speed of summing circuit or decreases the required power consumption.

As shown in Fig.1(a),  $h_4 \& h_{18}$ ,  $h_6 \& h_{16}$ ,  $h_8 \& h_{14}$ , and  $h_{10} \& h_{12}$ have been shared. After an optimization among the number of delay block and sensitivity in cope with the coefficient-sharing, spreadreduction and gain-, offset-compensation (discussed later),  $h_0 \& h_{22}$ ,  $h_2 \& h_{20}$  have not been shared and even some are implemented by parallel SC branches. This is due to the fact that they all have the smallest value and, particularly, less sensitivity among all others, thus no big impact on silicon area and sensitivity. Furthermore, the sole mid coefficient  $h_{11}$  in polyphase filter 1, which is the most sensitive one that affects especially stopband in halfband-filter, has been adjusted to be unity and obtained directly from the mismatchfree delay line for eliminating the capacitance ratio deviation.

# 2.3 Spread-Reduction

Another critical problem of analog SC FIR filter is the capacitance spread which is normally very large especially for narrowband filtering. In this design, the spread has been optimized from original 1100 for single stage to only 114 for 2-stage realization, and then to 57 using impulse sampled interpolation technique (divided by interpolation factor L). However, it will be about 11.4 pF for unit capacitance of 200 fF which is still impractical for implementing at such high frequency. T-network scheme [16] can reduce the spread but at the expense of requiring higher DC amplifier gain and suffering parasitics which cannot be neglected especially here when the low-gain OTA's are used. Therefore, proposed here an efficient technique for further spread-reduction described as follows: all coefficients are divided into Group A:  $h_0(h_{22}) - h_6(h_{16})$  and Group B:  $h_8(h_{14}) - h_{10}(h_{12})$  which will be normalized separately with their own summing capacitor C<sub>sumA</sub> and C<sub>sumB</sub> and their charge transferring will be accomplished in successive phase A and B respectively, and summed output in phase A will be transferred to

 $C_{sumB}$  by an extra  $C_{uB}$  in phase B together with the charges from Group B capacitors. To make an integer ratio of  $C_{uB}$  and  $C_{sumB}$  for good matching leads to the accuracy of overall capacitance ratio equivalent to original one. In such realization, the spread can be reduced to only 8, although 10.3 is finally adopted to increase the feedback factor for relaxing OTA speed with dynamic range scaling for two phase outputs. Thus, about 72% reduction in total capacitance area is achieved by the above spread reduction scheme, and more than 30% is further saved by the coefficient-sharing technique.

# 2.4 Mismatch- Gain- & Offset-Compensation

The upper circuit in Fig.1(a) or (b) forms a serial analog delay-line which is another limitation of analog transversal filters, as various errors like OTA finite-gain and offset-error, capacitance mismatch as well as noise will be accumulated during the propagation of analog signals through this delay line. Hence, in this design, the number of the delay line in 1st-stage has been minimized from original 11 to now 5 stages by taking advantage of halfband architecture [10] and employing parallel SC propagation techniques with the price of more clock phases which can be simply generated by digital logic. Besides, predictive correlated-double-sampling with mismatch-free techniques [17-18], which achieves an accurate wideband compensation of finite gain and offset errors as well as the flicker noise, is also employed in the delay line. This not only reduces the fixed pattern noise due to the OTA DC offset and mismatch from the parallel path nature of polyphase structure, but also relaxes the design difficulty of high-gain but high-speed OTA's, thus allowing the use of very simple single-stage low gain architecture with a maximum exertion in their high frequency capability. For simplicity, only the novel half-period GOC SC mismatch-free delay circuit (single-ended version) is shown in Fig.1(c) where the sampling and prediction are both in phase A and compensated output is generated in phase B. The delay circuits with  $z^{-3/2}$ ,  $z^{-2}$  and  $z^{-5/2}$  can be easily obtained from [18].

The output accumulation of polyphase filter 0 is implemented with Same-Sample Correction (SSC) property for gain- and offsetcompensation [18]. Thus, for 1<sup>st</sup>-stage, only one OTA for accumulator operates at output sampling rate which will not increase too much expense in terms of silicon and power dissipation, and all other 6 OTA's operate at lower input sampling rate (full output period settling). However, since 2<sup>nd</sup>-stage needs to implement only 4 non-zero coefficients with maximum spread of 11.4, the high speed of OTA (at 54 MHz) is of more critical when compared to area. Hence, to have a predictive gain-, offset-sharing technique is not adopted, so that the OTA in accumulator is relaxed to be settled in a full output sampling period (1/54 MHz).

#### 3. CIRCUIT IMPLEMENTATION

By taking the advantage of low gain requirement, the OTA is simply the telescopic cascode input differential amplifier but with non-cascode PMOS active load. This architecture consumes less static power and has inherently lower noise due to the minimum current legs (2 legs) and noise contribution devices (4 devices), respectively. More importantly, it has very fast speed capability, because, firstly, there are only NMOS devices in the signal path. Secondly, the usage of input cascode reduces the Miller effect of input transistor, and thirdly, the length of transistors especially input differential pair can be reduced due to the gain-compensation and flicker-noise-suppression nature of CDS. Satisfactory gain, which simply depends on the transconductance of differential pair and the output resistance of PMOS active load, can be easily achieved (100-500) due to the cascode input. Furthermore, the output swing (maximum of  $2(V_{DD} - 4V_{dsat})$ ) is also enlarged due to non-cascode load when compared to typical telescopic OTA. The Miller-effect cancellation transistors [19] and internal biasing for the cascode transistors [20] are also employed, and the output common-mode voltage is controlled by a dynamic SC common-mode feedback circuit.

In P-CDS circuits, the speed requirement of OTA in predictive and compensated output phases is different and the latter is always dominant due to a small feedback factor when generating the compensated virtual ground by the error-correction-capacitor Ce. To obtain a reasonable feedback factor in both two phases and also a good gain compensation accuracy, a compromised ratio between input parasitics of OTA and the Ce is especially important where around 1 to 0.6 has been adopted in the circuit for different situations. Thus, from the worst-case HSPICE simulations, the faster OTA required in 2<sup>nd</sup>-stage and also in output accumulator of first stage achieves 430 MHz open-loop unit-gain frequency and 51 dB gain and 70° phase margin for a capacitive load of 6.5 pF, and a 17 ns settling time (0.1%) for 1 Vpp output step with equivalent 6.5 pF load and a worst-case feedback factor. The power consumption is about 12.5 mW for 3 V supply. The slower OTA used in delay line of 1st-stage exhibits 176 MHz unit-gain frequency, 51 dB gain and 75° phase margin for 6.5 pF loading. Simulated closed-loop 0.1% settling time is about 25 ns with a power of 6 mW.

In order to reduce the signal-dependent charge injection and clock feedthrough, the bottom-plate sampling with fully-differential technique is used here for turning off all the switches near the virtual ground node of OTA first (For simplicity, not shown in circuit diagrams). For not only relaxing the circuit complexity but simplifying the required clock phases, only NMOS switches are used in this circuit. This is possible because the common-mode voltage and the required differential output are 1.1 V and 1.6 Vp-p, respectively, thus being the nonlinearity of the resistance of CMOS and NMOS switches similar within (0.7-1.5 V) for 3 V supply. Besides, dummy switches are especially used in the switches for GOC error-correction-capacitors and output sampling capacitors. Switch sizing is ranged from the width of 3  $\mu$ m to 30  $\mu$ m to accommodate the different loading conditions.

The circuit is designed with AMS 0.35  $\mu$ m Double-Poly CMOS and expects to consume an active chip area smaller than 2 mm<sup>2</sup> and a total DC power around 90 mW (not including clock generation).

# 4. SIMULATION RESULTS

The amplitude response of this SC interpolator illustrated in Fig.3 was simulated with together non-ideal effects, like finite DC gain (300), finite bandwidth (one-pole OTA model including input and output parasitics), and also the top (10%) & bottom (30%) parasitics in all capacitors and the corresponding switch on-resistance. The unwanted image bands located at 13.5 MHz & 40.5 MHz and 27 MHz have been attenuated by  $1^{st}$ - and  $2^{nd}$ -stage separately. Monte-Carlo sensitivity simulation verifies that the circuit achieves desired loss (-40 dB) with 0.3 % standard deviation of the Gaussian capacitor ratio random variables. And the less-than 0.3 dB deviation in passband eliminates the extra Sinx/x compensation at higher

output sampling rate. Moreover, FFT of the output signals shows the noise tones due to the OTA DC offset (20 mV) is below -50 dB.

#### 5. CONCLUSIONS

An optimized design and implementation of a 2-stage 4-fold SC interpolation filter with the sampling rate increase from 13.5 MHz to 54 MHz for NTSC/PAL digital video encoder post-processing has been presented for achieving a halfband FIR linear-phase response according to the standards of digital video. Various techniques have been employed in the implementation for overcoming several difficulties in analog FIR filtering for an efficient solution in terms of power and silicon dissipation, such as Impulse Sampled ADB polyphase halfband interpolation structure, novel symmetrical-coefficient-sharing, spread reduction, wideband gain- & offset-compensation with also mismatch-free delay line. The simple low-gain low power but high speed OTA's have also been implemented. The overall circuit will require less than 2 mm<sup>2</sup> active chip area with total DC power consumption of around 90 mW for 3 V supply in 0.35  $\mu$ m CMOS.

#### 6. REFERENCES

- J.Adélaide, et al, "Communication in a Single-Chip MPEG2 A/V/G Decoder for Digital Set-Top Box Application," *Proc. of ESSCIRC '96*, pp.348-351, Sep.1996.
- [2] T.Cummins, B.Murray, C.Prendergast, "A PAL/NTSC Digital Video Encoder on 0.6 μm CMOS with 66 dB typical SNR, 0.4% Differential Gain, and 0.2° Differential Phase," *IEEE J. of Solid-State Circuits*, Vol.32, No.7, pp.1091-1100, Jul.1997.
- [3] S.G.Smith et at, "A Single-Chip CMOS 306×244-Pixel NTSC Video Camera and a Descendant Coprocessor Device," *IEEE J. of Solid-State Circuits*, Vol.33, pp.2104-11, Dec.1998.
- [4] M.Harrand, et al, "A Single-Chip CIF 30Hz H261, H263, and H263+Video Encoder/decoder with Embedded Display Controller," *ISSCC Digest of Technical Papers*, pp.16-19, Feb. 1999.
- [5] Analog Devices Inc. "ADV7175A Video Encoder Data Sheet,"1998.
- [6] I.Bezzam, C.Vinn, R.Rao, "A Fully-Integrated Continuous-Time Programmable CCIR 601 Video Filter," ISSCC Digest of Technical Papers, pp.296-297, Feb.1995.
- [7] Sang-Soo Lee, C.A.Laber, "A BiCMOS Continuous-Time Filter for Video Signal Processing Applications," *IEEE J. of Solid-State Circuits*, Vol.33, No.9, pp.1373-1381, Sep.1998.
- [8] P.Senn, M.S.Tawfik, "Concepts for The Restitution of Video Signals Using MOS Analog Circuits," in Proc. of ISCAS '88, pp. 1935-38, 1988.
- [9] J.E.Franca, A.Petraglia, S.K.Mitra, "Multirate Analog-Digital Systems for Signal Processing and Conversion," in *Proc. of The IEEE*, Vol.85, No.2, pp.242-262, Feb. 1997.
- [10] Seng-Pan U, R.P.Martins, J.E.Franca, "A Novel Half-Band SC Architecture for Effective Analog Impulse Sampled Interpolation," in *Proc. of IEEE ICECS* '98, Portugal, Sep. 1998.
- [11] Veong-Sheng Lee, K.W.Martin, "A Switched-Capacitor Realization of Multiple FIR Filters on a Single Chip," *IEEE J. of Solid-State Circuits*, Vol.23, No.2, pp.536-542, Apr.1988.
- [12] G.Fischer, "Analog FIR Filters by Switched-Capacitor Techniques," IEEE Trans. Circuits Syst., Vol.CAS-77, pp.808-814, Jun. 1990.
- [13] H.Iwakura, "Realization of Tapped Delay Lines Using Switched-Capacitor LDI Ladders and Application to FIR Filter Design," *IEEE Trans. Circuits Syst.-II*, Vol.40, No.12, pp.794-797, Dec.1993.
- [14] B.C.Rothenberg, S.H.Lewis, P.J.Hurst, "A 20-Msample/s Switched-Capacitor Finite-Impulse-Response Filter Using a Transposed Structure," *IEEE J. of Solid-State Circuits*, Vol.30, No.12, pp.1350-1356, Dec.1995.
- [15] Qiutung Huang, "Mixed Analog/Digital, FIR/IIR Realization of a Linear-Phase lowpass Filter," *IEEE J. of Solid-State Circuits*, Vol.31, No.9, pp.1347-1350, Sep.1996.

- [16] W.M.C.Sansen, P.M.V.Peteghem, "An Area-Efficient Approach to the design of Very-Large Time Constants in Switched-Capacitor Integrators," *IEEE J. of Solid-State Circuits*, Vol.SC-19, No.5, pp.772-780, Oct. 1984.
- [17] C.C.Enz, G.C.Temes, "Circuit Techniques for Reducing the Effects of Op-Amp Imperfections: Autozeroing, Correlated Double Sampling, and Chopper Stabilization," *Proc. of The IEEE*, Vol.84, No.11, pp.1584-1614, Nov.1996.
- [18] Seng-Pan U, R.P.Martins, J.E.Franca, "High Performance Multirate SC Circuits With Predictive Correlated Double Sampling Technique," in *Proc. of IEEE ISCAS* '99, pp.II-77-80, May.1999.
- [19] K.Matsui, et al, "CMOS Video Filters Using Switched Capacitor 14-MHz Circuits," *IEEE J. of Solid-State Circuits*, Vol.SC-20, No.6, pp.1096-1102, Dec.1985.
- [20] K.A.Nishimura, P.R.Gray, "A Monolithic Analog Video Comb Filter in 1.2-µm CMOS," *IEEE J. of Solid-State Circuits*, Vol.28, No.12, pp.1331-1339, Dec.1993.



Fig.1 Simplified 2-Stage Halfband SC Video Interpolator Schematics (a) 1<sup>st</sup>-Stage (b) 2<sup>nd</sup>-Stage (c) GOC Mismatch-Free Half-Period Delay Circuit (d) Overall Clock Phases

