# A Digital LDO With Co-SA Logics and TSPC Dynamic Latches for Fast Transient Response

Lei Zhao, Yan Lu<sup>10</sup>, *Senior Member, IEEE*, and Rui P. Martins<sup>10</sup>, *Fellow, IEEE* 

Abstract—This letter presents a coarse-fine dual-loop digital low-dropout regulator (DLDO), with combined synchronous and asynchronous logics, designed and measured in a 28-nm bulk CMOS. We adopt a react-then-write two-step logic in the coarse loop for faster transient response. To further shorten the loop latency, we employ true single-phase clock dynamic latches in the coarse loop, and a self-biased continuous-time comparator for voltage droop detection. The proposed DLDO architecture achieves an FoM of 0.59 ps, with a load range of 5–25 mA under a 600-mV supply.

*Index Terms*—Continuous-time comparator, digital low-dropout regulator (DLDO), dynamic logic, transient response, true single-phase clock (TSPC) latch.

### I. INTRODUCTION

LOW-DROPOUT regulator (LDO) finds its way to provide A optimal supply voltages for the subsystems in a system-on-achip. With supply voltage scaling down to the near- or the subthreshold regions, it is challenging for an analog LDO to have sufficient loop gain, especially under process, voltage, and temperature variations. Recently, digital LDOs have been widely investigated for its capability to operate under a low supply voltage, using synchronous or asynchronous control methods for V<sub>OUT</sub> regulation [1]-[14]. The conventional synchronous digital LDO (DLDO) [1] simply consists of a comparator, a bidirectional shift register (SR), and a power switch array. The comparator senses the output voltage and determines the shift direction of the SR to turn on/off the power switches, thus regulating the output voltage. Obviously, a higher frequency clock can improve the transient speed. Thus, coarse-fine tuning with adaptive clock frequency exhibits a good tradeoff between speed and power [2]-[4]. However, it is hard for the D-flip-flops (DFFs) and the clocked comparator to operate at a very high clock frequency with a low supply voltage, for example, 1 GHz at 0.6 V. Therefore, additional circuit techniques, like the recursive binary search [5], time-based control [6], analog-assisted control [7], or N-type power switches [8], are used to further improve the transient response of the synchronous DLDOs. On the other hand, for conventional asynchronous DLDO [9]-[11], as shown in Fig. 1, a local pulse generated from the prestage comparator output triggers the next-stage operation. However, the pulse (comparator output) width or even the amplitude shrinks as the output voltage V<sub>OUT</sub> recovers. Then, the later

Manuscript received August 28, 2018; revised October 19, 2018 and November 14, 2018; accepted November 26, 2018. Date of publication December 5, 2018; date of current version December 18, 2018. This paper was approved by Associate Editor Mingoo Seok. This work was supported in part by the Macau Science and Technology Development Fund (FDCT) SKL fund, and in part by the Research Committee of University of Macau under Grant MYRG2017-00037-AMSV. (*Corresponding author: Yan Lu.*)

L. Zhao and Y. Lu are with the State Key Laboratory of Analog and Mixed-Signal VLSI and FST-ECE, University of Macau, Macau, China (e-mail: yanlu@umac.mo).

R. P. Martins is with the State Key Laboratory of Analog and Mixed-Signal VLSI and FST-ECE, University of Macau, Macau, China, on leave from the Instituto Superior Técnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal.

Digital Object Identifier 10.1109/LSSC.2018.2885217



Fig. 1. Block diagram of a conventional asynchronous DLDO.



Fig. 2. Block diagram and operation principle of the proposed DLDO.

asynchronous stages can hardly be driven by the weak pulse when  $V_{\text{OUT}}$  is close to the reference voltage  $V_{\text{REF}}$ , in such a way that the output voltage may not reach the desired value. Alternatively, event-driven architectures [12], [13] achieve a fast response with multibit asynchronous  $V_{\text{OUT}}$  monitoring and a proportional-integral control algorithm. And a hybrid synchronous–asynchronous architecture with load event-based feedforward triggering [14] further improves the transient response. In this design, we combine synchronous, asynchronous, and also dynamic logics with a proposed react-then-write (RW) two-step logic for faster transient response, which will be introduced next.

## **II. CIRCUIT IMPLEMENTATION**

Fig. 2 shows the block diagram and the basic operation principle of the proposed dual-loop combined synchronous and asynchronous (Co-SA) DLDO with the RW two-step logic. Our target is to mitigate the voltage droop by the asynchronous droop detection and also the dynamic asynchronous power switch activation. Similar to the previous coarse-fine tuning control, both the coarse and the fine loops have a synchronous bidirectional SR to hold the control data. The coarse and fine control loops have 16 and 32 unary bits, respectively,

2573-9603 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications\_standards/publications/rights/index.html for more information.



Fig. 3. (a) Block diagram of the coarse loop SR unit cell. (b) Schematic of the TSPC latch.



Fig. 4. (a) Schematic of the select signal and internal clock generator for the RW logics; and the four timing diagrams, where (b)  $V_{OUT}$  drops and recovers across  $V_{REFL}$  at CLK = "1"; (c)  $V_{OUT}$  drops and recovers across  $V_{REFL}$  at CLK = "0"; (d)  $V_{OUT}$  drops across  $V_{REFL}$  at CLK = "1" and recovers above  $V_{REFL}$  at CLK = "0"; and (e)  $V_{OUT}$  drops across  $V_{REFL}$  at CLK = "0" and recovers above  $V_{REFL}$  at CLK = "1."



Fig. 5. Schematic of the continuous-time comparator.

with a power switch size ratio of 32:1. In addition, we add some asynchronous flavors into the coarse loop.

The operation principle is described as follows. In the steady-state and during the heavy-to-light load transients, only the synchronous logics operate with a slow (10-MHz) clock to maintain the loop



Fig. 6. Schematic of the clocked comparator.



Fig. 7. (a) Simulation result for coarse loop output bits during load transient, and the zoomed-in details of the outputs of the (b) TSPC latches and (c) standard DFFs.

stability. As shown in Fig. 2, there are two clocked comparators for the synchronous logics, and one continuous-time comparator for the asynchronous fast reactions. For the light-to-heavy load transients, first, the self-biased continuous-time comparator [15] detects the fast varying load transients when  $V_{\text{OUT}}$  drops across  $V_{\text{REFL}}$ . Then, the control signal ENT is passed to all of the true single-phase clock (TSPC) dynamic latches [16], which pass the ON signal to the next TSPC latch and turn on the power switches sequentially in a mono-direction only. In our asynchronous design, different from the conventional pulse triggered asynchronous DLDO [9]-[11], a level signal (EN<sub>T</sub>), instead of a prestage regenerated pulse, enables the shift turn-on operation of the power switches. Therefore, the asynchronous signal strength in our design does not degrade when  $V_{OUT}$ approaches  $V_{\text{REFL}}$ . When  $V_{\text{OUT}}$  recovers above  $V_{\text{REFL}}$ , the outputs of the TSPC latches are frozen and written into the standard DFFs in the coarse loop, and the standard DFFs take over the control of the power switches. When  $V_{\text{OUT}}$  exceeds  $V_{\text{REFH}}$ , the standard DFFs in the coarse loop gradually turn off the power switches until  $V_{OUT}$ goes back into the dead-zone. When  $V_{OUT}$  is in the dead-zone, only the fine loop operates.

Fig. 3(a) illustrates the unit cell of the coarse loop SR (SR<sub>Coarse</sub>). In each unit, the outputs of a standard DFF and a TSPC latch are connected through a selector, which is composed by two tri-state inverters, to control the same power switch. Only one output will be valid at a time. Fig. 3(b) shows the schematic of the TSPC latch. When the EN signal is low, the input signal D will pass to its output Q. When the EN is high, the output holds its previous value on



Fig. 8. Chip micrograph.



Fig. 9. Measured transient response with  $V_{\rm IN} = 600$  mV and  $V_{\rm REF} = 550$  mV.



Fig. 10. Measured load regulation.

its parasitic capacitors. This dynamic data needs to be held for at most one (10-MHz) clock period, then it will be written into the DFFs. According to the simulation, the leakage will not be a problem until the clock frequency is lower than 1 MHz. Due to the simple hardware and dynamic operation, the time interval between the adjacent TSPC latches is effectively shortened with reduced power consumption.

Fig. 4(a) shows the select signal and internal clock generator used to coordinate the RW operation between the TSPC latches and the DFFs. The load transient may happen at any random time instant, and can be summarized into four cases, as shown in Fig. 4(b)–(e). Fig. 4(b) exhibits the case when  $V_{OUT}$  drops and recovers across  $V_{REFL}$  when the CLK signal is "1." Fig. 4(c) displays the case when  $V_{OUT}$  drops and recovers across  $V_{REFL}$  when CLK = "0." Fig. 4(d) presents the case when  $V_{OUT}$  drops across  $V_{REFL}$  at CLK = "1," and recovers above  $V_{REFL}$  at CLK = "0." Fig. 4(e) plots



Fig. 11. Measured current efficiency.

the case when  $V_{OUT}$  drops across  $V_{REFL}$  at CLK = "0," and recovers back at CLK = "1." In these four cases, at the time instant  $t_1$ ,  $V_{OUT}$ drops across V<sub>REFL</sub> and EN<sub>T</sub> becomes "0" to enable all the TSPC latches. Meanwhile, the initial data of the standard DFFs in the coarse loop are written into the TSPC latches. Subsequently, the ENTRI signal selects the TSPC output to control the coarse power switches. At the instant  $t_2$ ,  $V_{OUT}$  recovers above  $V_{REFL}$  and  $EN_T$  steps from "0" to "1." This ENT rising edge triggers a DFF to determine if CLK = "0" or "1" at  $t_2$ . If CLK = "1," the CLK signal directly performs an AND operation with  $EN_T$  to get  $CLK_1$ . If CLK = "0,"CLK is inverted first, then performs an AND with ENT. In this way, a rising edge can be generated for  $CLK_1$  once  $V_{OUT}$  recovers above  $V_{\text{REFL}}$ . Then, this CLK<sub>1</sub> rising edge triggers the standard DFFs in the coarse loop to update their outputs to be the same as TSPC latch outputs. Between  $t_2$  and  $t_3$ , the TSPC latches hold their output data. At  $t_3$ , the next rising edge of CLK<sub>1</sub>, which is synthesized with either the rising or the falling edge of CLK, sets EN<sub>TRI</sub> to "1." Then, the DFFs are in control of the power switches. The coarse loop ends its reaction.

In our design, the comparator array for the output voltage sensing includes a continuous-time comparator for the voltage droop detection, and two clocked comparators for steady-state operation and overshoot detection. Figs. 5 and 6 present the schematics of the employed continuous-time comparator [15] and clocked comparator, respectively. The continuous-time comparator consists of a self-biased first stage and a complementary self-biased second stage. It compares  $V_{\text{OUT}}$  and  $V_{\text{REFL}}$ , and defines the value of EN<sub>T</sub>, which is directly propagated to the TSPC latches. The clocked comparators based on the strong-arm latch topology compare  $V_{\text{OUT}}$  with either  $V_{\text{REF}}$  or  $V_{\text{REFH}}$ , specifying the shift direction for the standard DFF-based SRs.

Fig. 7 shows the simulated waveforms of  $V_{OUT}$  and the coarse loop output bits from either the TSPC latches or the standard DFFs, with  $V_{\text{IN}} = 600 \text{ mV}$ ,  $V_{\text{REF}} = 550 \text{ mV}$ ,  $V_{\text{REFH}} = 570 \text{ mV}$ ,  $V_{\text{REFL}} =$ 540 mV, and 5-25 mA load changes. In Fig. 7(b), we can find that the time interval for the TSPC latches to shift one bit is typically 180 ps, which is equivalent to the response speed of a synchronous DLDO with a 5.5-GHz clock. We also noticed that there is a voltage overshoot right after the undershoot recovery, which is caused by the increased loop gain with a high equivalent clock frequency, or namely reduced stability [3]. This problem can be solved by setting another lower boundary  $V_{\text{REFL},2}$  (for example,  $V_{\text{REFL},2} = 520 \text{ mV}$ ) to allow the TSPC latches to be disabled earlier. In such case, one more continuous-time comparator is required. In addition, to avoid the TSPC latches being enabled more than once in on synchronous clock period, a one-shoot logic can be applied [2]. Fig. 7(c) shows that the voltage overshoot recovers slowly with the coarse loop SR driven by a 10-MHz clock.

|                                | This work | [5] 2018 | [8] 2018 | [6] 2018 | [14] 2018    | [10] 2017 | [12] 2017    | [3] 2016  |
|--------------------------------|-----------|----------|----------|----------|--------------|-----------|--------------|-----------|
| Process                        | 28nm      | 65nm     | 28nm     | 65nm     | 65nm         | 65nm      | 65nm         | 130nm     |
| Area [mm <sup>2</sup> ]        | 0.019     | 0.0023   | 0.0055   | 0.0374   | 0.012        | 0.158     | 0.029        | 0.355     |
| Control                        | SR+TSPC   | SAR      | SR+NAP   | VCO+ADC  | Async.+Sync. | Async.    | Event-driven | SR        |
| V <sub>IN</sub> [V]            | 0.6-0.65  | 0.5-1    | 0.4-0.55 | 0.6-1.2  | 0.5-1        | 0.6-1     | 0.5-1        | 0.5-1.2   |
| V <sub>OUT</sub> [V]           | 0.55-0.6  | 0.3-0.45 | 0.35-0.5 | 0.4-1.1  | 0.35-0.95    | 0.55-0.95 | 0.45-0.95    | 0.45-1.14 |
| Max. ILOAD [mA]                | 25        | 2        | 20       | 100      | 2.8          | 500       | 3.5          | 4.6       |
| C <sub>OUT</sub> [nF]          | 0.15      | 0.4      | 0.024    | 0.04     | 0.1          | 1.5       | 0.4          | 1         |
| Min. I <sub>Q</sub> [µA]       | 28        | 14       | 0.81     | 100      | 45.2         | 300       | 12.5         | 24        |
| T <sub>EDGE</sub> [ns]         | 10        | < 1      | 3        | 800      | < 0.1        | 2         | N/A          | N/A       |
| ΔV <sub>OUT</sub> @            | 56mV@     | 40mV@    | 117mV@   | 108mV@   | 46mV@        | 50mV@     | 40mV@        | 90mV@     |
| $\Delta I_{LOAD}$              | 20mA      | 1.06mA   | 20mA     | 50mA     | 1.76mA       | 100mA     | 0.4mA        | 1.4mA     |
| Peak current<br>efficiency (%) | 99.96     | 99.8     | N/A      | 99.5     | 98.4         | N/A       | 96.3         | 98.3      |
| FOM* [ps]                      | 0.59      | 199      | 0.0057   | 1.38     | 67.1         | 2.3       | 1250         | 1102      |
| FOM** [V]                      | 0.0078    | N/A      | 0.00014  | 1.728    | N/A          | 0.003     | N/A          | N/A       |

 TABLE I

 Performance Comparison With the State-of-the-Art

\*  $FOM = \frac{C \times \Delta V_{OUT}}{\Delta I_{LOAD}} \times \frac{I_Q}{\Delta I_{LOAD}}$ ; \*\*  $FOM = K \times \frac{\Delta V_{OUT} \times I_Q}{\Delta I_{LOAD}}$  [17],  $K = \frac{T_{EDGE}}{T_{NORMAL}}$ ,  $T_{NORMAL} = 0.1$ ns.

#### **III. MEASUREMENT RESULTS**

Fig. 8 illustrates the chip micrograph of the proposed DLDO, fabricated in a 28-nm bulk CMOS process. The active area is 0.019 mm<sup>2</sup>, including the power switch arrays, control logics, and a 150-pF load capacitor. Fig. 9 plots the measured transient response with a 600-mV input, a 550-mV reference, and a 10-MHz external clock. When changing the off-chip load current between 5 mA and 25 mA with 10 ns edge time, the measured voltage undershoot and overshoot are 56 mV and 45 mV, respectively.

Fig. 10 exhibits the measured load regulation for different input voltages. For  $V_{\rm IN}$  higher than 600 mV, the output current can be as high as 25 mA. When  $V_{\rm IN} = 575$  mV, the maximum current drops to 20 mA. In high  $V_{\rm IN}$  or light load conditions, the limit cycle oscillation introduces large output ripples which affect the measured output accuracy. The measured quiescent current in steady state is 28  $\mu$ A with a 600-mV input. In addition, Fig. 11 presents the current efficiencies are higher than 99% over a wide load current range. Table I shows the performance comparison between the proposed Co-SA DLDO and prior DLDO designs.

## IV. CONCLUSION

When the supply voltage goes down to the near-/subthreshold region for energy-efficiency computing, static logics may not be able to operate under an ultrahigh frequency clock. Therefore, based on a RW two-step logic concept, we demonstrated a digital LDO with the dynamic logics and a continuous-time comparator for fast-drop transient response, while the synchronous logics only deal with the steady-state and voltage overshoot cases.

#### REFERENCES

- Y. Okuma *et al.*, "0.5-V input digital LDO with 98.7% current efficiency and 2.7-μA quiescent current in 65nm CMOS," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2010, pp. 1–4.
- [2] M. Huang, Y. Lu, S.-W. Sin, U. Seng-Pan, and R. P. Martins, "A fully integrated digital LDO with coarse–fine-tuning and burst-mode operation," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 63, no. 7, pp. 683–687, Jul. 2016.
- [3] S. B. Nasir, S. Gangopadhyay, and A. Raychowdhury, "All-digital lowdropout regulator with adaptive control and reduced dynamic stability for digital load circuits," *IEEE Trans. Power Electron.*, vol. 31, no. 12, pp. 8293–8302, Dec. 2016.

- [4] Y.-J. Lee *et al.*, "A 200-mA digital low drop-out regulator with coarsefine dual loop in mobile application processor," *IEEE J. Solid-State Circuits*, vol. 52, no. 1, pp. 64–76, Jan. 2017.
- [5] L. G. Salem, J. Warchall, and P. P. Mercier, "A successive approximation recursive digital low-dropout voltage regulator with PD compensation and sub-LSB duty control," *IEEE J. Solid-State Circuits*, vol. 53, no. 1, pp. 35–49, Jan. 2018.
- [6] S. Kundu, M. Liu, R. Wong, S.-J. Wen, and C. H. Kim, "A fully integrated 40pF output capacitor beat-frequency-quantizer-based digital LDO with built-in adaptive sampling and active voltage positioning," in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, Feb. 2018, pp. 308–309.
- [7] M. Huang, Y. Lu, U. Seng-Pan, and R. P. Martins, "An analog-assisted tri-loop digital low-dropout regulator," *IEEE J. Solid-State Circuits*, vol. 53, no. 1, pp. 20–34, Jan. 2018.
- [8] X. Ma et al., "A 0.4V 430nA quiescent current NMOS digital LDO with NAND-based analog-assisted loop in 28nm CMOS," in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, Feb. 2018, pp. 306–307.
- [9] Y.-H. Lee *et al.*, "A low quiescent current asynchronous digital-LDO with PLL-modulated fast-DVS power management in 40 nm SoC for MIPS performance improvement," *IEEE J. Solid-State Circuits*, vol. 48, no. 4, pp. 1018–1030, Apr. 2013.
- [10] F. Yang and P. K. T. Mok, "A nanosecond-transient fine-grained digital LDO with multi-step switching scheme and asynchronous adaptive pipeline control," *IEEE J. Solid-State Circuits*, vol. 52, no. 9, pp. 2463–2474, Sep. 2017.
- [11] Y. Huang, Y. Lu, F. Maloberti, and R. P. Martins, "A dual-loop digital LDO regulator with asynchronous-flash binary coarse tuning," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2018, pp. 1–5.
- [12] D. Kim and M. Seok, "A fully integrated digital low-dropout regulator based on event-driven explicit time-coding architecture," *IEEE J. Solid-State Circuits*, vol. 52, no. 11, pp. 3071–3080, Nov. 2017.
- [13] D. Kim, J. Kim, H. Ham, and M. Seok, "20.6 A 0.5V-VIN 1.44mA-class event-driven digital LDO with a fully integrated 100pF output capacitor," in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, Feb. 2017, pp. 346–347.
- [14] S. J. Kim et al., "A 67.1-ps FOM, 0.5-V-hybrid digital LDO with asynchronous feedforward control via slope detection and synchronous PI with state-based hysteresis clock switching," *IEEE Solid-State Circuits Lett.*, vol. 1, no. 5, pp. 130–133, May 2018.
- [15] S. J. Kim, D. Kim, and M. Seok, "Comparative study and optimization of synchronous and asynchronous comparators at near-threshold voltages," in *Proc. IEEE/ACM Int. Symp. Low Power Electron. Design*, Jul. 2017, pp. 1–6.
- [16] J. Yuan and C. Svensso, "High-speed CMOS circuit technique," *IEEE J. Solid-State Circuits*, vol. 24, no. 1, pp. 62–70, Feb. 1989.
- [17] J. Guo and K. N. Leung, "A 6-μW chip-area-efficient outputcapacitorless LDO in 90-nm CMOS technology," *IEEE J. Solid-State Circuits*, vol. 45, no. 9, pp. 1896–1905, Sep. 2010.