Asynchronous reset synchronization and distribution – challenges and solutions

jatxl3 2021-01-16

展开全文

Lack of coordination between asynchronous resets and synchronous logic clocks leads to intermittent failures on power up. In this series of articles, we discuss the requirements and challenges of asynchronous reset and explore advanced solutions for ASIC vs FPGA designs.

Asynchronous resets are traditionally employed in VLSI designs for bringing synchronous circuitry to a known state after power up. Asynchronous reset release operation must be coordinated with the synchronous logic clock signal to eliminate synchronization failures due to possible contention between the reset and the clock. A lack of such coordination leads to intermittent failures on power up. The problem exacerbates when large, multiple-clock domain designs are considered. In addition to the synchronization issues, the distribution of an asynchronous reset to millions of flip-flops is challenging, calling for techniques similar to CTS (Clock Tree Synthesis) and requiring similar area and routing resources.

The requirements and challenges of asynchronous reset are reviewed, focusing on synchronization and distribution issues. The drawbacks of classic solutions for reset synchronization (reset tree source synchronization) and distribution (reset tree synthesis) are discussed. Advanced solutions for faster and simpler timing convergence and more reliable reset synchronization and distribution are presented. Different approaches for ASIC versus FPGA designs are detailed.

Part 1 (this article) describes the issues surrounding asynchronous resets and outlines approaches for resolving those issues. Part 2 discusses additional solutions for correct asynchronous reset in ASIC and FPGA. Some useful special cases are discussed in Part 3.

1. Asynchronous reset challenges

A reset function is normally included in digital VLSI designs in order to bring the logic to a known state. Reset is mostly required for the control logic and may be eliminated from the data path logic, reducing logic area. Reset may be either synchronous or asynchronous relative to the clock signal.

Synchronous reset requires an active clock, incurs certain clock-cycle related latency and may impact the timing of the data paths. On the other hand, synchronous resets are deterministic and do not incur metastability.

Asynchronous reset does not require an active clock to bring flip-flops to a known state, has a lower latency than a synchronous reset and can exploit special flip-flop input pins that do not affect data path timing. However, asynchronous resets have a number of drawbacks:

They may cause metastability in flip-flops, leading to a non-deterministic behavior.
Asynchronous resets must be made directly accessible to enable DFT.
The asynchronous resets may incur reliability problems in rad-hard applications, being susceptible to Single Event Transient (SET) phenomena ‎[1].

Leaving aside the discussion on which type of reset is better ‎[2], in this article we focus on issues and solutions related to asynchronous resets. Some of the techniques presented in this paper are applicable to both asynchronous and synchronous resets.

Asynchronous resets are widely employed in digital designs. The typical drivers of asynchronous resets are external ports, depending on power supply status (RC circuits, watchdog devices), manual reset buttons and external masters, such as microprocessors.

In many cases asynchronous resets can be replaced by synchronous ones, but there are some situations in which the asynchronous reset functionality is compulsory. One example is a synchronous design that gets no active clock at power up (the clock is either unstable or gated for power reduction), but requires a certain known state for its external interfaces. Another example is low power design that is required to minimize power during the power up process, having no active clocks.

The employment of asynchronous reset is not straightforward. Although the relative timing between clock and reset can be ignored during reset assertion, the reset release must be synchronized to the clock. Avoiding the reset release edge synchronization may lead to metastability. Referring to Figure 1, an active high asynchronous reset is shown. The reset assertion (a) affects flip-flop output Q within a deterministically bounded time (propagation delay, T_R-pd ) and regardless of clock signal CLK. During reset release (b), setup and hold timing conditions must be satisfied for the RST port relative to the clock port CLK. A violation of the setup and hold conditions for the RST port (aka reset recovery and removal timing) may cause the flip-flop to become metastable, causing design failure due to switching to an unknown state. Note that this situation is similar to the violation of setup and hold conditions for the flip-flop data port, D.

click for larger image

Figure 1: Active high asynchronous reset assertion and release. (a) An asynchronous reset assertion (b) An asynchronous reset release with timing violation. (Source: vSync Circuits)

<div

In addition, for large designs, the skew inside the reset and clock distribution networks can be significant due to design (unequal wire length, unequal load, IR drop) and process (buffer and wire) variations. The relationship between reset and clock signal arrivals may vary for different flip-flops ‎[2]. In that case, different parts of the design may leave reset state on different clock cycles, violating the required functionality. An example is shown in Figure 2, where the release edge of RESET arrives at flip-flops Q0 and Q1 on different clock cycles, leading to a non-current release of the flip-flops from their reset states.

click for larger image

Figure 2: Reset and Clock skew in large designs (Source: vSync Circuits)

To avoid the aforementioned problems, an asynchronous reset release must be synchronized to a targeted clock. A classic reset synchronization is performed by means of special reset synchronizers that are employed at the root of the reset distribution network. A number of such synchronizers are shown in Figure 3.

click for larger image

Figure 3: Asynchronous reset synchronizers: (a), (b) “trailing-edge” synchronizer; (c), (d) “vdd-based” synchronizer; (e) reset synchronizer operation; (f) reset release timing path (Source: vSync Circuits)

In “trailing-edge” synchronizers, shown in Figure 3a and Figure 3b, the incoming asynchronous reset RSTI signal is connected to the synchronizer output RSTO through a combinational logic (OR and NAND gate examples are shown), allowing an asynchronous RSTO assertion following a RSTI assertion. Thus, the RSTO assertion does not depend on the clock. Note that in synchronizer of Figure 3a both RSTI and RSTO are active high signals, while in synchronizer of Figure 3b the input RSTI_N is active low, while RSTO is active high. On the asynchronous release of RSTI, the output RSTO is kept asserted until the RSTI release is synchronized by means of the two-flop synchronizer (F0, F1). Then, RSTO is released synchronously, satisfying setup and hold conditions towards the flip-flops connected down the reset distribution network.

The operation of the reset synchronizer is shown in wave diagram of Figure 3e. While being synchronous, the latency of the reset release can vary by one clock cycle due to a possible metastability of the F0 flip-flop. It should be also noted that the number of flip-flops employed in a synchronizer shall be set according to MTBF ‎[4] computation, however, thanks to a very low rate of RSTI, in most of the cases, two flip-flops provide a satisfactory MTBF.

Figure 3c and Figure 3d show another common flavor of asynchronous reset synchronizer. In this “vdd-based” synchronizer, flip-flops with asynchronous reset/set port are employed (note that the trailing-edge synchronizer employed simple D-flip-flops without RST/SET ports). At RSTI assertion (Figure 3c), the output of the synchronizer RSTO_N (active low) asynchronously becomes asserted regardless of the clock activity. On the RSTI release, the VDD signal (“1”) connected to the D port of flip-flop F0 is synchronized. F0 may become metastable, however, since the input of F1 does not change on the first clock edge, F1 is not subject for a metastability. Thus, the constant “1” input is synchronized using a two-flip-flop synchronizer, leading to an asynchronous release of RSTO_N.

The vdd-based synchronizer has an advantage over trailing edge one since it can work without a clock at all, namely the clock may appear after the RSTI release. The trailing edge synchronizer requires a stable clock (at least for a few cycles) before the RSTI release, otherwise its internal flip-flops are not initialized.

Figure 3f shows the timing path related to reset release between the synchronizer flip-flop F1 and a targeted application flip-flop F2. As can be observed, since both flip-flops F1 F2 reside in the same clock domain, the path T_R shall be optimized according to standard STA rules, namely, should be shorter than the clock cycle and should satisfy setup and hold conditions towards all destination flip-flops, e.g. F2.

Denoting the reset distribution network latency as T_R, and a clock cycle as T_CLK , the design should satisfy the following (for simplification, FF1 propagation delay is assumed to be included in T_R and the clock skew is neglected):

T_CLK >= T_R + T_SU … (1)

Evidently, the timing conversion for reset distribution networks is challenging in the following cases:

Large reset distribution network. When the number of flip-flops inside a clock domain is large, the reset distribution network latency T_R becomes high_, possibly larger than a single clock cycle, thus violating the timing constraint (1).
Fast clock rate. When a fast clock is employed, the clock cycle T_CLK becomes short, challenging constraint (1).

Modern high performance designs, having a large number of flip-flops and operating at high frequencies, require special solutions for handling the reset distribution. A straightforward optimization according to (1) calls for Clock Tree Synthesis (CTS)-like optimization algorithms. The main difference between CTS and reset tree synthesis is the lack of a low skew requirement, as long as constraint (1) is satisfied. Nevertheless, for an ASIC design, this approach results in a synthesis of a high-fanout net, consisting of a large number of large buffers. In an FPGA design, it results in employing multiple global net resources. The large capacitive networks pose an additional challenge of high switching current during reset toggling, requiring additional power resources. Taking into consideration that the asynchronous reset is little utilized – most often once per power up – the use of high fanout and global nets for it results in an unacceptable investment of power, ASIC area or FPGA routing resources, and EDA runtime.

The problem is exacerbated for large designs, where the reset synchronizer is clocked by a clock signal derived from the top of the clock tree, while the rest of the design is clocked by clock tree branches ‎[2]. In this situation, a precise post-layout STA, taking into account clock tree delays, is required to match the delay from the reset synchronizer to the rest of the logic.

In order to meet timing on high-fanout nets, synthesis tools tend to duplicate the timing path source flip-flop, thus having a reduced fanout for each one of the duplicates flip-flops. While this approach is functionally correct for regular synchronous logic, it may lead to a functional disaster when an asynchronous reset network is considered. A duplication of the last stage of a reset synchronizer breaks the fanout of one requirement for the two-flop synchronization scheme, leading to a reduced reliability.

A duplication of the entire reset synchronizer, which happens when the fanout of one requirement is specified to the synthesis tool, may lead to a synchronization failure due to the reconverging path problem as follows: A duplicated reset synchronizer is shown in Figure 4a. A global reset net is divided into two sub-networks, leading to timing convergence for {F1,F2} and {F1d, F3} paths. Figure 4b exemplifies the reconvergence problem. RSTI asynchronous input is synchronized by two different synchronizers, and each synchronizer incurs random latency. Thus, even though RSTI changes simultaneously at both synchronizer inputs, RSTO and RSTOd outputs may toggle one clock cycle apart, leading to a non-concurrent reset release for flip-flops F2 and F3.

click for larger image

Figure 4: Asynchronous reset reconvergence problem for a straightforward duplication of reset synchronizers: (a) Duplicated reset synchronizers for fanout reduction; (b) Behavior on the reset release due to reconvergence path – FF2 and FF3 are not concurrently released (Source: vSync Circuits)

The reset release timing problem for large distribution networks is common to both asynchronous and synchronous reset schemes, and can be handled similarly, as shown in this paper.

In a multiple clock domain design, the asynchronous reset shall be synchronized separately for each one of the clock domains as shown in Figure 5. Since different clock domains contain different numbers of flip-flops, their reset distribution network latencies are unequal, incurring even higher reset skew than in the case of the single clock domain. Moreover, each one of the reset synchronizers incurs an additional non-deterministic delay (related to its local clock), thus making a concurrent reset release of the entire Multiple Clock Domain (MDC) design impractical. Instead, reset sequence release order can be defined to ensure correct functional operation. For instance, M1 module should always be released from reset after the release of the M2 module, since after reset release M1 starts sending data to M2, and M2 must be ready to receive the data.

click for larger image

Figure 5: Reset Synchronization in Multiple-Clock Domain (MCD) design (Source: vSync Circuits)

Keeping in mind these problems, Part 2 discusses additional solutions for correct asynchronous reset in ASIC and FPGA and some useful special cases are discussed in Part 3.

References

G. Wirth, F. L. Kastensmidt and I. Ribeiro, “Single Event Transients in Logic Circuits – Load and Propagation Induced Pulse Broadening,” IEEE Transactions on Nuclear Science, 55(6), 2928 – 2935, 2008.
C. E. Cummings, D. Mills and S. Golson, Asynchronous & Synchronous Reset Design Techniques – Part Deux, SNUG, 2003.
W. J. Dally and J. W. Poulton, Digital System, Engineering (Eds.). Cambridge University Press (1998).
C. Dike and E. Burton, “Miller and noise effects in a synchronizing flip-flop,” IEEE Journal of Solid-State Circuits, 34(6), 849-855, 1999.
vSync Circuits Vincent Platform, http:///products
Altera, Quartus-II, www.altera.com
Quartus II Handbook Volume 1: Design and Synthesis, pp. 11-19 – 11-29, 2014.12.15
K. Chapman, “Get Smart About Reset: Think Local, Not Global”, Xilinx, WP272 (v1.0.1), 2008.
K. Chapman, “Get your Priorities Right – Make your Design Up to 50% Smaller,” WP275 (v1.0.1), 2007.
K. Chapman, “Xilinx-Ken Chapman-That Dangerous Asynchronous Reset!-External Antenna – Need for de-bouncer”, PLD Blog, 2008.
Xilinx, XST User Guide for Virtex-6, Spartan-6, and 7 Series Devices, UG687 (v 13.1), pp. 50, 95, 128, 2011.
Yaniv Halmut, RESET architecture in Altera FPGAs: utilization effects, private communication, RAD, 2016.
Chris Kwok, Priya Viswanathan and Ping Yeung, “Addressing the Challenges of Reset Verification in SoC Designs”, DVCon, 2015.

Rostislav (Reuven) Dobkin received PhD degree in electrical engineering from Technion, Israel Institute of Technology. Reuven is a co-founder and CTO of vSync Circuits LTD. (2010), a VLSI CAD company. In parallel, Reuven serves as a lecturer in Technion. Reuven has held management positions in radiation-hardened VLSI technology for space applications, in communications chip development, and in research in C⁴ I systems, signal processing, software systems engineering and VLSI. Reuven serves as a reviewer of numerous VLSI journals and conferences. His research interests are VLSI architectures, asynchronous logic, synchronization, GALS systems, SoC, NoC, many-core processors and parallel architectures.