Volume 3, Number 2, Spring 2003

Efficiency of A Self-Timed Ripple Adder

Feodor Vainstein, Ph.D.
Professor of Computer Engineering
GTREP & ECE
Georgia Institute of Technology
6001 Chatham Center Dr., Suite 350
Savannah, GA 31405, USA
Tel: 912.790.3440, Fax: 912.651.7279
Email: fvainste@gtrep.gatech.edu, vainstei@ece.gatech.edu

Lev B. Levitin, Ph.D.
Distinguished Professor of Engineering Science
Department of Electrical and Computer Engineering
Boston University
Boston, Massachusetts, >USA
Tel: 617.353.4607
Fax: 617.353.6440
Email: levitin@bu.edu

Abstract

An expression for the average carry propagation delay in a ripple carry adder is obtained which is exact up to terms of the order . The case of several adders working in parallel is also considered.

Introduction

Self-timed circuit techniques were first investigated in the 1950s by Huffman and Muller, and continued in the 1960s by Miller. However, clocked circuits were more economical in their use of components and were universally adopted. The advent of very large scale integration in the 1980s changed the balance and the availability of millions of devices on a chip means that clock skew is a more serious problem and also that the cost of components is less of an obstacle. This sounds like a panacea to designers of today's large-scale circuits but it requires a totally new methodology for design. Various problems of self-timed design discussed in the following publications [1-16]

A synchronous self-timed system consists of a number of functional units equipped by completion detectors. Initially, with a new clock pulse, completion its are cleared by the completion detectors circuitry when the operation being performed by units is completed. An AND gate is used to signal to the clock generator that all units completed their operations and a new clock pulse can be issued.

With this approach, the throughput of a system becomes considerably higher compared with the systems using traditional approach, when the rate of clock pulses is determined by the longest possible propagation delay in the functional units. In this paper we evaluate the potentials in speeding up for the case when we have only one functional unit that is a ripple carry adder. A more general case of m ripple carry adders operating in parallel is also considered.

An estimate of the average carry propagation delay was first obtained in [17]. A tighter upper bound was obtained in [18]. This paper presents a much simpler analytical approach that yield an expression for the average delay which is asymptotically exact with the remainder of is the number of full adders in the ripple adder.

By definition, the propagation delay in a combinational circuit is the duration of the time interval between the moment when all the input signals are stabilized and the moment when all output signals are stabilized.

Propagation delay depends on both the present inputs and on the inputs immediately proceeding to the present ones. Often the term “propagation delay of a circuit” is used to denote the upper bound of propagation delays for all possible inputs.

Denote by the maximum propagation delay in one full adder. It is clear that in the worst case the propagation delay of n-bit carry ripple adder is equal to . This value is used to determine the clock frequency of a conventional (not self-timed) system. In this paper we will show that in the average case the propagation delay is much smaller than . In fact it is equal to , where is a constant approximately equal to 0.668.

It follows from this result that a 64-bit ripple carry adder can work on average almost 9 times faster if the self-timed approach is used.

Stabilization time determined by stoppers and runners

Consider a ripple carry adder that consists of n full adders. We assume that the input bits

are independent random variables taking on values of 0 and 1 with equal probabilities. At time , the output carry for some full adders will stabilize. In particular, if then =1 in and will not change thereafter unless a new set of inputs is applied.

Similarly c then =0 in and will not change thereafter unless a new set of inputs is applied.

Hence if, for example, for every , then the carry propagation delay will be less than or equal to and all the outputs of the adder will be stable at time .

In fact, the actual time depends on the values of carries that appeared as a result of computation with inputs immediately preceding the present ones. In the most unfavorable case, however, the carry propagation delay is determined by the maximum number of consecutive bits of the input for which . Let us call input values “stoppers”, and values “runners”. Obviously, for completely random inputs, stoppers and runners occur independently with equal probabilities. Denote stoppers by 0 and runners by 1. Then the total stabilization time , where is the random variable that is the maximum length of a run of consecutive 1’s in a random sequence of 0’s and 1’s.

The average propagation delay in a ripple carry adder.

As known in previous section, the propagation delay in a ripple carry n-bit adder is a linear function of the maximum length of a run of 1’s in a random sequence of zeros and ones of length n, where the bits take on values of 0 and 1 with equal probabilities.

Thus the problem is reduced to the following: find the distribution of the maximum length of a run of 1’s in a random sequence of zeros and ones.

Denote bythe number of such sequences where the maximum length of a run of 1’s is smaller .

The cumulative distribution function (cdf) of the maximum length of a run is given by

(1)

It is easy to see that satisfies the following recurrence equation:

(2)

with the boundary condition

(3)

It is well known that for large the solution of equation (2) is asymptotically equal to

(4)

where is the largest real root of the characteristic equation:

(5)

Substituting into the bounding condition (3), we obtain:

(6)

Hence,

(7)

The largest root of (5) can be expressed asymptotically for , which gives a good approximation of :

(8)

Hence, asymptotically,

(9)

Thus, from (1) and (9) the cdf of is

(10)

The expected value is given by

(11)

For large , the function changes very rapidly from 0 to 1 in a narrow region. Therefore, a good estimate of is given by , where is the root of the equation:

(12)

Solving (12), we obtain:

(13)

Note that for large ,

(14)

Expression (14) allows us to obtain an accurate evaluation of using (11).

(15)

where is a constant

Finally, we obtain that the average propagation delay in a ripple carry adder is approximately

(16)

Let us consider now another version of the problem. Suppose, we have such -bit adders working in parallel. Then the delay is equal to the maximum delay among all adders. The latter is proportional to the maximum length of the runs of in all m adders. The cumulative distribution function of is

(22)

As a result, the expected value of is:

(23)

Conclusion
The results obtained above give the exact value of the average carry propagation delay up to terms of order for randomly distributed input values. The important fact is that, because of the narrowness of the delay distribution, the time variations of the delay are rather limited. In the case of m adders working in parallel, the average delay increases quite modestly by .

It is interesting to compare the results obtained under the assumption random input statistics with empirical data. There exists experimental evidence [19] that data arithmetic operations carry propagation chains can be very long, exceeding . One possible explanation of this effect is that very often in such computations a small number (that has a long run of 0’s starting with the leftmost digit) is subtracted from another small number, which creates a long sequence of runners. If this is the case the situation can be corrected by special means with small hardware overhead.

References

Simon Moore, Ross Anderson, Paul Cunningham, Robert Mullins, George Taylor, Improving Smart Card Security using Self-timed Circuits, Eighth International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2002
Simon Moore, George Taylor, Robert Mullins, Peter Robinson, Point to Point GALS Iterconnect, Eighth International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2002
S.W. Moore, Protecting Consumer Security Devices --- The Next 10 Years, published by Springer in LNCS 2140, Cannes, September 2001.
G S Taylor, S W Moore, P Robinson, An on-chip dynamically recalibrated delay line for embedded self-timed systems, Sixth International Symposium on Advanced Research in Asynchronous Circuits and Systems, 2000.
S W Moore, R Anderson, M Kuhn, Self-timed Technology to Reduce Smartcard Fraud, ACiD-WG Workshop, Grenoble, 2000.
S.W. Moore, G.S. Taylor, P.A. Cunningham, R.D. Mullins and P.Robinson, Self Calibrating Clocks for Globally Asynchronous Locally Synchronous Systems, International Conference on Computer Design, Austin Texas, September 2000.
G S Taylor, S W Moore, P Robinson, Frequency Locked Loops: Adding Some Asynchronous Circuits to Synchronous Circuits, 7th UK Async. Forum, 1999
P Cunningham, G S Taylor, P Robinson, S W Moore, Cheating Safely in Asynchronous Design, 7th UK Async. Forum, 1999
S P Wilcox, Synthesis of Asynchronous Circuits, Technical Report 468, Computer Laboratory, University of Cambridge, 1999
S P Wilcox, G S Taylor, S W Moore, P Robinson, Designing a 16-bit ALU for an Embedded Processor, 6th UK Async. Forum, 1999
S W Moore, P Robinson, Rapid Prototyping of Self-timed Circuits, ICCD, 1998.
S W Moore, P Robinson, S Wilcox, Engineering Arbiters for FPGAs, 4th UK Async. Forum, 1998.
S W Moore, P Robinson, Geometry Planning for Self-Timed ASIC Design, ACiD-WG Workshop, Turin, 1998.
S P Wilcox, S W Moore, P Robinson, Flow Table Synthesis, ACiD-WG Workshop, Turin, 1998.
S W Moore, P Robinson, S Wilcox, RED Flip-Flops and Hybrid Completion Detection, ACiD-WG Workshop, University of Groningen technical report CSN 9602, 1997.
S W Moore, P Robinson, S Wilcox, Rotary Pipeline Processors, Special issue of IEE Computers and Digital Techniques on Asynchronous Processors, 1996.
17. A. W. Burks, H. H. Goldstine, and J. von Neumann, “Preliminary discussion of the logical design of an electronic computing instrument”, in Collected Works of John von Neumann”, A. H. Taub, ed., vol.5, New York, Macmillan, 1963, pp.34-79
18. B. E. Briley, ;Some new results on average worst case carry’, IEEE Trans. on Computers, vol C-22, No 5, May 1973, pp. 459-463.
19. K. Y. Yun, P. A. Beerel, V. Vakilotjar, A. E. Dooply, J. Arceo, “The design and verification of a high-performance low-control-overhead asynchronous differential equation solver”, IEEE Conference, 1997, pp. 140-153