# Implementation of High Performance Multiplier with Adaptive Hold Logic

## C. Teja Kumar<sup>1</sup>, K. Sravan Kumar<sup>2</sup>

Department of Electronics & Communications Engineering 1\*M.Tech Student, 2\*Lecturer, Dept. of ECE. Jawaharlal Nehru Technological University College of Engineering, Anantapuramu.

1\*M. Tech Student, 2\*Lecturer, Dept. of ECE. Jawanariai Nenru Technological University Conege of Engineering, Anantapuraniu.

ABSTRACT: Arithmetic multipliers are the most difficult operational circuits. The entire operation of the circuits directly depends on the multiplier performance. When a p-channel MOS transistor device is in negative bias voltage(-Vdd), the transistor device suffer with negative bias temperature instability effect due to this changes in threshold value of the MOS device, thereby decreases the speed of the multiplier. The positive bias temperature instability (PBTI) similar phenomenon takes place when an n channel MOS device is in positive bias voltage. Both the effects decreases the transistors speed in the multiplier circuit and also the entire system may incorrect because of timing violations in the long term. Hence, it is necessary to design the multipliers with higher performance at present technology. We implement a higher performance multiplier design by using an adaptive hold logic circuit (AHL) technique. This AHL multipliers. In Experimental results, the given proposed design of 16-bit adaptive column-bypass with carry look-ahead adder improve performance up to 44.8% and 6.9% compared to the normal column bypass multiplier without and with adaptive hold logic circuit in 90 nm technology.

Keywords—PBTI, bypass multiplier, AHL, variable latency design

## I. Introduction

Multiplications are the basic elementary mathematical operations of arithmetic unit which is most frequently used in huge applications like filtering, radars and microprocessors etc. low power consuming and higher performance applications are necessary in the present modern technology of ICs. We know multipliers are generally called units in those circuits. Multiplication which can perform by digital arithmetic multipliers is very slow and complex and also the overall speed of almost all applications are dominated by multiplier operational speed.

Bias temperature reliability issue is the main problem of degradation and also it affects the speed of the transistor by varying threshold value, decrease drain current and mobility of a MOS channel device. Further, Negative bias temperature instability (NBTI) appears whenever the P-MOS device is in reverse bias condition. NBTI increase the threshold value of MOSFET device [2].

Then the result will miss interaction of Si & H bond at the oxidation level, which generates the H /H2 molecules. Whenever these molecules are diffused away, interface holes are generated in the silicon and gate oxide of device. These interface holes vary the value of threshold voltage, which degrade the operational speed of applications [3].

The similar effect also on the n channel MOS devices called as Positive bias temperature instability (PBTI),

Which appears when an n-MOS transistor is under positive bias. Compare with NBTI effect, the PBTI effect is much lesser on oxide or poly gate devices so this effect generally ignored. Whenever the bias supply is disconnected, the reverse phenomenon is also occurs and which can reduces the NBTI effect, this reverse action phenomenon will not reduce all the above interface holes generated during at the stress condition, and it also increases threshold greatly [2]. Therefore, it's necessary to have a multiplier with higher performance.

The remaining of this paper is described as: Section II deals with different conventional existing multiplier architectures. Section III

shows the proposed AHL Bypassing multipliers with and without carry look-ahead adder. Section IV presents the implementation and results of existing and proposed method. Finally in section V we concluded the paper.

## **II. Existing Multiplier Architectures**

Almost all gates in every VLSI chip have its own propagation delay, this degrades the speed of the circuit. All these circuits use overall the maximum combinational path delay as the entire circuit clock cycle to perform the operation exactly. But in more situations the probability of that the maximum combinational delay is activated mostly less. Using this overall maximum path delay as entire clock cycle for non-critical path will gives considerable timing waste. The novel method variable-latency design is used to reduce traditional circuits timing waste. This method divided the circuit design into two main parts:

- 1. Shorter path
- 2. Longer path

Shorter paths take only one cycle to finish the operation and longer paths will take two cycles to complete the operation [4]. Therefore for the time constraint (T), if path is a longer path then the maximum critical path delay is must be higher than or same as T otherwise the entire path is a short path [4]. Multipliers are utilized more power in many computations. So, bypassing multipliers have preferred to control the delay and also switching power consumption of circuit. In bypassing multipliers delay and switching power mainly based on the selection input coefficient bit. That is if the selection input coefficient bit become zero, then the respective column or row of all adders are disable.

## 1. Array Multiplier:

Array multipliers are very simple and regular architecture structure. Operation is depends on Addition and shift operation method. All the partial products are formed by its corresponding multiplicand bit and respective multiplier bit multiplication and also shifted based on bit order and later added. Addition has completed with general carry propagation method and N-1 adders are necessary for N bit multiplication.



## Fig 1. Array Multiplier

The AM is one of the fast parallel multiplier and its Multiplication procedure is shown in above figure1 and full adders in an array have two outputs with its three inputs.

1. Sum

2. Carry

Sum bit will goes to its down adder and carry bit will goes to its left lower adder and all the FA in AM are active irrespective of operation.

#### 2. Column-Bypass Multiplier:

This Bypass multiplier is the extension of general array multiplier. Low-power column-bypassing multiplier (CLB) design is used to decrease power and also delay. In bypassing multiplication whenever the respective bit in the multiplicand becomes zero then all the FA operations in that row or column are not activated. The FA is enhanced to use one multiplexer and two tri-state buffer gates as shown in fig 2. Multiplicand bit ai is act as the multiplexer selection bit to decide the FA output and also used as the selection bit of tri state gate to deactivate the FA input path. If selection bit is 0, FA inputs are not in active and also sum bit of the upper FA is same as the current FA sum output, so it reduces the switching power consumption of the multiplier. If the selection ai bit is 1, normal operation of FA is performed.



Fig 2. Column Bypass Multiplier

For Example the bits are 1001\*1111 in this multiplication process entire first and second diagonal full adders, two of the total three input bits are zero that is the carry output from its right upper FA and also partial product ai\*bi. Here both the two diagonals full adders are zero, so output sum bit of full adder is same as third bit [5].

#### **III. Proposed Method**

The higher performance multiplier is proposed by the Adaptive Hold logic technique method with the bypassing multiplier (ACLB). The AHL method consists of many blocks such as m-bit input blocks (m positive integer number), one column bypass multiplier block, 1bit Razor block circuit, and AHL block as below. The AHL device input is multiplicand bit (md) for column bypassing and razor block is to detect any timing violations for the given input pattern. The functioning operation of every block in the AHL multiplier is as shown in figure





Output of bypassing multiplier is given to the razor block as an input. One bit razor flip-flop has one flip flop which is connected with normal clock to process the execution, one shadow latch which is connected delay clock to process the execution, multiplexer, and XOR gate.

#### 1. Column-Bypassing with carry look ahead:



Fig 4. Column bypass with carry look-ahead adder

Last stage carry ripple adder of the bypassing multiplier is replaced with carry look-ahead adder (CLB-CLA). This adder gives a better operational performance in the results. By using carry generate and carry propagate, carries will be calculated in the middle stages irrespective of input carry. Hence called, carry Look Ahead. Carry propagate term will propagated to the next stages and the carry generate will generate the advanced carry regardless of input carry of the stage. The circuit diagram is as shown in below.



#### Fig 5. 4 bit CLA adder

Propagate and generate terms are obtained by using A and B input bits. Later, by using propagate and generate bits next sum and carry bits are obtained irrespective of input carry bit.

Gj = Aj && Bj Pj= Aj xor Bj Sumj = Cj xor Pj

C1 = G0 + Cin.P0 C2 = G1 + P1.G0 + Cin.P0.P1 C3 = G2 + G1.P2 + G0.P1.P2 + Cin.P0.P1.P2 C4 = G3 + G2.P3 + G1.P2.P3 + G0.P1.P2.P3 + Cin.P0.P1.P2.P3

#### 2. Razor flip flop:

Generally delayed clock is slower compared with normal clock. Whenever the latch output is different from that of flip flop, which is the operation over all path delay exceeds the clock cycle period and flip flop gets incorrect output signal. Razor block will activate the error signal which indicates that re-execute the given operation and also noticed the AHL circuit has error. Main operation of Razor circuit is used to detect that the given operation to be a one clock cycle input pattern can really complete in a one clock cycle or not [6]. Otherwise the operation is re-executed with two Clock cycle period which greater than or equal to maximum critical oath delay.



#### Fig 6. Razor Flip Flop

## 2. Adaptive hold Logic Technique:

Adaptive Hold Logic circuit (AHL) block diagram has been proposed with variable-latency multipliers. AHL internal circuit diagram given below. The internal circuit consist two decision blocks, one Data type flip flop and multiplexer. Whether the cycle period is very smaller, then the bypassing type multiplier is doesn't able to execute the operations successfully, which produces timing violations of circuit. These will detected by razor FF and also activated the error at the stage of output. If it happens more frequently, that indicates the circuit suffered with efficient timing

violation degradation because of aging effect of the circuit. When the given input pattern reaches to AHL, two decision blocks will decide whether the given input pattern takes one clock cycle or two clock cycles to complete the operation and send both Outputs to the multiplexer. Here the multiplexer will produces any one of the output signal depends on the razor flip-flop result. MUX output given to the OR gate with Qbar signal and this has been used to determine the D flip flop data input. Multiplexer output is 0, when the given input pattern complete the operation in one clock cycle period and also the !(Gating) output become active. Then the input D flip-flops will fetch the new input pattern to the further clock cycle. Whenever the multiplexer output is 0, which indicates the given input pattern must requires two clock cycles to complete operation, then the OR operation will also 0 to the Data flip-flop. hence, the !(gating) output will zero which disable the input Data flip-flops clock cycle to the further next clock cycle period.





Whenever the given input patterns reaches, the column bypass multiplier and the AHL logic circuit both perform their operation simultaneously. Based on the number of zeros in the multiplicand bit (md), then the AHL logic circuit predicts if the given input bit Takes one or two clock cycles. If the given input pattern takes two clock cycles to complete the operation and also the AHL logic block will gives 0 to disable the input clock of the data flip flops. The AHL will gives 1 for General multiplications. After completion of the bypassing multiplication the result will send to the razor flip flop. This circuit predict any path delay timing violation. If any delay timing violation occurs, that means the one clock cycle period is not enough to the given input pattern operation and also execution of the multiplier takes two clock cycle period to finish the given operation exactly. Hence the razor will result an error to notify that the system current operation requires to be re execution by using the two clock cycles to get the given operation is correct. In these cases, extra re execution clock cycles by delay timing violations incurs a penalty to the overall system average latency. However, the proposed AHL logic circuit exactly deducts whether the given input data patterns take one or two clock cycles in all most all cases. Only very less patterns may give timing variations only the AHL logic judge incorrectly.

## **IV. SIMULATION RESULTS**

Simulation results of the adaptive column bypassing multiplier is written in Verilog and simulated in cadence 90nm technology. These will help to evaluate its performance and also used to calculate the power, and area. Fig 10 shows the waveforms of 16 bit column bypassing multiplier which disables the entire column for a 0 in multiplicand bit. Fig 13 shows the waveforms of 16 bit column bypass with AHL logic, first input pattern having less number of 0s in multiplicand so it took 2 clock cycles to execute. Next input pattern having more number of 0s in multiplicand, it took only one

clock cycle to complete the given multiplication. In the below Table1, different multipliers are compared for area, power and delay. The proposed multiplier with carry look ahead adder has been improved the performance up to 6.9% and 44.8% compared to normal CLB multiplier with and without AHL logic respectively. Delay and area improvements are shown in below chart 1 and chart 2 respectively. Area of normal CLB occupied 2% more compared to CLB with CLA. Proposed adaptive CLB with CLA adder saved 15% of area compared to adaptive CLB without CLA adder and this proposed multiplier also saved 4.5% of total power consumption compared with adaptive CLB without CLA.



Fig 8. Block Diagram of 16\*16 Column Bypass



Fig 9. Schematic of 16\*16 Column bypass



Fig 10. Waveforms of Fixed 16\*16 Column Bypass



Fig 11. Block diagram of Column Bypass with AHL



Fig 12. Schematic of 16\*16 Column bypass with CLA



Fig 13. Waveforms of Column Bypass with AHL

| Factors   | Existing Method |                         | Proposed Method |          | Improvement(%) |           |
|-----------|-----------------|-------------------------|-----------------|----------|----------------|-----------|
|           | 16bit           | 16bit 16bit 16bit 16bit |                 | 16bit    | 16bit ACLB-CLA |           |
|           | CLB             | A CLB                   | CLB-CLA         | ACLB-CLA | over CLB       | over ACLB |
| Area(µm)  | 11134           | 17191                   | 11132           | 15580    | 40             | -9.4      |
| Power(µW) | 60              | 438.01                  | 148.5           | 416.4    | 593            | -4.9      |
| Delay(ns) | 16.82           | 9.98                    | 11.13           | 9.29     | -44.8          | -6.9      |

Table 1. Area, Power and delay Comparison of Multiplier



CLB

CLB-CLA

ACLB-CLA

## REFERENCES

[1] Lin.I.C., Yang.Y.M and Cho.Y.H, "Aging-Aware Reliable Multiplier Design with Adaptive Hold Logic", IEEE Transaction paper on VLSI Systems, 2015.

[2] A. Calmera, M. Poncino and E. Macii, "Design technique for NBTI tolerant power-gating architecture," IEEE Transaction on CS., vol. 59 and no. 4, p 249-253, 2012.

[3] T. Kim, and Y. Lee, "A fine grained techniques of NBTI aware voltage scale & body biasing for standard cell based design," in Pro., ASPDAC., 603 608, 2011

[4]M. Olivier, "Design of synchronous & asynchronous variable latency design pipelined multipliers," IEEE Transaction paper on VLSI System, 2001.

[5] SJ. Wang, YN. Lin and MC. Wen, "Low power parallel multipliers with column bypass" in Pro, IEEE ISCAS, 2005.

[6] D. Ernst etal, "Razor: low-power pipeline based on the circuit level timing speculation," in Pro. 36th Annu. IEEE ACM MICRO, p. 7 18, 2003.



Chart 1. Delay Comparison chart

Chart 2. Area Comparison chart

16 bit Multipliers

## V. CONCLUSION

In this paper, proposed a high performance multiplier having AHL technique. This circuit is totally based on the variable latency design technique and it can also make the AHL logic circuit to get reliable performance under the effect of NBTI and PBTI. In addition to the bias temperature effects that increases the transistor delay. Whenever the aging effect occurs by the BTI effects, then circuit delay and also performance degradation will be more significant problem. Error occurred caused by the timing violations is reduces by proposed high performance multiplier using AHL logic. Hence the proposed adaptive multiplier with carry look ahead adder has been improved the performance up to 6.9% and 44.8% compared to normal CLB with and without AHL logic respectively.