# JETIR.ORG ISSN: 2349-5162 | ESTD Year : 2014 | Monthly Issue JOURNAL OF EMERGING TECHNOLOGIES AND INNOVATIVE RESEARCH (JETIR)

An International Scholarly Open Access, Peer-reviewed, Refereed Journal

# Minimize Delay and High Speed Complex Booth Multiplier using Carry Select Adder with BEC

# <sup>1</sup>Upendra Singh Sengar, <sup>2</sup>Dr. Shalini Sahay

M. Tech. Scholar, Department of Electronics and Communication Engineering, SIRT, Bhopal<sup>1</sup> Associate Professor, Department of Electronics and Communication Engineering, SIRT, Bhopal<sup>2</sup>

*Abstract:* Multiplier is one of the most essential and fundamental components in many multimedia and Digital Signal Processing (DSP) Systems. They play very important role since they can greatly influence the performance and the power dissipation of the system. Thus, for better performance of such systems, efficient realization of multiplier is very crucial. Many DSP applications employ fixed-point arithmetic where n bit signals are multiplied by n bit coefficients. To avoid infinite growth in the word size, 2n bit products obtained must be quantized to n bits. Also many multimedia systems maintain a fixed format and tolerate some loss in the accuracy where precision may be compromised for achieving improvisation in performance parameters, viz. speed, area and power dissipation. In this paper is design complex multiplier with the help of radix-4 booth multiplier and carry select adder (CSLA) with BEC adder. The 64-bit complex multiplier is simulating Xilinx software and calculates simulation parameter i.e. number look up table, number of flip flop used and maximum combinational path delay.

Keywords - Complex Multiplier, Radix-4, Carry Select Adder, Binary Excess-1 Converter

## I. INTRODUCTION

Multipliers are either sequential or combinational logic circuits. High speed parallel multipliers are employed in many applications which are completely combinational circuits. All the sources of power consumption discussed above, namely, switching power, short circuit power and leakage power do exist in parallel multipliers. As multipliers occupy larger chip area with higher transistor density, they tend to have more leakage power dissipation [1]. Of the three process steps of the multiplication process, i.e. generation of the partial products, accumulation of the partial products and final carry propagate addition, the partial products accumulation is an intensive task and it decides the overall delay, area and power consumption of the parallel multipliers. Increased device density results in high switching transition activities in a smaller chip area leading to more power dissipation concentration. With increasingly more portable systems and technology scaling, exponentially increasing chip device counts make the power management more difficult. Reduction of supply voltage anticipating quadratic reduction of power does not viably result in proportional reduction of threshold voltage of the device [2, 3]. This factor leads to leakage current components increasing in loops and bounds. Hence, multipliers are one of the main contributors of power dissipation and area in DSP systems. More importantly, multipliers are usually placed in the critical path of such systems.

Multiplication is one of the most common operations in many digital signal processing (DSP) applications, such as the Fast Fourier transform (FFT) and digital filtering, wavelet transforms etc. To avoid the bit-width growth, the multiplier outputs need to be truncated or rounded to a certain width. For n-bit inputs, conventional fixed-width multipliers perform the overall partial product summation before rounding or truncating the results to n-bit. In such post truncated multipliers, since all the adder cells are used to compute the 2n bit product, they produce more accurate outputs. However, this kind of multiplier incurs area overhead and high power dissipation [4]. Thus, to overcome the above mentioned issues in post truncated multipliers, direct-truncated multipliers can be employed. In direct-truncation multipliers, half of the least significant partial products are simply eliminated by removing the adder cells which compute the least significant n bits of the 2n bit outputs. Area and power dissipation can be approximately reduced by 50% since half of the adder cells are removed. Since 50% of the adder cells are removed, it reduces critical path delay owing to reduced delay. However, huge truncation errors will be introduced. In order to realize a low-complexity and low-truncation error fixed-width multiplier, a compensation bias is estimated from the truncated part and is is added to the retained adder cells i.e., the adder cells which add the most significant bits of the partial products [5].

## II. BOOTH MULTIPLIER

For achieving high performance, MBE is popularly used in many parallel multipliers, to reduce the number of partial products factor of 2 by performing multiplier recoding. High accuracy and speed are achieved through Fixed-width Booth multipliers. High speed and high accuracy are achieved as a result of reduced number of partial products owing to Booth encoding. Since the truncated partial products are reduced, the accuracy obtained is also high in comparison with that of array multipliers. This section discusses the various techniques adopted to obtain the compensation bias in the design of fixed-width Booth multipliers. Two L-bit Inputs, X (multiplicand) and Y (multiplier), are represented in two's complement as given by

$$X = -x_{L-1}2^{L-1} + \sum_{i=0}^{L-2} x_i 2^i$$

$$Y = -y_{L-1}2^{L-1} + \sum_{i=0}^{L-2} y_i 2^i$$

Booth multiplier has been widely used for high performance signed multiplication. In Booth multipliers, multiplier is encoded which reduces the number of partial products. Since the number of partial products are reduced, area and power dissipation are also reduced [6, 7].

A multiplier employing radix-4 encoding algorithm also known as modified Booth multiplier is very efficient due to the ease of partial product generation, where as radix-8 Booth multiplier is slow due to the complexity of generating the odd multiples of the multiplicand. In order to address this issue, an approximate radix-8 Booth multipliers for low power and high performance. An approximate two bit adder is designed which incurs small area, reduced power and exhibits short critical path delay. This 2-bit adder has been employed for adding least significant section for generating the triple multiplicand which avoids carry propagation. A hybrid radix-4/radix-8 architecture targeted for high bit multipliers has been proposed. It is a method to trade-off speed and power dissipation in the two's complement signed multiplication. Hybrid architecture uses a combination of modified Booth radix-4 and radix-8 encoding.

The hybrid radix4/radix-8 architecture mitigates the delay penalty associated with the generation of odd multiples of the multiplicand (3 times the multiplicand) for radix-8 by using additional parallelism of radix-4 encoding/radix-8 and hence combines the speed advantage of radix-4 multiplier with the reduced power dissipation of the radix-8 multiplier. A hybrid architecture proposed in uses separate radix-4 and radix-8 Booth encoders and they were operated regardless of the input pattern which results in power and area overhead. Though power consumption was reduced in comparison with radix-4 architecture, it incurred critical path delay. Due to this factor, its energy efficiency was not improved over conventional radix-4 or radix-8 architecture. This issue has been resolved where the hybrid architecture is operated in radix-8 mode in 56% of the time of the input cases for low power and reverts to radix-4 mode in 44% of slower input cases for high speed. A mode detection circuit has been devised which determines the mode signal from the input operand and determines the radix mode before multiplication and adaptively operates the Booth encoder and the wallace tree.

#### **III. COMPLEX MULTIPLIER**

The product of A and B then

Suppose two numbers are complex then

hen  

$$A = A_r + jA_i$$

$$B = B_r + jB_i$$

$$P = A \times B$$

$$P = A_r \times B_r - A_i \times B_i + j(A_r \times B_i + A_i \times B_r)$$

$$P_r = A_r \times B_r - A_i \times B_i$$

$$P_i = A_r \times B_r - A_i \times B_r$$

Where  $P_r$  and  $P_i$  is speaks to the genuine and nonexistent piece of the yield of the mind boggling multiplier. Ar and Ai is speaks to the genuine and fanciful piece of the principal contribution of the unpredictable multiplier. Br and Bi is speaks to the genuine and nonexistent piece of the second contribution of the unpredictable multiplier.

Complex multiplier for four Vedic multipliers is shown in figure 2. In this block diagram reduce four Vedic multipliers to three Vedic multipliers is shown in below:

$$P_r = A_r \times B_r - A_i \times B_i = A_r (B_r + B_i) - B_i (A_r + A_i)$$
$$P_i = A \times B_i + A_i \times B_i = A (B_r + B_i) + B (A_r - A_i)$$

JETIR2305A18 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org

k141



Figure 1: Block Diagram of Complex Multiplier for four Vedic Multiplier



Figure 2: Block Diagram of Complex Multiplier for three Vedic Multiplier

#### IV. PROPSOED METHODOLOGY

To further decrease the number of partial products, algorithms with higher radix value are used. In radix-4 algorithm grouping of multiplier bits is done in such a way that each group consists of 3 bits as mentioned in table 1. Similarly the next pair is the overlapping of the first pair in which MSB of the first pair will be the LSB of the second pair and other two bits. Number of groups formed is dependent on number of multiplier bits. By applying this algorithm, the number of partial product rows to be accumulated is reduced from n in radix-2 algorithm to n/2 in radix-4 algorithm. The grouping of multiplier bits for 8-bit of multiplication is shown in figure 3.



Figure 3: Grouping of multiplier bits in Radix-4 Booth algorithm

For 8-bit multiplier the number groups formed is four using radix-4 booth algorithm. Compared to radix-2 booth algorithm the number of partial products obtained in radix-4 booth algorithm is half because for 8-bit multiplier radix-2 algorithm produces eight partial products. The truth table and the respective operation is depicted in table 1. Similarly when radix-8 booth algorithm is applied to multiplier of 8-bits each group will consists of four bits and the number of groups formed is 3. For 8x8 multiplications, radix-4 uses four stages to compute the final product and radix-8 booth algorithm uses three stages to compute the

product. In this thesis, radix-4 booth algorithm is used for 8x8 multiplication because number components used in radix-4 encoding style.

| Bi+1 | Bı | B <sub>F1</sub> | Operation | Yi+1 | Yi | Y+1 |
|------|----|-----------------|-----------|------|----|-----|
| 0    | 0  | 0               | +0        | 0    | 0  | 0   |
| 0    | 0  | 1               | +A        | 0    | 1  | 0   |
| 0    | 1  | 0               | *A        | 0    | 1  | 0   |
| 0    | 1  | 1               | +2A       | 0    | 0  | 1   |
| 1    | 0  | 0               | -2A       | 1    | 0  | 1   |
| 1    | Ø  | 1               | -A        | 1    | 1  | 0   |
| 1    | 1  | 0               | -A        | 1    | 1  | 0   |
| 1    | 1  | 1               | -0        | 1    | 0  | 0   |

| Table 1: Truth Table for Radix-4 E | Booth algorithm |
|------------------------------------|-----------------|
|------------------------------------|-----------------|

#### CSLA WITH BEC

The Binary to excess one Converter (BEC) replaces the ripple carry adder with Cin=1, in order to reduce the area and power consumption of the regular CSLA. The modified16-bit CSLA using BEC is shown in Figure 2. The structure is again divided into five groups with different bit size RCA and BEC. The group 2 of the modified 16-bit CSLA is shown Figure 4.



#### • Binary to Excess-1 Converter (BEC)

Booth encoding is a techniques to reduce the number of partial products in n-bit encoder. Booth encoder change the binary to excess-1 converter is used to reduce the area and power consumption in CSA. Figure 5 shows the basic structure of 3-b BEC. The Boolean expressions of the 3-b BEC is as



Figure 5: 3-bit Binary to Excess-1 Converter (BER)

| Binary[2:0] | Excess- 1[2:0] |
|-------------|----------------|
| 000         | 001            |
| 001         | 010            |
| 010         | 011            |
|             |                |
|             |                |
|             |                |
| 111         | 000            |

$$X1 = B0^B1$$

$$X2 = B2^{(B0 \& B1 \& B2)}$$

#### V. SIMULATION RESULTS

To carry out extensive literature survey on fixed-width multipliers to explore various existing techniques and the design options available. Analyze the performance parameters of the fixed-width multipliers measured in terms of error metric which includes maximum absolute error, average error and mean-square errors. The performance metric also include the conventional parameters such as area, power and speed. To carry out intensive mathematical analysis of the fixed-width multipliers to arrive at the solution and derive the compensation bias systematically from theoretical computation instead of time consuming exhaustive simulation methods.



Figure 6: View Technology of Radix-4 Complex 64-bit Multiplier using CSLA with BEC



Figure 7: Resistor Transfer Level of Radix-4 Complex 64-bit Multiplier using CSLA with BEC

```
Device utilization summary:
 Selected Device : 7vx1140tflg1930-2
 Slice Logic Utilization:
  Number of Slice LUTs:
                                       20354 out of 712000
                                                                 28
     Number used as Logic:
                                       20354 out of 712000
                                                                2%
 Slice Logic Distribution:
  Number of LUT Flip Flop pairs used: 20354
                                                             1005
    Number with an unused Flip Flop: 20354
                                             out of 20354
    Number with an unused LUT:
                                          0
                                             out of
                                                     20354
                                                               0.5
    Number of fully used LUT-FF pairs:
                                          0
                                             out of 20354
                                                               0%
    Number of unique control sets:
                                           0
 IO Utilization:
  Number of IOs:
                                         512
  Number of bonded IOBs:
                                             out of
                                                      1100
                                         512
                                                               46%
 Specific Feature Utilization:
  Number of DSP48E1s:
                                         768 out of 3360
                                                              225
 Timing Summary:
  _____
 Speed Grade: -2
    Minimum period: No path found
    Minimum input arrival time before clock: No path found
    Maximum output required time after clock: 34.275ns
    Maximum combinational path delay: 38.461ns
Figure 8: Device Utilization Summary of Radix-4 Complex 64-bit Multiplier using CSLA with BEC
```

#### VI. CONCLUSION

Booth multiplication algorithm or Booth algorithm was named after the inventor Andrew Donald Booth. It can be defined as an algorithm or method of multiplying binary numbers in two's complement notation. It is a simple method to multiply binary numbers in which multiplication is performed with repeated addition operations by following the booth algorithm. Again this booth algorithm for multiplication operation is further modified and hence, named as modified booth algorithm. The main objective of this research paper is to design architecture for radix-4 complex multiplier based on radix-4 booth multiplier by rectifying the problems in the existing method and to improve the speed by using the carry select adder (CSLA) with binary excess-1 converter (BEC). The radix-4 booth is normally used for higher bit length applications and ordinary multiplier is good for lower order bits. These two methods are combined to produce the high speed multiplier for higher bit length applications. The problem of existing architecture is reduced by removing bits from the remainders. The proposed algorithm is implementation Xilinx software with Vertex-7 device family.

#### REFERENCES

- Nikhil Advaith Gudala, Trond Ytterdal, John J. Lee and Maher Rizkalla, "Implementation of High Speed and Low Power Carry Select Adder with BEC", International Midwest Symposium on Circuits and Systems (MWSCAS), IEEE 2021.
- [2] S. Venkatachalam E. Adams H. J. Lee and S.-B. Ko "Design and analysis of area and power efficient approximate booth multipliers" IEEE Trans. Comput. vol. 68 no. 11 pp. 1697-1703 Nov. 2019.
- [3] D. Kalaiyarasi and M. Saraswathi, "Design of an Efficient High Speed Radix-4 Booth Multiplier for both Signed and Unsigned Numbers", 4th International Conference on Advances in Electrical, Electronics, Information, Communication and Bio-Informatics (AEEICB), IEEE 2018.
- [4] Elisardo Antelo Paolo Montuschi Alberto Nannarelli "Improved 64-bit Radix-16 Booth Multiplier Based on Partial Product Array Height Reduction" IEEE Transactions On Circuits And Systems-I: Regular Papers vol. 64 no. 2 February 2017.
- [5] T. Adiono H. Herdian S. Harimurti "Full-Custom Design Implementation of Serial Radix-4 8-bit Booth Multiplier" The 2nd International Conference on Electrical Engineering and Computer Science (ICEECS 2016) 2016.
- [6] H. Jiang J. Han F. Qiao F. Lombardi "Approximate Radix-8 Booth Multipliers for Low-Power and High-Performance Operation" IEEE Transactions on Computers vol. 65 no. 8 pp. 2638-2644 Aug. 2016.
- [7] Honglan Jiang Jie Han Fei Qiao Fabrizio Lombardi "Approximate Radix-8 Booth Multipliers for Low-Power and High-Performance Operation" IEEE Transactions on Computers vol. 65 no. 8 pp. 2638-2644 Aug 2016.
- [8] A. Rama Vasantha M. Tech M. Sai Satya Sri "Design and Implementation of FPGA Radix-4 Booth Multiplication Algorithm" International Journal of Research in Computer and Communication Technology vol. 3 no. 9 September 2014.
- [9] Ravi H Bailmare, S. J. Honale And Pravin V Kinge, "Design And Implementation of Adaptive FIR Filter using Systolic Architecture", In International Journal of Current Engineering And Technology, Vol.4, No.3, June 2014.
- [10] M. Usha, R. Ramadoss, "An Efficient Adaptive Fir Filter Based On Distributed Arithmetic", International Journal of Engineering Science Invention, Vol. 3, Issue. 4, pp. 15-20, April 2014.
- [11] Sukhmeet Kaur Suman Manpreet Signh Manna "Implementation of Modified Booth Algorithm (Radix 4) and its Comparison with Booth Algorithm (Radix-2)" Advance in Electronic and Electric Engineering vol. 3 no. 6 pp. 683-690 2013 ISSN 2231-1297.
- [12] Sang Yoon Park and Pramod Kumar Meher, "Low power, High-throughput And Low- Area Adaptive FIR Filter Based on Distributed Arithmetic", in IEEE Transactions On Circuits And Systems-ii, Vol. 60, No. 6, pp. 346- 350, 2013.

- [13] Basant K. Mohanty, And Pramod Kumar Meher, "A High-Performance Energy- efficient Architecture For FIR Adaptive Filter Based On New Distributed Arithmetic Formulation Of Block LMS Algorithm", In IEEE Transactions On Signal Processing, Vol. 61, No. 4, February, 2013.
- [14] Pallavi Saxena, Urvashi Purohit, Priyanka Joshi, "Analysis of Low Power, Area Efficient and High Speed Fast Adder", In International Journal Of Advanced Research In Computer And Communication Engineering, Vol. 2, Issue 9, September 2013.
- [15] A. Fathi S. Azizian R. Fathi H.G. Tamar "Low latency glitch-free booth encoder-decoder for high speed multipliers" IEICE Electronics Express vol. 9 no. 16 pp. 1335-1341 2012.

