# Implementation of Power Area Efficient 32-Bit Approximate Multiplier with Improved Accuracy

Sonu Pawar<sup>1</sup>, Prof. Manish Gupta<sup>2</sup>

<sup>1</sup>M.Tech Scholar, <sup>2</sup> Assistant Professor

<sup>1,2</sup>Department of Electronics & Communication Engineering,

<sup>1,2</sup> SCOPE College of Engineering, Bhopal (M.P.), India.

**Abstract**: Approximate computing has received significant attention as a promising strategy to enhance performance of multiplication. Various arithmetic operations such as multiplication addition, subtraction are important part of digital circuit to speed up the computation speed of processor. This paper presents 32 bit approximate multiplier for high speed and low delay for advance digital signal processing. Previous it is designed at 16 bit for various applications. Research work is focus on hardware-level approximation by introducing the partial product perforation technique and dadda multiplier for designing approximate multiplication circuits. Xilinx 14.7 is used to implementation with verilog programming language.

*IndexTerms* – Approximate, Dadda, Multiplier, Verilog, Speed.

#### I. INTRODUCTION

Different computer arithmetic systems can be utilized to execute an advanced multiplier. Out of these most procedures include computing a lot of halfway products, and afterward adding the incomplete products together. Different arithmetic activities, for example, addition, subtraction, multiplication and division are significant piece of advanced circuit to accelerate the calculation speed of processor.

In applications like interactive media signal preparing and information mining which can endure error, careful computing units are not constantly essential. They can be supplanted with their approximate partners. Research on approximate computing for error tolerant applications is on the ascent. Adders and multipliers structure the key segments in these applications. In approximate full adders are proposed at transistor level and they are used in advanced sign preparing applications.

To decrease equipment multifaceted nature of multipliers, truncation is generally utilized in fixed-width multiplier plans. At that point a steady or variable rectification term is added to make up for the quantization error presented by the shortened part. Estimation procedures in multipliers center on gathering of halfway products, which is vital regarding power utilization. Broken array multiplier is executed, where the least critical bits of sources of info are shortened, while shaping incomplete products to diminish equipment multifaceted nature.

An effective multiplier ought to have following attributes:-

**Accuracy:** - A good multiplier should give right outcome.

Speed:- Multiplier ought to perform activity at fast.

Area:- A multiplier should possesses less number of cuts and LUTs.

Power: - Multiplier should devour less power.



Figure 1: Types of Multiplier

Approximate computing has emerged as a potential solution for the design of energy-efficient digital systems [1]. Applications such as multimedia, recognition and data mining are inherently error-tolerant and do not require a perfect accuracy in computation. For these applications, approximate circuits may play an important role as a promising alternative for reducing area, power and delay in digital systems that can tolerate some loss of accuracy, thereby achieving better performance in energy efficiency. As one of the key components in arithmetic circuits, adders have been extensively studied for approximate implementation.

#### II. BACKGROUND

There are many multipliers designed by researchers. The basic multiplication algorithm is discussed here.

The multiplication algorithm for an N bit multiplicand by N bit multiplier is shown below:

Y= Yn-1 Yn-2 ......Y2 Y1 Y0 Multiplicand

X= Xn-1 Xn-2 ...... X2 X1 X0 Multiplier

Also, gates are utilized to create the Fractional Items, PP, On the off chance that the multiplicand is N-bits and the Multiplier is M-bits, at that point there is N\*M halfway item. How the fractional items are produced or summed up is the distinction between the diverse models of different multipliers.

Duplication of paired numbers can be disintegrated into increments. Consider the increase of two 8-bit numbers A and B to create the 16 bit item P.

The equation for the addition is:

$$P(m+n) = A(m)B(n) = \sum_{i=0}^{m-1} \sum_{j=0}^{n-1} a_i b_j 2^{i+j}$$

- If the LSB of Multiplier is '1', at that point include the multiplicand into a collector.
- Shift the multiplier one piece to the right and multiplicand one piece to one side.
- Stop when all bits of the multiplier are zero.

Equipment usage of digital signal preparing (DSP) algorithms and mixed media applications in advances, for example, field programmable gate arrays (FPGAs) and digital signal processors requires countless. Frequently, the general execution of the plan is restricted by imperatives on the speed, vitality utilization, and zone necessities of the accessible multiplier structure choices. This is especially valid for applications revolved around current handheld sight and sound gadgets, where physical size, chip area and power. Therefore, research has been centered on the improvement of effective, propelled multiplier procedures to help these requesting applications.

### III. PROPOSED METHODOLOGY

Proposed 32-bit approximate multiplier is design according to following flow chart.



Figure 2: Flow Chart

Figure 2 is showing proposed flow chart. According to this working flow it can be clear that proposed approximate multiplier is design and implemented according to following sub modules-

- Carry Save Adder
- Dadda Multiplier
- Compressor
- Full Adder
- Half Adder

#### A. Dadda Multiplier

In a famous multiplication conspire the cluster, the summation continues in a more standard, yet more slow way, to getting the summation of the fractional items .Utilizing this plan just one column of bits in the lattice is disposed of at each phase of the summation. In a parallel multiplier the halfway items are created by utilizing exhibit of AND entryways. The fundamental issue is the summation of the fractional items, and it is the time taken to perform this summation which decides the greatest speed at which a multiplier may work. The Dadda plot basically limits the quantity of adder stages required to perform the summation of halfway items. This is accomplished by utilizing full and half adders to diminish the quantity of lines in the grid number of bits at every

summation arrange. Dadda multipliers are a refinement of the parallel multipliers exhibited by Wallace. Dadda multiplier comprises of three phases. The incomplete item grid is formed in the primary stage by N2 AND stages. In the subsequent stage, the halfway item lattice is decreased to a tallness of two. Dadda supplanted Wallace Pseudo adders with parallel (n, m) counters. A Parallel (n, m) counter is a circuit which has n inputs and produce m outputs which give a double tally of the ONEs present at the inputs. A full adder is a usage of a (3, 2) counter which takes 3 inputs and creates 2 outputs. Likewise a half adder is an execution of a (2, 2) counter which takes 2 inputs and delivers 2 outputs.



Figure 3: Dot diagram of proposed 32-bit dadda multiplier

Figure 3 indicating spot chart of proposed 32-bit dadda multiplier [1]. Generation of incomplete items, halfway items decrease tree, and at last, a vector consolidate expansion to deliver last item from the entirety and convey columns created from the decrease tree. Second step expends more force. In this short, estimate is applied in decrease tree arrange.



Figure 4: Top level View

In figure 4, showing top level view of proposed 32-bit approximate multiplier. There is 32 bit at input 'a' and 32 bit at input 'b'. The output 'c' of this multiplier will be 64 bit. In digital multiplier the output bit is total addition of input bit.

#### IV. RESULT AND DISCUSSION

The implementation of the proposed 32-bit approximate multiplier is Xilinx 14.7 version using verilog language. Isim simulator is used for simulation and validation of result in test bench. Behavioral modeling style used to develop proposed algorithm. Artix 7 Family is used to implementation.

# **Binary Number Input-1**

Input (a) = 100000000

Input (b) = 011101010



Figure 5: Result validation in Test Bench-1

Now simulation is done using Isim simulator. Figure 5 is showing test bench window. Here 'a' and 'b' is 32 bit inputs and 'c' is 64 bit output. Value of 'a' and 'b' is mentioned above. Now the output of 'c' is multiply of 'a' and 'b'.

Output (c) = 011101010000000000

11

Global Maximum Fanout

# **Hexadecimal Number Input-2**



Figure 6: Result validation in Test Bench-2

Figure 6 is showing test bench window. Here 'a' and 'b' is 32 bit hexadecimal inputs and 'c' is 64 bit hexadecimal output. Value of 'a' is d3 and value of 'b' is ce. The output of 'c' is multiply of 'a' and 'b'. So value of 'c' is a9ca.

Sr No. **Parameters Proposed Work** Type of Multiplier 1 32-bit Approximate-Multiplier 2 15% Area 3 Delay 7.311ns 4 97% Accuracy rate 5 Simulation Time 19.00 Secs 6 Power 0.082mW 7 PDP (Power delay product) 599 8 136.7 MHz Frequency 9 8.7 GHz Throughput 10 Memory 4625720 kilobytes

Table 1: Simulation Parameter

Table 2: Comparison with Previous and proposed work

100000

| Sr No. | Parameters                | Previous work                   | Proposed work                |
|--------|---------------------------|---------------------------------|------------------------------|
| 1      | Type of Multiplier        | 16-bit Multiplier               | 32- Bit multiplier           |
| 2      | Area                      | 3319.20 micrometer <sup>2</sup> | 1500 micrometer <sup>2</sup> |
| 3      | Delay                     | 6 ns                            | 7.311 ns                     |
| 4      | Power                     | 0.112mW                         | 0.082mW                      |
| 5      | PDP (Power delay product) | 727                             | 599                          |

Therefore proposed 32-bit approximate multiplier gives better result in term of calculated parameters. So it can be used in high speed, low area and latency.



Figure 7: Comparison of area and PDP

Figure 7 is showing the comparison of used area and power delay product values. It is clear from graph also that proposed results is better than existing results.



Figure 8: Comparison of delay and power

Figure 8 is showing the comparison of delay and power values. It is clear from graph that proposed approach save the power and reduced delay (for 32-bit approximate multiplier).

Proposed multiplier is designed for 32 bit approximate multiplication while in previous work it is designed only for 16-bit. So it can be say that proposed 32-bit multiplier is advanced of previous multiplier in terms of calculated parameters

## V. CONCLUSION

Multiplier is an important part in arithmetic processors, the current mobile applications and DSP applications need ICs with high-speed operations but low power consumption. This paper presents 32-bit efficient approximate multiplier. It is implemented and result validation using xilinx software successfully. Various parameters calculated like power, area, latency, throughput, frequency and power delay product to identify the improved architectures for high speed applications. The proposed multiplier plans can be utilized in applications with negligible misfortune in yield quality while sparing huge power and area. The 32-bit inputs binary and hexadecimal will be given and 32-bit multiplier output will be resultant. Proposed multiplier save 85% area and consume 0.08mW power.

#### REFERENCES

- [1]. H. Saadat, H. Bokhari and S. Parameswaran, "Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication," in *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 37, no. 11, pp. 2623-2635, Nov. 2018.
- [2]. V. Mrazek, Z. Vasicek, L. Sekanina, H. Jiang and J. Han, "Scalable Construction of Approximate Multipliers With Formally Guaranteed Worst Case Error," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 26, no. 11, pp. 2572-2576, Nov. 2018.
- [3]. S. Venkatachalam and S. Ko, "Design of Power and Area Efficient Approximate Multipliers," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 25, no. 5, pp. 1782-1786, May 2017.

- [4]. S. Mazahir, O. Hasan, R. Hafiz and M. Shafique, "Probabilistic Error Analysis of Approximate Recursive Multipliers," in *IEEE Transactions on Computers*, vol. 66, no. 11, pp. 1982-1990, 1 Nov. 2017.
- [5]. A. Bonetti, A. Teman, P. Flatresse and A. Burg, "Multipliers-Driven Perturbation of Coefficients for Low-Power Operation in Reconfigurable FIR Filters," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 9, pp. 2388-2400, Sept. 2017.
- [6]. R. Zendegani, M. Kamal, M. Bahadori, A. Afzali-Kusha and M. Pedram, "RoBA Multiplier: A Rounding-Based Approximate Multiplier for High-Speed yet Energy-Efficient Digital Signal Processing," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 25, no. 2, pp. 393-401, Feb. 2017.
- [7]. A. Mokhtari, W. Shi, Q. Ling and A. Ribeiro, "DQM: Decentralized Quadratically Approximated Alternating Direction Method of Multipliers," in *IEEE Transactions on Signal Processing*, vol. 64, no. 19, pp. 5158-5173, 1 Oct.1, 2016.
- [8]. G. Zervakis, K. Tsoumanis, S. Xydis, D. Soudris and K. Pekmestzi, "Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 24, no. 10, pp. 3105-3117, Oct. 2016.
- [9]. S. Saito, K. Oishi and T. Furukawa, "Convolutive Blind Source Separation Using an Iterative Least-Squares Algorithm for Non-Orthogonal Approximate Joint Diagonalization," in *IEEE/ACM Transactions on Audio, Speech, and Language Processing*, vol. 23, no. 12, pp. 2434-2448, Dec. 2015.
- [10]. B. Shao and P. Li, "Array-Based Approximate Arithmetic Computing: A General Model and Applications to Multiplier and Squarer Design," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 62, no. 4, pp. 1081-1090, April 2015.
- [11]. Z. Zhi-Qing Lü and X. An, "Non-conforming finite element tearing and interconnecting method with one Lagrange multiplier for solving large-scale electromagnetic problems," in *IET Microwaves*, *Antennas & Propagation*, vol. 8, no. 10, pp. 730-735, 15 July 2014.
- [12]. U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, S. Kulasekera and A. Edirisuriya, "Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 61, no. 6, pp. 1727-1740, June 2014.
- [13]. D. De Caro, N. Petra, A. G. M. Strollo, F. Tessitore and E. Napoli, "Fixed-Width Multipliers and Multipliers-Accumulators With Min-Max Approximation Error," in *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 60, no. 9, pp. 2375-2388, Sept. 2013.
- [14]. S. S. Roy, C. Rebeiro and D. Mukhopadhyay, "Theoretical Modeling of Elliptic Curve Scalar Multiplier on LUT-Based FPGAs for Area and Speed," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 21, no. 5, pp. 901-909, May 2013.
- [15]. J. Wang, S. Kuang and S. Liang, "High-Accuracy Fixed-Width Modified Booth Multipliers for Lossy Applications," in *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 19, no. 1, pp. 52-60, Jan. 2011.