# IMPLEMENTATION AND PERFORMANCE COMPARISON OF CMOS AND GNRFET BASED ADVANCED MULTIPLIER DESIGNS

Ganesh Kumar M. T<sup>1</sup>, Sandeep Malik<sup>2</sup>, Madan H. R<sup>3</sup>, Dr. Ravish Aradhya H.V<sup>4</sup> <sup>1</sup>Research Scholar, <sup>2</sup>Assistant Professor, <sup>3</sup>Project Manager, <sup>4</sup>Professor <sup>1</sup>Department of Electronics and Communication Engineering,

<sup>1</sup>Raffles University, Neermana, Alwar (Raj)-301 705, India

# Abstract:

MOSFET scaling theory primarily attributes for the design of smaller area, optimized power and faster ICs. The scaling theory continued the Moore's law till the channel size reached the Nano-meter (nm) regime. Further, the scaling of transistors below 45 nm results in more power dissipation due to current leakage. This led in finding alternatives to the traditional CMOS technology. Graphene based FETs (GNRFETs are identified as the potential alternative to the CMOS technology. Graphene Nano Ribbon (GNRFET) based designs are discovered to be the most viable designs considering the performance in terms of power consumption and package area. Design and analysis of CMOS and GNRFET based modified Booth multiplier and modified Wallace tree multipliers are presented in the proposed research work.

The performance of CPU mainly depends on the performance of the multipliers in a processing unit. With the scaling of transistors, the advanced devices like finFET, CNTFET and GNRFET based designs promise better solution. 8-bit modified Booth multiplier and modified Wallace-tree multipliers are implemented using CMOS and GNRFET based technologies in the proposed work. The obtained results are analyzed and compared for the key performance parameters like Power Dissipation, Propagation Delay and Power-Delay-Product. The results infer that GNRFET based modified Wallace tree multiplier offers better performance than the other multiplier designs.

Keywords - Booth Algorithm, Wallace-tree multiplier, Figure of Merit, GNRFETs, Optimum multiplier.

# I. INTRODUCTION

Power dissipation is one of the most important design objectives in the design of integrated circuit for any embedded applications after the chip area<sup>[1]</sup>. The power dissipation is directly proportional to the number of switching operations in any electronics system. Switching operations are generally more in Signal Processing unit. The main building block of a signal processing circuits is multiplier unit. High speed and low power multiplier unit is desirable for any signal processor unit since speed and throughput rate are always the concerns of a signal processing system. Due to rapid growth of portable electronic systems like laptop, calculator, mobile phones, etc., the low power circuits have become very important in today's world <sup>[2, 3]</sup>. Low power and high-throughput circuitry design are playing the challenging role for a VLSI designer. For real-time signal processing, a high speed and high throughput multiplier unit is always a key to achieve a high-performance Signal Processing system<sup>[4]</sup>. Multipliers dissipate significant power in a computational unit of an electronic system.

# 1.1 Basics of Multipliers

Multiplication is a mathematical operation at its simplest is an abbreviated process of adding a number to itself, a specified number of times. A number (multiplicand) is added to itself a number of times as specified by another number (multiplier) to form result (product). To summarize, multiplication involves two basic operations: the generation of the partial products and their accumulation. Therefore, mainly there are two possible ways to speed up the multiplication, reduce the number of partial products or accelerate their accumulation. A smaller number of partial products also reduces the complexity, and as a result reduces the time needed to accumulate the partial products <sup>[5]</sup>. Both solutions can be applied simultaneously.

Multiplication hardware often consumes much time and area compared to other arithmetic operations like addition and subtraction. A signal processing unit use a multiplier unit as a basic building block and the algorithms they run are often multiplication-intensive <sup>[6, 7]</sup>. Multiplication-based operations are currently implemented in most of the signal processing applications such as convolution, Fast Fourier Transform (FFT), filtering, in the Central Processing Unit (CPU) of microprocessors and microcontrollers.

# II. GNR FIELD EFFECT TRANSISTOR (GNRFET)

The channel material in a transistor can be a bilayer graphene or micron - wide graphene sheet or a GNR, thereby giving a wide variety of graphene transistors. Graphene transistors that contain GNR as their channel material are termed as GNR Field Effect Transistors (GNRFETs). GNRFETs have two variants namely, Schottky barrier (SB) - type and Metal Oxide Semiconductor (MOS) - type [10]. The former uses metal contacts and graphene channel while in the latter, the reservoirs are doped with acceptor and donor impurities. SB-type GNRFETs have lower  $I_{on}/I_{off}$  ratio compared to MOS-type GNRFETs, thus making MOS-type superior in device

#### www.jetir.org (ISSN-2349-5162)

performance. Depending upon the doping material, there are two types of MOS-type GNRFETs, namely N-type and P-type. N-type GNRFETs and P-type GNRFETs are obtained by doping with donor and acceptor impurities respectively <sup>[8]</sup>.



Fig 2.1 - (a) Top view of GNRFET (b) side view of GNRFET<sup>[16]</sup>

The structure of MOS-type GNRFET is as shown in Fig 2.1. In a single GNRFET, to increase the drive strength, multiple graphene ribbons with armchair chirality are connected in parallel. GNRs under the gate are un-doped, while those between the wide contacts and gate are doped heavily with a  $f_{dop}$ , doping fraction. The un-doped (intrinsic) part is called channel while the doped parts are called reservoirs. Channel length is  $L_{ch}$ , reservoir length is  $L_{res}$ , ribbon width is  $W_{ch}$ , gate width is  $W_{gate}$  and ribbon spacing is  $2W_{sp}$  [9].

In order to reduce the number of metal-graphene contacts, multiple layers of metal are used on top of the single graphene layer. Gates are located on the first layer of metal while drains, channels and sources on the layer of graphene. Logic gates are connected with each other on metal layer and connections within each logic gate are made on the layer of graphene without vias. Both Zigzag and armchair GNRs serve as good conductors above 20nm width and hence can be used as local interconnect for routing on the graphene layer. Since the input to the logic gate is on the metal layer and the output (drain/source) is on the layer of graphene, the use of metal-graphene vias cannot be avoided <sup>[10]</sup>.

## III. MODIFIED BOOTH MULTIPLIER

The modified Booth recoding algorithm is the most frequently used method to generate partial products. This algorithm allows for the reduction of the number of partial products to be compressed in a carry-save adder tree, thus the speed can be enhanced. This Booth–Mac Sorley algorithm is simply called the Booth algorithm, and the two-bit recoding using this algorithm scans a triplet of bits to reduce the number of partial products by roughly 50%. The 2-bit recoding means that the multiplier is divided into groups of two bits, and the algorithm is applied to this group of divided bits. The Booth algorithm is implemented into two steps: Booth encoding and Booth selecting. The Booth encoding step is to generate one of the five values from the adjacent three bits. The Booth selector generates a partial product bit by utilizing the output signals <sup>[11]</sup>.

One advantage of the Booth multiplier is, it reduces the number of partial product, thus make it extensively used in multiplier with long operands <sup>[12]</sup>. The main disadvantage of Booth multiplier is the complexity of the circuit to generate a partial product bit in the Booth encoding.

# 3.1 Modified Booth Algorithm using pipeline technique



Fig. 3.1: Modified Booth Algorithm using pipeline technique

Pipeline technique is an effective design for binary multiplication using modified booth multiplier <sup>[13]</sup>. The block diagram of the modified Booth multiplier is shown in the Fig. 3.1. It increases the computing speed by combining the concept of parallel processing and pipelining into a single concept. The data flows synchronously across the array between neighbors, usually with different data flowing in different directions.

# IV. MODIFIED WALLACE TREE MULTIPLIER

The main objective of Modified Wallace-Tree for multiplier design is to decrease the propagation delay and minimize the chip area. KSA has been used instead of CLA in the modified Wallace-Tree multiplier. KSA is used mostly in all the high-performance adders since it offers very less propagation delay. Usage of KSA instead of CLA is to enhance the speed and to minimize the area of the multiplier design. In order to speed-up the process of binary addition, parallel prefix adders are used and are more flexible. The structure of the Wallace tree multiplier is shown in the figure. 4.1.



Fig. 4.1: Modified Wallace-Tree Multiplier

The 3-main stages of any parallel prefix adder are:

- 1. Pre-processing stage
- 2. Carry generation network and
- 3. Post processing stage.

In the pre-possessing stage, computation, generation and propagation of signals is done. Equations 4.1 and 4.2 describe the process in stage-1:

Eq. 4.1 Eq. 4.2

$$\begin{array}{l} P_i = A_i \bigoplus B_i \\ G_i = A_i \bullet B_i \end{array}$$

Corresponding to each bit, carry is computed in network stage meant for carry generation. Carry generation operation is carried out concurrently and then split into smaller pieces. The logic equations 4.3 and 4.4 illustrate the pre-processing stage:

$$CP_{i:j} = P_{i:k+1} \text{ and } P_{k:j}$$

$$Eq. 4.3$$

$$CG_{i:j} = G_{i:k+1} \text{ or } (P_{i:k+1} \text{ and } G_{k:j})$$

$$Eq. 4.4$$

In the Post-processing stage, the computation of the final summation of all the inputs is done and is depicted through equations 5.5 and 5.6:

$$\begin{array}{ll} C_{i\text{-}1} = (P_i \bullet C_{in}) + G_i & \text{Eq. 4.5} \\ \text{Si} = P_i \bigoplus C_i\text{-}1 & \text{Eq. 4.6} \end{array}$$

The parallel prefix adder consists of black cells and gray cells. Gray signal is used to generate the left signal and the black cell (BC) is used to generate the ordered pair <sup>[14]</sup>. Kogge-Stone adder (KSA) is used in high performance computational systems. It is one of the high-performance adders and is derived form of Carry Look Ahead adder.

## 4.1 Kogge-Stone Adder (KSA)

KSA is a parallel prefix form carry look ahead adder. It generates carry in O (logn) time and is widely considered as the fastest adder and is widely used in the industry for high performance arithmetic circuits <sup>[15]</sup>. In KSA, carries are computed fast by computing them in parallel at the cost of increased area. The structure of KSA is shown in the figure 4.2.



# Fig. 4.2: Structure of a Kogge Stone Adder

# V. RESULTS AND DISCUSIONS

CMOS and GNRFET based 8-bit multipliers using modified Booth algorithm and modified Wallace tree algorithm are designed and implemented in the proposed work. The important performance parameters namely the total power dissipation, critical propagation delay and Power Delay Product are observed and recorded for both the designs.

#### 5.1 Modified Booth Multiplier

8-bit modified Booth multipliers are designed and simulated using CMOS and GNRFETs. The power dissipation of the designed modified Booth multipliers is tabulated in the Table 5.1 and the graphical representation of the results are shown in the Fig. 5.1.





The critical propagation delay of modified Booth multipliers are analyzed and tabulated in the Table 5.2. The graphical representation of the results is shown in the Fig. 5.2.



#### Table 5.2: Propagation Delay (s)

| Architecture              | Technology | Delay (s) |
|---------------------------|------------|-----------|
| Modified Booth multiplier | CMOS       | 3.725E-10 |
|                           | GNRFET     | 6.91E-09  |

#### Fig 5.2: Propagation delay (s)

#### www.jetir.org (ISSN-2349-5162)

The key performance parameter, the Power Delay Product (PDP) is analyzed and the results are tabulated in the Table. 5.3. The graphical representation of the results is shown in the Fig. 5.3.



| Table 5.3: Power Delay Prod | luct (Ws) |
|-----------------------------|-----------|
|-----------------------------|-----------|

| Architecture                 | Technology | PDP (Ws) |
|------------------------------|------------|----------|
| Modified Booth<br>multiplier | CMOS       | 7.13E-14 |
|                              | GNRFET     | 5.09E-15 |

Fig. 5.3: Power Delay Product (Ws)

#### 5.2 Modified Wallace tree multiplier

The power dissipation of the designed modified Wallace tree multipliers is tabulated in Table 5.4 and the graphical representation of the same is shown in the Fig. 5.4 respectively.



Fig. 5.4: Power Dissipation (W)

The Propagation delay of modified Wallace tree multipliers are calculated and are tabulated in the Table 5.5. The graphical representation of the results is shown in the Fig. 5.5.



| Table | 5.5: | Performance | delay | (s) |
|-------|------|-------------|-------|-----|
|-------|------|-------------|-------|-----|

| Architecture               | Technology | Delay (s)  |
|----------------------------|------------|------------|
| Modified                   | CMOS       | 4.3586E-11 |
| Wallace Tree<br>Multiplier | GNRFET     | 1.2002E-08 |

Fig. 5.5: Performance delay (s)

Another key performance parameter, the Power Delay Product (PDP) of the CMOS and GNRFET based Wallace tree multipliers are analyzed and tabulated in the Table 5.6. The graphical representation of the results is shown in the Fig. 5.6.



| Table 5.6: Performance | delay | Product | (Ws) |
|------------------------|-------|---------|------|
|------------------------|-------|---------|------|

| Architecture                        | Technology | PDP (Ws)  |
|-------------------------------------|------------|-----------|
| Modified Wallace<br>Tree Multiplier | CMOS       | 7.6E-15   |
|                                     | GNRFET     | 6.251E-15 |

# Fig. 5.6: Performance Delay Product (Ws) VI. CONCLUSION AND FUTURE ENHANCEMENTS

GNRFET based multiplier dissipates less power as compared to the CMOS based multiplier due to Graphene characteristics such as low gate capacitance, tunneling current and low  $V_{dd}$ . CMOS based designs serves as a reference design to compare the performance parameters of multipliers. CMOS based multipliers employ the traditional switching operation with a constant power supply and hence dissipates more power as compared to GNRFET based multipliers.

The obtained result clearly implies that GNRFET based multipliers dissipate less power than the conventional CMOS multiplier designs. From the obtained results, it is clear that the modified Booth multiplier dissipates least power when compared to the modified Wallace tree multiplier design, whereas the modified Wallace tree-based multiplier offers least propagation delay as compared to the modified Booth multiplier design.

GNRFET based modified Wallace tree multiplier design offers the least PDP as compared to the modified Booth multiplier design.

# REFERENCES

[1] Philip Teichmann, 'Fundamentals of Adiabatic Logic', *Adiabatic Logic- Future trend and System Level Perspective*, Springer Series in Advanced Microelectronics 34, pp. 5-23.

[2] H. V. Ravish Aradhya, H R Madan, Megaraj T Mahadikar, R Muniraj, M S Suraj, Mohammed Moiz, "Design and performance comparison of adiabatic 8-bit multipliers", 2016 IEEE Distributed Computing, VLSI, Electrical Circuits and Robotics (DISCOVER-2016), 13-14 August 2016, Bengaluru, pp. 141 – 147.

[3] H. V. Ravish Aradhya, Megaraj T Mahadikar, R Muniraj, M S Suraj, Mohammed Moiz, H R Madan, "Design, analysis and performance comparison of GNRFET based adiabatic 8-bit ALU", IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT - 2016), 20-21 May 2016, Bengaluru, pp. 1584 – 1588.

[4] Lipiansky, E. "A Simple CPU Design", Wiley-IEEE press, ISBN: 9781118414552, 2013.

[5] T. Arunachalam and S. Kirubaveni, "Analysis of High Speed Multipliers", International conference on Communication and Signal Processing, April 3-5, 2013, India.

[6] Moore, Gordon. "Progress in Digital Integrated Electronics", IEEE, IEDM Tech Digest, pp.11-13, 1975.

[7] V. Tiwari, D. Singh, S. Rajgopal, G. Mehta, R. Patel, F. Baez, "*Reducing power in high performance microprocessors*", Proceedings of the 35th Annual Design Automation Conference, San Francisco, CA, pp. 732–737, 1998.

[8] Y.-W. Son, M. L. Cohen, and S. G. Louie, "Energy gaps in graphene nanoribbons", Physics Rev. Letters, 2006.

[9] Ying-Yu Chen, Amit Sangai, Morteza Gholipour and Deming Chen, "Graphene Nano-Ribbon Field-Effect Transistors as Future Low-Power Devices", IEEE-2013, Symposium on Low Power Electronics and Design, 978-1-4799-1235-3/13/2013 IEEE.

[10] Hsin-Lei Lin, "Design of a Novel Radix – 4 Booth Multiplier", the 2004 IEEE Asia – Pacific Conference on Circuit and Systems, December 2005.

[11] W. C. Yeh and C.-W. Jen, "High-speed Booth encoded parallel multiplier design", IEEE Trans. Comput., vol. 49, no. 7, pp. 692–701, Jul. 2000.

[12] Hsin-Lei Lin, Robert C. Chang, Ming-Tsai Chan, "Design of a Novel Radix-4 Booth Multiplier", IEEE Asia-Pacific Conference on Circuits and Systems, Vol.2, pp. 837-840, 2004.

[13] Jin-HaoTu and Lan-Da Van, "Power-Efficient Pipelined Reconfigurable Fixed-Width Baugh-Wooley Multipliers" *IEEE Transactions on computers*, vol. 58, No. 10, October 2009.

[14] Meenali Janveja, Vandana Niranjan, "High performance Wallace tree multiplier using improved adder", ICTACT journal on Microelectronics, April 2017, Volume - 03, Issue - 01, pp. 370 - 374, 2017

[15] Vishnupriya.A and Sudarmani.R, "Efficient Serial Multiplier Design using Ripple Counters, Kogge-Stone Adder and Full Adder", *International Journal of Computer Applications* (0975 – 8887), Volume 67– No.6, April 2013.