# A Comprehensive Survey on Packet and Circuit Switched NoC Routers

<sup>1</sup>Liyaqat Nazir, <sup>2</sup> Mohd Asifuddola

<sup>1</sup>Lecturer, <sup>2</sup>Asisstant professor <sup>1,2</sup>Electronics and Communication Engineering, <sup>1</sup>University of Kashmir, Srinagar, India, <sup>2</sup>MANU Hyderabad, India

*Abstract*: This study has been undertaken to present a comprehensive survey to have an insight in network-on-chip design. The article provides a detailed comparison of various performance parameters adopted in the design process of Network on chip router microarchitecture. The article focuses mainly on two broad classes of router viz packet switched and circuit switched routers and reports their respective parameters that may further aid a designer in adopting a better choice while carrying a topological level design process.

## IndexTerms – Network-on-chip, Virtual channels.

# I. INTRODUCTION

Traditionally the design space exploration for Network on chip has focused on the computational aspects of the problem at hand. However, as the number of components on a single chip and their performance continue to increase the design of the communication architecture plays a major role in defining area, performance, and energy consumption of overall system. Further more, with technology scaling, the global interconnects cause severe on-chip synchronization errors, unpredictable delays, and high power consumption [1] [2]. To mitigate these effects, the network-on-chip (NoC) approach emerged recently as a promising alternative to classical bus -based and point to point (P2P)communication architectures[3]. The NoC paradigm provides a scalable solution to the tough integration challenge of modern SoCs, by applying at the silicon chip level well established networking principles, after suitably adapting them to the silicon chip characteristics and to application demands [4] THe general block diagram of a NoC based system is illustrated in figure 3.1. With the application of seminal idea of networking technology to address the chip-level interconnect problem has been shown to be adequate for current systems, the complexity of future computing platforms demands new architectures that go beyond physical-related requirements and equally participate in delivering high-performance, quality of service, dynamic adaptivity at the minimum energy and area overhead[5]. The Network-on-Chip communication paradigm architechtures have evolved significantly in key areas starting from topology and routing algorithms [6],[7], [8], [9], and covering router and network interface microarchitecture, over the last decade. The topologies and routing algorithms are usually defined in a generic manner, so as true for a router microarchitecture. The general router design involves assembling hardware blocks from a component library of varying granularity and complexity. Although such an approach has provided so far efficient architectures, its efficiency is limited by the efficiency of the independent blocks; the designer's potential for delivering efficient compositions, and the depth of the design space exploration. There is a high need of a unified customizable model that will cover in a unified manner all micro-architectural alternatives such as control and data path pipelines [10] [8], speculation [11], buffering architecture [12], [13], [14] and allocation policies. The derivation of such a model would allow for rapid, safe architectural changes in order to find the globally optimum architectures bridging the gap between architecture exploration, microarchitecture fine tuning and physical implementation.

# II. NETWORK-ON-CHIP ROUTER

The heart of an on-chip network is the router, which undertakes the crucial task of steering and coordinating the data flow. The architecture employed by conventional NoC routers [1] is illustrated in Figure 1(a). The router operation revolves around two fundamental regimes: (a) the datapath, and (b) the associated control logic. The datapath consists of a number of input and output channels to facilitate packet switching and traversal. In general, the router has Q input and Q output channels (or ports). In most implementations, Q=5; four inputs from the four cardinal directions (North, East, South and West) and one from the local Processing Element (PE), which is attached to the NoC router. To minimize router complexity and traffic congestion, NoC routers are usually assumed to connect to a single PE. The input/output channels may consist of undirectional links, bidirectional, or even serial links. Buffering within a network router is necessary due to congestion, output link contention, and intra-router processing



## III. NOC ROUTER BLOCKS

The section illustrates various blocks of a network on chip router. The network on chip router microarchitecture is composed broadly of two types of circuit's blocks. These blocks are classified as control path block network and data path block network. The control path consists of blocks such as routing and computation blocks, allocations units. While as data path blocks comprises blocks such as buffering and storage units, channel pipeline blocks, crossbar unit and virtual channel units. Figure 2 shows the architecture of a generic virtual channel router The router has five input ports and five output ports, supporting two virtual channels (VCs) per port. The control logic comprises of the routing computation (RC) unit, the virtual channel allocation (VA) unit, the switch allocation (SA) unit and the crossbar (XB) which connects the input and output ports of the router.



The detailed discussion of each block is presented as follows:

- Input Buffering unit: In a general NoC router, buffers are organized as multiple fixed-length queues (as shown in Figure 2). each of these
  queues is termed as a virtual channel. All virtual channels share a physical channel. Each new incoming flit is stored in the VC buffer
  designated by its VC identifier.
- Routing path Computation Logic: The routing path computation logic is used to compute the output port based on the destination information in the head flit using routing algorithms. In a generic NoC router, we use the routing computation logic to compute the output port of the downstream router. Particularly, each input port has its own routing computation unit. The fundamental logic blocks required for implementing a routing logic unit is comparators.
- Virtual Channel Allocation Unit: The virtual channel allocation unit allocates an unused VC to a new packet. It only operates on the head flit. Figure 2 shows the block diagram of the virtual channel allocation unit in a generic 2-stage router. In the first stage of this separable virtual channel allocator, every input VC with a head flit arbitrates for an empty VC in the downstream router. In the second stage, head flits that have been allocated to the same VC at the downstream router compete with each other.
- Switch Allocation Unit: The switch allocation unit determines which input VC from an input port can transmit a flit through the crossbar in the next cycle. Figure 5 shows the micro-architecture of the switch allocation unit in a generic 2-stage router. It can be seen that two switch allocators are implemented. One handles speculative requests (from packets that are also requesting a VC) and the other one handles non-speculative requests. Non speculative requests have higher priorities over speculative requests. The red rectangular box shows the diagram of a separable switch allocator. The first stage is used to select an input VC from each input port. The second stage is used to arbitrate for an output port.
- Crossbar: The crossbar connects the input and output ports. Figure 6 shows the block diagram of the crossbar in a generic NoC router, where Q is the number of port. Each output port has an associated multiplexer. The control signal comes from the result of SA.

## IV. SWITCHING CLASSIFICATION IN NOC ROUTERS.

Messages are data that have to be sent from a sender to a receiver through a network. Messages can be transformed into packets, by encapsulating all or part of each message with network control information. Alternatively, messages can be sent after a connection establishment between the sender and the receiver. This defines the two basic modes for message transmission in networks, packet switching and circuit switching, respectively. Wormhole packet switching is the switching mode most commonly employed in NoCs [15]. Packet-switched networks often allow for high aggregate system bandwidth, as each packet can be distributed across a subset of network nodes at any given instant [16]. However, such networks generally require congestion control and packet processing, which include the need for buffers to queue packets awaiting the availability of routing resources. Correct buffer sizing is a fundamental parameter to optimize NoC performance. Small buffers increase network congestion and large buffers increase area and latency overhead. This switching mode supports well best-effort (BE) services [17], being efficient for traffic with short and frequent packets. HERMES [18], Xpipes [19], MANGO [20] and SoCIN [21] are examples of NoCs employing wormhole packet switching. Another switching mode employed in NoCs is circuit switching. It can provide guaranteed throughput and latency bounds for individual packets, since and exclusive path is allocated to data transfers between requirement is typically a single register instead of a FIFO buffer, since when the circuit is established the NoC acts like a pipeline where each router acts as a stage. The disadvantages of circuit switching are the channel bandwidth underutilization when data is transmitted at lower rates and the setup latency to establish a circuit, which depends on the traffic in the path during circuit establishment. This switching mode is more efficient for traffics with long packets transmitted at high rates, with requirements for throughput and latency guarantees. Representative circuit-switching NoCs are Æthereal [22], SoCBUS [23] and Octagon [24]. Æthereal employs circuit switching only for traffic with QoS requirements, while BE traffic uses wormhole packet switching. Table I summarizes the main advantageous and inconvenient features of circuit switching and wormhole packet switching.

| Table I. Advantages and disadvantages of NoC switching modes. |  |
|---------------------------------------------------------------|--|
|---------------------------------------------------------------|--|

| Features           | Features                                                                                     | Features                                                                                                                                                                                  |
|--------------------|----------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| →Switching Mode    | Advantageous                                                                                 | Inconvenient                                                                                                                                                                              |
| Circuit Switching  | Guaranteed throughput and latency<br>- Single register instead of FIFO buffers               | - Static path, reservation and possibly wasted bandwidth                                                                                                                                  |
| Wormhole Switching | Shared NoC resources, enabling to distributes<br>multiple flows simultaneously along routers | <ul> <li>Under heavy traffic, flits may block an important number<br/>of routers</li> <li>Wasted bandwidth when the traffic initiator rate is slower<br/>than the channel rate</li> </ul> |

### © 2019 JETIR June 2019, Volume 6, Issue 6

1.08

## V. RESULTS AND DISCUSSION

There are many different ways to measure and present the performance of a particular NoC based system, however the standard metric to measure the performance of a NoC are throughput, latency, and area. While these names sound generic, their exact definition depends strongly on how we measure them or, more specifically, the measurement setup.

Power analysis determines the amount of on-chip power dissipated during operation. For high throughput NoC systems it is more appropriate to quantify the power efficiency through energy analysis. To compare and contrast different NoC architectures, a standard set of performance metrics can be used. Another parameter of interest is throughput. Throughput is the rate at which packets are delivered by the network for a particular traffic pattern. It is measured by counting the packets that arrive at destinations over a time interval for each flow (source-destination pair) in the traffic pattern and computing from these flow rates the fraction of the traffic pattern delivered. Throughput, or accepted traffic, is to be contrasted with demand, or offered traffic, which is the rate at which packets are generated by the packet source. Area on the other hand may be defined as the amount of logic resources needed to implement a NoC architecture. Table 2 gives a detailed comparison of various parameters of interest reported by various authors while designing a packet switch based router microarchitecture. Table 3 gives a detailed comparison of various performance parameters reported by various authors while designing a circuit switch based NoC router microarchitecture.

|                        | -            | -           | -                 | -             |              |         |
|------------------------|--------------|-------------|-------------------|---------------|--------------|---------|
|                        |              | Topological |                   |               | Timing delay | Power   |
| Authors                | Channel type | connections | Routing algorithm | Throughput    | (ns)         | (mW)    |
| David. A et-al [25]    | PS           | 4x4         | custom            | NA            | 1.33/cycle   | 1.08    |
|                        | PS           |             | encoding scheme   |               |              | 254 for |
| Mehdi et-al[26]        |              | 6x6         | algo              | 32bit/0.3ns   | 0.3          | DSB     |
| Rohit et-al[27]        | PS           | 8x8         | WXY routing       | 1 Gbps        | 1            | NA      |
|                        | PS           |             | RACO-CAR          | 1.879         |              |         |
| Mohammad et-al[28]     |              | 4x3         | Routing           | Gbs/node      | 2            | 51      |
| En-jui chang et-al[29] | PS           | 16x16       | RoShQ5            | 0.362 f/cycle | 1            | 58.26   |
| Anh. T. Tran et-al[30] | PS           | 2D mesh     | PAA routing       | 0.02Gps       | 0.015        | 108.2   |
|                        |              |             |                   |               |              |         |

custom

Table 2: Comparison of performance parameters in various packet switched NoC routers

\*PS= Packet switching, CS= Circuit switching, NA Not Available.

4x4

PS

NA

1.33/cycle

| Authors             | Channel type | Target device | Resources used   | Throughput  | Frequency(MHz) |
|---------------------|--------------|---------------|------------------|-------------|----------------|
| Matheus et al [32]  | CS           | V5            | 0.35 % total LUT | NA          | NA             |
| lu wang [33]        | CS           | Asic          | 80000Um2         | .4 F/c/node | NA             |
| Bouraoui Chemli[34] | CS           | v6            | 7847             | NA          | NA             |
| yahia salah[35]     | CS           | \$3           | 713 slices       | 0.7         | 134            |
| yahia slah[36]      | CS           | V2            | 291 Slices       | 0.35        | 164            |
| jawad latief[37]    | CS           | NA            | NA               | 1.14        | NA             |

\*PS= Packet switching, CS= Circuit switching, NA Not Available.

#### VI. CONCULSION

JIANG et-al[31]

The paper provides a comprehensive introduction about Network on chip routers. The comprehensive survey provided in this article will help NoC designers to adopt a proper microarchitecture for the design of a NoC based topology.

#### REFERENCES

- [1]K. Lee, S.-J. Lee, and H.-J. Yoo, "Low-power network-on-chip for highperformance soc design," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 2, pp. 148–160, 2006.
- [2]P. A. Beerel and M. E. Roncken, "Low power and energy efficient asynchronous design," Journal of Low Power Electronics, vol. 3, no. 3, pp. 234-253, 2007.
- [3]J. Duato, S. Yalamanchili, and L. M. Ni, Interconnection networks: an engineering approach. Morgan Kaufmann, 2003.
- [4]J. Handy, "Noc interconnect improves soc economics," Objective analysisSemiconductor market research, 2011.
- [5]G. De Micheli, C. Seiculescu, S. Murali, L. Benini, F. Angiolini, and A. Pullini, "Networks on chips: From research to products," in Design Automation Conference (DAC), 2010 47th ACM/IEEE. IEEE, 2010, pp. 300-305.
- [6]D. Seo, A. Ali, W.-T. Lim, N. Rafique, and M. Thottethodi, "Near-optimal worstcase throughput routing for two-dimensional mesh networks," in ACM SIGARCH Computer Architecture News, vol. 33, no. 2. IEEE Computer Society, 2005, pp. 432–443.
- [7]R. Kumar, D. M. Tullsen, N. P. Jouppi, and P. Ranganathan, "Heterogeneous chip multiprocessors," Computer, vol. 38, no. 11, pp. 32-38, 2005.
- [8]J. Kim, J. Balfour, and W. Dally, "Flattened butterfly topology for on-chip networks," in Microarchitecture, 2007. MICRO 2007. 40th Annual IEEE/ACM International Symposium on. IEEE, 2007, pp. 172–182.
- [9]S. Ma, N. E. Jerger, and Z. Wang, "Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networkson-chip," in High Performance Computer Architecture (HPCA), 2012 IEEE 18th International Symposium on. IEEE, 2012, pp. 1 - 12.
- [10]J. Kim, "Low-cost router microarchitecture for on-chip networks," in Proceedings of the 42nd annual IEEE/ACM international symposium on microarchitecture. ACM, 2009, pp. 255-266.

#### © 2019 JETIR June 2019, Volume 6, Issue 6

[11]R. Mullins, A. West, and S. Moore, "Low-latency virtual-channel routers for onchip networks," in ACM SIGARCH Computer

- Architecture News, vol. 32, no. 2. IEEE Computer Society, 2004, p. 188.
- [12]G. Michelogiannakis and W. J. Dally, "Router designs for elastic buffer on-chip networks," in High Performance Computing Networking, Storage and Analysis, Proceedings of the Conference on. IEEE, 2009, pp. 1–10.
- [13]S. M. Hassan and S. Yalamanchili, "Centralized buffer router: A low latency, low power router for high radix nocs," in Networks on Chip (NoCS), 2013 Seventh IEEE/ACM International Symposium on. IEEE, 2013, pp. 1–8.
- [14]D. U. Becker, N. Jiang, G. Michelogiannakis, and W. J. Dally, "Adaptive backpressure: Efficient buffer management for onchip networks," in Computer Design (ICCD), 2012 IEEE 30th International Conference on. IEEE, 2012, pp. 419–426.
- [15] Dehyadgari, M.; Nickray, M.; Afzali-kusha, A.; Navabi, Z. "A new protocol stack model for network on chip". In: IEEE Computer Society Annual Symposium on Emerging VLSI Technologies and Architectures, 2006. 3 pp.
- [16] Bjerregaard, T.; Mahadevan, S. "A survey of research and practices of Network-on-chip". ACM Computing Surveys, 38(1), 2006, pp. 1-51.
- [16] Hilton, C.; Nelson, B. "PNoC: a flexible circuit-switched NoC for FPGA-based systems". IEEE Proceedings on Computers and Digital Techniques, 153(3), May 2006, pp. 181-188.
- [17] Jantsch, A.; Tenhunen, H. "Networks on Chip". Kluwer Academic Publishers, 2003, 303p.
- [18] Moraes, F.; Calazans, N.; Mello, A.; Möller, L.; Ost, L. "HERMES: an Infrastructure for Low Area Overhead Packetswitching Networks on Chip". Integration the VLSI Journal, 38(1), Oct. 2004, pp. 69-93.
- [19] Benini, L.; Bertozzi, D. "Xpipes: A Network-on-Chip Architecture for Gigascale Systems-on-Chip". IEEE Circuits and Systems Magazine, Second Quarter, 2004, pp. 18-31.
- [20] Bjerregaard, T.; Sparso, J. "A Router Architecture for Connection-Oriented Service Guarantees in the MANGO Clockless Network-on-Chip". In: Proceedings of the Design, Automation and Test in Europe, DATE'05, 2005, pp. 12261231.
- [21] Zeferino, C. A.; Susin, A. A. "SoCIN: a Parametric and Scalable Network-on-Chip". In: 16th Symposium on Integrated Circuits and Systems Design, SBCCI'03, 2003, pp. 169-174.
- [22] Goossens, K.; Dielissen, J.; Radulescu, A. "Æthereal network-on-chip: concepts, architectures, and implementations". IEEE Design & Test of Computers, 22(5), Sept.-Oct. 2005, pp. 414-421.
- [23] Wiklund, D.; Liu, D. "SoCBUS: Switched Network-on-Chip for Hard Real Time Embedded Systems". In: Proceedings of the 17th IEEE International Parallel and Distributed Processing Symposium, Apr. 2003, pp. 113-116.
- [24] Karim, F.; Nguyen, A.; Dey, S. "An Interconnect Architecture for Networking Systems on Chips". IEEE Micro, 22(5), Sept.Oct. 2002, pp. 36-45.
- [25] David Atienza, Federico Angiolini, Srinivasan Murali, Antonio Pullini, Luca Benini and Giovanni De Micheli, Network-on-Chip Design and Synthesis Outlook, Integration, The VLSI Journal, vol. 41, pp. 340–359, (2008).
- [26] Mehdi Modarressi, Arash Tavakkol and Hamid Sarbazi-Azad, Virtual Point-to-Point Connections for NoCs, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 29, no. 6, June (2010).
- [27] Rohit Sunkam Ramanujam, Vassos Soteriou, Bill Lin and Li-Shiuan Peh, Extending the Effective Throughput of NoCs with Distributed Shared-Buffer Routers, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 30, no. 4, April (2011).
- [28] Mohammad Abdullah Al Faruque, Thomas Ebi and Jrg Henkel, AdNoC: Runtime Adaptive Network-on-Chip Architecture, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 2, February (2012).
- [29] En-Jui Chang, Hsien-Kai Hsin, Chih-Hao Chao, Shu-Yen Lin and An-Yeu (Andy) Wu, Regional ACO-Based Cascaded Adaptive Routing for Traffic Balancing in Mesh-Based Network-on-Chip Systems, 10.1109/TC.2013.2296032, IEEE Transactions on Computers, (2013).
- [30] Anh T. Tran and Bevan M. Baas, Achieving High-Performance On-Chip Networks With Shared-Buffer Routers, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 22, no. 6, June (2014).
- [31] Guoyue Jiang, Zhaolin Li, Fang Wang and Shaojun Wei, A Low-Latency and Low-Power Hybrid Scheme for on-Chip Networks. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 23, no. 4, (2015).
- [32]Moreira, Matheus T., et al. "BaBaNoC: an asynchronous network-on-chip described in Balsa." 2013 International Symposium on Rapid System Prototyping (RSP). IEEE, 2013.
- [33]Wang, Lu, et al. "A high performance reliable NoC router." Integration 58 (2017): 583-592.
- [34]Chemli, Bouraoui, and Abdelkrim Zitouni. "A turn model based router design for 3D network on chip." World Applied Sciences Journal 32.8 (2014): 1499-1505.
- [35]Salah, Yahia, et al. "Cost/performance evaluation for a 3D symmetric NoC router." International Image Processing, Applications and Systems Conference. IEEE, 2014.
- [36]Salah, Yahia, and Rached Tourki. "Design and fpga implementation of a qos router for networks-on-chip." 2011 3rd International Conference on Next Generation Networks and Services (NGNS). IEEE, 2011.
- [37]Latif, Jawwad, Hassan Nazeer Chaudhry, and Sadia Azam. "Design Trade off and Performance Analysis of Router Architectures in Network-on-Chip." Procedia Computer Science 56 (2015): 421-426.