# 3-D SPIDERGON: 3-D TOPOLOGY OF DELAY OPTIMIZATION FOR NETWORKS-ON-CHIP

Zhou Lei<sup>1,2</sup>, Wu Ning<sup>1</sup>, Ge Fen<sup>1</sup>

(1. College of Electric and Information Engineering, NUAA, 29 Yudao Street, Nanjing, 210016, P. R. China;2. College of Information Engineering, Yangzhou University, Yangzhou, 225009, P. R. China)

Abstract: A 3-D topology architecture based on Spidergon and its generation method are proposed. Aiming at establishing relationships between the topology architecture and the latency, the 3-D topology latency model based on prototype is proposed, and then the optimization topology structure with minimum latency is determined based on it. Meanwhile, in accordance with the structure, the adaptive routing algorithm is designed. The algorithm sets longitudinal direction priority to adaptively searching the equivalent minimum path between the source nodes and the destination nodes in order to increase network throughput. Simulation shows that in case of approximate saturation network, compared with the same scale 3-D mesh structure, 3-D Spidergon has 17% less latency and 16.7% more network throughput.

Key words: network-on-chip(NoC); topology; Spidergon; routing algorithmCLC number: TP302Document code:AArticle ID:1005-1120(2011)04-0372-07

### INTRODUCTION

The rapid scaling technology driving integration into the deep sub-micron is facing critical challenges such as floorplan, heat dissipation etc, especially interconnect<sup>[1]</sup>. As a promising solution to the complex communication problems, the network-on-chip (NoC) architecture is proposed and studied widely in recent years. In fact, tremendous examples have been made on the design of 2-D NoC architecture, both on regular applications and application-specified architecture for custom<sup>[2-4]</sup>. However, the 3-D integration technology increasingly offers unique opportunities for onchip interconnect design.

3-D stacking technology stacks multiple silicon layers on top of each other, and connects them through vertical wires<sup>[5]</sup>. The 3-D architecture has a number of advantages comparing with the traditional 2-D design<sup>[6]</sup>: (1)Shorter global interconnects; (2) Lower wire delay; (3) Lower interconnect power consumption due to wirelength reduction; (4) Higher packing density and smaller footprint; (5)Support for the implementation of hybrid topology chips. In this context, several 3-D designs have been appeared recently. Addo-Quaye<sup>[7]</sup> presented an algorithm for the thermal-aware mapping and placement of 3-D NoC including regular mesh topologies. Li et al<sup>[8]</sup> proposed a similar 3-D NoC topology using a buss structure for communicating among processing elements (PEs) located on different physical planes. However, the recent study of 3-D topology focus on the 3-D mesh like topology, which limits the choice of 3-D floor planning of cores, traffic requirements, accurate power, and delay models for 3-D wiring.

This paper proposes a 3-D topology based on Spidergon which called 3-D Spidergon. The Spidergon topology is used in each horizon layer, and

Received date: 2010-12-29; revision received date: 2011-02-25

Foundation items: Supported by the National Nature Science Foundation of China(61076019); the Aviation Science Foundation(20115552031); the Science and Technology Support Program of Jiangsu Province(BE2010003).

E-mail:tomcat800607@126.com

the routers located on adjacent physical layers are connected with vertical link. And, the relevancy of the number of layers and the average hops in the 3-D Spidergon NoC are analyzed, and a topology-reconstruction algorithm is presented to minimize the delay of topology. Furthermore, a routing algorithm applied to this topology is presented. A key idea in the routing algorithm is routing messages in vertical directions firstly, and when messages reach the destination layer, choosing the output port adaptively.

#### **1 3-D SPIDERGON LAYOUT**

Topology specifies the structure in which routers connect IP cores together. In 2-D NoC, the study of topology focuses on mesh, torus, tree, and butterfly recently. Spidergon scheme is similar to the ring structure shown in Fig. 1(a), where each router is connected to the routers in three directions: clockwise, counter-clockwise and opposite. Spidergon presents many valuable characteristics as shown in below: First, the vertex symmetry of structure provides convenience in design, so we can use the homogeneous routers in network and place PE in any location of the network without considering the influence of performance affected by position; Second, the regularity makes it adapt to various scales of interconnect and reduce the complexity of floorplan; Third, low node degree and good network diameter make it possible that reducing the delay by simplifying the structure and routing rules of router.

However, the objective of Spidergon topology addresses the demand of limited scope of nodes in NoC implementation for the network diameter is linear increasing according to the number of nodes. To overcome this shortcoming, the whole network is divided into several same sub networks and stacked on the top of each other. Alternatively, the routers should add two ports to connect the additional neighboring routers located on the adjacent layers as shown in Fig. 1(b). It's obvious that delays can be substantially reduced by shortening the network diameter. For example, assuming the nodes in a network is 64, we can calculate that the network diameter is 16 if using 2-D Spidergon. While the network is divided into four layers and stacked as we noted earlier, the network diameter is reduced to 7. Furthermore, the wires connecting to nodes in opposite position are typically prolonged with the scale of network, and the wire delay also can be reduced by shrinking the scale of network in each layer. The relationship between number of layers and zero-load latency is discussed under the condition of certain nodes and the method for optimizing structure is presented.



Fig. 1 Spidergon and 3-D Spidergon topology

The zero-load network latency is widely used as a performance metric in traditional interconnection networks. The zero-load latency of a network is the latency where only one packet traverses the network. The zero-load latency of NoC with wormhole switching is<sup>[9]</sup>

$$T_{\text{delay}} = \text{hops} \cdot t_{\text{r}} + t_{\text{c}} + L_{\text{p}}/b$$
 (1)

where the first term represents the routing delay, hops is the average number of routers which packet traverses,  $t_r$  the router delay;  $t_c$  is the propagation delay along the wires of the communication channel; the third term is the serialization delay of packet. In particular, the delay of the communication channel  $t_c$  can described as

 $t_{\rm c} = t_{\rm v} \cdot {\rm hops}_{3-{\rm D}} + t_{\rm h} \cdot {\rm hops}_{2-{\rm D}}$  (2) where  $t_{\rm v}$  and  $t_{\rm h}$  are delays of vertical and horizontal channels respectively. The parameter hops can be determined by the structure of network primarily.

Assuming the node number of network is N,

the number of planes in vertical direction is n, and the number of nodes in horizon direction is m. Note that N is not always equal to m \* n, for Spidergon is symmetrical in structure, we must guarantee m is even. If the conditions are not met we should add redundant nodes to ensure the number of nodes in each layer is even. The average hops in 3-D Spidergon are

hops<sub>3-D</sub> = 
$$\frac{\sum_{i=1}^{m * n} \sum_{j=1}^{m * n} d_{ij}}{m * n}$$
 (3)

where  $d_{ij}$  expresses the distance between node iand j. Actually, hops between two nodes are composed of two parts: The hops in horizontal layer and in vertical layer. The average hops in 3-D Spidergon can be described as follows

hops = 
$$\frac{3(2(p+1)^2 - 1)n + (n^2 - 1)m}{3(mn-1)}$$
  
 $m = 4p + 2$  (4a)  
hops =  $\frac{3(2p^2 + 2p - 1)n + (n^2 - 1)m}{3(mn-1)}$   
 $m = 4p$  (4b)

where p represents the positive integer. If the nodes in horizontal layer are integral multiple of 4, the average hops are calculated by Eq. (4a), otherwise by Eq. (4b). From Eqs. (4a,4b), it is obviously that the number of planes n is the only argument of function to average hops of network for m can be expressed by n if N is constant. In order to minimize the average hops, it is necessary to find the best n to fit this requirement. It is a typical NP problem. As the search space is limited, we use full scale search algorithm to find out the best n. The minimal Spidergon topology is composed of four nodes, so we define the bounds of searching from 1 to  $\lfloor N/4 \rfloor$ , where  $\lfloor \cdot \rfloor$ expresses round-off the numbers. Firstly, the algorithm calculates the node number m in per horizontal layer according to n; Then it achieves the result of average hops by Eq. (4), which executes with constraints repeatedly until all the possible nis traversed; Lastly it chooses the n, which corresponds to the minimal average hops, as the final result. The time complexity of algorithm is O(n). Table 1 shows the optimal results of 3-D Spidergon when node number is 64, 72, 128, and 256. All experimental results are obtained in accepted time (<5 min) on a 3 GHz Intel P4 processor machine with 512 MB memory.

Table 1 Optimal results of 3-D Spidergon

| Node   | Louron | Node number | Average hops |  |
|--------|--------|-------------|--------------|--|
| number | Layer  | per layer   |              |  |
| 64     | 4      | 16          | 3.746        |  |
| 72     | 6      | 12          | 3.915        |  |
| 128    | 8      | 16          | 5.102        |  |
| 256    | 10     | 26          | 7.057        |  |

### 2 ADAPTIVE ROUTING ALGO-RITHM

In this section the adaptive routing algorithm in 3-D Spidergon is presented. Besides of the topology structure, NoC network latency is also affected by its routing algorithms. Compared with deterministic routing, the adaptive routing is more complex in implementation, but it has advantages in decreasing delay and relieving local congestion. Our objective is to design adaptive routing algorithm, which is based on the idea of aFirst and aLast algorithms<sup>[10]</sup> applied to Spidergon.

Both aFirst and aLast algorithms are minimal source routing whose behaviors are depicted in Fig. 2. In the aFirst algorithm the across channel can be used only as a first step. If the destination node is on the source half of the ring composing the Spidergon, the packet is forwarded along the clockwise (CW) or counter-clockwise (CCW) direction according to the minimal path restriction. If the packet destination is on the opposite side of the ring, the packet is first forwarded along the across channel and from there, along the ring towards its final destination. In contrast, a packet can traverse an across link only as last step in the aLast algorithm.

aFirst and aLast algorithms belong to deterministic routing, and the effects of them are equivalent. But neither of them considers the effect of the local congestion. In order to solve the problem, the routing algorithm applied to 3-D

OP)



Fig. 2 aFirst and aLast routing algorithms

Spidergon chooses aFirst or aLast routing adaptively according to the link utilization in horizontal layer. Similar to the routers in Spidergon, a router in 3-D Spidergon has three channels in CW, CCW and opposite direction (OP) respectively in horizontal layer. In addition, there are two more channels connected to neighbors in vertical direction. In routing algorithm, the packet is firstly transferred to destination layer in vertical direction determinedly and then routes in horizontal layer adaptively.

The proposed routing algorithm is composed of the following steps. Firstly the nodes in network are sequenced by coordinates (i, j) where *i* means the sequence in horizon layer and *j* the sequence in the third dimension. Note that the nodes in horizon layer are sequenced in CW. Then router  $R_{ij}$  transfers packets according to the distance between destination and itself. If the distance in vertical direction is not equal to zero, the packets will be sent to ports in vertical direction simply, otherwise the port is selected by minimal path policy in horizon layer. The distance between them can be expressed as dist =  $((i_{dest} - i_{src}) + m) \mod m$  (5)

In fact, dist presents the horizon distance between destination node and  $R_{ij}$  in CW. In brief, dist is classified into five sections to discuss the choice of output as below:

dist = (0, m/4], output = CW dist = (m/4, m/2), output = select (CCW, OP) dist = m/2, output = OP dist = (m/2, 3m/4), output = select (CW,

dist = [3m/4, m), output = CCW

There is only one shortest path in routing path when dist is less than m/4, more than 3m/4or equal to m/2, so the algorithm chooses the fixed port as output. Otherwise, there are two equivalent shortest paths aFirst and aLast. Then the algorithm picks output by select (•) function. The function checks the buffer levels of the ports, and marks the lower as output port. If the buffer levels of the ports are equivalent, the function picks the output port randomly.

An available adaptive routing algorithm should ensure freedom from deadlock, here we use channel dependency graph(CDG) to verify it. In CDG, nodes represent channels of the network and edges the possible depencies generated by packets traversing nodes. The algorithm routes determinedly when dist is less than m/4, more than 3m/4 or equal to m/2, otherwise there' re two possible paths. If the across channel is idle, it can be used as the first step so that in CDG they have no incoming edges. Instead the packets will be transferred along the ring channels until dist equals m/2, and they must use across channel as last step, hence their relative nodes in CDG have no outgoing arrows. In fact, these two situations are equivalent to aFirst and aLast which do not generate any dangerous dependency. As the links on the ring do not generate any cycle of dependencies, it can be proved that the routing algorithm satisfies the condition of deadlock avoidance. The pseudo-code of routing algorithm is shown in Fig. 3.

#### **3 EXPERIMENTAL RESULTS**

In this section, the performance of 3-D Spi-

| Algorithm for 3-D Spidergon router:                     |  |  |  |  |  |  |
|---------------------------------------------------------|--|--|--|--|--|--|
| Inputs:Coordinates of current node (Icurr,Jcurr)        |  |  |  |  |  |  |
| and destination node (Xdest,Ydest),                     |  |  |  |  |  |  |
| Output: Selected output channel                         |  |  |  |  |  |  |
| Procedure:                                              |  |  |  |  |  |  |
| Ioffset:= $((\text{Idest} - \text{Icurr}) + m) \mod m;$ |  |  |  |  |  |  |
| Joffset:= Jdest -Jcurr;                                 |  |  |  |  |  |  |
| if (Joffset < 0) then Channel:= $Y$ -;                  |  |  |  |  |  |  |
| else if (Joffset > 0) then Channel:= $Y + ;$            |  |  |  |  |  |  |
| else if (Joffset $== 0$ ) then {                        |  |  |  |  |  |  |
| if (Ioffset >0 and Ioffset $\leq m/4$ )                 |  |  |  |  |  |  |
| then Channel:= CW;                                      |  |  |  |  |  |  |
| else if (Ioffset $> m/4$ and Ioffset $< m/2$ )          |  |  |  |  |  |  |
| then Channel:=select(CCW,OP);                           |  |  |  |  |  |  |
| else if (Ioffset $>/2$ and Ioffset $< 3m/4$ )           |  |  |  |  |  |  |
| then Channel:=select(CW,OP);                            |  |  |  |  |  |  |
| else if(Ioffset $> 3m/4$ and Ioffset $< m$ )            |  |  |  |  |  |  |
| then Channel:=CCW;                                      |  |  |  |  |  |  |
| else if(Ioffset = $= m/2$ )                             |  |  |  |  |  |  |
| then Channel:=OP;                                       |  |  |  |  |  |  |
| else if(Ioffset = $= 0$ )                               |  |  |  |  |  |  |
| then Channel:=Internal:}                                |  |  |  |  |  |  |

Fig. 3 Pseudo-code of routing algorithm

dergon is evaluated by Noxim<sup>[11]</sup>, a flit-accurate simulator developed in SystemC. The 3-D Spidergon, 3-D mesh and Spidergon topology with 64 nodes are constructed respectively. A packet size is randomly distributed between two and eight flits, so router uses wormhole switching and is allocated eight flits of input buffer. The maximum bandwidth of each link is set to one flit per cycle.

Experimental results of the topology in comparison with results using different nodes and injection rates are shown in Table 2. For each parameter, the average delay, throughput and total energy are reported. Experiments show that the system average number of hops has a similar proportional relationship to both the delay time and system power consumption. If the average number of hops increases, the delay time and the power consumption can increase accordingly. Especially in the case of the same network size, the different hierarchical classification will bring delay and power consumption changes. Apparently due to the decrease of the average number of hops, the packet transmission delay is cut down and the routing unit number of data path is reduced. Therefore the power consumption of the routing unit will be reduced, which is consistent with theoretical analysis of relationship between average number of hops and system delay.

In the experiments, the average delay results are used as the index to measure the performances of topologies. The average delay is related to the injection rate and the traffic pattern. The performances of network under different message generation rates are evaluated and reported in Table 2.

| Bench | Com  | Layer | Injection | Average | Average  | Average    | Max   | Energy - / I |
|-------|------|-------|-----------|---------|----------|------------|-------|--------------|
|       | Core |       | rate      | hops    | delay    | throughput | delay | Energy/µJ    |
| B1    | 36   | 3     | 0.1       | 2.886   | 10.477 7 | 0.119 0    | 59    | 382.390      |
| B2    | 36   | 3     | 0.2       | 2.886   | 14.665 1 | 0.208 2    | 137   | 704.959      |
| B3    | 64   | 4     | 0.1       | 3.746   | 13.154 9 | 0.121 2    | 73    | 836.393      |
| B4    | 64   | 4     | 0.2       | 3.746   | 19.695 9 | 0.209 9    | 152   | 1 557.370    |
| B5    | 80   | 5     | 0.1       | 4.088   | 14.307 3 | 0.122 3    | 78    | 1 129.210    |
| B6    | 80   | 5     | 0.2       | 4.088   | 22.574 0 | 0.2107     | 157   | 2 104.000    |

Table 2 3-D Spidergon experimental results

The average delay and throughput results obtained by 3-D Spidergon, 3-D mesh and Spidergon with different injection rates under homogeneous source and destination distribution are shown in Fig. 4. All nodes behave like sources and can be addressed as destination for packets with uniform probability distribution. The results show that the bigger the injection rate is, the worse the performance of the average latency is in same topology. When the injection rate of Spidergon exceeds 0.15, the throughput stays steady and the average delay increases drastically. It illuminates that destination node saturation is obtained. Similarly, when the injection rate exceeds 0.3, 3-D mesh also obtains saturation. But the throughput of 3-D Spidergon is 19% larger than that of 3-D mesh with this injection rate, and 3-D Spidergon still cannot obtain saturation for the throughput



Fig. 4 Comparisons of average delay and throughput under homogeneous distribution

is still increasing slightly.

Fig. 5 shows the throughput and average delay results obtained by three topologies with different hot-spot nodes under hot-spot destination. When hot-spot destination is present in the system, one single destination node receives most packets from other nodes. Assume that the probability of single hot-spot destination node chosen by source nodes is 30% in the experiments. The results show that 3-D Spidergon outperforms 3-D mesh and Spidergon, and scales better when injection rate is high. Under the condition, the average delays of hot-spot node in 3-D Spidergon and 3-D mesh are obviously smaller than that of Spidergon. Both delays of 3-D topologies are close, but the delay of 3-D Spidergon is still 17%smaller than that of 3-D mesh. In particular, the latency of 3-D mesh and 3-D Spdiergon may be different for the location of the destination node changes, but the performance of 3-D Spdiergon is still better than other two topologies.



Fig. 5 Comparisons of average delay and throughput under hot-spot distribution

#### **4** CONCLUSION

In this paper, a 3-D Spidergon based on Spidergon topology is proposed. To minimize the average hops of network, the number of layer is optimized. An adaptive routing algorithm applied to this topology is designed, and the router selects the output ports according to the location between source and destination node pairs. The average delay of 3-D Spidergon compared with other topologies under uniform random distribution and hot-spot distribution is analyzed in detail. Experimental results show that 3-D Spidergon can provide better performance than 3-D mesh and Spidergon in the same size under different traffic patterns.

#### References :

 Benini L, De Micheli G. Networks on chip: A new SoC paradigm[J]. IEEE Comput, 2002, 31(1): 70-78.

- [2] Taylor M B, Kim J, Miller J, et al. The RAW microprocessor: A computational fabric for software circuits and general-purpose programs[J]. IEEE Micro,2002, 22(6): 25-35.
- [3] Sankaralingam K, Nagarajan R, Liu Haiming, et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture [C]//ISCA. [S. l.]: IEEE,2003,23(6):46-57.
- [4] Hu J, Marculescu R. Energy-aware mapping for tile-based NoC architectures under performance constraints [C]//ASP-DAC. Kitakyushu, Japan: [s.n.],2003:233-239.
- [5] Gutmann R J, Liu J Q, Kwon Y, et al. Three-dimensional (3-D) ICs: A technology platform for integrated systems and opportunities for new polymeric adhesives[C]//Proc Conf Polymers Adhesives Microelectron Photon. Potsdam, Germany: IEEE, 2001: 173-180.
- [6] Ababei C, Feng Y, Goplen B, et al. Placement and routing in 3-D integrated circuits[J]. IEEE Design &. Test, 2005, 22: 520-531.

- [7] Addo-Quaye C. Thermal-aware mapping and placement for 3-D NoC designs[C]//Proc IEEE Int Syston-Chip Conf. Herdon, VA, USA:[s.n.], 2005:25-28.
- [8] Li F, Nicopoulos C, Richardson T, et al. Design and management of 3-D chip multiprocessors using network-in-memory[C]//Proc IEEE Int Symp Comput Arch. Washington D C, USA: IEEE, 2006: 130-142.
- [9] Dally W J, Towles B. Principles and practices of interconnection networks[M]. San Mateo, CA: Morgan Kaufmann, 2004.
- [10] Concer N, Iamundo S, Bononi L. aEqualized: A novel routing algorithm for the spidergon network on chip design[C]//Automation & Test in Europe Conference & Exhibition. Nice, France: [s. n. ], 2009: 749-754.
- [11] Palesi M, Patti D, Fazzino F. Noxim: Network-onchip simulator [EB/OL]. http://sourceforge.net/ projectsnoxim, 2008/2010-10.

## 3-D Spidergon:一种延时优化的通用三维片上网络拓扑生成方法

周磊<sup>1,2</sup> 吴宁<sup>1</sup> 葛芬<sup>1</sup>

(1. 南京航空航天大学电子信息工程学院,南京,210016,中国;2. 扬州大学信息工程学院,扬州,225009,中国)

摘要:提出一种基于Spidergon的通用三维拓扑结构及其拓 扑生成方法。该方法在三维拓扑结构原型基础上,通过该 拓扑的延时模型建立拓扑结构和延时时间的关系,并以此 确定最小化延时时间条件下的拓扑结构。同时设计了针对 该结构的自适应路由算法。该算法以纵向路由为优先方 向,通过自适应寻找源节点和目的节点的等效最短路径提 高网络吞吐量。仿真结果表明,同等规模的 3-D Spidergon 与 3-D mesh 结构相比,在网络近似饱和的情况下,该拓扑 的延时时间比 3-D mesh 低 17%,吞吐量高 16.7%。

关键词:片上网络; 拓扑; Spidergon; 路由算法 中图分类号:TP302

(Executive editor: Zhang Huangqun)