#### Minimal-Power, Delay-Balanced Smart Repeaters for interconnects in the Nanometer Regime

<u>Roshan Weerasekera</u>, Dinesh Pamunuwa\*, Li-Rong Zheng, Hannu Tenhunen *roshan@imit.kth.se* 



Dept. Electronic, Computer and Software systems KTH School of Information and Communication Technology 164 40 Kista, Sweden

\*Centre for Microsystem Engineering, Lancaster University, UK

## **Challenges in global Communication...**

- Shrinking feature size
- Increasing die sizes,
- Scaling of supply voltage
- Increasing interconnect density
- Fast clock rates

#### **Global wires suffer from**

- Delay Problems
- Power Problems
- Reliability Problems



Source: SIA Roadmap 1999

### **Interconnect Capacitance**

 Interconnects in deep sub-micron technologies are typically very lossy so that the RC delay dominates. To keep the resistance to a minimum the aspect ratio (height/width) of wires is kept high.



Majority of Interconnect capacitance is side-capacitance and hence Cross-talk effect becomes more important.

#### Effects of crosstalk

- 1. Couples Noise on Victim
- 2. Affects to the signal speed (crosstalk-induced delay or Dynamic Delay)

### **Effective Capacitance depends on Activity**

The effective interconnect capacitance can be written down as:

$$C_{eff} = C_s + SF \cdot C_c$$

where  $SF = \{0, 2\}$ 





2.5

Grid spacing: 1x

Grid spacing: 2x

±80%

3.5

2x

1x

## **Repeater Insertion for Delay Reduction**

Widely used method for delay reduction ...





- Adverse effect of repeater insertion: Increased Power Consumption.
- 50% of dynamic power consumption of a microprocessor is due to interconnects.
  - And in 5 years, interconnect power will reach 80% of the total
  - Global signal lines account for 34% of this power.

#### Other ways for delay reduction ...

- Alternative to Repeater Insertion
  - Trasient Sensitive Trigger, Charge Recycling Technique, Booster, TAGS reciever, Aggressor-Aware repeater
  - Use skewed inverters, consume more energy, occupy a large area
- Error Control or Trasition Coding
  - Complex CODEC circuitry causes additional delay and consumes more power
- Introducing additional delay between wires to reduce Dynamic Delay
  - Dissipates more power for transitions in the same direction
- Advantage of the repeater circuit proposed in this work:
  - Energy Saving
  - Delay Equalization
  - A clear design methodology similar to traditional repeater insertion
- with minor increase in Circuit Complexity.

# **Switching Activities for Two Coupled Nets**

 16 possible patterns can be identified and they are categorized into five different groups, given below:

| Group | Case | Switching Event on |               | Switch Factor             |                           | Energy Dissipation for wire $i (\times \frac{1}{2} V_{dd}^2)$ with |                          |  |
|-------|------|--------------------|---------------|---------------------------|---------------------------|--------------------------------------------------------------------|--------------------------|--|
|       |      | wire $i$           | wire $j$      | Delay-Based $(\lambda^d)$ | Power-Based $(\lambda^p)$ | Traditional driver                                                 | Smart driver             |  |
| 1     | 1    | ↓                  | $\rightarrow$ | 0                         | 0.25                      | $C_{w\_trad} + 0.25C_c$                                            | $C_{w\_smart} + 0.25C_c$ |  |
|       | 2    | Î                  | Ť             | 0                         | 0.25                      | $C_{w\_trad} + 0.25C_c$                                            | $C_{w\_smart} + 0.25C_c$ |  |
| 2     | 3    | 0                  | 0             | n.a.                      | n.a.                      | 0                                                                  | 0                        |  |
|       | 4    | 0                  | 1             | n.a.                      | n.a.                      | 0                                                                  | 0                        |  |
|       | 5    | 1                  | 0             | n.a.                      | n.a.                      | 0                                                                  | 0                        |  |
|       | 6    | 1                  | 1             | n.a.                      | n.a.                      | 0                                                                  | 0                        |  |
| 3     | 7    | 0                  | 1             | 1                         | 1                         | 0                                                                  | 0                        |  |
|       | 8    | 1                  | 0             | 1                         | 1                         | $C_{w\_trad} + C_c$                                                | $C_{w\_smart} + C_c$     |  |
|       | 9    | 0                  | $\rightarrow$ | 1                         | 1                         | 0                                                                  | 0                        |  |
|       | 10   | Ļ                  | 0             | 1                         | 1                         | $C_{w\_trad} + C_c$                                                | $C_{w\_smart} + C_c$     |  |
| 4     | 11   | 1                  | 1             | 1                         | 0                         | 0                                                                  | 0                        |  |
|       | 12   | 1                  | 1             | 1                         | 0                         | $C_{w\_trad}$                                                      | $C_{w\_smart}$           |  |
|       | 13   | 1                  | $\rightarrow$ | 1                         | 0                         | 0                                                                  | 0                        |  |
|       | 14   | Ļ                  | 1             | 1                         | 0                         | $C_{w\_trad}$                                                      | $C_{w\_smart}$           |  |
| 5     | 15   | 1                  | ↓             | 2                         | 1.75                      | $C_{w\_trad} + 1.75C_c$                                            | $C_{w\_trad} + 1.75C_c$  |  |
|       | 16   | Ļ                  | 1             | 2                         | 1.75                      | $C_{w\_trad} + 1.75C_c$                                            | $C_{w\_trad} + 1.75C_c$  |  |

- To ensure error-free operation, timing constraints have to be satisfied for the switching patterns 15 and 16. The worst-case switching pattern.
- But, this pattern occurs only twice out of 16 patterns.

## **Our Adaptive Smart Repeater Concept**

- With a traditional repeater, the drive strength is static and hence there is a variation of delay depending on the switching pattern.
- In our work, the drive strength is dynamically altered depending on the relative bit pattern.



- With the worst-case drive strength is large, and with the best-case drive strength is less.
- This cause to a less variation of delay!!!

## **Energy Saving with the SMART driver**

- The energy dissipation per cycle depends on whether or not switching transitions occur, and on the relative swithing pattern. A switching transfer is a probabilistic event.
- Average energy dissipation for wire i with tranditional repeater is:

$$E_{avg}^{trad}(i) = 0.5V_{DD} \left[ p_{s,s} \left( C_{w_{trad}} + 0.25C_{c} \right) + p_{e,1} C_{w_{trad}} + p_{e,0} \left( C_{w_{trad}} + C_{c} \right) + p_{o,o} \left( C_{w_{trad}} + 1.75C_{c} \right) \right]$$

Where,  $p_{x,y}$  is the probability that wires *i,j* switch as defined below: (s,s) - both wires switch in same direction; (e,0) - wire *i* switches up or down while wire *j* is quiet at 0; (e,0) wire *i* switches up or down while wire *j* is quiet at 1; (o,o) - both wires switch in different directions.

So,

$$E_{avg}^{trad}(i) = 0.5V_{DD}\left[\left(p_{s,s} + p_{e,1} + p_{e,0} + p_{o,o}\right)C_{w_{trad}} + \left(0.25p_{s,s} + p_{e,0} + 1.75p_{o,o}\right)C_{c}\right]$$

• Average energy dissipation for wire i with SMART driver is:

$$E_{avg}^{smart}(i) = 0.5V_{DD}^{2} \left[ \left( p_{s,s} + p_{e,1} + p_{e,0} \right) C_{w_smart} + p_{o,o} C_{w_trad} + \left( 0.25 p_{s,s} + p_{e,0} + 1.75 p_{o,o} \right) C_c \right]$$
  
Hence,

$$\Delta E = 0.5 V_{DD}^{2} (p_{s,s} + p_{e,1} + p_{e,0}) (C_{w_{trad}} - C_{w_{smart}})$$

## **Energy Saving with the SMART driver**

• Substituting  $C_{w_trad} = C_s + H_t (C_{g\min} + C_{d\min})$  and  $C_{w_tsmart} = C_s + H_t C_{d\min} + H_m C_{g\min}$  we get:

$$\Delta E = \frac{H_a}{2} (p_{s,s} + p_{e,0} + p_{e,1}) C_{g\min} V_{DD}^{2}$$

 If the switching events are random uniformly distributed events with no corelations between neighbouring lines

$$\Delta E_{avg} = \frac{3}{16} H_a C_{g\min} V_{DD}^{2}$$

• With the same worst-case delay as a traditional driver.

# **Design Methodology: Delay Modeling**

- Repeater Modeling: Bakoglu and Meindl (1985) used a linearized repeater model as a combination of resistor and capacitor which scale linearly with size.
- We included the drain diffusion capacitance as well.



- Switch S, is controlled by a logic circuit which determines the switching pattern.
  - For the worst-case, S is Closed.
  - For the best-case, S is open



 $\mathbf{R}_{dmin}$ 

## **Design Methodology**

• Delay Analysis with both Drivers Switching (for the worst-case Switching Pattern)



• 50% delay for the wire is therefore:

$$T_{MA} = k \left\{ 0.7R_d \left( C_d + \frac{C_w}{k} + C_g \right) + 0.7 \frac{R_w}{k} C_g + 0.4 \frac{R_w}{k} \frac{C_w}{k} \right\}$$

where  $\begin{aligned} R_d &= \left(\frac{R_{dmin}}{H_m} \parallel \frac{R_{dmin}}{H_a}\right) = \frac{R_{dmin}}{H_m + H_a}, \ C_g = C_{gmin}(H_m + H_a), \\ C_d &= C_{dmin}(H_m + H_a) \text{ and } \mathsf{C} \end{aligned}$  Here  $H_m$  and

## **Design Methodology**

So, TMA can be simplified into

$$T_{MA} = 0.7k(t_{Dout} + t_{Din}) + \frac{0.7(t_{DWs} + 2t_{DWc})}{(H_m + H_a)} + 0.7t_{WD}(H_m + H_a) + 0.4\frac{(t_{Ws} + 2t_{Wc})}{k}$$

#### where

 $t_{Dout} = R_{dmin}C_{dmin}, t_{DWs} = R_{dmin}C_s, t_{DWc} = R_{dmin}C_c, t_{Din} = R_{dmin}C_{gmin}, t_{WD} = R_wC_{gmin}, t_{Ws} = R_wC_s \text{ and } t_{Wc} = R_wC_c$ 

Similarly, when the Assistant is quiet,

$$T_{M} = 0.7k \left[ t_{Dout} \left( 1 + \frac{H_{a}}{H_{m}} \right) + t_{Din} \right] + 0.7H_{m}t_{WD} + \frac{0.7(t_{DWs} + \lambda t_{DWc})}{H_{m}} + 0.4\frac{t_{Ws} + \lambda t_{Wc}}{k}$$



20 February, 2006

## **Optimum Buffer Sizes**

◆ We obtained optimal (Hm+Ha) and k, by derivating TMA

$$H_{m\_opt} + H_{a\_opt} = \sqrt{\frac{t_{DWs} + 2t_{DWc}}{t_{WD}}} \qquad \qquad k_\_opt = \sqrt{\frac{0.4(t_{Ws} + 2t_{Wc})}{0.7(t_{Dout} + t_{Din})}}$$

• To find values of Hm and Ha seperately, TM is differentiated w.r.t Hm and equated to zero.  $\partial T_{M} = e^{-\frac{k H}{2} t_{D}} + \frac{1}{2} t_{DW} + \frac{1}{2} t_{DW}$ 

• So, when 
$$\mu = \frac{4H_t kt_{Dout}}{t_{WD}} + \left(\frac{kt_{Dout}}{t_{WD}}\right)^2$$
$$H_a = H_t + \frac{kt_{Dout}}{2t_{WD}} - \sqrt{\frac{\mu}{4} + \frac{t_{DWs} + \lambda t_{DWc}}{t_{WD}}}$$
$$H_m = \sqrt{\frac{\mu}{4} + \frac{t_{DWs} + \lambda t_{DWc}}{t_{WD}}} - \frac{kt_{Dout}}{2t_{WD}}$$

## **Delay Balancing – the key technique**

- SMART driver saves energy by reducing the capacitive load for certain switching combinations.
- The driver is essentially slower for the switching combinations that give rise to a lower capacitive load, and hence reduces Jitter.

With low Ceff

• Evaluate Ha such that  $T_{MA} - T_M = 0$ 

$$AH_{aDB}^2 + BH_{aDB} + C = 0$$

$$A = 0.7t_{WD}$$

$$B = 0.7[kt_{Dout} - t_{WD}H_t$$

$$+ \frac{t_{DWs} + 2t_{DWc}}{H_t} + \frac{(2-\lambda)t_{Wc}}{k}]$$

$$C = -(2-\lambda)\left[0.7t_{DWc} + \frac{0.4t_{WC}}{k}\right]$$

With high Ceff

| Case           | k | $H_t$ | $H_m$ | $H_a$ | $\Delta T$ | $\Delta E$ |
|----------------|---|-------|-------|-------|------------|------------|
| Strategy One   | 4 | 173   | 105   | 68    | 3.1%       | 10.9%      |
| Strategy Two   | 4 | 173   | 142   | 31    | -1.6%      | 5.0%       |
| Delay Balanced | 4 | 173   | 75    | 97    | 18.7%      | 15.6%      |
| Traditional    | 4 | 173   | N.A.  | N.A.  | _          | _          |

 Theoritically, this new driver technique will save 15% of Energy, and jitter reduction is 18%.

## **Propagation Delay for Different Patterns**



## **Propagation Delay with Delay Balancing**



## **In progress - Circuit Level Implementation**



#### Thank you !