# Delay Study of Virtex-2, Virtex-4 and Spartan-3E Based Truncated Multipliers

## Mohammed H. Al Mijalli

King Saud University, College of Applied Medical Sciences, Biomedical Technology Department, Riyadh 11433, Saudi Arabia

#### Summary

Medical imaging technology requires real time efficient algorithms. The aim of this paper is to present the Field Programmable Gate Array (FPGA) based truncated multipliers delay study; implemented on Spartan-3E, Virtex-2 and Virtex-4 FPGAs using Very high speed integrated circuit Hardware Description Language (VHDL). The delay study was analyzed using analysis of variance (ANOVA) method using the software Statistical Package for Social Science (SPSS). The one way ANOVA method followed by post hoc Tukey's test using the software SPSS with a .05 significance level was used to compare the FPGA devices. Multiple comparison tests revealed that the differences between the FPGA devices are significant with a 95% confidence level. In all three FPGA devices as the size of truncated multipliers increases their mean latency value is also increases.

#### Key words:

Digital Signal Processing (DSP), Field Programmable Gate Array (FPGA), Spartan-3E, Truncated Multiplier, Virtex-2, Virtex-4, VHDL.

#### **1. Introduction**

The accomplishment of efficient realization of multiplication using Field Programmable Gate Arrays (FPGAs) can ascribe digital signal processing (DSP) blocks including small embedded multipliers to specific application in terms of speed, power dissipation and area.

A full width digital  $n \times n$  multiplier computes the 2n output as a weighted sum of partial products [1]. The contribution of the least-significant columns of the product matrix to the final result becomes negligible if the product is truncated to n-bit. The truncated multipliers do not form all of the least-significant columns in the partial-product matrix. The area and power consumption of the arithmetic unit are significantly reduced as more columns are eliminated which also decreases the delay.

The only disadvantage is that using truncated multipliers also introduces errors into the computation due to the unformed columns of partial product bits [2-14].

This also reduces the area and power consumption of the multiplier [15].

A study by Al Mijalli [16] on latency has shown that truncated multipliers based on Spartan-3AN FPGA clearly indicate as the size of truncated multiplier increases the mean delay time also increases.

The purpose of this study is to present the statistical evaluation effect of the truncated multipliers delays in Spartan-3E, Virtex-2 and Virtex-4 FPGA devices using one way ANOVA (analysis of variance) and post hoc Tukey's test using the SPSS (statistical Package for Social Science) software.

## 2. Architecture platform

FPGAs work as an ideal platform for the realization of highly computational and extremely parallel architecture. Here brief introduction about Spartan-3, Virtex-2 and Virtex-4 FPGAs from Xilinx are presented.

#### 2.1. Spartan-3 FPGAs

The Spartan-3 FPGA [17] family is specifically designed to meet the needs of high volume, low unit cost electronic systems. The sparatn-3 consists of eight member offering densities ranging from 50,000 to five million system gates. The Spartan-3 consists of five fundamental programmable functional elements: CLBs, IOBs, Block RAMs, dedicated multipliers (18×18) and digital clock managers (DCMs), Spartan-3 FPGA family includes Spartan-3L, Spartan-3E, Spartan-3A, Spartan-3A DSP, Spartan-3AN and the extended Spartan-3A FPGAs. Particularly, the Spartan-3E is used as a target technology in this paper.

## 2.2. Virtex-2 FPGAs

The Virtex-2 FPGA family [18] is a platform developed for high performance from low-density to high-density designs that are based on IP cores and customized modules. Combining a wide variety of flexible features and a large range of densities up to 10 million system gates, the Virtex-2 family enhances programmable logic design capabilities. The Virtex-2 family comprises 11 members, ranging from 40K to 8M system gates.

The Virtex-2 FPGA family consists of four major elements such as Configurable Logic Blocks (CLBs), Block SelectRAM, 18×18-bit dedicated multipliers and Digital

Manuscript received July 5, 2011

Manuscript revised July 20, 2011

Clock Manager (DCM). The Virtex-2 architecture is optimized for high speed with low power consumption. **2.3. Virtex-4 FPGAs** 

Viretx-4 FPGA [19] consists of three platform families i.e., LX, SX and FX. Virtex-4 hard-IP core blocks include the IBM power PC (PPC) 405 32–bit reduced instruction set computer (RISC) processor; tri-mode Ethernet media access controls (MACs) 622 Mbps to 6.5 Gbps serial transceivers, dedicated DSP slices and high speed clock management circuitry. Virtex-4 devices consumes approximately 50% the power of respective Virtex-2 Pro devices due to static and dynamic power reduction enabled by triple-oxide technology and reduced core voltage and capacitance respectively. The Virtex-4 FPGA family comprises of CLBs, Block RAMs, Xtreme DSP Slices and DCMs.

### **3. FPGA design and implementation results**

The design of truncated  $4\times4$ ,  $6\times6$ ,  $8\times8$ , and  $12\times12$ -bit multipliers are done using VHDL and implemented in a Xilinx Spartan-3E XC3S100E (package: vq100, speed grade: -5), Virtex-2 XC2V40 (package: fg256, speed grade: -6) and Virtex-4 XC4VLX40 (package: ff668, speed grade: -12) FPGAs using the Xilinx ISE 9.2i design tool [20].

A one-way ANOVA is applied to find out the effect of different multipliers on the mean delay time for Spartan-3E, Virtex-2 and Virtex-4 FPGAs.

Table 1 summarizes the statistics of latency in truncated multipliers for Spartan-3E, Virtex-2 and Virtex-4 FPGAs. The mean value for  $4\times4$ ,  $6\times6$ ,  $8\times8$  and  $12\times12$  multipliers in Virtex-2 and Virtex-4 are almost similar, as compared to Spartan-3E FPGA.

In all three FPGA devices as the size of truncated multipliers increases their mean latency value are also increases, similar result is obtained in [16] that as the size of multiplier increases the mean delay time also increases.

Figure 1 shows the average value of mean delay time for  $4 \times 4$ ,  $6 \times 6$ ,  $8 \times 8$  and  $12 \times 12$  truncated multipliers for three devices. The average value of mean delay time for Virtex-2 is much lower in value than Spartan-3E and Virtex-4 FPGAs. Table 2 shows the ANOVA results on Spartan-3E, Virtex-2, and Virtex-4 FPGAs. ANOVA is applied to compare the mean delay time for three devices including Spartan-3E, Virtex-2 and Virtex-4 FPGAs; using four multipliers  $4\times4$ ,  $6\times6$ ,  $8\times8$ , and  $12\times12$ . There is a statistically significant difference at the 0.05 level in delay time for the three devices [F (2, 57) = 7.302, p = .002] compared by using ANOVA and post-hoc Tukey HSD multiple comparison tests at the 0.05 significance level. The test indicates that the mean of delay time for Spartan-3E (Mean = 13.2, Standard Deviation = 3.5) is significantly different from the other two devices; Virtex-4

(Mean = 10.76, Standard Deviation = 1.78) and Virtex-2 (Mean = 10.29, Standard Deviation = 2.26). However, there is no statistically significant difference in mean delay times of the devices Virtex-2 and Virtex-4 FPGAs.

Table 1: Statistics of latency in truncated multipliers for FPGAs

| FPGA<br>Devices | Bit Width | Mean<br>(ns) | Mean<br>(ns) Std. Deviation<br>(ns) |        |
|-----------------|-----------|--------------|-------------------------------------|--------|
| Spartan-3E      | 4×4       | 8.84         | 0.1140                              | 0.0510 |
|                 | 6 ×6      | 11.76        | 1.0877                              | 0.4864 |
|                 | 8×8       | 14.08        | 0.5541                              | 0.2478 |
|                 | 12×12     | 18.12        | 0.4764                              | 0.2131 |
| Virtex-2        | 4×4       | 8.12         | 0.4764                              | 0.2131 |
|                 | 6 ×6      | 8.92         | 0.3899                              | 0.1744 |
|                 | 8×8       | 10.04        | 1.4029                              | 0.6274 |
|                 | 12×12     | 13.42        | 0.4550                              | 0.2035 |
| Virtex-4        | 4×4       | 8.62         | 0.4604                              | 0.2059 |
|                 | 6 ×6      | 10.36        | 0.4393                              | 0.1965 |
|                 | 8×8       | 10.76        | 0.5899                              | 0.2638 |
|                 | 12×12     | 13.3         | 0.5568                              | 0.2490 |

## 4. Conclusion

In this paper we have presented a comparative study of latency in Sparatn-3E, Virtex-2 and Virtex-4 FPGA devices using Statistical Analysis. The design of truncated  $4\times4$ ,  $6\times6$ ,  $8\times8$ , and  $12\times12$ -bit multipliers are done using VHDL and implemented in Xilinx FPGAs devices.

The one way ANOVA method followed by post hoc Tukey's test using the software SPSS with a .05 significance level was used to compare the FPGA devices. Multiple comparison tests revealed that the differences between the FPGA devices are significant with a 95% confidence level. In all three FPGA devices as the size of truncated multipliers increases their mean latency value is also increases.

#### Acknowledgement

The author acknowledges the assistance and the financial support provided by the Cornea Research Chair, College of Applied Medical Sciences, King Saud University.



Fig. 1: The average of mean delay time for 4×4, 6×6, 8×8 and 12×12 truncated multipliers for Spartan-3E, Virtex-2 and Virtex-4 FPGA devices

| (I)<br>Devices | (J)<br>Devices | Mean Difference<br>(I-J) | Std. Error | Sig. | 95% Confidence Interval |             |
|----------------|----------------|--------------------------|------------|------|-------------------------|-------------|
|                |                |                          |            |      | Lower Bound             | Upper Bound |
| Spartan-3E     | Virtex-2       | 2.90500*                 | .83216     | .003 | .9025                   | 4.9075      |
|                | Virtex-4       | 2.44000*                 | .83216     | .013 | .4375                   | 4.4425      |
| Virtex-2       | Spartan-3E     | -2.90500*                | .83216     | .003 | -4.9075                 | 9025        |
|                | Virtex-4       | 46500                    | .83216     | .842 | -2.4675                 | 1.5375      |
| Virtex-4       | Spartan-3E     | -2.44000*                | .83216     | .013 | -4.4425                 | 4375        |
|                | Virtex-2       | .46500                   | .83216     | .842 | -1.5375                 | 2.4675      |

Table 2: The results of multiple comparisons of delay time (ns) for four multipliers using the Tukey's HSD post-hoc test

\* The mean difference is significant at the 0.05 level.

# References

- C.R. Baugh and B.A. Wooley, "A Two's Complement Parallel Array Multiplication Algorithm", IEEE Transactions on Computers, 1973, Vol. C-22, No. 12, pp. 1045-1047.
- [2] J.M. Jou, S.R. Kuang and R.D. Chen, "Design of Low-Error Fixed-Width Multipliers for DSP Applications", IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 1999, Vol. 46, No. 6, pp. 836-842.
- [3] S.S. Kidambi, F.El-. Guibaly, and A. Antonious, "Area-Efficient Multipliers for Digital Signal Processing Applications", IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 1996, Vol. 43, No. 2, pp. 90-95.
- [4] E. J. King and E. E. Swartzlander, Jr., "Data-Dependent Truncation Scheme for Parallel Multipliers", In Proc. 31<sup>st</sup>

Asilomar Conference on Signals, Systems, and Computers, 1997. Vol. 2, pp. 1178–1182.

- [5] S.R. Kuang and J.P. Wang, "Low-error configurable truncated multipliers for multiply-accumulate applications", Electronics Letters, 2006, Vol. 42, No. 16, pp. 904-905.
- [6] Y.C. Lim, "Single-Precision Multiplier with Reduced Circuit Complexity for Signal Processing Applications", IEEE Transactions on Computers, 1992, Vol. 41, No. 10, pp. 1333-1336.
- [7] R. Michard, A. Tisserand, and N.V-.Charvillon, "Carry Prediction and Selection for Truncated Multiplication", In Proc. Workshop on Signal Processing Systems, 2006, pp. 339-344.
- [8] M. H. Rais, "FPGA design and implementation of fixed width standard and truncated 6×6-bit multipliers: A comparative study", in Proc. of the 4<sup>th</sup> IEEE International Design and Test Workshop, IEEE Xplore Press, 2009, pp. 1-4.
- [9] M.H. Rais, "Efficient hardware realization of truncated multipliers using FPGA", International Journal of

Engineering and Applied Sciences, 2009, Vol. 5, No. 2, pp. 124-128.

- [10] M.H. Rais, "Hardware design and implementation of fixed width standard and truncated 4×4, 6×6, 8×8 and 12×12 bit multipliers using FPGA", In Proc. AIP conference, 2010, Vol. 1239, No. 1, pp. 192-196.
- [11] M.H. Rais, "Hardware implementation of truncated multipliers using Spartan-3AN, Virtex-4 and Virtex-5 devices", American Journal of Engineering and Applied Sciences, 2010, Vol. 3, No. 1, pp. 201-206.
- [12] M.H. Rais, B. M. Al-Harthi, S. I. Al-Askar, and F. K. Al-Hussein, "Design and Field Programmable Gate Array Implementation of Basic Building Blocks for Power-Efficient Baugh-Wooley Multipliers", American Journal of Engineering and Applied Sciences, 2010, Vol. 3, No. 2, pp. 307-311.
- [13] M.H. Rais and M.H. Al Mijalli, "FPGA based fixed width 4×4, 6×6, 8×8, and 12×12 bit multipliers using Spartan-3AN", International Journal of Computer Science and Network Security, 2011, Vol.11, No. 2, pp. 61-68.
- [14] M.H. Rais and M.H. Al Mijalli, "Field programmable gate arrays based realization of truncated multipliers", American Journal of Applied Sciences, 2011, Vol.8, No. 7, pp. 681-684.
- [15] V. Garofalo, N. Petra, D. DeCaro, A.G.M. Strollo, and E. Napoli, "Low error truncated multipliers for DSP applications", in Proc. of the 15<sup>th</sup> IEEE International Conference on Electronics, Circuits and Systems, 2008, pp. 29-32.
- [16] M.H. Al Mijalli, "Spartan-3AN field programmable gate arrays truncated multipliers delay study", American Journal of Applied Sciences, 2011, Vol. 8, No. 6, pp. 554-557.
- [17] Xilinx, Spartan-3 FPGA family datasheet, 2008.
- [18] Xilinx, Virtex-2 FPGA family overview, 2007.
- [19] Xilinx, Virtex-4 FPGA family overview, 2007.
- [20] Xilinx, ISE 9.2i design tool, 2007.

**Mohammed H. Al Mijalli** received the Ph.D. degree in Bioengineering, Strathclyde University, Glasgow UK. He is an Associate Professor at Department of Biomedical Technology, King Saud University, Riyadh, Saudi Arabia. His major interest includes Biomedical Instrumentation, FPGA and Medical Imaging.