# Virtex-5 FPGA Based Braun's Multipliers

# Muhammad H. Rais<sup>1</sup> and Mohammed H. Al Mijalli<sup>2</sup>

<sup>1</sup>King Saud University, College of Applied Medical Sciences, Cornea Research Chair, Riyadh 11433, Saudi Arabia

<sup>2</sup>King Saud University, College of Applied Medical Sciences, Biomedical Technology Department, Riyadh 11433, Saudi Arabia

#### **Summary**

Fast Fourier transform (FFT) and Finite Impulse Response (FIR) are examples of digital signal processing (DSP) applications, which require high execution speed. Parallel array multipliers implementation on field programmable gate arrays (FPGAs) can fulfill high execution speed. The Virtex-5 FPGA resource utilization is obtained for 4×4, 6×6, 8×8 and 12×12 bit Braun's multipliers. The analysis of variance (ANOVA) and post hoc Tukey's test using the Statistical Package for Social Science (SPSS) are applied to find out significant difference using delay time effect in 4×4, 6×6, 8×8 and 12×12 bit Braun's multipliers. The ANOVA and Tukey HSD multiple comparison with .05 confidence level suggests that all of the 4×4, 6×6, 8×8 and 12×12 bit Braun's multipliers implemented on Virtex-5 are significant to each other. The statistics of mean delay time with standard deviation and mean error is also presented.

#### Key words:

Braun's multipliers, Digital Signal Processing (DSP), Field Programmable Gate Array (FPGA), Virtex-5, VHDL.

## 1. Introduction

Real time imaging processes technology include Computed Tomography (CT), Magnetic Resonance (MR), ultrasound images and capsule endoscope demands high execution time, since they involve tedious multiplication. Also real time imaging demands reconfiguration and implementation at the same time. Digital signal processing (DSP) algorithm implementation demands using Application Specific Integrated Circuits (ASICs); costs for ASICs are high as well as algorithms should be verified and optimized before realization.

The contemporary field programmable gate arrays (FPGAs) have emerged as a platform for efficient hardware implementation of such complex and computation intensive algorithms. Numerous efforts of researchers have been reported on low power multiplier designs [1-11]. The purpose of this paper is to present 4×4, 6×6, 8×8 and 12×12 bit Braun's multipliers [12] and their resources utilization of Virtex-5 FPGA. The analysis of variance (ANOVA) and post hoc Tukey's test using the statistical Package for Social Science (SPSS) are applied to find out significant difference using delay time effect in 4×4, 6×6, 8×8 and 12×12 bit Braun's multipliers.

# 2. Architecture platform

A FPGA is an integrated circuit similar to ASIC to be configured by the customer or designer after manufacturing. The FPGA is configured using hardware description language (HDL). FPGAs can be used to implement any logical function that an ASIC could perform. The ability to reconfigure after shipping and low cost relative to an ASIC design makes it an ideal candidate for many applications.

FPGA contains programmable logic components called logic blocks and a hierarchy of reconfigurable interconnects that allow the blocks to be wired together. In FPGAs, the configurable logic blocks (CLBs) contain the programmable logic for the FPGA. The CLB containing RAM for creating combinational logic functions. CLBs also contain memory elements such as flip-flops for clocked storage elements and multiplexers in order to route the logic within the block and to route the logic to and from external resources. FPGAs originally began as competitors to complex programmable logic devices (CPLDs) and competed in a similar space, which of glue logic for printed circuit boards.

The inherent parallelism of the logic resources on an FPGA allows for considerable computational throughput even at a low MHz clock rates. The flexibility of the FPGA allows for even higher performance by trading off precision and range in the number format for an increased number of parallel arithmetic units. This has driven a new type of processing called reconfigurable computing, where time intensive tasks are offloaded from software to FPGAs. FPGA have the benefit of hardware speed and the flexibility of software. The three main factors that play an important role in FPGA based design are the targeted FPGA architecture, Electronic Design Automation (EDA) tools and design techniques employed at the algorithmic level using HDL. In FPGAs, the choice of the optimum multiplier involves three key factors: area, propagation delay and reconfiguration time [13]. In this section a brief introduction about Virtex-5 FPGA from Xilinx is presented.

## 2.1. Virtex-5 FPGAs

The Virtex-5 devices [14] are a programmable alternative to custom ASIC technology. The Virtex-5 LX platform also contains many hard-IP system-level blocks, including Block RAM/first in first out (FIFO), second generation 25×18 DSP slices, SelectIO technology with built-in digitally-controlled impedance, ChipSync sourcesynchronous interface blocks, enhanced management tiles with integrated DCM and phase locked loop (PLL) clock generators, and advanced configuration options. Advanced DSP48E slices are available in Virtex-5 FPGAs that helps in accelerating computation intensive DSP and image processing algorithms. These slices can operate at a maximum frequency of 550 MHz, drawing only 1.38 mW of power at 100 MHz frequency.

# 2.2. Braun's multiplier

Braun's multiplier is an  $n \times m$  bit parallel multiplier and generally known as carry save multiplier and is constructed with  $m \times (n-1)$  addres and  $m \times n$  AND gates. The Braun's multiplier has a glitching problem which is due to the ripple carry adder in the last stage of the multiplier.

### 2.2.1. Mathematical basis of Braun's multiplier:

Consider a generic m by n multiplication of two unsigned n-bit numbers  $Y = Y_{m-1} \dots Y_0$  and  $X = X_{n-1} \dots X_0$ 

$$Y = \sum_{i=0}^{m-1} Y_i 2^i$$

$$X = \sum_{i=0}^{n-1} X_i 2^i$$

The product  $P = P_{2n-1} \dots P_1 P_0$ , which results from multiplying the multiplicand Y by the multiplier X, can be written as follows:

$$P = \sum_{i=0}^{m-1} \sum_{i=0}^{n-1} (Y_i.X_j) 2^{i+j}$$

### 3. FPGA design and implementation results

The design of standard 4×4, 6×6, 8×8, and 12×12-bit Braun's multipliers are done using VHDL and implemented in a Xilinx Virtex-5 XC5VLX50 (package: ff676, speed grade: -3) FPGA using the Xilinx ISE 9.2i design tool [15]. A one-way ANOVA is applied to find out the effect of different multipliers on the mean delay time for Virtex-5 FPGA device.

Table 1 demonstrates the statistics of mean delay time in Braun's multipliers for Virtex-5 FPGA. Fig. 1 shows the mean delay time for the four Braun's multipliers, which clearly indicates as the size of multiplier increases the mean delay time also increases, the same results are obtained for truncated multipliers [16-18].

Table 2 summarizes the Virtex-5 FPGA resources utilization for standard 4×4, 6×6, 8×8, and 12×12-bit Braun's multipliers.

Table 3 shows the one-way ANOVA on Virtex-5 FPGA device. The multipliers 4×4, 6×6, 8×8, and 12×12 are used for this analysis. The statistical analysis is done by using SPSS program. There is a statistically significant difference at the .05 level in delay time for the multipliers [F(3, 16) = 730.622, p = 0.000]. The mean values of delay time for the multipliers are compared by using one-way ANOVA and post-hoc Tukey HSD multiple comparison tests at the .05 significance level. The test indicates that the mean value of the delay time for multiplier 4×4 (Mean = 6.58, Standard Deviation = 0.26) is significantly different from multiplier 6×6 (Mean = 8.14, Standard Deviation = 0.29), multiplier  $8\times8$  (Mean = 8.60, Standard Deviation = 0.16), and multiplier  $12 \times 12$  (Mean = 12.78, Standard Deviation = 0.13). Hence, the ANOVA and Tukey HSD multiple comparison with .05 confidence level suggests that all of the  $4\times4$ ,  $6\times6$ ,  $8\times8$  and  $12\times12$  bit Braun's multipliers implemented on Virtex-5 are significant to each other.

Table 1: Statistics of mean delay time in Braun's multipliers for

|             |              | \<br>\       | /irtex-5 FPGA             |                            |
|-------------|--------------|--------------|---------------------------|----------------------------|
| FPGA Device | Bit<br>Width | Mean<br>(ns) | Std.<br>Deviation<br>(ns) | Std. Error of<br>Mean (ns) |
| Virtex-5    | 4×4          | 6.58         | 0.2588                    | 0.1158                     |
|             | 6×6          | 8.14         | 0.2881                    | 0.1288                     |
|             | 8×8          | 8.60         | 0.1581                    | 0.0707                     |
|             | 12×12        | 12.78        | 0.1304                    | 0.0583                     |



Fig. 1: Mean delay time for the Braun's multipliers

### 4. Conclusion

In this paper we have presented a 4×4, 6×6, 8×8, and 12×12-bit Braun's multipliers design using VHDL and implemented in Xilinx Virtex-5 FPGA. The latency study

of Braun's multipliers was performed using ANOVA statistical analysis method followed by post hoc Tukey's test with a .05 significance level to compare the 4×4, 6×6, 8×8 and 12×12-bit Braun's multipliers. Multiple comparison tests revealed that the differences between the Virtex-5 FPGA based multipliers are significant with a 95% confidence level. However, as the size of Braun's multipliers based Virtex-5 increases their mean latency value is also increases.

# Acknowledgement

The authors acknowledge the assistance and the financial support provided by the Cornea Research Chair, College of Applied Medical Sciences, King Saud University.

### References

- [1] C.R. Baugh and B.A. Wooley, "A Two's Complement Parallel Array Multiplication Algorithm", IEEE Transactions on Computers, 1973, Vol. C-22, No. 12, pp. 1045-1047.
- [2] J.M. Jou, S.R. Kuang and R.D. Chen, "Design of Low-Error Fixed-Width Multipliers for DSP Applications", IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 1999, Vol. 46, No. 6, pp. 836-842.
- [3] S.S. Kidambi, F.El-. Guibaly, and A. Antonious, "Area-Efficient Multipliers for Digital Signal Processing Applications", IEEE Transactions on Circuits and Systems-II: Analog and Digital Signal Processing, 1996, Vol. 43, No. 2, pp. 90-95.
- [4] M. H. Rais, "FPGA design and implementation of fixed width standard and truncated 6×6-bit multipliers: A comparative study", in Proc. of the 4<sup>th</sup> IEEE International Design and Test Workshop, IEEE Xplore Press, 2009, pp. 1-4.
- [5] M.H. Rais, "Efficient hardware realization of truncated multipliers using FPGA", International Journal of Applied Science, Engineering and technology, 2009, Vol. 5, No. 2, pp. 124-128.
- [6] M.H. Rais, "Hardware design and implementation of fixed width standard and truncated 4×4, 6×6, 8×8 and 12×12 bit multipliers using FPGA", In Proc. AIP conference, 2010, Vol. 1239, pp. 192-196.
- [7] M.H. Rais, "Hardware implementation of truncated multipliers using Spartan-3AN, Virtex-4 and Virtex-5 devices", American Journal of Engineering and Applied Sciences, 2010, Vol. 3, No. 1, pp. 201-206.
- [8] M.H. Rais, B. M. Al-Harthi, S. I. Al-Askar, and F. K. Al-Hussein, "Design and Field Programmable Gate Array Implementation of Basic Building Blocks for Power-Efficient Baugh-Wooley Multipliers", American Journal of Engineering and Applied Sciences, 2010, Vol. 3, No. 2, pp. 307-311.
- [9] M.H. Rais and M.H. Al Mijalli, "FPGA based fixed width 4×4, 6×6, 8×8, and 12×12 bit multipliers using Spartan-

- 3AN", International Journal of Computer Science and Network Security, 2011, Vol.11, No. 2, pp. 61-68.
- [10] M.H. Rais and M.H. Al Mijalli, "Field programmable gate arrays based realization of truncated multipliers", American Journal of Applied Sciences, 2011, Vol.8, No. 7, pp. 681-684
- [11] M.H. Rais and M.H. Al Mijalli, "Reconfigurable design and implementation of standard and truncated multipliers using Spartan-3AN, Spartan-3E, Virtex-2 and Virtex-4 FPGAs", European Journal of Scientific Research, 2011 (Accepted for Publication).
- [12] K. S. Yeo and K. Roy, Low Voltage, Low Power Subsystems. McGraw-Hill Professional, 2005.
- [13] C. Maxfield, The Design Warrior's Guide to FPGAs: Devices, Tools and Flows. Newnes Publishers, MA, 2004.
- [14] Xilinx, Virtex-5 FPGA family overview, 2007.
- [15] Xilinx, ISE 9.2i design tool, 2007.
- [16] M.H. Al Mijalli, "Spartan-3AN field programmable gate arrays truncated multipliers delay study", American Journal of Applied Sciences, 2011, Vol. 8, No. 6, pp. 554-557.
- [17] M.H. Al Mijalli, "Delay study of Virtex-2, Virtex-4 and Spartan-3E based truncated multipliers", International Journal of Computer Science and Network Security, 2011, Vol.11, No. 7, pp. 68-71.
- [18] M.H. Al Mijalli, "FPGA based truncated multipliers: A study of latency in FPGAs devices", European Journal of Scientific Research, 2011 (Accepted for Publication).

Muhammad H. Rais received the Ph.D. degree in Electronics Engineering from the University of Western Australia, in 2000. He is a Researcher at Cornea Research Chair, College of Applied Medical Sciences, King Saud University, Riyadh, Saudi Arabia. His major interest includes microelectronics, logic design, FPGA, VHDL, medical imaging, and characterization and modeling of semiconductor devices.

**Mohammed H. Al Mijalli** received the Ph.D. degree in Bioengineering, Strathclyde University, Glasgow UK. He is an Associate Professor at Department of Biomedical Technology, King Saud University, Riyadh, Saudi Arabia. His major interest includes Biomedical Instrumentation, FPGA and Medical Imaging.

Table 2: FPGA resource utilization for standard Braun's multipliers for Virtex-5 XC5VLX50 (package: ff676, speed grade:-3)

| Bit<br>Width | Braun's<br>Multipliers | Four Input<br>LUTs<br>(28800) | Occupied<br>Slices<br>(7200) | Bonded<br>IOBs<br>(440) | Total Equivalent<br>Gate Count | Average<br>Connection Delay<br>(ns) | Maximum Pin Delay<br>(ns) |
|--------------|------------------------|-------------------------------|------------------------------|-------------------------|--------------------------------|-------------------------------------|---------------------------|
| 4×4          | Standard               | 22                            | 11                           | 16                      | 154                            | 0.887                               | 2.103                     |
| 6 ×6         | Standard               | 43                            | 19                           | 24                      | 301                            | 0.885                               | 1.795                     |
| 8×8          | Standard               | 81                            | 29                           | 32                      | 567                            | 0.857                               | 1.733                     |
| 12×12        | Standard               | 202                           | 96                           | 48                      | 1414                           | 1.074                               | 2.834                     |

Table 3: The results of multiple comparisons of delay time (ns) for four Braun's multipliers using the Tukey's HSD post-hoc test

| (I) (J)<br>Multipliers Multipl |                    |                          |            | Sig. | 95% Confidence Interval |             |
|--------------------------------|--------------------|--------------------------|------------|------|-------------------------|-------------|
|                                | (J)<br>Multipliers | Mean Difference<br>(I-J) | Std. Error |      | Lower Bound             | Upper Bound |
| 4×4                            | 6×6                | -1.56000*                | .13856     | .000 | -1.9654                 | -1.1636     |
|                                | 8×8                | -2.02000*                | .13856     | .000 | -2.4164                 | -1.6236     |
|                                | 12×12              | -6.20000*                | .13856     | .000 | -6.5964                 | -5.8036     |
| 6×6                            | 4×4                | 1.56000*                 | .13856     | .000 | 1.1636                  | 1.9564      |
|                                | 8×8                | 46000*                   | .13856     | .020 | 8564                    | 0636        |
|                                | 12×12              | -4.64000*                | .13856     | .000 | -5.0364                 | -4.2436     |
| 8×8                            | 4×4                | 2.02000*                 | .13856     | .000 | 1.6236                  | 2.4164      |
|                                | 6×6                | .46000*                  | .13856     | .020 | .0636                   | .8564       |
|                                | 12×12              | -4.18000*                | .13856     | .000 | -4.5764                 | -3.7836     |
| 12×12                          | 4×4                | 6.20000*                 | .13856     | .000 | 5.8036                  | 6.5964      |
|                                | 6×6                | 4.64000*                 | .13856     | .000 | 4.2436                  | 5.0364      |
|                                | 8×8                | 4.18000*                 | .13856     | .000 | 3.7836                  | 4.5764      |

<sup>\*</sup> The mean difference is significant at the .05 level.