## **Research Paper**

Engineering



## A New Vlsi Archiecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorith"?

\* R. Santhosh Kumar \*\* P. Kalpana Reddy

# \* Master of Technology (ECE) & H.No:12-11-1899/28, Ambernagar, Hyderabad, Andhra Pradesh.

### \*\* Associate Professor and Head of the Department, Progressive Engineering College

#### ABSTRACT

In a VLSI system, the pipeline architecture of high-speed modified Booth multipliers are used. The proposed multiplier circuits are based on the modified Booth algorithm. The pipeline technique which are the most widely used to accelerate the multiplication speed. To implement the optimally pipelined multipliers, many kinds of experiments have been conducted. The speed of the multipliers is greatly improved by properly deciding the number of pipeline stages and the positions for the pipeline registers to be inserted. The proposed modified Booth multiplier circuits in Verilog HDL and synthesized the gate-level circuits. The resultant multiplier circuits show better performance than others. Since the proposed multipliers operate at GHz ranges, they can be used in the systems requiring very high performance.

## Keywords : modified Booth algorithm, multiplier and pipeline, high-speed

#### INTRODUCTION

Very-large-scale integration (VLSI) is the process of creating integrated circuits by combining thousands of transistors into a single chip. VLSI began in the 1970s when complex semiconductor and communication technologies were being developed. The microprocessor is a VLSI device. The first semiconductor chips held two transistors each. Subsequent advances added more and more transistors, and, as a consequence, more individual functions or systems were integrated over time. The first integrated circuits held only a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it possible to fabricate one or more logic gates on a single device. Now retrospectively as small-scale integration (SSI), improvements in technique led to devices with hundreds of logic gates, known as medium-scale integration (MSI). Further improvements led to large-scale integration (LSI), i.e. systems with at least a thousand logic gates. Current technology has moved far past this mark and today's microprocessors have many millions of gates and billions of individual transistors.

At one time, there was an effort to name and calibrate various levels of large-scale integration above VLSI. Terms like ultra-large-scale integration (ULSI) were used. But the huge number of gates and transistors available on common devices has rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of integration are no longer in widespread use.

#### Challenges in VLSI System:

As microprocessors become more complex due to technology scaling, microprocessor designers have encountered several challenges which force them to think beyond the design plane are

 Process variation – As photolithography techniques tend closer to the fundamental laws of optics, achieving high accuracy in doping concentrations and etched wires is becoming more difficult and prone to errors due to variation. Designers now must simulate across multiple fabrication process corners before a chip is certified ready for production.

 Stricter design rules – Due to lithography and etch issues with scaling, design rules for layout have become increasingly stringent. Designers must keep ever more of these rules in mind while laying out custom circuits. The overhead for custom design is now reaching a tipping point, with many design houses opting to switch to electronic design automation (EDA) tools to automate their design process.

Timing/design closure –As clock frequencies tend to scale up, designers are finding it more difficult to distribute and maintain low clock skew between these high frequency clocks across the entire chip. This has led to a rising interest in multicore and multiprocessor architectures, since an overall speedup can be obtained by lowering the clock frequency and distributing processing.

 First-pass success – As die sizes shrink (due to scaling), and wafer sizes go up (to lower manufacturing costs), the number of dies per wafer increases, and the complexity of making suitable photo masks goes up rapidly. A mask set for a modern technology can cost several million dollars. This non-recurring expense deters the old iterative philosophy involving several "spin-cycles" to find errors in silicon, and encourages first-pass silicon success.

The digital signal processing (DSP) is one of the core technologies in multimedia and communication systems. Many application systems based on DSP, especially the recent next-generation optical communication systems, require extremely fast processing of a huge amount of digital data. Most of DSP applications such as fast Fourier transform (FFT) require additions and multiplications. Since the multipliers have a significant impact on the performance of the entire system, many high-performance algorithms and architectures have been proposed to accelerate multiplication.

Various multiplication algorithms such as Booth modified Booth, Braun, Baugh-Wooley have been proposed. The modified Booth algorithm reduces the number of partial products to be generated and is known as the fastest multiplication algorithm. Many researches on the multiplier architectures including array, parallel and pipelined multipliers have been pursued and the pipelining is the most widely used technique to reduce the propagation delays of digital circuits. In this the high-speed multipliers with very deep pipelines based on the modified Booth algorithm determined the number of pipeline stages and the positions for the pipeline registers to be inserted by exhaustive experiments.

#### ALGORITHM OF THE MODIFIED BOOTH MULTIPLIER:

Multiplication consists of three steps: 1) the first step to generate the partial products; 2) the second step to add the generated partial products until the last two rows are remained; 3) the third step to compute the final multiplication results by adding the last two rows. The modified Booth algorithm reduces the number of partial products by half in the first step. We used the modified Booth encoding (MBE) scheme proposed in. It is known as the most efficient Booth encoding and decoding scheme. To multiply X by Y using the modified Booth algorithm starts from grouping Y by three bits and encoding into one of  $\{-2, -1, 0, 1, 2\}$ . Table I shows the rules to generate the encoded signals by MBE scheme and Fig. 1 (a) shows the corresponding logic diagram. The Booth decoder generates the partial products using the encoded signals as shown in Fig. 1 (b).



#### Booth encoder



(b) Booth decoder

#### Fig. 1 Encoder and decoder for MBE scheme

#### ARCHITECTURE OF THE MODIFIED BOOTH MULTIPLI-ER:

The architecture of the commonly used modified Booth multiplier, the inputs of the multiplier is multiplicand X and multiplier Y. The Booth encoder encodes input Y and derives the encoded signals as shown in Fig. 1 (a). The Booth decoder generates the partial products according to the logic diagram in Fig. 1 (b) using the encoded signals and the other input X. The Wallace tree computes the last two rows by adding the generated partial products. The last two rows are added to generate the final multiplication results using the carry look-ahead adder (CLA).



#### Fig. 2 Architecture of the modified Booth multiplier

#### PROPOSED PIPELINE ARCHITECTURE:

The pipeline technique is widely used to improve the performance of digital circuits. As the number of pipeline stages is increased, the path delays of each stage are decreased and the overall performance of the circuit is improved. We investigated various pipeline schemes to find the optimum number of pipeline stages and the positions for the pipeline registers to be inserted in order to obtain the high-speed modified Booth multiplier.

A. Basic 3-stage Pipelined Multiplier: At first, we partitioned the modified Booth multiplier into three pipeline stages according to the functionality of the circuit as shown in Fig. 3. The delay of the critical path of the 3-stage pipelined multiplier is reduced approximately by half compared to the non-pipelined one.

B. Pipelines in Full Adders: The delays of the basic 3-stage pipelined multiplier are reduced by half compared to the non-pipelined multiplier. By applying 3-stage pipelines in the Wallace tree, the delay of the critical path is reduced again by half. In this case, the critical path becomes the full adders in the Wallace tree. We inserted the pipeline registers in the full adders too. Fig. 4 shows the architecture of the 2-stage pipelined full adder. By using this adder in the Wallace tree, the overall performance is improved more.



Fig. 3 Modified Booth multiplier with 3-stage pipelines



Fig. 4 Full adder with 2-stage pipelines

#### CONCLUSIONS:

The pipelining is the most widely used technique to improve the performance of digital circuits. We proposed the highspeed modified Booth multipliers with very deep pipelines. The proposed multipliers consist of three modules: 1) the modified Booth encoder and decoder module to generate N/2 partial products; 2) the carry look-ahead adder module for the final addition.

#### REFERENCES

[1]Wen-Chang Yeh and Chein-Wei Jen(2000), "High-speed Booth encoded parallel multiplier design," IEEE Trans. on Computers, vol. 49, isseu 7, pp. 692-701. [2] Shiann-Rong Kuang, Jiun-Ping Wang and Cang-Yuan Guo(2009), "Modified Booth multipliers with a regular partial product array," IEEE Trans. On Circuit and Systems, vol.56, Issue 5, pp. 404-408. [3]Li-rong Wang, Shyh-Jye Jou and Chung-Len Lee(2008), "A well-structured modified Booth multiplier design," Froc. of IEEE VLSI-DAT, pp. 85-88. [4]A. A. Khatibzadeh, K. Raahemifar and M. Ahmadi(2005), "A 1.8V 1.1GHz Novel Digital Multiplier," Proc. of IEEE CCECE, pp. 686-689. [5]S. Hus, V. Venkatraman, S. Mathew, H. Kaul, M. Anders, S. Dighe, W. Burleson and R. Krishnamurthy(2005), "A 2GHZ 13.6mW 12x9b multiplier for energy efficient FFT accelerators," Proc. of IEEE ESSCIRC, pp. 199-202. []