EECS 427 Winter 2003 Group 3 Chip - Portfolio

Electrical Engineering and Computer Science (EECS) 427 Winter 2003 Group 3 Project: A Simple 16-bit Digital Signal Processing (DSP) Chip

This chip was designed by a group that included myself and four other people for the Winter 2003 semester of EECS 427, which is the introductory Very Large-Scale Integrated (VLSI) circuit design course at the University of Michigan College of Engineering. This course can be taken by both undergraduate seniors as a Major Design course and graduate students. All five of the members of our group were undergraduate seniors.

The chip was designed over the course of a four-month period (the length of a semester at the University of Michigan), and was created using both full-custom design and synthesis and automatic place-and-route (APR) design. The basic datapath (including program counter, register file, arithmetic logic unit, and shifter) was created using full-custom design flow; the datapath's control unit and multiplier-accumulator (MAC) unit were created using synthesis and APR. The on-chip memory and the bond pads were provided to us by the instructor. The baseline requirements included a 16-bit wide datapath with 2 stages of pipelining. Our additions to the baseline consisted mainly of a MAC unit that had an 8-bit wide multiply and a 25-bit wide accumulator, as well as modifications to the input/output (I/O) handling. As a result of adding the MAC and the specialized I/O, we also had to add some assembly instructions to our instruction set architecture (ISA).

Chip Statistics
Function	A simple digital signal processing (DSP) chip
Application	Can be used to perform image compression and decompression
Process Technology	AMI^® 0.5 µm (1 layer of polysilicon, 3 metal layers available)
Datapath	16-bits wide Reduced Instruction Set Computer (RISC) Architecture 2-stage pipeline (Stage 1: Fetch and Decode; Stage 2: Execute, Data Memory, and Write-Back) Program counter (PC), register file, arithmetic logic unit (ALU), shifter, and multiplier-accumulator (MAC)
Software	Design Flow	Task	Software Used
Full-Custom	Schematic Input	Mentor Graphics^® Design Architect (DA)
Functional Verification	Mentor Graphics^® Modelsim
Analog Simulation	Mentor Graphics^® Accusim
Layout Creation	Mentor Graphics^® IC Station
Design-Rule Check (DRC) and Layout vs. Schematic (LVS)	Mentor Graphics^® IC Station
Synthesis and APR	Functional Description	Verilog (Behavioral)
Gate-Level Netlist and Schematic Generation	Synopsys^® Design Analyzer (Design Compiler)
Functional Verification	Mentor Graphics^® Modelsim
Layout Generation	Synopsys^® Silicon Ensemble
DRC and LVS	Mentor Graphics^® IC Station

Chip Statistics

Function

A simple digital signal processing (DSP) chip

Application

Can be used to perform image compression and decompression

Process
Technology

AMI^® 0.5 µm (1 layer of polysilicon, 3 metal layers available)

Datapath

16-bits wide
Reduced Instruction Set Computer (RISC) Architecture
2-stage pipeline (Stage 1: Fetch and Decode; Stage 2: Execute, Data Memory, and Write-Back)
Program counter (PC), register file, arithmetic logic unit (ALU), shifter, and multiplier-accumulator (MAC)

Software

Design Flow

Task

Software Used

Full-Custom

Schematic Input

Mentor Graphics^® Design Architect (DA)

Functional Verification

Mentor Graphics^® Modelsim

Analog Simulation

Mentor Graphics^® Accusim

Layout Creation

Mentor Graphics^® IC Station

Design-Rule Check (DRC) and
Layout vs. Schematic (LVS)

Mentor Graphics^® IC Station

Synthesis and APR

Functional Description

Verilog (Behavioral)

Gate-Level Netlist and
Schematic Generation

Synopsys^® Design Analyzer
(Design Compiler)

Functional Verification

Mentor Graphics^® Modelsim

Layout Generation

Synopsys^® Silicon Ensemble

DRC and LVS

Mentor Graphics^® IC Station

Baseline Architecture
Parts of the chip that were required to be completed by all groups in EECS 427.
Instruction Set Architecture (ISA)	Reduced Instruction Set Computer (RISC) Click here to see the instruction set (PDF format)
Datapath	Register File	Dual simultaneous read, single write Buffered control
Arithmetic Logic Unit (ALU)	For the ALU, we had a choice of which style of adder implementation to use. In EECS 427, we had four choices available to us: a simple ripple-carry adder, a carry-lookahead adder, a square-root carry-select adder, or a logarithmic-lookahead (Brent-Kung) adder. Our group decided to use the square-root carry-select adder with simple ripple-carry substages for our ALU, given our time constraints. Although it required a lot of area for the layout, it was also one of the faster adder types that we could implement. We split up the substages into 2-2-3-4-5 bits for the fastest possible overall delay times.
Shifter	We also had the option of implementing either a barrel shifter or a logarithmic (a.k.a. multiplexing) shifter. The group chose the barrel shifter for simplicity of layout, and because of the fact that you do not receive much, if any, time-savings from using a log shifter unless the datapath is wider than 16 bits.
Program Counter (PC)	For the PC, we had to make a choice about whether to use a counter-style PC or to use a register/adder combination. We elected to use the register/adder type of PC, because it would have the capability of supporting branch-prediction logic if we were to implement it, as well as the fact that we had already created a relatively "fast" adder for our ALU.
Control Unit	The control unit was synthesized. The code describing the controller was written using behavioral Verilog. The schematic should have been described using structural Verilog, but a couple of our group members created the control unit schematic and schematic symbol by hand using DA. By the time we discovered that synthesis would have been much easier if we had used structural Verilog, we were already significantly behind schedule, and most of the group members were unwilling to try to use structural Verilog, because they were afraid of all the time spent hand-creating the schematic having gone to waste.
Additions to Baseline
These parts were added by our group specifically for the application that our chip was supposed to perform.
Input/Output (I/O) Handling	Input	Our chip interfaces with an external serial-to-parallel register for input. Addition of the LSR (Load from Serial-to-Parallel Register) instruction to our instruction set. Our group made the decision to implement our chip's input handling using software polling instead of hardware interrupts. This decision was prompted by the fact that we would not have had time to implement the hardware interrupt instructions due to the time constraints of our class. However, we were aware that in order for a real DSP chip to perform the desired algorithm as fast as possible (i.e. in real-time), hardware interrupts are usually the better choice.
Output	Our chip interfaces with an external parallel-to-serial register for output. Addition of the SPARR (Store to PARallel-to-Serial Register) instruction to our instruction set.
Multiplier-Accumulator (MAC) Unit	8-bit wide multiply 25-bit wide accumulator (the most-significant bit is used purely for overflow detection) Addition of the MAC instruction to the instruction set. This assembly instruction performs the multiply-and-accumulate operation. Addition of the MAC2 instruction to the instruction set. This assembly instruction loads the lower 16 bits from the accumulator into a specified register. Addition of the MAC3 instruction to the instruction set. This assembly instruction loads the upper 8 bits from the accumulator into a specified register. [Personally, I would have liked more descriptive opcode names for both the MAC2 and MAC3 instructions, but I was working on layout at the time, and was never asked for input as to what to name the instructions.]

Baseline Architecture

Parts of the chip that were required to be completed by all groups in EECS 427.

Instruction Set Architecture (ISA)

Reduced Instruction Set Computer (RISC)
Click here to see the instruction set (PDF format)

Datapath

Register File

Dual simultaneous read, single write
Buffered control

Arithmetic Logic Unit (ALU)

For the ALU, we had a choice of which style of adder implementation to use. In EECS 427, we had four choices available to us: a simple ripple-carry adder, a carry-lookahead adder, a square-root carry-select adder, or a logarithmic-lookahead (Brent-Kung) adder. Our group decided to use the square-root carry-select adder with simple ripple-carry substages for our ALU, given our time constraints. Although it required a lot of area for the layout, it was also one of the faster adder types that we could implement. We split up the substages into 2-2-3-4-5 bits for the fastest possible overall delay times.

Shifter

We also had the option of implementing either a barrel shifter or a logarithmic (a.k.a. multiplexing) shifter. The group chose the barrel shifter for simplicity of layout, and because of the fact that you do not receive much, if any, time-savings from using a log shifter unless the datapath is wider than 16 bits.

Program Counter (PC)

For the PC, we had to make a choice about whether to use a counter-style PC or to use a register/adder combination. We elected to use the register/adder type of PC, because it would have the capability of supporting branch-prediction logic if we were to implement it, as well as the fact that we had already created a relatively "fast" adder for our ALU.

Control Unit

The control unit was synthesized. The code describing the controller was written using behavioral Verilog. The schematic should have been described using structural Verilog, but a couple of our group members created the control unit schematic and schematic symbol by hand using DA. By the time we discovered that synthesis would have been much easier if we had used structural Verilog, we were already significantly behind schedule, and most of the group members were unwilling to try to use structural Verilog, because they were afraid of all the time spent hand-creating the schematic having gone to waste.

Additions to Baseline

These parts were added by our group specifically for the application that our chip was supposed to perform.

Input/Output (I/O) Handling

Input

Our chip interfaces with an external serial-to-parallel register for input.
Addition of the LSR (Load from Serial-to-Parallel Register) instruction to our instruction set.
Our group made the decision to implement our chip's input handling using software polling instead of hardware interrupts. This decision was prompted by the fact that we would not have had time to implement the hardware interrupt instructions due to the time constraints of our class. However, we were aware that in order for a real DSP chip to perform the desired algorithm as fast as possible (i.e. in real-time), hardware interrupts are usually the better choice.

Output

Our chip interfaces with an external parallel-to-serial register for output.
Addition of the SPARR (Store to PARallel-to-Serial Register) instruction to our instruction set.

Multiplier-Accumulator (MAC) Unit

8-bit wide multiply
25-bit wide accumulator (the most-significant bit is used purely for overflow detection)
Addition of the MAC instruction to the instruction set. This assembly instruction performs the multiply-and-accumulate operation.
Addition of the MAC2 instruction to the instruction set. This assembly instruction loads the lower 16 bits from the accumulator into a specified register.
Addition of the MAC3 instruction to the instruction set. This assembly instruction loads the upper 8 bits from the accumulator into a specified register. [Personally, I would have liked more descriptive opcode names for both the MAC2 and MAC3 instructions, but I was working on layout at the time, and was never asked for input as to what to name the instructions.]

[an error occurred while processing this directive] Copyright (c) 2005 Jason Liu. All rights reserved. Legal Disclaimer: [Originally taken from http://www.perl.com/doc/FAQs/FAQ/oldfaq-html/Q3.5.html] This is UNPUBLISHED PROPRIETARY SOURCE CODE of Jason Liu; the contents of this file may not be disclosed to third parties, copied or duplicated in any form, in whole or in part, without the prior written permission of Jason Liu. Permission is hereby granted solely to the licensee for use of this source code in its unaltered state. This source code may not be modified by licensee except under direction of Jason Liu. Neither may this source code be given under any circumstances to non-licensees in any form, including source or binary. Modification of this source constitutes breach of contract, which voids any potential pending support responsibilities by Jason Liu. Divulging the exact or paraphrased contents of this source code to unlicensed parties either directly or indirectly constitutes violation of federal and international copyright and trade secret laws, and will be duly prosecuted to the fullest extent permitted under law. This software is provided by Jason Liu "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the regents or contributors be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage. [an error occurred while processing this directive]