Electrical Engineering and Computer Science (EECS) 427 Winter 2003 Group 3 Project: A Simple 16-bit Digital Signal Processing (DSP) Chip


This chip was designed by a group that included myself and four other people for the Winter 2003 semester of EECS 427, which is the introductory Very Large-Scale Integrated (VLSI) circuit design course at the University of Michigan College of Engineering. This course can be taken by both undergraduate seniors as a Major Design course and graduate students. All five of the members of our group were undergraduate seniors.

The chip was designed over the course of a four-month period (the length of a semester at the University of Michigan), and was created using both full-custom design and synthesis and automatic place-and-route (APR) design. The basic datapath (including program counter, register file, arithmetic logic unit, and shifter) was created using full-custom design flow; the datapath's control unit and multiplier-accumulator (MAC) unit were created using synthesis and APR. The on-chip memory and the bond pads were provided to us by the instructor. The baseline requirements included a 16-bit wide datapath with 2 stages of pipelining. Our additions to the baseline consisted mainly of a MAC unit that had an 8-bit wide multiply and a 25-bit wide accumulator, as well as modifications to the input/output (I/O) handling. As a result of adding the MAC and the specialized I/O, we also had to add some assembly instructions to our instruction set architecture (ISA).

EECS 427 Winter 2003 Group 3 chip layout
Copyright © 2003 Jason Liu, Mark Kai Li, Mindy Simin Teo, Xiao Liang, Kaihua Zhang, and the University of Michigan.


Chip Statistics
Function A simple digital signal processing (DSP) chip
Application Can be used to perform image compression and decompression
Process
Technology
AMI® 0.5 µm (1 layer of polysilicon, 3 metal layers available)
Datapath
  • 16-bits wide
  • Reduced Instruction Set Computer (RISC) Architecture
  • 2-stage pipeline (Stage 1: Fetch and Decode; Stage 2: Execute, Data Memory, and Write-Back)
  • Program counter (PC), register file, arithmetic logic unit (ALU), shifter, and multiplier-accumulator (MAC)
Software Design Flow Task Software Used
Full-Custom Schematic Input Mentor Graphics® Design Architect (DA)
Functional Verification Mentor Graphics® Modelsim
Analog Simulation Mentor Graphics® Accusim
Layout Creation Mentor Graphics® IC Station
Design-Rule Check (DRC) and
Layout vs. Schematic (LVS)
Mentor Graphics® IC Station
Synthesis and APR Functional Description Verilog (Behavioral)
Gate-Level Netlist and
Schematic Generation
Synopsys® Design Analyzer
(Design Compiler)
Functional Verification Mentor Graphics® Modelsim
Layout Generation Synopsys® Silicon Ensemble
DRC and LVS Mentor Graphics® IC Station


Baseline Architecture
Parts of the chip that were required to be completed by all groups in EECS 427.
Instruction Set Architecture (ISA)
Datapath Register File
  • Dual simultaneous read, single write
  • Buffered control
Arithmetic Logic Unit (ALU) For the ALU, we had a choice of which style of adder implementation to use. In EECS 427, we had four choices available to us: a simple ripple-carry adder, a carry-lookahead adder, a square-root carry-select adder, or a logarithmic-lookahead (Brent-Kung) adder. Our group decided to use the square-root carry-select adder with simple ripple-carry substages for our ALU, given our time constraints. Although it required a lot of area for the layout, it was also one of the faster adder types that we could implement. We split up the substages into 2-2-3-4-5 bits for the fastest possible overall delay times.
Shifter We also had the option of implementing either a barrel shifter or a logarithmic (a.k.a. multiplexing) shifter. The group chose the barrel shifter for simplicity of layout, and because of the fact that you do not receive much, if any, time-savings from using a log shifter unless the datapath is wider than 16 bits.
Program Counter (PC) For the PC, we had to make a choice about whether to use a counter-style PC or to use a register/adder combination. We elected to use the register/adder type of PC, because it would have the capability of supporting branch-prediction logic if we were to implement it, as well as the fact that we had already created a relatively "fast" adder for our ALU.
Control Unit The control unit was synthesized. The code describing the controller was written using behavioral Verilog. The schematic should have been described using structural Verilog, but a couple of our group members created the control unit schematic and schematic symbol by hand using DA. By the time we discovered that synthesis would have been much easier if we had used structural Verilog, we were already significantly behind schedule, and most of the group members were unwilling to try to use structural Verilog, because they were afraid of all the time spent hand-creating the schematic having gone to waste.
Additions to Baseline
These parts were added by our group specifically for the application that our chip was supposed to perform.
Input/Output (I/O) Handling Input
  • Our chip interfaces with an external serial-to-parallel register for input.
  • Addition of the LSR (Load from Serial-to-Parallel Register) instruction to our instruction set.
  • Our group made the decision to implement our chip's input handling using software polling instead of hardware interrupts. This decision was prompted by the fact that we would not have had time to implement the hardware interrupt instructions due to the time constraints of our class. However, we were aware that in order for a real DSP chip to perform the desired algorithm as fast as possible (i.e. in real-time), hardware interrupts are usually the better choice.
Output
  • Our chip interfaces with an external parallel-to-serial register for output.
  • Addition of the SPARR (Store to PARallel-to-Serial Register) instruction to our instruction set.
Multiplier-Accumulator (MAC) Unit
  • 8-bit wide multiply
  • 25-bit wide accumulator (the most-significant bit is used purely for overflow detection)
  • Addition of the MAC instruction to the instruction set. This assembly instruction performs the multiply-and-accumulate operation.
  • Addition of the MAC2 instruction to the instruction set. This assembly instruction loads the lower 16 bits from the accumulator into a specified register.
  • Addition of the MAC3 instruction to the instruction set. This assembly instruction loads the upper 8 bits from the accumulator into a specified register. [Personally, I would have liked more descriptive opcode names for both the MAC2 and MAC3 instructions, but I was working on layout at the time, and was never asked for input as to what to name the instructions.]