For EECS 470, the entire goal of the class was to create an out-of-order (OoO) processor. For my team's project, we created an OoO N-way superscalar processor based on the Intel P6 microarchitecture (N-way meaning a parameterizable number of ways). The processor consisted of all the required features of an OoO processor, including a reorder buffer, reservation station, map table, instruction fetch, and instruction decode. We also included advanced features including an N-way superscalar design, a branch predictor, a speculative load/store queue, and a non-blocking I-cache and D-cache. The entire project was coded from scratch using SystemVerilog for both synthesis and validation using Synopsis' DC and VCS, respectively.
Our design methodology focused on making modules as general and as parameterized as possible. This included variable sizes for the reorder buffer, reservation station, branch predictor, instruction queue, load/store queue, and caches. It also included options for the associativity of the caches and the number of superscalar ways. During the project, we also spent time investigating other features such as a return address stack and static vs. dynamic reservation station allocation. We were able to simulate our processor to measure CPI and minimum clock period over a variety of parameters. Our end results showed that 2-way superscalar with a relatively small load/store queue provided the best performance, due to the amount of instruction-level parallelism in the course's benchmarks and the exponential increase in complexity with larger load/store queue sizes.
The capstone project of EECS 312 was to minimize the Energy Delay Product (EDP) of a 3x8 decoder using a 65 nm process. This project involved creating a design for the decoder at the schematic level, hand calculating the optimal sizings, and then using the calculations as a guide for simulating the optimal sizings. The specifications for the project also required a certain propagation delay, rise time, and fall time be met. In this project, we used Cadence's Virtuoso Suite for the schematic creation and circuit simulation.
In our design, the main strategies employed were gate input ordering, input profiling, transmission gating, and parametric analysis. For gate input ordering, we simply placed the inputs with shorter delays on the outermost nodes of the gate. This allowed the transistors to start charging the internal capacitances of the gate and reduce the overall delay of the gate. Second, we profiled the input set provided to see the frequency of certain input transitions. For the uncommon cases, we were able to size down the relevant transistors and reduce the associated node capacitances. One of the main optimizations we added to our design was inserting transmission gates on the inputs which were controlled by the "enable" signal. By doing so, we cut off the inputs while the decoder was disabled and effectively eliminated the dynamic power associated with switching events while the gate was disabled. Finally, using our calculations as a starting point, we ran multiple parametric simulations on the sizing of gates to determine the impact on the overall power consumption and the worst case delays.
Overall, our team did well and finished with one of the best EDPs in the class.
One of my personal favorite projects was my project for ENGR 100 - The Floppy Player. The project was to created a simple music composition suite that plays back using the stepper motors of up to 8 floppy disk drives. The composition suite was created completely using an assembly language for a softcore FPGA processor. The floppy drive stepper motors were controlled using a hardware-based controller written in Verilog and frequency modulation. The composition suite was a simple interface that allowed users to click to place notes on a music staff and compose up to 4 bars of music. The device also supported downloading MIDI music files for playback on the drives, where the controller also supported mult-track songs (i.e. multiple drives playing different parts simultaneously).
This was a large project that gave my team and me exposure to many different technical areas and concepts. Some areas included:
- Event-driven scheduling in order to play multiple different parts on the floppy drive
- I/O controllers in order to interact with the floppy drives
- I/O drivers to interact with all of the connected devices (e.g. computer monitor, USB mouse, floppy drives)
- Binary data encoding and manipulation in order to store and read music in the MIDI format
- Serial communication to download music from a PC
- Timing error mitigation in order to prevent music desynchronization
- Multi-word multiplication and division algorithms to support 32-bit operations on a 16-bit device.
I had a great time in this class, and I feel like it was one of the deciding factors to study computer hardware. You can check out a short demo of our project below! (Apologies for the vertical video; I was not the one filming)
One of my first design projects as an undergrad was to design a PID-based controller to control the angular velocity of a tabletop satellite (tablesat). The goal of the project was to accelerate as quickly as possible from rest to the target angular velocity. This was an interesting project, especially since we had not yet been exposed to any control theory. Our team manually tuned the parameters by collecting data on the system to model the friction on the pivot, and then manually tuned the parameters to offer a short settling time while limiting the overshoot to a reasonable amount (<15%). The project also proved to be an exercise in familiarizing ourselves with network protocols, because we had to SSH tunnel into a server into another server to control the tablesat, and then SCP or SFTP between two other servers for moving around data files.
While we weren't evaluated against our peers, our team did fairly well and scored 100% on the assignment.
Below is a short list of other notable projects I've done
- Created an integrated CPU-GPU system designed for smart watches. This was a mix of layout by hand and synthesis/automated place and route. I'll have a full writeup of this project when I get a chance. (EECS 427)
- Created a 5-stage in-order RISC processor in SystemVerilog using the Alpha ISA. (EECS 470)
- Created a simulator in C to perform a cycle-accurate simulation of a 5-stage RISC in-order pipeline with forwarding. (EECS 370)
- Created an L1 cache miss simulator in C to monitor hit/miss patterns. (EECS 370)
- Wrote an assembler to convert a simple ISA from assembly to machine code. (EECS 370)
- Implemented a compiler pass for LLVM that dynamically counts the number of instructions. (EECS 583)
- Implemented an LLVM compiler pass for speculative loop-invariant code motion (similar to loop-invariant code motion, except loads and stores which can alias are hoisted and then fix-up code is applied at runtime if an alias occurred). (EECS 583)