Project 3: Cache and Cache Controller

 

In this project you will use verilog to implement a data cache and its controller for a single-cycle processor implementation.

Description

You are given a processor core module which has every components except the data memory. You are also provided with a main memory module, which has n-cycle latency. Design and implement a cached data memory subsystem for the processor. The cache is direct-mapped and write-through.

 

The following figure gives the interconnection between the processor core (processor thereafter) and its cached data memory subsystem (data memory thereafter). When executing of load/store instruction, the processor asserts read/write signals. The data memory returns the data for cache read hits, and assert the stall signal otherwise. The processor stalls itself on asserted stall signal. The flush signal is used to flush the content of cache (reset valid bits in implementation).

 

 

The data memory consists of the following sub-modules: (1) cache controller with cache tag array, (2) cache data array, and (3) main memory. You will implement the former two modules, and integrate the three modules into a data memory module. Note in the diagram the address is decomposed into tag, index, and offset before they are sent to cache controller and data array. In this project, the cache block size is fixed at 16-bytes or 128-bits. The data input/output to processor is 32-bit.

 

 

The data array stores cached memory blocks. On cache hits, the index selects a cache data block and the offset selects a word (4-byte) from the block. The output to processor is through rdata. There is a 128-bit data path from main memory to the data array, which is used to fill the data array on cache misses.

Besides the connection to data array, from the processor the main memory accepts 32-bit address, 32-bit write data, and 32-bit read data. From the cache controller it accepts the read and write signals. Any memory access will finish in n cycles; n = 3 for the default configuration. Here is a waveform showing the timing of the signals for memory accesses. The memory module de-asserts the ready signal when a memory access is in process, and assert it when and after the memory access finishes. The input to main memory is registered, but the output is unregistered.

The cache controller is the central part of the design. It provides all the signals to processor, data array, and main memory. Internally it has a finite state machine to set the signals at the right times. It also includes the tag array and valid bits.

Here is an example about how a cache read miss can be processed. Suppose at cycle 0 the processor executes a load instruction. The control sees the tag and index, uses the index to access the store tag, and match the two tags. Then it determines the memory reference misses in the cache, and asserts the stall signal by the end of cycle 0 (it will keep the stall signal asserted until the main memory access finishes). Meanwhile,  it asserts mem_read signal. The main memory latches mem_read and addr at the end of cycle 0, and access the data for cycle 1 to n. By the end of cycle n, the valid 128-bit data block returns. The controller asserts refill, and the data array accepts the memory data.  The controller de-asserts stall, and the processor finishes the execution of the load instruction. Note that the processor is being stalled and all its outputs do not change, thus the index to data array is valid. The data array also forwards 32-bit in the block (selected by offset) to the processor.

On cache read hits, the controller simply asserts read to data array. The data array returns the read data to the processor.

For simplicity, the cache stalls the processor for n cycles for every write, i.e., it has no write buffer. For write hits, the cache controller must assert update and the data array should accept the write data from the processor.

Preparation

Download package dcache.zip from WebCT, which has the following modules:

Programming

Develop the following verilog modules:

Notes:

Testing

Using the testing examples in the package to test your each of your modules. You may need to change the names of variables (some of the names in the above diagram are shortened). Test your modules one by one. Create projects for each of them. You may use the existing waveform files (cache_ctrl.vwf, data_mem.vwf, and cpu.vwf).

You also need to check the contents of logic memories after the simulation of cpu.v.

The compilation of the cpu.v is slower than project 2 but is still acceptable.

Note: Because of the complexity in RAM access in Quartus, you need to provide two clock signals, "clock" and "fast_clock", such that the latter one doubles the frequency of the former one. (The latter one is used for RAM access.)

Further Discussions

Discuss with your partner for the following questions:

Project Report

Your report may include the following contents:

  1. Introduction: This is your understanding of cache design (for blocking cache).
  2. Design Description: Give an overview of the whole design and details of the modules your complete.
  3. Component Testing: Describe each testing example and discuss why the results are correct.
  4. Overall Testing: Describe the testing result of cpu.v. Discuss why the result is correct.
  5. Further discussions: as described above.
  6. Summary: Give a summary of this project
  7. Comments: Please give (1) your view of the learning through the project; (2) any suggestions you might have to improve learning in this class, and (3) any additional comments.

Submission

Create a package that contains ONLY the four verilog files (.v), all waveform files (.vwf), project files (.quartus), and a project report (see below). Please remove files generated by Quartus. The report should use PDF format or WORD format. Each group may submit one package and include the names of all members in the report.

The project is due on 11:55pm Monday December 1. There is 10% penalty if one submits less than two days later.

Grading

Final Points