Many low-end FPGA development boards use SDR-SDRAM as off-chip memory, but DDR-SDRAM (DDR1) is larger and less expensive than SDR-SDRAM. And like SDR-SDRAM, DDR1 can also be directly driven by common IO pins of low-end FPGAs. I write a soft core DDR1 controller with a slave AXI4 interface for user. The features of this controller are:
* **Self-test demo** : Through the DDR1 controller, write data into DDR1, then read it out, and compare whether the read data is consistent with the written data.
* **UART DDR1 read/write demo** : Convert UART commands into AXI4 bus actions to read and write DDR1 through the DDR1 controller.
In addition, since the interface timing of each generation of DDR-SDRAM (such as DDR3, DDR2, DDR1) is similar, this repository can also facilitate those who are familiar with Verilog to learn the DDR-SDRAM interface.
For FPGA selection, only an FPGA with a sufficient number of common IOs can drive DDR1. The IO level standard of DDR1 is often SSTL-2, which is compatible with 2.5V LVTTL or 2.5V LVCMOS, so the power supply of the corresponding FPGA IO bank should be 2.5V, and should be configured as 2.5V LVTTL or 2.5V LVCMOS in the FPGA development software .
For demonstration, I used the cheapest FPGA of Altera Cyclone IV (model: EP4CE6E22C8N) and MICRON's 64MB DDR1 (model MT46V64M8TG-6T) to design a small board. All the demo project of this repository can run on this board directly. If you want to use DDR1 in your own PCB design, just refer to this board's design.
See board schematic [PCB/sch.pdf](./PCB/sch.pdf) and manufacturing file [PCB/gerber.zip](./PCB/gerber.zip) . It is a double-layer board, and it is not necessary to pay attention to impedance matching like DDR2 and DDR3, because the operating frequency of the circuit is 75MHz on both sides, which is not particularly high. Just pay attention to keeping the distance between FPGA and DDR as close as possible, and the wiring as short as possible. For example, I put the DDR1 chip on the opposite side of the FPGA chip to keep the wiring short.
See [RTL/ddr_sdram_ctrl.v](./RTL/ddr_sdram_ctrl.v) for the DDR1 controller code, which can automatically initialize DDR1 and refresh it regularly. The module has a AXI4 slave interface through which reads and writes to DDR1 can be accomplished. This section introduce in detail how to use this module.
| READ_BUFFER | | 0 or 1 | 1 | If it is set to 0, there will be no read data buffer in the DDR1 controller, and the read data will not wait for the AXI4 host to accept it, that is, the rvalid signal will not wait for the rready signal, and will be directly poured out at the highest rate. At this time, AXI4 is not Complete, but can reduce the latency of readout. If it is set to 1, there will be a large enough read data buffer in the DDR1 controller, and the rvalid signal will shake hands with the rready signal to confirm that the AXI4 host is ready before reading the data. |
| BA_BITS | Width | 1~4 | 2 | The width of DDR BANK ADDRESS (ddr_ba) is specified. Regular DDR1 BANK ADDRESS is 2bit, so this parameter is usually fixed to the default value and does not need to be changed. |
| ROW_BITS | Width | 1~15 | 13 | It specifies the width of the DDR ROW ADDRESS, and also determines the width of the address line pins (ddr_a) of the DDR1 chip. This parameter depends on the selection of the DDR1 chip. For example, MT46V64M8 has 8192 COLs per bank, considering 2^11=8192, this parameter should be 13. Similarly, for MT46V128M8, this parameter should be 14 |
| COL_BITS | Width | 1~14 | 11 | Specifies the width of the DDR COL ADDRESS. This parameter depends on the selection of the DDR1 chip. For example, MT46V64M8 has 2048 COLs per ROW. Considering 2^11=2048, this parameter should be 11. Similarly, for MT46V32M16, this parameter should be 10. |
| DQ_LEVEL | Width | 0~7 | 1 | The data bit width of the DDR1 chip is specified. For a chip with a bit width of x4 (such as MT46V64M4), this parameter should be 0; for a chip with a bit width of x8 (such as MT46V64M8), this parameter should be 1; for a chip with a bit width of x16 ( For example, MT46V64M16), this parameter should be 2; for the case of bit width x32 (such as two pieces of MT46V64M16 extended bit width), this parameter should be 3; and so on. |
| tREFC | Timing | 1\~1023 | 256 | The controller will refresh DDR1 periodically, and this parameter specifies the refresh interval of DDR1. See [Timing Parameters](#Timing Parameters) for details. |
| tW2I | Timing | 1\~255 | 7 | This parameter specifies the interval from the last write command of a write action to the activation command (ACT) of the next action. See [Timing Parameters](#Timing Parameters) for details. |
| tR2I | Timing | 1\~255 | 7 | This parameter specifies the interval from the last read command of a read action to the activation command (ACT) of the next action. See [Timing Parameters](#Timing Parameters) for details. |
`rstn_async` is a low-level reset signal and should be set high during normal operation. `drv_clk` is the driving clock, and its frequency is 4 times the user clock.
In this module, the driving clock `drv_clk` is divided by 4 to generate the DDR1 clock (`ddr_ck_p/ddr_ck_n`) and the AXI4 bus user clock (`clk`). This section describes how to determine the frequency of the drive clock `drv_clk`.
First, the clock frequency is limited by the DDR1 chip. Considering that all DDR1 interface frequencies are at least 75MHz, the lower limit of `drv_clk` is 75\*4=300MHz.
The upper limit of `drv_clk` also depends on the chip model of DDR1. For example, for MT46V64M8P-5B, check the chip manual, the maximum clock frequency of DDR1 with -5B suffix is 133MHz when CAS Latency (CL)=2, then the upper limit of `drv_clk` is 133 \*4=532MHz.
In addition, the upper limit of the clock frequency is also limited by the speed of the FPGA. Too high a clock frequency can easily lead to timing failure. This design fully considers the timing safety design, most registers work in the clk clock domain with a lower frequency; some registers work in a clock that is twice the frequency of the clk clock, and the combinational logic of the input port is very short; Under the `drv_clk` of the frequency, but the input port comes directly from the register output of the upper stage (no combinational logic). Therefore, even on the EP4CE6E22C8N with a very low speed class, the correct operation of the module is guaranteed at a drive clock of 300MHz.
It can be seen that the bit width of some signals of the DDR1 interface is related to the parameters, and the user needs to determine the module parameters according to the DDR1 chip part. For details, see [Bit Width Parameters](#Bit Width Parameters).
If you want to know the waveforms of the DDR1 interface during initialization, read/write, and refresh, please perform [RTL Simulation](#RTL Simulation).
The DDR1 controller provides clock and reset signals for the AXI4 slave interface, as follows. The AXI4 masters should use them as their clock and reset.
Following code are the AXI4 slave interfaces of this module, they are all synchronized with the rising edge of `clk` clock and should be connected to the AXI4 master inside the FPGA.
The bit width of the data (`wdata` and `rdata`) of the AXI4 bus is twice that of the DDR1 data bit width. For example, for MT46V64M8 whose bit width is 8bit, the bit width of `wdata` and `rdata` is 16bit.
The addresses (`awaddr` and `araddr`) of the AXI4 are byte addresses. The module will calculate the bit width of `awaddr` and `araddr` according to the parameters specified by the user = `BA_BITS+ROW_BITS+COL_BITS+DQ_LEVEL-1` . For example, for a 64MB MT46V64M8, the `awaddr/araddr` bit width is 26, where 2^26=64MB.
For each read and write access, the address value (`awaddr` and `araddr`) must be aligned with the entire word. For example, if the bit width of `wdata` and `rdata` is 16bit (2 bytes), then `awaddr` and `araddr` should be aligned by 2 bytes, that is, Divide by 2.
The AXI4 interface of this module does not support byte strobe write, there is no `wstrb` signal, and at least one entire data bit width is written each time. For example, if the bit width of `wdata` is 16, then at least 2 bytes are written at a time.
The AXI4 interface of this module supports burst reading and writing, and the burst length can take any value between 1\~256, that is, the value range of `awlen` and `arlen` is 0\~255. For example, in a write operation, `awlen=12`, the burst length is 13, and if the bit width of `wdata` is 16, then the burst write writes a total of 13\*2=26B.
Note that each burst read and write cannot exceed a row boundary within DDR1. For example, for MT46V64M8, each row has 8\*2^11/8 = 2048B, so each burst read and write cannot exceed the 2048B boundary.
* **Address channel handshake**: The AXI4 host pulls the `arvalid` signal high, indicating that it wants to initiate a write action. After a cycle in the figure, the DDR1 controller pulls the ready signal high, indicating that the address channel handshake is successful (before this , the DDR1 controller may be processing the last read or write action, or performing a refresh, so it has not pulled ready for the time being). At the same time as the handshake is successful, the DDR1 controller receives the address to be read (`araddr`) and the burst length (`arlen`). `arlen=8'd4` means that the burst length is 5.
* **Data transfer**: `rvalid` and `rready` on the AXI4 bus perform 5 handshakes, and read out 5 of the burst transfer. During handshake, if the AXI4 host is not ready to accept data, `rready` can be pulled low, and the DDR1 controller will keep the current data until the AXI4 host can accept the data (that is, pull `rready` high). When the last data is transferred, the DDR1 controller will pull `rlast` high.
> Note: When the module parameter `READ_BUFFER=0`, the module will save a BRAM resource and also reduce the delay between address channel handshake and data transfer. But the DDR1 controller ignores the `rready=0` condition and does not wait for the AXI4 host to be ready to accept data. This will destroy the completeness of the AXI4 protocol, but it may be useful in some simple situations.
Take [MICRON's DDR-SDRAM](https://www.micron.com/products/dram/ddr-sdram) series chips as an example, different chips have different ROW ADDRESS BITS, COL ADDRESS BITS and DATA BITS, which means that their bit width parameters are also different, as shown in the following table (note: these parameters can be found in the chip datasheet).
We know that DDR1 requires periodic refresh action, and `tREFC` specifies the refresh clock cycle interval (subject to clk). For example, if the user clock is 75MHz, according to the chip manual of MT46V64M8, it needs to be refreshed once at most 7.8125us. Considering that 75MHz * 7.8125us = 585.9, this parameter can be set to a value less than 585, such as `10'd512`.
`tW2I` specifies the minimum number of clock cycles (in clk) from the last write command of a write operation to the active command (ACT) of the next operation. The following figure shows a write operation on a DDR1 interface. The first rising edge of ddr_cas_n represents the end of the last write command of a write operation, and the second falling edge of ddr_ras_n represents the next operation (may be read, write, Refresh), there are 5 clock cycles between them, `tW2I` is used to specify the lower limit of the number of cycles. The default value of `tW2I` is `8'd7`, which is a conservative value compatible with most DDR1s. For different DDR1 chips, there are different shrinkage margins (see DDR1 chip datasheet for details).
`tR2I` specifies the minimum number of clock cycles (in `clk`) from the last read command of a read operation to the active command (ACT) of the next operation. The figure below shows a read operation on a DDR1 interface. The first rising edge of ddr_cas_n represents the end of the last read command of a read operation, and the second falling edge of ddr_ras_n represents the next operation (may be read, write, Refresh), there are 5 clock cycles between them, `tR2I` is used to specify the lower limit of the number of cycles. The default value of `tR2I` is `8'd7`, which is a conservative value compatible with most DDR1. For different DDR1 chips, there are different shrinkage margins (see DDR1 chip datasheet for details).
I provide a DDR1 read and write self-test project based on the [FPGA+DDR1 board](#Hardware Design Example) I designed. The project directory is [example-selftest](./example-selftest) , please open it with Quartus.
**Write**: After the project starts running, it will first write the entire DDR1 through AXI4, and directly write the address word to the corresponding address. For example, if the data width of AXI4 is 16bit, then write 0x0001 at address 0x000002. Write 0x3456 at address 0x123456.
**Read & Error Check**: After writing the entire DDR1, the project will repeatedly read the entire DDR1 round by round. error) to generate a high-level pulse, and the error count signal (error_cnt) +1. If the DDR1 configuration is correct, there should be no error signal. You can measure the pin corresponding to error_cnt, if it is 0 (all low), it means there is no error.
**SignalTap waveform capture**: This project contains a SignalTap file stp1.stp, which can be used to view the waveform on the DDR1 interface when the program is running. It triggers with error=1, so if there is no error in the read and write self-test, it will not be triggered. Because the project is reading DDR1 at any time, if you want to see the waveform on the DDR1 interface, just press the "Stop" button.
**Modify AXI4 burst length**: You can modify WBURST_LEN and RBURST_LEN in lines 82 and 83 of fpga_top.v to modify the write/read burst length during self-test. The self-test program only supports 2^n-1 This burst length, ie WBURST_LEN and RBURST_LEN must take values like 0, 1, 3, 7, 15, 31, ... (note that this is only a limitation of the self-test program I wrote, DDR1 controller supports 0 Any burst length between ~255.
I provide a UART read and write project based on the [FPGA+DDR1 board](#Hardware Design Example) I designed. In this project, you can read and write DDR1 with different burst lengths through UART commands. The project directory is [example-uart-read-write](./example-uart-read-write) , please open it with Quartus.
| example-uart-read-write/uart/uart2axi4.v | an AXI4 master, which can convert the commands received by UART RX into AXI4 read and write operations, and send the data read out by read operations through UART TX. see https://github.com/WangXuan95/Verilog-UART for detail. |
| example-uart-read-write/uart/uart_rx.v | called by uart2axi4.v |
| example-uart-read-write/uart/uart_tx.v | called by uart2axi4.v |
There is a CH340E chip (USB to UART) on the [FPGA+DDR1 board](#Hardware Design Example), so after plugging in the USB cable, you can see the serial port corresponding to the UART on the computer (you need to download and install the driver of CH341 on [www.wch. cn/product/CH340.html](http://www.wch.cn/product/CH340.html) first).
After the project is programed to FPGA, double-click to open a serial port tool **UartSession.exe** (it is in the [example-uart-read-write](./example-uart-read-write) directory), open the COM port corresponding to the board according to the prompt, and then type the following command + Enter, you can Write the 4 data 0x0123, 0x4567, 0x89ab, and 0xcdef into the starting address 0x12345. (A write operation with a burst length of 4 occurs on the AXI4 bus).
The effect is as shown in the figure below. The first 4 data (0123 4567 89ab cdef) are what we have written into DDR1, and the last 4 data are random data that comes with DDR1 after initialization.
The length of the write burst is how much data there is in the write command. For example, the burst length of the following write command is 10, and 10 pieces of data are written to the starting address 0x00000
The read command needs to specify the burst length. For example, the burst length of the following command is 30 (0x1e), and 30 data are read from the starting address 0x00000
The behavior of the simulation project is the same as that of the self-test program, axi_self_test_master.v, as the AXI4 host, writes regular data into DDR1, but not all of them, only the first 16KB of DDR1 (because the memory spave of the simulation model is limited), and then repeatedly read data round by round to compare whether there is unmatched data, if there is, generate a high level of one clock cycle on the `error` signal.
Before using iverilog for simulation, you need to install iverilog , see: [iverilog_usage](https://github.com/WangXuan95/WangXuan95/blob/main/iverilog_usage/iverilog_usage.md)
The default configuration parameters of the above simulation fit MT46V64M8, namely `ROW_BITS=13`, `COL_BITS=11`, and `DQ_BITS=8`. If you want to simulate other DDR1 chips, you need to modify them in tb_ddr_sdram_ctrl.v. For MICRON's DDR1 series, these parameters should be modified as follows: