mirror of
https://github.com/corundum/corundum.git
synced 2025-01-16 08:12:53 +08:00
Update readme
This commit is contained in:
parent
1bee717bc8
commit
52ba4c40e2
211
README.md
211
README.md
@ -4,32 +4,21 @@
|
|||||||
|
|
||||||
GitHub repository: https://github.com/corundum/corundum
|
GitHub repository: https://github.com/corundum/corundum
|
||||||
|
|
||||||
|
GitHub wiki: https://github.com/corundum/corundum/wiki
|
||||||
|
|
||||||
Google group: https://groups.google.com/d/forum/corundum-nic
|
Google group: https://groups.google.com/d/forum/corundum-nic
|
||||||
|
|
||||||
|
Slack workspace: https://join.slack.com/t/corundumworkspace/shared_invite/zt-tj5azsbm-V9LV8L7ugSRDBpe2JiPKMA
|
||||||
|
|
||||||
## Introduction
|
## Introduction
|
||||||
|
|
||||||
Corundum is an open-source, high-performance FPGA-based NIC. Features include
|
Corundum is an open-source, high-performance FPGA-based NIC. Features include a high performance datapath, 10G/25G/100G Ethernet, PCI express gen 3, a custom, high performance, tightly-integrated PCIe DMA engine, many (1000+) transmit, receive, completion, and event queues, scatter/gather DMA, MSI interrupts, multiple interfaces, multiple ports per interface, per-port transmit scheduling including high precision TDMA, flow hashing, RSS, checksum offloading, and native IEEE 1588 PTP timestamping. A Linux driver is included that integrates with the Linux networking stack. Development and debugging is facilitated by an extensive simulation framework that covers the entire system from a simulation model of the driver and PCI express interface on one side to the Ethernet interfaces on the other side.
|
||||||
a high performance datapath, 10G/25G/100G Ethernet, PCI express gen 3, a
|
|
||||||
custom, high performance, tightly-integrated PCIe DMA engine, many (1000+)
|
|
||||||
transmit, receive, completion, and event queues, scatter/gather DMA, MSI
|
|
||||||
interrupts, multiple interfaces, multiple ports per interface, per-port
|
|
||||||
transmit scheduling including high precision TDMA, flow hashing, RSS, checksum
|
|
||||||
offloading, and native IEEE 1588 PTP timestamping. A Linux driver is included
|
|
||||||
that integrates with the Linux networking stack. Development and debugging is
|
|
||||||
facilitated by an extensive simulation framework that covers the entire system
|
|
||||||
from a simulation model of the driver and PCI express interface on one side to
|
|
||||||
the Ethernet interfaces on the other side.
|
|
||||||
|
|
||||||
Corundum has several unique architectural features. First, transmit, receive,
|
Corundum has several unique architectural features. First, transmit, receive, completion, and event queue states are stored efficiently in block RAM or ultra RAM, enabling support for thousands of individually-controllable queues. These queues are associated with interfaces, and each interface can have multiple ports, each with its own independent scheduler. This enables extremely fine-grained control over packet transmission. Coupled with PTP time synchronization, this enables high precision TDMA.
|
||||||
completion, and event queue states are stored efficiently in block RAM or
|
|
||||||
ultra RAM, enabling support for thousands of individually-controllable
|
|
||||||
queues. These queues are associated with interfaces, and each interface can
|
|
||||||
have multiple ports, each with its own independent scheduler. This enables
|
|
||||||
extremely fine-grained control over packet transmission. Coupled with PTP time
|
|
||||||
synchronization, this enables high precision TDMA.
|
|
||||||
|
|
||||||
Corundum currently supports Xilinx Virtex 7, UltraScale, and UltraScale+ series
|
Corundum also provides an application section for implementing custom logic. The application section has a dedicated PCIe BAR for control and a number of interfaces that provide access to the core datapath and DMA infrastructure.
|
||||||
devices. Designs are included for the following FPGA boards:
|
|
||||||
|
Corundum currently supports Xilinx Virtex 7, UltraScale, and UltraScale+ series devices. Designs are included for the following FPGA boards:
|
||||||
|
|
||||||
* Alpha Data ADM-PCIE-9V3 (Xilinx Virtex UltraScale+ XCVU3P)
|
* Alpha Data ADM-PCIE-9V3 (Xilinx Virtex UltraScale+ XCVU3P)
|
||||||
* Exablaze ExaNIC X10 (Xilinx Kintex UltraScale XCKU035)
|
* Exablaze ExaNIC X10 (Xilinx Kintex UltraScale XCKU035)
|
||||||
@ -45,11 +34,9 @@ devices. Designs are included for the following FPGA boards:
|
|||||||
* Xilinx VCU1525 (Xilinx Virtex UltraScale+ XCVU9P)
|
* Xilinx VCU1525 (Xilinx Virtex UltraScale+ XCVU9P)
|
||||||
* Xilinx ZCU106 (Xilinx Zynq UltraScale+ XCZU7EV)
|
* Xilinx ZCU106 (Xilinx Zynq UltraScale+ XCZU7EV)
|
||||||
|
|
||||||
For operation at 10G and 25G, Corundum uses the open source 10G/25G MAC and
|
For operation at 10G and 25G, Corundum uses the open source 10G/25G MAC and PHY modules from the verilog-ethernet repository, no extra licenses are required. However, it is possible to use other MAC and/or PHY modules.
|
||||||
PHY modules from the verilog-ethernet repository, no extra licenses are
|
|
||||||
required. However, it is possible to use other MAC and/or PHY modules.
|
Operation at 100G on Xilinx UltraScale+ devices currently requires using the Xilinx CMAC core with RS-FEC enabled, which is covered by the free CMAC license.
|
||||||
Operation at 100G currently requires using the Xilinx CMAC core with RS-FEC
|
|
||||||
enabled, which is covered by the free CMAC license on Xilinx UltraScale+ parts.
|
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
@ -63,132 +50,177 @@ Block diagram of the Corundum NIC. PCIe HIP: PCIe hard IP core; AXIL M: AXI lite
|
|||||||
|
|
||||||
#### `cmac_pad` module
|
#### `cmac_pad` module
|
||||||
|
|
||||||
Frame pad module for 512 bit 100G CMAC TX interface. Zero pads transmit
|
Frame pad module for 512 bit 100G CMAC TX interface. Zero pads transmit frames to minimum 64 bytes.
|
||||||
frames to minimum 64 bytes.
|
|
||||||
|
|
||||||
#### `cpl_op_mux` module
|
#### `cpl_op_mux` module
|
||||||
|
|
||||||
Completion operation multiplexer module. Merges completion write operations
|
Completion operation multiplexer module. Merges completion write operations from different sources to enable sharing a single `cpl_write` module instance.
|
||||||
from different sources to enable sharing a single `cpl_write` module instance.
|
|
||||||
|
|
||||||
#### `cpl_queue_manager` module
|
#### `cpl_queue_manager` module
|
||||||
|
|
||||||
Completion queue manager module. Stores device to host queue state in block
|
Completion queue manager module. Stores device to host queue state in block RAM or ultra RAM.
|
||||||
RAM or ultra RAM.
|
|
||||||
|
|
||||||
#### `cpl_write` module
|
#### `cpl_write` module
|
||||||
|
|
||||||
Completion write module. Responsible for enqueuing completion and event
|
Completion write module. Responsible for enqueuing completion and event records into the completion queue managers and writing records into host memory via DMA.
|
||||||
records into the completion queue managers and writing records into host
|
|
||||||
memory via DMA.
|
|
||||||
|
|
||||||
#### `desc_fetch` module
|
#### `desc_fetch` module
|
||||||
|
|
||||||
Descriptor fetch module. Responsible for dequeuing descriptors from the queue
|
Descriptor fetch module. Responsible for dequeuing descriptors from the queue managers and reading descriptors from host memory via DMA.
|
||||||
managers and reading descriptors from host memory via DMA.
|
|
||||||
|
|
||||||
#### `desc_op_mux` module
|
#### `desc_op_mux` module
|
||||||
|
|
||||||
Descriptor operation multiplexer module. Merges descriptor fetch operations
|
Descriptor operation multiplexer module. Merges descriptor fetch operations from different sources to enable sharing a single `desc_fetch` module instance.
|
||||||
from different sources to enable sharing a single `desc_fetch` module instance.
|
|
||||||
|
|
||||||
#### `event_mux` module
|
#### `event_mux` module
|
||||||
|
|
||||||
Event mux module. Enables multiple event sources to feed the same event queue.
|
Event mux module. Enables multiple event sources to feed the same event queue.
|
||||||
|
|
||||||
|
#### `mqnic_core` module
|
||||||
|
|
||||||
|
Core module. Contains the interfaces, asynchronous FIFOs, PTP subsystem, statistics collection subsystem, and application block.
|
||||||
|
|
||||||
|
#### `mqnic_core_pcie` module
|
||||||
|
|
||||||
|
Core module for a PCIe host interface. Wraps `mqnic_core` along with generic PCIe interface components, including DMA engine and AXI lite masters.
|
||||||
|
|
||||||
|
#### `mqnic_core_pcie_us` module
|
||||||
|
|
||||||
|
Core module for a PCIe host interface on Xilinx 7-series, UltraScale, and UltraScale+. Wraps `mqnic_core_pcie` along with FPGA-specific interface logic.
|
||||||
|
|
||||||
#### `mqnic_interface` module
|
#### `mqnic_interface` module
|
||||||
|
|
||||||
Interface module. Contains the event queues, interface queues, and ports.
|
Interface module. Contains the event queues, interface queues, and ports.
|
||||||
|
|
||||||
#### `mqnic_port` module
|
#### `mqnic_port` module
|
||||||
|
|
||||||
Port module. Contains the transmit and receive datapath components, including
|
Port module. Contains the transmit and receive datapath components, including transmit and receive engines and checksum and hash offloading.
|
||||||
transmit and receive engines and checksum and hash offloading.
|
|
||||||
|
#### `mqnic_ptp` module
|
||||||
|
|
||||||
|
PTP subsystem. Contains one `mqnic_ptp_clock` instance and a parametrizable number of `mqnic_ptp_perout` instances.
|
||||||
|
|
||||||
|
#### `mqnic_ptp_clock` module
|
||||||
|
|
||||||
|
PTP clock module. Contains an instance of `ptp_clock` with a register interface.
|
||||||
|
|
||||||
|
#### `mqnic_ptp_perout` module
|
||||||
|
|
||||||
|
PTP period output module. Contains an instance of `ptp_perout` with a register interface.
|
||||||
|
|
||||||
|
#### `mqnic_tx_scheduler_block_rr` module
|
||||||
|
|
||||||
|
Transmit scheduler block with round-robin transmit scheduler and register interface.
|
||||||
|
|
||||||
|
#### `mqnic_tx_scheduler_block_rr_tdma` module
|
||||||
|
|
||||||
|
Transmit scheduler block with round-robin transmit scheduler, TDMA scheduler, TDMA scheduler controller, and register interface.
|
||||||
|
|
||||||
#### `queue_manager` module
|
#### `queue_manager` module
|
||||||
|
|
||||||
Queue manager module. Stores host to device queue state in block RAM or ultra
|
Queue manager module. Stores host to device queue state in block RAM or ultra RAM.
|
||||||
RAM.
|
|
||||||
|
|
||||||
#### `rx_checksum` module
|
#### `rx_checksum` module
|
||||||
|
|
||||||
Receive checksum computation module. Computes 16 bit checksum of Ethernet
|
Receive checksum computation module. Computes 16 bit checksum of Ethernet frame payload to aid in IP checksum offloading.
|
||||||
frame payload to aid in IP checksum offloading.
|
|
||||||
|
|
||||||
#### `rx_engine` module
|
#### `rx_engine` module
|
||||||
|
|
||||||
Receive engine. Manages receive datapath operations including descriptor
|
Receive engine. Manages receive datapath operations including descriptor dequeue and fetch via DMA, packet reception, data writeback via DMA, and completion enqueue and writeback via DMA. Handles PTP timestamps for inclusion in completion records.
|
||||||
dequeue and fetch via DMA, packet reception, data writeback via DMA, and
|
|
||||||
completion enqueue and writeback via DMA. Handles PTP timestamps for
|
|
||||||
inclusion in completion records.
|
|
||||||
|
|
||||||
#### `rx_hash` module
|
#### `rx_hash` module
|
||||||
|
|
||||||
Receive hash computation module. Extracts IP addresses and ports from packet
|
Receive hash computation module. Extracts IP addresses and ports from packet headers and computes 32 bit Toeplitz flow hash.
|
||||||
headers and computes 32 bit Toeplitz flow hash.
|
|
||||||
|
#### `stats_collect` module
|
||||||
|
|
||||||
|
Statistics collector module. Parametrizable number of increment inputs, single AXI stream output for accumulated counts.
|
||||||
|
|
||||||
|
#### `stats_counter` module
|
||||||
|
|
||||||
|
Statistics counter module. Receives increments over AXI stream and accumulates them in block RAM, which is accessible via AXI lite.
|
||||||
|
|
||||||
|
#### `stats_dma_if_pcie` module
|
||||||
|
|
||||||
|
Collects DMA-related statistics for `dma_if_pcie` module, including operation latency.
|
||||||
|
|
||||||
|
#### `stats_dma_if_latency` module
|
||||||
|
|
||||||
|
DMA latency measurement module.
|
||||||
|
|
||||||
|
#### `stats_pcie_if` module
|
||||||
|
|
||||||
|
Collects TLP-level statistics for the generic PCIe interface.
|
||||||
|
|
||||||
|
#### `stats_pcie_tlp` module
|
||||||
|
|
||||||
|
Extracts TLP-level statistics for the generic PCIe interface (single channel).
|
||||||
|
|
||||||
#### `tdma_ber_ch` module
|
#### `tdma_ber_ch` module
|
||||||
|
|
||||||
TDMA bit error ratio (BER) test channel module. Controls PRBS logic in
|
TDMA bit error ratio (BER) test channel module. Controls PRBS logic in Ethernet PHY and accumulates bit errors. Can be configured to bin error counts by TDMA timeslot.
|
||||||
Ethernet PHY and accumulates bit errors. Can be configured to bin error
|
|
||||||
counts by TDMA timeslot.
|
|
||||||
|
|
||||||
#### `tdma_ber` module
|
#### `tdma_ber` module
|
||||||
|
|
||||||
TDMA bit error ratio (BER) test module. Wrapper for a tdma_scheduler and
|
TDMA bit error ratio (BER) test module. Wrapper for a tdma_scheduler and multiple instances of `tdma_ber_ch`.
|
||||||
multiple instances of `tdma_ber_ch`.
|
|
||||||
|
|
||||||
#### `tdma_scheduler` module
|
#### `tdma_scheduler` module
|
||||||
|
|
||||||
TDMA scheduler module. Generates TDMA timeslot index and timing signals from
|
TDMA scheduler module. Generates TDMA timeslot index and timing signals from PTP time.
|
||||||
PTP time.
|
|
||||||
|
|
||||||
#### `tx_checksum` module
|
#### `tx_checksum` module
|
||||||
|
|
||||||
Transmit checksum computation and insertion module. Computes 16 bit checksum
|
Transmit checksum computation and insertion module. Computes 16 bit checksum of frame data with specified start offset, then inserts computed checksum at the specified position.
|
||||||
of frame data with specified start offset, then inserts computed checksum at
|
|
||||||
the specified position.
|
|
||||||
|
|
||||||
#### `tx_engine` module
|
#### `tx_engine` module
|
||||||
|
|
||||||
Transmit engine. Manages transmit datapath operations including descriptor
|
Transmit engine. Manages transmit datapath operations including descriptor dequeue and fetch via DMA, packet data fetch via DMA, packet transmission, and completion enqueue and writeback via DMA. Handles PTP timestamps for inclusion in completion records.
|
||||||
dequeue and fetch via DMA, packet data fetch via DMA, packet transmission, and
|
|
||||||
completion enqueue and writeback via DMA. Handles PTP timestamps for
|
|
||||||
inclusion in completion records.
|
|
||||||
|
|
||||||
#### `tx_scheduler_ctrl_tdma` module
|
#### `tx_scheduler_ctrl_tdma` module
|
||||||
|
|
||||||
TDMA transmit scheduler control module. Controls queues in a transmit
|
TDMA transmit scheduler control module. Controls queues in a transmit scheduler based on PTP time, via a `tdma_scheduler` instance.
|
||||||
scheduler based on PTP time, via a `tdma_scheduler` instance.
|
|
||||||
|
|
||||||
#### `tx_scheduler_rr` module
|
#### `tx_scheduler_rr` module
|
||||||
|
|
||||||
Round-robin transmit scheduler. Determines which queues from which to send
|
Round-robin transmit scheduler. Determines which queues from which to send packets.
|
||||||
packets.
|
|
||||||
|
|
||||||
### Source Files
|
### Source Files
|
||||||
|
|
||||||
cmac_pad.v : Pad frames to 64 bytes for CMAC TX
|
cmac_pad.v : Pad frames to 64 bytes for CMAC TX
|
||||||
cpl_op_mux.v : Completion operation mux
|
cpl_op_mux.v : Completion operation mux
|
||||||
cpl_queue_manager.v : Completion queue manager
|
cpl_queue_manager.v : Completion queue manager
|
||||||
cpl_write.v : Completion write module
|
cpl_write.v : Completion write module
|
||||||
desc_fetch.v : Descriptor fetch module
|
desc_fetch.v : Descriptor fetch module
|
||||||
desc_op_mux.v : Descriptor operation mux
|
desc_op_mux.v : Descriptor operation mux
|
||||||
event_mux.v : Event mux
|
event_mux.v : Event mux
|
||||||
event_queue.v : Event queue
|
event_queue.v : Event queue
|
||||||
mqnic_interface.v : Interface
|
mqnic_core.v : Core logic
|
||||||
mqnic_port.v : Port
|
mqnic_core_pcie.v : Core logic for PCIe
|
||||||
queue_manager.v : Queue manager
|
mqnic_core_pcie_us.v : Core logic for PCIe (UltraScale)
|
||||||
rx_checksum.v : Receive checksum offload
|
mqnic_interface.v : Interface
|
||||||
rx_engine.v : Receive engine
|
mqnic_port.v : Port
|
||||||
rx_hash.v : Receive hashing module
|
mqnic_ptp.v : PTP subsystem
|
||||||
tdma_ber_ch.v : TDMA BER channel
|
mqnic_ptp_clock.v : PTP clock wrapper
|
||||||
tdma_ber.v : TDMA BER
|
mqnic_ptp_perout.v : PTP period output wrapper
|
||||||
tdma_scheduler.v : TDMA scheduler
|
mqnic_tx_scheduler_block_rr.v : Scheduler block (round-robin)
|
||||||
tx_checksum.v : Transmit checksum offload
|
mqnic_tx_scheduler_block_rr_tdma.v : Scheduler block (round-robin TDMA)
|
||||||
tx_engine.v : Transmit engine
|
queue_manager.v : Queue manager
|
||||||
tx_scheduler_ctrl_tdma.v : TDMA transmit scheduler controller
|
rx_checksum.v : Receive checksum offload
|
||||||
tx_scheduler_rr.v : Round robin transmit scheduler
|
rx_engine.v : Receive engine
|
||||||
|
rx_hash.v : Receive hashing module
|
||||||
|
stats_collect.v : Statistics collector
|
||||||
|
stats_counter.v : Statistics counter
|
||||||
|
stats_dma_if_pcie.v : DMA interface statistics
|
||||||
|
stats_dma_latency.v : DMA latency measurement
|
||||||
|
stats_pcie_if.v : PCIe interface statistics
|
||||||
|
stats_pcie_tlp.v : PCIe TLP statistics
|
||||||
|
tdma_ber_ch.v : TDMA BER channel
|
||||||
|
tdma_ber.v : TDMA BER
|
||||||
|
tdma_scheduler.v : TDMA scheduler
|
||||||
|
tx_checksum.v : Transmit checksum offload
|
||||||
|
tx_engine.v : Transmit engine
|
||||||
|
tx_scheduler_ctrl_tdma.v : TDMA transmit scheduler controller
|
||||||
|
tx_scheduler_rr.v : Round robin transmit scheduler
|
||||||
|
|
||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
@ -201,6 +233,7 @@ Running the included testbenches requires [cocotb](https://github.com/cocotb/coc
|
|||||||
- J. A. Forencich, *System-Level Considerations for Optical Switching in Data Center Networks*, [Paper](https://escholarship.org/uc/item/3mc9070t)
|
- J. A. Forencich, *System-Level Considerations for Optical Switching in Data Center Networks*, [Paper](https://escholarship.org/uc/item/3mc9070t)
|
||||||
|
|
||||||
## Citation
|
## Citation
|
||||||
|
|
||||||
If you use Corundum in your project please cite one of the following papers
|
If you use Corundum in your project please cite one of the following papers
|
||||||
and/or link to the github project:
|
and/or link to the github project:
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user