-The current testbench has a big pause between frames, whereas the chip might push out back to back frames with only a single cycle pause between frames. It seems possible that the old logic would have been a problem, since there two incorrect states that took 2 cycles to settle. This would not have been a problem with bursting or frames with many nops between. Let's see....
-The correct way to verify this is to 1.) Improve TX to make performance as good as on the chip (less stalls) 2.) Create a testbench witht the chip reference code.
-In the meantime, we compile and pray...
- reduced frame fanout, removed clock gater in erx_io (improves speed path)
- driving constants on "wid signals" (proper)
- making lock signal 1 bit wide to remove warning
- moved backed to BUFIO for IDDR blocks
- Using the BUFIO makes another clock domain....FPGAs apparently hate clock domain crossings, avoid them at all cost.
- Now moving back to having on high speed clock domain for logic and DDR blocks, take care of IO alignment in software for TX and RX
- Also, fixed the io_wait path with logic...not sure what I was thinking there. Logic was trivial. The way it was,the io path was going straight into the FIFO as a wait.
- reset was broken!
- need to account for wait
- merging read/write wait for simplicity, otherwise you would need to reset the packets to figure out if it's a read or write transaction...and I don't want to reset every packet throughout the pipe.
- holding rx in reset state until tx is done
- removing reset from all pipeline registers
- removing reset from oddr/iddr
- the idea is to keep things quiet not to block in lots of places. The only real block needed is in the FIFO to keep "noise" from propagating past the link. The link should be kept in a safe reset state until the rx fram is stable and the clock is running so that the pipe can be cleaned out.
- Making all resets async since we cannot guarantee that we have a clock coming in from RX. This is needed due to the way we use a PLL for alignment. If we would have used a free running local clock this would have been different, but this would have required a FIFO for synchronization betwen the rx and rxdiv4 clock.
- Moving the clock block into the RX for modularity
- Making a specil rx soft reset (driven from sys_clk domain)
- Still there is a POR_reset so the link should wake up ok
- This is DEFINITELY the way to do things, sweep the delays and find the right value. No f'ing way to get these stupid FPGAs to work otherwise with the ridiculuosly over margined PVT nubmers they are running through the STAs. I understand they want to make the design bullet proof, but as a result designers are wasting countless hours overoptimzinng designs and being clever. So much performance is left on the table for expert users.
- Lesson: I/O design should be "self syncrhonizing". Only contraints in the design should be create_clk
- Made RX clock async, too tricky to guarantee that there clock is there. No way to do this if the clock sources are actually independent for RX/TX!
-When a read response is detected, there should be no spurious transactions to the RD/WR request fifos.
-Move the "filter" backt to the erx_protocol block
-Removed the remap bypass signal (was hacky)
-Passes simulations again..
-In the default mode we now have 7 input clocks to basic elink
-This is too many, need to simplify, not reasonable!
-But with all the knobs on the MMCM, performance will be great...
-WIP on bursting...
-moving to "real" Xilinx PLL instantiation
-one PLL for CCLK one for LCLK
-removing clock dividers, can't work at speed, put inside model
-configuration needs to be done differently
-removing pll_bypass signal, can't work with logic
-clocks should be done with hard macro primitives (no logic)
-Removing enable from ISERDES, not healthy
-Moving all logic to protocol block. (this is an IO block)
-Removing tow redundant pipeline stages (check this??)