- After finding the bug in the reference model and wasting countless hours going back and forth with FPGA timing optimization and bug tweaks, I realized that the design was fundementally broken. The decision to use two clock domains (high speed) and low speed was correct from the beginning. The FPGA is dreadfully slow, (you definitely don't want to do much logic at 300MHz...), but the handoff between tclk and tclk_div4 was too complicated. The puzzle of having to respond to wait quickly, covering the corner cases, and meeting timing was just too ugly.
- The "new" design goes back to the method of using the high speed logic only for doing a "dumb" parallel to serial converter and preparing all the necessary signals in the low speed domain.
- This feel A LOT cleaner and the it already passes basic tests with the chip reference and the loopback after less than 3 hours of redesign work!
- The TX meets timing but there is still some work to do with wait pushback testing.
- The logic was a mess, causing me to go around in circles for days. In the end, by adding a missing sync circuit (duh!) between the fast and slow clock to align the edges and removing a redundant pipeline stage ("double") the nasty logic just fell away. Looks good now.
-Write bursts mostly works and design looks clean.
-one bug left to fix on streams of writes...
- Using the BUFIO makes another clock domain....FPGAs apparently hate clock domain crossings, avoid them at all cost.
- Now moving back to having on high speed clock domain for logic and DDR blocks, take care of IO alignment in software for TX and RX
- Also, fixed the io_wait path with logic...not sure what I was thinking there. Logic was trivial. The way it was,the io path was going straight into the FIFO as a wait.
- Using new packet interface
- Adding active signal, indicating that link is ready. This way you don't need to guess when the link is ready (no magic constants)
- Removed register on por reset input to get rid of x on startup.
- holding rx in reset state until tx is done
- removing reset from all pipeline registers
- removing reset from oddr/iddr
- the idea is to keep things quiet not to block in lots of places. The only real block needed is in the FIFO to keep "noise" from propagating past the link. The link should be kept in a safe reset state until the rx fram is stable and the clock is running so that the pipe can be cleaned out.
- more modular
- two bits cominng from sys_clk elink config domain
- drives the tx and rx from top level elink
- from software you would probably write 2'b11 to reset both at same time
* Seems like a useless feature. Why autogenerate the transactions at the transmit side. This should always be done at the receive side to minimize bits moving across the link. Can't really see a use for it anymore so I am removing it.
* If you want to hack the design to reduce latency, you can always grab the raw etx_core and drive signals directly through write port.
* May consider adding a fourth port to etx to allow bypassing the link interfac?
* Add an ifdef to bypass the fifos?
-In the default mode we now have 7 input clocks to basic elink
-This is too many, need to simplify, not reasonable!
-But with all the knobs on the MMCM, performance will be great...
-WIP on bursting...
-There is now only one clock domain crossing involved with the reg files/ All clock domain crossings use the same 104 bit wide async fifo! If it's broken we are screwed, if it works we are perfect!
-Configuration can be done from host through txwr/txrd path of any register
-The RX IO pins can only access the RX side of the design
-Access without symmetry was awkward, now we can reach regs from TX or RX side
-Removes a special path for mailbox (came for free)
-At the same time reduced clock complexity (one clock for system!!)
-Moved mailbox to top level
-Changed main clock to "sys_clk" for all
-A read on transmit side initiates timer
-Timer is reset when read request comes back
(assumption last read requests starts timer..last return stops it)
-pll bypass for clocks (customer request)
-adding dividers on all clocks (tx/cclk)
-adding reset block (clearer)
-using commong clock_divider block
-need to clean up divider block later today, slightly broken:-)
-added register read/write properly
-removed redundant wrapper layers in maxi/saxi
-changed over to "emesh" interface from packet 103 bit data
-cleaned up maxi
-cleaned up saxi
-removed redundant signals in elink interface (user,lock,..)
-added wrapper to fifo (to carry emesh interface through)
Now comes the fun part of testing
Our "standard packet" order should be followed everywhere to ease verification and integration (standards are good fir reuse...):
[0]=access
[1]=write
[3:2]=datamode
[7:4]=ctrlmode
[39:8]=dstaddr
[71:40]=data
[103:72]=upper-data (or srcaddr)