- Removed the cfgif block, too confusing. There is a good lesson here. Probably the n'th time I that I have been overzealous about reuse. When you end up adding a parameter to a block that duplicates the logic 2X it's always better to create two separate blocks...
- Changed the register access interface to packet format
- Change the priority on the etx_arbiter to pick read responses first
- Removed redundant signals
- Took away the read resonse bypass on remap in tx for now..
- Removed defparams (convention)
- Unified wait signal on tx
- Fixed cfg wait
-
- After finding the bug in the reference model and wasting countless hours going back and forth with FPGA timing optimization and bug tweaks, I realized that the design was fundementally broken. The decision to use two clock domains (high speed) and low speed was correct from the beginning. The FPGA is dreadfully slow, (you definitely don't want to do much logic at 300MHz...), but the handoff between tclk and tclk_div4 was too complicated. The puzzle of having to respond to wait quickly, covering the corner cases, and meeting timing was just too ugly.
- The "new" design goes back to the method of using the high speed logic only for doing a "dumb" parallel to serial converter and preparing all the necessary signals in the low speed domain.
- This feel A LOT cleaner and the it already passes basic tests with the chip reference and the loopback after less than 3 hours of redesign work!
- The TX meets timing but there is still some work to do with wait pushback testing.
- Clearing the "done" register with tx_burst. Kind of makes sense logically since while we are in burst mode we are not done.
- Still not 100% happy with this circuit, but there arent' a lot of lines of code left...
- But elink now passes 500 random burst transactions!!!
- Adding transaction counter to speed up debugging
- Clearing access signal on wait ("bubble")
- Adding back special propagation when there is a wait after io_wait.
-Solved a speed path in synchronizing the wait signal, had to use the first edge signal fo the IO and the lclk_div4 for the core logic. It seems that the FPGA has a really hard time mixing clock domains, the routing delay between domains explodes
-Put in some special case logic for edge cases, like when there is a wait coming in from the IO and there is a wait from the IO. In that case, the packet gets sampled by the IO and not by the current logic.
-This needs to be cleaned up eventually, not clean enough but it's good enough for now.
- The burst signal was going fro lclk_div4 domain straight into the io high speed domain. There is quite a bit of logic on this signal. Instead of starting with false paths or multi cycle paths with firstedge, I changed the pipeline.
- The logic was a mess, causing me to go around in circles for days. In the end, by adding a missing sync circuit (duh!) between the fast and slow clock to align the edges and removing a redundant pipeline stage ("double") the nasty logic just fell away. Looks good now.
-Write bursts mostly works and design looks clean.
-one bug left to fix on streams of writes...
- The burst signal needs to be pipelined like everything else (0th order..)
- Don't look at write signal when pushing back wait...WILL GO BACK AND REVISIT THIS ONE LATER.
- Yeah, burst write test now passes!!!!
- Old design was not workable with bursting and long waits. The wait signal needs to be very carfully handled since it's asynchronous to the clock.
-The TX needs to be stopped quickly so the sync needs to be done at the high speed clock, not at div4 clock
-Since there are synchronizers here, there should be only one point of sync. This is not completely the case still, but I think??? it should be safe by constructiona at this point.
-bursting working at this point for writes!!!!!
- Using the BUFIO makes another clock domain....FPGAs apparently hate clock domain crossings, avoid them at all cost.
- Now moving back to having on high speed clock domain for logic and DDR blocks, take care of IO alignment in software for TX and RX
- Also, fixed the io_wait path with logic...not sure what I was thinking there. Logic was trivial. The way it was,the io path was going straight into the FIFO as a wait.
- Using new packet interface
- Adding active signal, indicating that link is ready. This way you don't need to guess when the link is ready (no magic constants)
- Removed register on por reset input to get rid of x on startup.
- holding rx in reset state until tx is done
- removing reset from all pipeline registers
- removing reset from oddr/iddr
- the idea is to keep things quiet not to block in lots of places. The only real block needed is in the FIFO to keep "noise" from propagating past the link. The link should be kept in a safe reset state until the rx fram is stable and the clock is running so that the pipe can be cleaned out.
-Removing the wait signal from the pipeline
-Assumption is that the prog_full is used on fifo, allowing two entries
to be captured in fifo.
-May revisit this at some time...
-In the default mode we now have 7 input clocks to basic elink
-This is too many, need to simplify, not reasonable!
-But with all the knobs on the MMCM, performance will be great...
-WIP on bursting...
-Separate waits for rd/wr wait
-Adding wait to protocol block as well
-io_wait always goes through
-using active frame signal to select/clear data for output
-pin driven testmode driven..b/c fpga designers often don't like software
-and because it's really convenient, press a push button and see a pattern appear
-removed protocol description, goes in README.md, there should only be one source for documentation
-shortened signal names for ecfg
-changed to "clk" input now that everything is single clock
-added register read/write properly
-removed redundant wrapper layers in maxi/saxi
-changed over to "emesh" interface from packet 103 bit data
-cleaned up maxi
-cleaned up saxi
-removed redundant signals in elink interface (user,lock,..)
-added wrapper to fifo (to carry emesh interface through)
Now comes the fun part of testing