release v1.0

2025-01-28 21:12:53 +08:00 · 2023-03-03 20:48:37 +08:00 · 2023-03-03 20:48:37 +08:00 · a03807ee47
commit a03807ee47
15 changed files with 2994 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
+**/vivado
+**/quartus
--- a/README.md
+++ b/README.md
@ -0,0 +1,296 @@
+![语言](https://img.shields.io/badge/语言-systemverilog_(IEEE1800_2005)-CAD09D.svg) ![仿真](https://img.shields.io/badge/仿真-iverilog-green.svg) ![部署](https://img.shields.io/badge/部署-quartus-blue.svg) ![部署](https://img.shields.io/badge/部署-vivado-FF1010.svg)
+
+中文 | [English](#en)
+
+FPGA JPEG-LS image compressor
+===========================
+
+基于 **FPGA** 的流式的 **JPEG-LS** 图象压缩器，特点是：
+
+* 用于压缩 **8bit** 的灰度图像。
+* 可选**无损模式**，即 NEAR=0 。
+* 可选**有损模式**，NEAR=1~7 可调。
+* 图像宽度取值范围为 [5,16384]，高度取值范围为 [1,16384]。
+* 极简流式输入输出。
+
+
+
+# 背景知识
+
+**JPEG-LS** （简称**JLS**）是一种无损/有损的图像压缩算法，其无损模式的压缩率相当优异，优于 Lossless-JPEG、Lossless-JPEG2000、Lossless-JPEG-XR、FELICES 等。**JPEG-LS** 用压缩前后的像素的最大差值（**NEAR**值）来控制失真，无损模式下 **NEAR=0**；有损模式下**NEAR>0**，**NEAR** 越大，失真越大，压缩率也越大。**JPEG-LS** 压缩图像的文件后缀是 .**jls** 。
+
+
+
+# 使用方法
+
+RTL 目录中的 [**jls_encoder.sv**](./RTL/jls_encoder.sv) 是用户可以调用的 JPEG-LS 压缩模块，它输入图像原始像素，输出 JPEG-LS 压缩流。
+
+## 模块参数
+
+**jls_encoder** 只有一个参数：
+
+```verilog
+parameter logic [2:0] NEAR
+```
+
+决定了 **NEAR** 值，取值为 3'd0 时，工作在无损模式；取值为  3'd1~3'd7 时，工作在有损模式。
+
+## 模块信号
+
+**jls_encoder** 的输入输出信号描述如下表。
+
+| 信号名称 | 全称 | 方向 | 宽度 | 描述 |
+| :---: | :---: | :---: | :---: | :--- |
+| rstn | 同步复位 | input | 1bit | 当时钟上升沿时若 rstn=0，模块复位，正常使用时 rstn=1 |
+| clk | 时钟 | input | 1bit | 时钟，所有信号都应该于 clk 上升沿对齐。 |
+| i_sof | 图像开始 | input | 1bit | 当需要输入一个新的图像时，保持至少368个时钟周期的 i_sof=1 |
+| i_w | 图像宽度-1 | input | 14bit | 例如图像宽度为 1920，则 i_w 应该置为 14‘d1919。需要在 i_sof=1 时保持有效。 |
+| i_h | 图像高度-1 | input | 14bit | 例如图像宽度为 1080，则 i_h 应该置为 14‘d1079。需要在 i_sof=1 时保持有效。 |
+| i_e | 输入像素有效 | input | 1bit | 当 i_e=1 时，一个像素需要被输入到 i_x 上。 |
+| i_x | 输入像素    | input | 8bit | 像素取值范围为 8'd0 ~ 8'd255 。 |
+| o_e | 输出有效    | output | 1bit | 当 o_e=1 时，输出流数据产生在 o_data 上。 |
+| o_data | 输出流数据 | output | 16bit | 大端序，o_data[15:8] 在先；o_data[7:0] 在后。 |
+| o_last | 输出流末尾 | output | 1bit | 当 o_e=1 时若 o_last=1 ，说明这是一张图象的输出流的最后一个数据。 |
+
+> 注：i_w 不能小于 14'd4 。
+
+## 输入图片
+
+**jls_encoder 模块**的操作的流程是：
+
+1. **复位**（可选）：令 rstn=0 至少 **1 个周期**进行复位，之后正常工作时都保持 rstn=1。实际上也可以不复位（即让 rstn 恒为1）。
+2. **开始**：保持 i_sof=1 **至少 368 个周期**，同时在 i_w 和 i_h 信号上输入图像的宽度和高度，i_sof=1 期间 i_w 和 i_h 要一直保持有效。
+3. **输入**：控制 i_e 和 i_x，从左到右，从上到下地输入该图像的所有像素。当 i_e=1 时，i_x 作为一个像素被输入。
+4. **图像间空闲**：所有像素输入结束后，需要空闲**至少 16 个周期**不做任何动作（即 i_sof=0，i_e=0）。然后才能跳到第2步，开始下一个图像。
+
+i_sof=1 和 i_e=1 之间；以及 i_e=1 各自之间可以插入任意个空闲气泡（即， i_sof=0，i_e=0），这意味着我们可以断断续续地输入像素（当然，不插入任何气泡才能达到最高性能）。
+
+下图展示了压缩 2 张图像的输入时序图（//代表省略若干周期，X代表don't care）。其中图像 1 在输入第一个像素后插入了 1 个气泡；而图像 2 在 i_sof=1 后插入了 1 个气泡。注意**图像间空闲**必须至少 **16 个周期**。
+
+               __    __//  __    __    __    __   //_    __    //    __    __//  __    __    __    //    __
+    clk    \__/  \__/  //_/  \__/  \__/  \__/  \__// \__/  \__///\__/  \__/  //_/  \__/  \__/  \__///\__/  \_
+                _______//________                 //           //     _______//________            //
+    i_sof  ____/       //        \________________//___________//____/       //        \___________//________
+                _______//________                 //           //     _______//________            //
+    i_w    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
+                _______//________                 //           //     _______//________            //
+    i_h    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
+                       //         _____       ____//_____      //            //               _____//____
+    i_e    ____________//________/     \_____/    //     \_____//____________//______________/     //    \___
+                       //         _____       ____//_____      //            //               _____//____
+    i_x    XXXXXXXXXXXX//XXXXXXXXX_____XXXXXXX____//_____XXXXXX//XXXXXXXXXXXX//XXXXXXXXXXXXXXX_____//____XXXX
+    
+    阶段：      |    开始图像1     |        输入图像1       | 图像间空闲  |    开始图像2      |       输入图像2       
+
+## 输出压缩流
+
+在输入过程中，**jls_encoder** 同时会输出压缩好的 **JPEG-LS流**，该流构成了完整的 .jls 文件的内容（包括文件头部和尾部）。o_e=1 时，o_data 是一个有效输出数据。其中，o_data 遵循大端序，即 o_data[15:8] 在流中的位置靠前，o_data[7:0] 在流中的位置靠后。在每个图像的输出流遇到最后一个数据时，o_last=1 指示一张图像的压缩流结束。
+
+
+
+# 仿真
+
+仿真相关文件都在 SIM 目录里，包括：
+
+* tb_jls_encoder.sv 是针对 jls_encoder 的 testbench。行为是：将指定文件夹里的 .pgm 格式的未压缩图像批量送入 jls_encoder 进行压缩，然后将 jls_encoder 的输出结果保存到 .jls 文件里。
+* tb_jls_encoder_run_iverilog.bat 包含了执行 iverilog 仿真的命令。
+* images 文件夹包含几张 .pgm 格式的图像文件。 .pgm 格式存储的是未压缩（也就是存储原始像素）的 8bit 灰度图像，可以使用 photoshop 软件或 Linux 图像查看器就能打开它（Windows图像查看器查看不了它）。
+
+> .pgm 文件格式非常简单，只有一个文件头来指示图像的长宽，然后紧接着就存放图像的所有原始像素。因此我选用 .pgm 文件作为仿真的输入文件，因为只需要在 testbench 中简单地编写一些代码就能解析 .pgm 文件，并把其中的像素取出发给 jls_encoder 。不过，你可以不关注 pgm 文件的格式，因为 jls_encoder 的工作与 pgm 格式并没有关系，它只需要接受图像的原始像素作为输入即可。你只需关注仿真的波形，关注图像像素是如何被送入 jls_encoder 中即可。
+
+使用 iverilog 进行仿真前，需要安装 iverilog ，见：[iverilog_usage](https://github.com/WangXuan95/WangXuan95/blob/main/iverilog_usage/iverilog_usage.md)
+
+然后双击 tb_jls_encoder_run_iverilog.bat 就可以运行仿真，该仿真需要运行十几分钟。
+
+仿真结束后，你可以看到文件夹中产生了几个 .jls 文件，它们就是压缩得到的图像文件。另外，仿真还产生了波形文件 dump.vcd ，你可以用 gtkwave 打开 dump.vcd 来查看波形。
+
+另外，你还可以修改一些仿真参数来进行：
+
+- 修改 tb_jls_encoder.sv 里的宏名 **NEAR** 来改变压缩率。
+- 修改 tb_jls_encoder.sv 里的宏名 **BUBBLE_CONTROL** 来决定输入相邻的像素间插入多少个气泡：
+  - **BUBBLE_CONTROL=0** 时，不插入任何气泡。
+  - **BUBBLE_CONTROL>0** 时，插入 **BUBBLE_CONTROL **个气泡。
+  - **BUBBLE_CONTROL<0** 时，每次插入随机的 **0~(-BUBBLE_CONTROL)** 个气泡
+
+> 在不同 NEAR 值和 BUBBLE_CONTROL 值下，本库已经经过了几百张照片的结果对比验证，充分保证无bug。（这部分自动化验证代码就没放上来了）
+
+## 查看压缩结果
+
+因为 **JPEG-LS** 比较小众和专业，大多数图片查看软件无法查看 .jls 文件。
+
+你可以试试用[该网站](https://filext.com/file-extension/JLS)来查看 .jls 文件（不过这个网站时常失效）。
+
+如果该网站失效，可以用我提供的解压器 decoder.exe 来把它解压回 .pgm 文件再查看。请在 SIM 目录下用 CMD 运行命令：
+
+```powershell
+.\decoder.exe <JLS_FILE_NAME> <PGM_FILE_NAME>
+```
+
+例如：
+
+```powershell
+.\decoder.exe test000.jls tmp.pgm
+```
+
+> 注：decoder.exe 编译自 UBC 提供的 C 语言源码： http://www.stat.columbia.edu/~jakulin/jpeg-ls/mirror.htm
+
+
+
+
+
+# FPGA 部署
+
+在 Xilinx Artix-7 xc7a35tcsg324-2 上，综合和实现的结果如下。
+
+|    LUT     |    FF    |              BRAM              | 最高时钟频率 |
+| :--------: | :------: | :----------------------------: | :----------: |
+| 2347 (11%) | 932 (2%) | 9个RAMB18 (9%)，等效于 144Kbit |    35 MHz    |
+
+35MHz 下，图像压缩的性能为 35 Mpixel/s ，对 1920x1080 图像的压缩帧率是 16.8fps 。
+
+
+
+<span id="en">FPGA JPEG-LS image compressor</span>
+===========================
+
+**FPGA** based streaming **JPEG-LS** image compressor, features:
+
+* For compressing **8bit** grayscale images.
+* Support **lossless mode**, i.e. NEAR=0 .
+* Support **lossy mode**, NEAR=1~7 adjustable.
+* The value range of image width is [5,16384], and the value range of height is [1,16384].
+* Minimalist streaming input and output.
+
+
+
+# Background
+
+**JPEG-LS** (abbreviated as **JLS**) is a lossless/lossy image compression algorithm which has the best lossless compression ratio compared to JPEG2000 and JPEG-XR. **JPEG-LS** uses the maximum difference between the pixels before and after compression (**NEAR** value) to control distortion, **NEAR=0** is the lossless mode; **NEAR>0** is the lossy mode, the larger the **NEAR**, the greater the distortion and the greater the compression ratio. The file suffix name for **JPEG-LS** compressed image is .**jls** .
+
+
+
+# Module Usage
+
+[**jls_encoder.sv**](./RTL/jls_encoder.sv) in the [RTL](./RTL) directory is a JPEG-LS compression module that can be call by the FPGA users, which inputs image raw pixels and outputs a JPEG-LS compressed stream.
+
+## Module parameter
+
+**jls_encoder** has a parameter:
+
+```verilog
+parameter logic [2:0] NEAR
+```
+
+which determines the NEAR value of JPEG-LS algorithm. When the value is 3'd0, it works in lossless mode; when the value is 3'd1~3'd7, it works in lossy mode.
+
+## Module Interface
+
+The input and output signals of **jls_encoder** are described in the following table.
+
+| Signal |      Name      | direction | width | description                                                  |
+| :----: | :------------: | :-------: | :---: | :----------------------------------------------------------- |
+|  rstn  |     reset      |    in     | 1bit  | When the clock rises, if rstn=0, the module is reset, and rstn=1 in normal use. |
+|  clk   |     clock      |    in     | 1bit  | All signals should be aligned on the rising edge of clk.     |
+| i_sof  | start of frame |    in     | 1bit  | When a new image needs to be input, keep i_sof=1 for at least 368 clock cycles. |
+|  i_w   |    width-1     |    in     | 14bit | For example, if the image width is 1920, i_w should be set to 14'd1919. Needs to remain valid when i_sof=1. |
+|  i_h   |    height-1    |    in     | 14bit | For example, if the image width is 1080, i_h should be set to 14'd1079. Needs to remain valid when i_sof=1. |
+|  i_e   |  input valid   |    in     | 1bit  | i_e=1 indicates a valid input pixel is on i_x                |
+|  i_x   |  input pixel   |    in     | 8bit  | The pixel value range is 8'd0 ~ 8'd255 .                     |
+|  o_e   |  output valid  |    out    | 1bit  | o_e=1 indicates a valid data is on o_data.                   |
+| o_data |  output data   |    out    | 16bit | Big endian, odata[15:8] online; odata[7:0] after.            |
+| o_last |  output last   |    out    | 1bit  | o_last=1, indicate that this is the last data of the output stream of an image. |
+
+> Note：i_w cannot less than 14'd4 。
+
+## Input pixels
+
+The operation flow of  **jls_encoder** module is:
+
+1. **Reset** (optional): Set `rstn=0` for at least **1 cycle** to reset, and then keep `rstn=1` during normal operation. In fact, it is not necessary to reset.
+2. **Start**: keep `i_sof=1` **at least 368 cycles**, while inputting the width and height of the image on the `i_w` and `i_h` signals, `i_w` and `i_h` should remain valid during` i_sof=1`.
+3. **Input**: Control `i_e` and `i_x`, input all the pixels of the image from left to right, top to bottom. When `i_e=1`, `i_x` is input as a pixel.
+4. **Idle between images**: After all pixel input ends, it needs to be idle for at least 16 cycles without any action (i.e. `i_sof=0`, `i_e=0`). Then you can skip to step 2 and start the next image.
+
+Between `i_sof=1` and `i_e=1`; and between `i_e=1` each can insert any number of free bubbles (ie, `i_sof=0`, `i_e=0`), which means that we can input pixels intermittently (of course, without inserting any bubbles for maximum performance).
+
+The following figure shows the input timing diagram of compressing 2 images (//represents omitting several cycles, X represents don't care). where image 1 has 1 bubble inserted after the first pixel is entered; while image 2 has 1 bubble inserted after i_sof=1. Note **Inter-image idle** must be at least **16 cycles**.
+
+               __    __//  __    __    __    __   //_    __    //    __    __//  __    __    __    //    __
+    clk    \__/  \__/  //_/  \__/  \__/  \__/  \__// \__/  \__///\__/  \__/  //_/  \__/  \__/  \__///\__/  \_
+                _______//________                 //           //     _______//________            //
+    i_sof  ____/       //        \________________//___________//____/       //        \___________//________
+                _______//________                 //           //     _______//________            //
+    i_w    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
+                _______//________                 //           //     _______//________            //
+    i_h    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
+                       //         _____       ____//_____      //            //               _____//____
+    i_e    ____________//________/     \_____/    //     \_____//____________//______________/     //    \___
+                       //         _____       ____//_____      //            //               _____//____
+    i_x    XXXXXXXXXXXX//XXXXXXXXX_____XXXXXXX____//_____XXXXXX//XXXXXXXXXXXX//XXXXXXXXXXXXXXX_____//____XXXX
+    
+    阶段：      |    开始图像1     |        输入图像1       | 图像间空闲  |    开始图像2      |       输入图像2       
+
+## Output JLS stream
+
+During the input, **jls_encoder** will also output a compressed **JPEG-LS stream**, which constitutes the content of the complete .jls file (including the file header and trailer). When `o_e=1`, `o_data` is a valid output data. Among them, `o_data` follows the big endian order, that is, `o_data[15:8]` is at the front of the stream, and `o_data[7:0]` is at the back of the stream. `o_last=1` indicates the end of the compressed stream for an image when the output stream for each image encounters the last data.
+
+
+
+# RTL Simulation
+
+Simulation related files are in the [SIM](./SIM) directory, including:
+
+* [tb_jls_encoder.sv](./SIM) is a testbench for jls_encoder. The behavior is: batch uncompressed images in .pgm format in the specified folder into jls_encoder for compression, and then save the output of jls_encoder to a .jls file.
+* [tb_jls_encoder_run_iverilog.bat](./SIM) is a command script for iverilog simulation.
+* The [images](./SIM) folder contains several image files in .pgm format. The .pgm format stores an uncompressed (that is, raw pixel) 8bit grayscale image, which can be opened with photoshop software or a Linux image viewer (Windows image viewer cannot view it).
+
+> The .pgm file format is very simple, with only a header to indicate the length and width of the image, followed by all the raw pixels of the image. So I choose .pgm file as the input file for the simulation, because it only needs to write some code in the testbench to parse the .pgm file, and take out the pixels and send it to jls_encoder . However, you can ignore the format of the pgm file, because the work of jls_encoder has nothing to do with the pgm format, it only needs to accept the raw pixels of the image as input. You only need to focus on the simulated waveform and how the image pixels are fed into the jls_encoder.
+
+Before using iverilog for simulation, you need to install iverilog , see: [iverilog_usage](https://github.com/WangXuan95/WangXuan95/blob/main/iverilog_usage/iverilog_usage.md)
+
+Then double-click tb_jls_encoder_run_iverilog.bat to run the simulation, which takes more than 10 minutes to run.
+
+After the simulation is over, you can see that several .jls files are generated in the folder, which are compressed image files. In addition, the simulation also produces a waveform file dump.vcd, you can open dump.vcd with gtkwave to view the waveform.
+
+In addition, you can also modify some simulation parameters:
+
+- Modify the macro **NEAR** in tb_jls_encoder.sv to change the compression ratio.
+- Modify the macro **BUBBLE_CONTROL** in tb_jls_encoder.sv to determine how many bubbles to insert between adjacent input pixels:
+  - When **BUBBLE_CONTROL=0**, no bubbles are inserted.
+  - When **BUBBLE_CONTROL>0**, insert **BUBBLE_CONTROL ** bubbles.
+  - When **BUBBLE_CONTROL<0**, insert random **0~(-BUBBLE_CONTROL)** bubbles each time.
+
+
+
+## View compressed JLS file
+
+Because **JPEG-LS** is niche and professional, most image viewing software cannot view .jls files.
+
+You can try [this site](https://filext.com/file-extension/JLS) to view .jls files (though this site doesn't work sometimes).
+
+If the website doesn't work, you can use the decompressor [decoder.exe](./SIM) I provided to decompress it back to a .pgm file and view it again. Please run the command with CMD in the [SIM](./SIM) directory:
+
+```powershell
+.\decoder.exe <JLS_FILE_NAME> <PGM_FILE_NAME>
+```
+
+For example:
+
+```powershell
+.\decoder.exe test000.jls tmp.pgm
+```
+
+> Note: decoder.exe is compiled from the C language source code provided by UBC : http://www.stat.columbia.edu/~jakulin/jpeg-ls/mirror.htm
+
+
+
+# FPGA Deployment
+
+On Xilinx Artix-7 xc7a35tcsg324-2, the synthesized and implemented results are as follows.
+
+|    LUT     |    FF    |              BRAM              | Max Clock freq. |
+| :--------: | :------: | :----------------------------: | :-------------: |
+| 2347 (11%) | 932 (2%) | 9 x RAMB18 (9%), total 144Kbit |     35 MHz      |
+
+At 35MHz, the image compression performance is 35 Mpixel/s, which means the compression frame rate for 1920x1080 images is 16.8fps.
+
--- a/RTL/jls_encoder.sv
+++ b/RTL/jls_encoder.sv
@ -0,0 +1,903 @@
+
+//--------------------------------------------------------------------------------------------------------
+// Module  : jls_encoder
+// Type    : synthesizable, IP's top
+// Standard: SystemVerilog 2005 (IEEE1800-2005)
+// Function: JPEG-LS image compressor
+//--------------------------------------------------------------------------------------------------------
+
+module jls_encoder #(
+    parameter [2:0] NEAR = 3'd1
+) (
+    input  wire        rstn,
+    input  wire        clk,
+    input  wire        i_sof,   // start of image
+    input  wire [13:0] i_w,     // image_width-1 , range: 4~16383, that is, image_width range: 5~16384
+    input  wire [13:0] i_h,     // image_height-1, range: 0~16382, that is, image_height range: 1~16383
+    input  wire        i_e,     // input pixel enable
+    input  wire [ 7:0] i_x,     // input pixel
+    output wire        o_e,     // output data enable
+    output wire [15:0] o_data,  // output data
+    output wire        o_last   // indicate the last output data of a image
+);
+
+//---------------------------------------------------------------------------------------------------------------------------
+// local parameters
+//---------------------------------------------------------------------------------------------------------------------------
+wire [3:0] P_QBPPS [8];
+assign P_QBPPS[0] = 4'd8;
+assign P_QBPPS[1] = 4'd7;
+assign P_QBPPS[2] = 4'd6;
+assign P_QBPPS[3] = 4'd6;
+assign P_QBPPS[4] = 4'd5;
+assign P_QBPPS[5] = 4'd5;
+assign P_QBPPS[6] = 4'd5;
+assign P_QBPPS[7] = 4'd5;
+
+localparam logic               P_LOSSY     = NEAR != '0;
+localparam logic  signed [8:0] P_NEAR      = $signed({6'd0, NEAR});
+localparam logic  signed [8:0] P_T1        = $signed(9'd3) + $signed(9'd3) * P_NEAR;
+localparam logic  signed [8:0] P_T2        = $signed(9'd7) + $signed(9'd5) * P_NEAR;
+localparam logic  signed [8:0] P_T3        = $signed(9'd21)+ $signed(9'd7) * P_NEAR;
+localparam logic  signed [9:0] P_QUANT     = {P_NEAR, 1'b1};
+localparam logic  signed [9:0] P_QBETA     = $signed(10'd256 + {5'd0,NEAR,2'd0}) / P_QUANT;
+localparam logic  signed [9:0] P_QBETAHALF = (P_QBETA+$signed(10'd1)) / $signed(10'd2);
+wire        [3:0] P_QBPP      = P_QBPPS[NEAR];
+wire        [4:0] P_LIMIT     = 5'd31 - {1'b0, P_QBPP};
+localparam logic        [12:0] P_AINIT     = (NEAR=='0) ? 13'd4 : 13'd2;
+
+wire [3:0] J [32];
+assign J[ 0] = 4'd0;
+assign J[ 1] = 4'd0;
+assign J[ 2] = 4'd0;
+assign J[ 3] = 4'd0;
+assign J[ 4] = 4'd1;
+assign J[ 5] = 4'd1;
+assign J[ 6] = 4'd1;
+assign J[ 7] = 4'd1;
+assign J[ 8] = 4'd2;
+assign J[ 9] = 4'd2;
+assign J[10] = 4'd2;
+assign J[11] = 4'd2;
+assign J[12] = 4'd3;
+assign J[13] = 4'd3;
+assign J[14] = 4'd3;
+assign J[15] = 4'd3;
+assign J[16] = 4'd4;
+assign J[17] = 4'd4;
+assign J[18] = 4'd5;
+assign J[19] = 4'd5;
+assign J[20] = 4'd6;
+assign J[21] = 4'd6;
+assign J[22] = 4'd7;
+assign J[23] = 4'd7;
+assign J[24] = 4'd8;
+assign J[25] = 4'd9;
+assign J[26] = 4'd10;
+assign J[27] = 4'd11;
+assign J[28] = 4'd12;
+assign J[29] = 4'd13;
+assign J[30] = 4'd14;
+assign J[31] = 4'd15;
+
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: is_near
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic func_is_near(input [7:0] x1, input [7:0] x2);
+    logic signed [8:0] ex1, ex2;
+    ex1 = $signed({1'b0,x1});
+    ex2 = $signed({1'b0,x2});
+    return ex1 - ex2 <= P_NEAR && ex2 - ex1 <= P_NEAR;
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: predictor (get_px)
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic [7:0] func_predictor(input [7:0] a, input [7:0] b, input [7:0] c);
+    if( c>=a && c>=b )
+        return a>b ? b : a;
+    else if( c<=a && c<=b )
+        return a>b ? a : b;
+    else
+        return a - c + b;
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: q_quantize
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic signed [3:0] func_q_quantize(input [7:0] x1, input [7:0] x2);
+    logic signed [8:0] delta;
+    delta = $signed({1'b0,x1}) - $signed({1'b0,x2});
+    if     (delta <= -P_T3 )
+        return -$signed(4'd4);
+    else if(delta <= -P_T2 )
+        return -$signed(4'd3);
+    else if(delta <= -P_T1 )
+        return -$signed(4'd2);
+    else if(delta <  -P_NEAR )
+        return -$signed(4'd1);
+    else if(delta <=  P_NEAR )
+        return  $signed(4'd0);
+    else if(delta <   P_T1 )
+        return  $signed(4'd1);
+    else if(delta <   P_T2 )
+        return  $signed(4'd2);
+    else if(delta <   P_T3 )
+        return  $signed(4'd3);
+    else
+        return  $signed(4'd4);
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: get_q (part 1), qp1 = 81*Q(d-b) + 9*Q(b-c)
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic signed [9:0] func_get_qp1(input [7:0] c, input [7:0] b, input [7:0] d);
+    return $signed(10'd81) * func_q_quantize(d,b) + $signed(10'd9) * func_q_quantize(b,c);
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: get_q (part 2), get sign(qs) and abs(qs), where qs = qp1 + Q(c-a)
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic [9:0] func_get_q(input signed [9:0] qp1, input [7:0] c, input [7:0] a);
+    logic signed [9:0] qs;
+    logic              s;
+    logic        [8:0] q;
+    qs = qp1 + func_q_quantize(c,a);
+    s = qs[9];
+    q = s ? (~qs[8:0]+9'd1) : qs[8:0];
+    return {s, q};
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: clip
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic [7:0] func_clip(input signed [9:0] val);
+    if( val > $signed(10'd255) )
+        return 8'd255;
+    else if( val < $signed(10'd0) )
+        return 8'd0;
+    else
+        return val[7:0];
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: errval_quantize
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic signed [9:0] func_errval_quantize(input signed [9:0] err);
+    if(err[9])
+        return -( (P_NEAR - err) / P_QUANT );
+    else
+        return    (P_NEAR + err) / P_QUANT;
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: modrange
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic signed [9:0] func_modrange(input signed [9:0] val);
+    logic signed [9:0] new_val;
+    new_val = val;
+    if( new_val[9] )
+        new_val += P_QBETA;
+    if( new_val >= P_QBETAHALF )
+        new_val -= P_QBETA;
+    return new_val;
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: get k
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic [3:0] func_get_k(input [12:0] A, input [6:0] N, input rt);
+    logic [18:0] Nt, At;
+    logic [ 3:0] k;
+    Nt = {12'h0, N};
+    At = { 6'h0, A};
+    k = 4'd0;
+    if(rt)
+        At += {13'd0, N[6:1]};
+    for(int ii=0; ii<13; ii++)
+        if((Nt<<ii) < At)
+            k++;
+    return k;
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: B update for run mode
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic [6:0] B_update(input reset, input [6:0] B, input errm0);
+    B_update = B;
+    if(errm0)
+        B_update ++;
+    if(reset)
+        B_update >>>= 1;
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: C, B update for regular mode
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic [14:0] C_B_update(input reset, input [6:0] N, input signed [7:0] C, input signed [6:0] B, input signed [9:0] err);
+    logic signed [9:0] Bt;
+    logic signed [7:0] Ct;
+    Bt = B;
+    Ct = C;
+    Bt += err * P_QUANT;
+    if(reset)
+        Bt >>>= 1;
+    if( Bt <= -$signed({3'd0,N}) ) begin
+        Bt += $signed({3'd0,N});
+        if( Bt <= -$signed({3'd0,N}) )
+            Bt = -$signed({3'd0,N}-10'd1);
+        if( Ct != $signed(8'd128) )
+            Ct--;
+    end else if( Bt > $signed(10'd0) ) begin
+        Bt -= $signed({3'd0,N});
+        if( Bt > $signed(10'd0) )
+            Bt = $signed(10'd0);
+        if( Ct != $signed(8'd127) )
+            Ct++;
+    end
+    return {Ct, Bt[6:0]};
+endfunction
+
+
+//---------------------------------------------------------------------------------------------------------------------------
+// function: A update
+//---------------------------------------------------------------------------------------------------------------------------
+function automatic logic [12:0] A_update(input reset, input [12:0] A, input [9:0] inc);
+    A_update = A + {3'd0, inc};
+    if(reset)
+        A_update >>>= 1;
+endfunction
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// context memorys
+//-------------------------------------------------------------------------------------------------------------------
+reg        [ 5:0] Nram [366];
+reg        [12:0] Aram [366];
+reg signed [ 6:0] Bram [366];
+reg signed [ 7:0] Cram [1:364];
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage a: generate ii, jj
+//-------------------------------------------------------------------------------------------------------------------
+reg        a_sof;
+reg        a_e;
+reg [ 7:0] a_x;
+reg [13:0] a_w;
+reg [13:0] a_h;
+reg [13:0] a_wl;
+reg [13:0] a_hl;
+reg [13:0] a_ii;
+reg [14:0] a_jj;
+
+always @ (posedge clk)
+    if(~rstn) begin
+        {a_sof, a_e, a_x, a_w, a_h, a_wl, a_hl, a_ii, a_jj} <= '0;
+    end else begin
+        a_sof <= i_sof;
+        a_e <= i_e;
+        a_x <= i_x;
+        a_w <= i_w;
+        a_h <= i_h;
+        if(a_sof) begin
+            a_wl <= (a_w<14'd4 ? 14'd4 : a_w);
+            a_hl <= a_h;
+            a_ii <= '0;
+            a_jj <= '0;
+        end else if(a_e) begin
+            if(a_ii < a_wl)
+                a_ii <= a_ii + 14'd1;
+            else begin
+                a_ii <= '0;
+                if(a_jj <= {1'b0,a_hl})
+                    a_jj <= a_jj + 15'd1;
+            end
+        end
+    end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage b: generate fc, lc, nfr
+//-------------------------------------------------------------------------------------------------------------------
+reg        b_sof;
+reg        b_e;
+reg        b_fc;
+reg        b_lc;
+reg        b_fr;
+reg        b_eof;
+reg [13:0] b_ii;
+reg [ 7:0] b_x;
+
+always @ (posedge clk) begin
+    b_sof <= a_sof & rstn;
+    if(~rstn | a_sof) begin
+        {b_e, b_fc, b_lc, b_fr, b_eof, b_ii, b_x} <= '0;
+    end else begin
+        b_e <= a_e & (a_jj <= {1'b0,a_hl});
+        b_fc <= a_e & (a_ii == '0);
+        b_lc <= a_e & (a_ii == a_wl);
+        b_fr <= a_e & (a_jj == '0);
+        b_eof <= a_jj > {1'b0,a_hl};
+        b_ii <= a_ii;
+        b_x <= a_x;
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage c: maintain linebuffer, generate context pixels: b, c, d , where d is not valid in case of fr and lc
+//-------------------------------------------------------------------------------------------------------------------
+reg        c_sof;
+reg        c_e;
+reg        c_fc;
+reg        c_lc;
+reg        c_fr;
+reg        c_eof;
+reg [13:0] c_ii;
+reg [ 7:0] c_x;
+reg [ 7:0] c_b;
+reg [ 7:0] c_bt;
+reg [ 7:0] c_c;
+reg [ 7:0] c_d;
+
+always @ (posedge clk) begin
+    c_sof <= b_sof & rstn;
+    if(~rstn | b_sof) begin
+        {c_e,c_fc,c_lc,c_fr,c_eof,c_ii,c_x,c_b,c_bt,c_c} <= '0;
+    end else begin
+        c_e <= b_e;
+        c_fc <= b_fc;
+        c_lc <= b_lc;
+        c_fr <= b_fr;
+        c_eof <= b_eof;
+        c_ii <= b_ii;
+        if(b_e) begin
+            c_x <= b_x;
+            c_b <= b_fr ? '0 : c_d;
+            if(b_fr) begin
+                c_bt <= '0;
+                c_c <= '0;
+            end else if(b_fc) begin
+                c_bt <= c_d;
+                c_c <= c_bt;
+            end else
+                c_c <= c_b;
+        end
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage d: fix context pixel d (locally) in case fr and lc, get q part-1 (qp1)
+//-------------------------------------------------------------------------------------------------------------------
+reg        d_sof;
+reg        d_e;
+reg        d_fc;
+reg        d_lc;
+reg        d_eof;
+reg [13:0] d_ii;
+reg [ 7:0] d_x;
+reg [ 7:0] d_b;
+reg [ 7:0] d_c;
+reg signed [9:0] d_qp1;
+
+always @ (posedge clk) begin
+    d_sof <= c_sof & rstn;
+    if(~rstn | c_sof) begin
+        {d_e, d_fc, d_lc, d_eof, d_ii, d_x, d_b, d_c, d_qp1} <= '0;
+    end else begin
+        logic [7:0] d;
+        d_e <= c_e;
+        d_fc <= c_fc;
+        d_lc <= c_lc;
+        d_eof <= c_eof;
+        d_ii <= c_ii;
+        d_x <= c_x;
+        d_b <= c_b;
+        d_c <= c_c;
+        d = c_fr ? '0 : (c_lc ? c_b : c_d);
+        d_qp1 <= func_get_qp1(c_c, c_b, d);
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage e: get errval, Rx reconstruct loop, N, B, C update
+//-------------------------------------------------------------------------------------------------------------------
+reg              e_sof;
+reg              e_e;
+reg              e_fc;
+reg              e_lc;
+reg              e_eof;
+reg       [13:0] e_ii;
+reg              e_runi;
+reg              e_rune;
+reg              e_2BleN;
+reg        [7:0] e_x;
+reg        [8:0] e_q;
+reg              e_rt;
+reg signed [9:0] e_err;
+reg        [6:0] e_No;
+reg              e_write_C, e_write_en;
+reg        [5:0] e_Nn;
+reg signed [7:0] e_Cn;
+reg signed [6:0] e_Bn;
+
+always @ (posedge clk) begin
+    e_sof <= d_sof & rstn;
+    e_2BleN <= 1'b0;
+    {e_write_C, e_Cn, e_write_en, e_Bn, e_Nn} <= '0;
+    if(~rstn | d_sof) begin
+        {e_e, e_fc, e_lc, e_eof, e_ii, e_runi, e_rune, e_x, e_q, e_rt, e_err, e_No} <= '0;
+    end else begin
+        logic        [7:0] a;
+        logic              s;
+        logic        [8:0] q;
+        logic              rt;
+        logic              runi;
+        logic              rune;
+        logic signed [7:0] Co;
+        logic        [6:0] No, Nn;
+        logic signed [6:0] Bo;
+        logic signed [9:0] px;
+        logic signed [9:0] err;
+        a = d_fc ? d_b : e_x;
+        rt = 1'b0;
+        rune = 1'b0;
+        No = '0;
+        err = '0;
+        {s, q} = func_get_q(d_qp1, d_c, a);
+        Co = (e_write_C & e_q==q) ? e_Cn : Cram[q];
+        runi = ~d_fc & e_runi | (q == 9'd0);
+        if(runi) begin
+            runi = func_is_near(d_x, a);
+            rune = ~runi;
+        end
+        if(d_e) begin
+            if(runi) begin
+                e_x <= P_LOSSY ? a : d_x;
+            end else begin
+                if(rune) begin
+                    rt = func_is_near(d_b, a);
+                    s = {1'b0,a} > ({1'b0,d_b} + {6'd0,NEAR}) ? 1'b1 : 1'b0;
+                    q = rt ? 9'd365 : 9'd0;
+                    px = rt ? a : d_b;
+                end else begin
+                    px[9:8] = 2'b00;
+                    px[7:0] = func_clip( $signed({2'h0, func_predictor(a,d_b,d_c)}) + ( s ? -$signed({Co[7],Co[7],Co}) : $signed({Co[7],Co[7],Co}) ) );
+                end
+                err = s ? px - $signed({2'd0, d_x}) : $signed({2'd0, d_x}) - px;
+                err = func_errval_quantize(err);
+                e_x <= P_LOSSY ? func_clip( px + ( s ? -(P_QUANT*err) : P_QUANT*err ) ) : d_x;
+                err = func_modrange(err);
+                No = ((e_write_en & e_q==q) ? e_Nn : Nram[q]) + 7'd1;
+                Nn = No;
+                if(No[6]) Nn >>>= 1;
+                e_Nn <= Nn[5:0];
+                Bo = (e_write_en & e_q==q) ? e_Bn : Bram[q];
+                e_write_en <= 1'b1;
+                if(rune) begin
+                    e_Bn <= B_update(No[6], Bo, err<$signed(10'd0));
+                    e_2BleN <= $signed({Bo,1'b0}) < $signed({1'b0,No});
+                end else begin
+                    e_write_C <= 1'b1;
+                    {e_Cn, e_Bn} <= C_B_update(No[6], Nn+7'd1, Co, Bo, err);
+                    e_2BleN <= $signed({Bo,1'b0}) <= -$signed({1'b0,No});
+                end
+            end
+            e_runi <= runi;
+        end
+        e_e <= d_e;
+        e_fc <= d_fc;
+        e_lc <= d_lc;
+        e_eof <= d_eof;
+        e_ii <= d_ii;
+        e_rune <= d_e & rune;
+        e_q <= q;
+        e_rt <= rt;
+        e_err <= err;
+        e_No <= No;
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage f: write Cram, Bram, Nram
+//-------------------------------------------------------------------------------------------------------------------
+reg [8:0] NBC_init_addr;
+always @ (posedge clk)
+    NBC_init_addr <= e_sof ? NBC_init_addr + (NBC_init_addr < 9'd366 ? 9'd1 : 9'd0) : 9'd0;
+
+always @ (posedge clk)
+    if(e_sof | e_write_en) begin
+        Nram[e_write_en ? e_q : NBC_init_addr] <= e_Nn;
+        Bram[e_write_en ? e_q : NBC_init_addr] <= e_Bn;
+    end
+
+always @ (posedge clk)
+    if(e_sof | e_write_C) begin
+        Cram[e_write_C ? e_q : NBC_init_addr] <= e_Cn;
+    end
+
+
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage ef: read Aram, buffer registers
+//-------------------------------------------------------------------------------------------------------------------
+reg              ef_sof;
+reg              ef_e;
+reg              ef_fc;
+reg              ef_lc;
+reg              ef_eof;
+reg              ef_runi;
+reg              ef_rune;
+reg              ef_2BleN;
+reg        [8:0] ef_q;
+reg              ef_rt;
+reg signed [9:0] ef_err;
+reg        [6:0] ef_No;
+reg       [12:0] ef_Ao;
+reg              ef_write_en;
+
+always @ (posedge clk) begin
+    ef_sof <= e_sof & rstn;
+    if(~rstn | e_sof) begin
+        {ef_e, ef_fc, ef_lc, ef_eof, ef_runi, ef_rune, ef_2BleN, ef_q, ef_rt, ef_err, ef_No, ef_write_en} <= '0;
+    end else begin
+        ef_e <= e_e;
+        ef_fc <= e_fc;
+        ef_lc <= e_lc;
+        ef_eof <= e_eof;
+        ef_runi <= e_runi & e_e;
+        ef_rune <= e_rune;
+        ef_2BleN <= e_2BleN;
+        ef_q <= e_q;
+        ef_rt <= e_rt;
+        ef_err <= e_err;
+        ef_No <= e_No;
+        ef_write_en <= e_write_en;
+    end
+end
+
+always @ (posedge clk)
+    ef_Ao <= Aram[e_q];
+
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage f: process run, calcuate merrval and k, A update
+//-------------------------------------------------------------------------------------------------------------------
+reg               f_sof;
+reg               f_e;
+reg               f_eof;
+reg               f_runi;
+reg               f_rune;
+reg        [ 9:0] f_merr;
+reg        [ 3:0] f_k;
+reg        [15:0] f_rc;
+reg        [ 4:0] f_ri;
+reg        [ 1:0] f_on;
+reg        [15:0] f_cb;
+reg        [ 4:0] f_cn;    // in range of 0~16
+reg        [ 4:0] f_limit;
+reg  [8:0] f_q;
+reg        f_write_en;
+reg [12:0] f_An;
+
+reg  [8:0] g_q;
+reg        g_write_en;
+reg [12:0] g_An;
+always @ (posedge clk)
+    if(~rstn | f_sof)
+        {g_q, g_write_en, g_An} <= '0;
+    else
+        {g_q, g_write_en, g_An} <= {f_q, f_write_en, f_An};
+
+
+always @ (posedge clk) begin
+    f_sof <= ef_sof & rstn;
+    f_limit <= P_LIMIT;
+    f_An <= P_AINIT;
+    if(~rstn | ef_sof) begin
+        {f_e, f_eof, f_runi, f_rune, f_merr, f_k, f_rc, f_ri, f_on, f_cb, f_cn, f_q, f_write_en} <= '0;
+    end else begin
+        logic [ 1:0] on;
+        logic [15:0] rc;
+        logic [ 4:0] ri;
+        logic [12:0] Ao;
+        logic [ 3:0] k;
+        logic [ 9:0] abserr;
+        logic [ 9:0] merr, Ainc;
+        logic        map;
+        on = '0;
+        rc = (ef_fc|~ef_runi) ? '0 : f_rc;
+        ri = f_ri;
+        Ao = (f_write_en & f_q==ef_q) ? f_An : (g_write_en & g_q==ef_q) ? g_An : ef_Ao;
+        abserr = ef_err<$signed(10'd0) ? $unsigned(-ef_err) : $unsigned(ef_err);
+        merr='0;
+        Ainc='0;
+        f_write_en <= ef_write_en;
+        k = func_get_k(Ao, ef_No, ef_rt);
+        f_cb <= ef_fc ? '0 : f_rc;
+        f_cn <= {1'b0,J[ri]} + 5'd1;
+        if(ef_runi) begin
+            rc ++;
+            if(rc >= (16'd1<<J[ri])) begin
+                on++;
+                rc -= (16'd1<<J[ri]);
+                if(ri < 5'd31) ri ++;
+            end
+            if(ef_lc & (rc > 16'd0))
+                on++;
+        end else if(ef_rune) begin
+            f_limit <= P_LIMIT - 5'd1 - {1'b0,J[ri]};
+            if(ri > '0) ri --;
+            map = ~( (ef_err=='0) | ( (ef_err>$signed(10'd0)) ^ (k==4'd0 & ef_2BleN) ) );
+            merr = (abserr<<1) - {9'd0,ef_rt} - {9'd0,map};
+            Ainc = ((merr + {9'd0,~ef_rt}) >> 1);
+        end else begin
+            map = (~P_LOSSY) & (k==4'd0) & ef_2BleN;
+            if(ef_err < $signed(10'd0))
+                merr = (abserr<<1) - 10'd1 - {9'd0,map};
+            else
+                merr = (abserr<<1) + {9'd0,map};
+            Ainc = (ef_err < $signed(10'd0)) ? $unsigned(-ef_err) : $unsigned(ef_err);
+        end
+        if(ef_e) begin
+            f_rc <= rc;
+            f_ri <= ri;
+        end
+        f_An <= A_update(ef_No[6], Ao, Ainc);
+        f_e <= ef_e;
+        f_eof <= ef_eof;
+        f_runi <= ef_runi;
+        f_rune <= ef_rune;
+        f_merr <= merr;
+        f_k <= k;
+        f_on <= on;
+        f_q <= ef_q;
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage g: write Aram
+//-------------------------------------------------------------------------------------------------------------------
+reg [8:0] A_init_addr;
+always @ (posedge clk)
+    A_init_addr <= f_sof ? A_init_addr + (A_init_addr < 9'd366 ? 9'd1 : 9'd0) : 9'd0;
+    
+always @ (posedge clk)
+    if(f_sof | f_write_en)
+        Aram[f_write_en ? f_q : A_init_addr] <= f_An;
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage g: golomb coding parser
+//-------------------------------------------------------------------------------------------------------------------
+reg         g_sof;
+reg         g_e;
+reg         g_eof;
+reg         g_runi;
+reg  [ 1:0] g_on;   // in range of 0~2
+reg  [15:0] g_cb;   
+reg  [ 4:0] g_cn;   // in range of 0~16
+reg  [ 4:0] g_zn;   // in range of 0~27
+reg  [ 9:0] g_db;
+reg  [ 3:0] g_dn;   // in range of 0~13
+
+always @ (posedge clk) begin
+    g_sof <= f_sof & rstn;
+    if(~rstn | f_sof) begin
+        {g_e, g_eof, g_runi, g_on, g_cb, g_cn, g_zn, g_db, g_dn} <= '0;
+    end else begin
+        logic [9:0] merr_sk;
+        merr_sk = f_merr >> f_k;
+        g_e <= f_e;
+        g_eof <= f_eof;
+        g_runi <= f_runi;
+        g_on <= f_on;
+        g_cb <= f_rune ? f_cb : '0;
+        g_cn <= f_rune ? f_cn : '0;
+        if(merr_sk < f_limit) begin 
+            g_zn <= merr_sk[4:0];
+            g_db <= f_merr & ~(10'h3ff<<f_k);
+            g_dn <= f_k;
+        end else begin
+            g_zn <= f_limit[4:0];
+            g_db <= (f_merr-10'd1) & ~(10'h3ff<<P_QBPP);
+            g_dn <= P_QBPP;
+        end
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage h: golomb coding bits merge
+//-------------------------------------------------------------------------------------------------------------------
+reg        h_sof;
+reg        h_eof;
+reg [56:0] h_bb;  // max 57 bits
+reg [ 5:0] h_bn;  // in range of 0~57
+
+always @ (posedge clk) begin
+    h_sof <= g_sof & rstn;
+    {h_bb, h_bn} <= '0;
+    if(~rstn | g_sof) begin
+        h_eof <= 1'b0;
+    end else begin
+        h_eof <= g_eof;
+        if(g_e) begin
+            if(g_runi) begin
+                if(g_on==2'd1)
+                    h_bb[56] <= 1'b1;
+                else if(g_on==2'd2)
+                    h_bb[56:55] <= 2'b11;
+                h_bn <= {4'h0, g_on};
+            end else begin
+                h_bb <= ( {41'h0,g_cb} << (6'd57-g_cn) ) | ( 57'd1 << (6'd56-g_cn-g_zn) ) | ( {47'h0,g_db} << (6'd56-g_cn-g_zn-g_dn) );
+                h_bn <= {1'b0,g_cn} + {1'b0,g_zn} + 6'd1 + {2'b0,g_dn};
+            end
+        end
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage j: jls stream generate
+//-------------------------------------------------------------------------------------------------------------------
+reg        j_sof;
+reg        j_eof;
+reg        j_e;
+reg [15:0] j_data;
+reg[247:0] j_bbuf;
+reg [ 7:0] j_bcnt;
+
+always @ (posedge clk) begin
+    j_sof <= h_sof & rstn;
+    {j_e, j_data} <= '0;
+    if(~rstn | h_sof) begin
+        {j_eof, j_bbuf, j_bcnt} <= '0;
+    end else begin
+        logic [247:0] bbuf;
+        logic [  7:0] bcnt;
+        bbuf = j_bbuf | ({h_bb,191'h0} >> j_bcnt);
+        bcnt = j_bcnt + {2'd0,h_bn};
+        if(bcnt >= 8'd16) begin
+            j_e <= 1'b1;
+            j_data[15:8] <= bbuf[247:240];
+            if(bbuf[247:240] == '1) begin
+                bbuf = {1'h0, bbuf[239:0], 7'h0};
+                bcnt -= 8'd7;
+            end else begin
+                bbuf = {      bbuf[239:0], 8'h0};
+                bcnt -= 8'd8;
+            end
+            j_data[ 7:0] <= bbuf[247:240];
+            if(bbuf[247:240] == '1) begin
+                bbuf = {1'h0, bbuf[239:0], 7'h0};
+                bcnt -= 8'd7;
+            end else begin
+                bbuf = {      bbuf[239:0], 8'h0};
+                bcnt -= 8'd8;
+            end
+        end else if(h_eof && bcnt > 8'd0) begin
+            j_e <= 1'b1;
+            j_data[15:8] <= bbuf[247:240];
+            if(bbuf[247:240] == '1)
+                j_data[ 7:0] <= {1'b0,bbuf[239:233]};
+            else
+                j_data[ 7:0] <= bbuf[239:232];
+            bbuf = '0;
+            bcnt = 8'd0;
+        end
+        j_bbuf <= bbuf;
+        j_bcnt <= bcnt;
+        j_eof <= h_eof;
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// make .jls file header and footer
+//-------------------------------------------------------------------------------------------------------------------
+reg [15:0] jls_wl, jls_hl;
+wire[15:0] jls_header [13];
+assign jls_header[0] = 16'hFFD8;
+assign jls_header[1] = 16'h00FF;
+assign jls_header[2] = 16'hF700;
+assign jls_header[3] = 16'h0B08;
+assign jls_header[4] = jls_hl;
+assign jls_header[5] = jls_wl;
+assign jls_header[6] = 16'h0101;
+assign jls_header[7] = 16'h1100;
+assign jls_header[8] = 16'hFFDA;
+assign jls_header[9] = 16'h0008;
+assign jls_header[10]= 16'h0101;
+assign jls_header[11]= {13'b0,NEAR};
+assign jls_header[12]= 16'h0000;
+wire[15:0] jls_footer = 16'hFFD9;
+always @ (posedge clk)
+    if(~rstn) begin
+        jls_wl <= '0;
+        jls_hl <= '0;
+    end else begin
+        jls_wl <= {2'd0,a_wl} + 16'd1;
+        jls_hl <= {2'd0,a_hl} + 16'd1;
+    end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// pipeline stage k: add .jls file header and footer
+//-------------------------------------------------------------------------------------------------------------------
+reg  [3:0] k_header_i;
+reg        k_footer_i;
+reg        k_last;
+reg        k_e;
+reg [15:0] k_data;
+
+always @ (posedge clk) begin
+    k_last <= 1'b0;
+    k_e <= 1'b0;
+    k_data <= '0;
+    if(j_sof) begin
+        k_footer_i <= '0;
+        if(k_header_i < 4'd13) begin
+            k_e <= 1'b1;
+            k_data <= jls_header[k_header_i];
+            k_header_i <= k_header_i + 4'd1;
+        end
+    end else if(j_e) begin
+        k_header_i <= '0;
+        k_footer_i <= '0;
+        k_e <= 1'b1;
+        k_data <= j_data;
+    end else if(j_eof) begin
+        k_header_i <= '0;
+        k_footer_i <= 1'b1;
+        if(~k_footer_i) begin
+            k_last <= 1'b1;
+            k_e <= 1'b1;
+            k_data <= jls_footer;
+        end
+    end else begin
+        k_header_i <= '0;
+        k_footer_i <= '0;
+    end
+end
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// linebuffer for context pixels
+//-------------------------------------------------------------------------------------------------------------------
+reg [7:0] linebuffer [1<<14];
+always @ (posedge clk)  // line buffer read
+    c_d <= linebuffer[a_ii];
+always @ (posedge clk)  // line buffer write
+    if(e_e) linebuffer[e_ii] <= e_x;
+
+
+//-------------------------------------------------------------------------------------------------------------------
+// output signal
+//-------------------------------------------------------------------------------------------------------------------
+assign o_last = k_last;
+assign o_e = k_e;
+assign o_data = k_data;
+
+
+endmodule
+
--- a/SIM/decoder.exe
+++ b/SIM/decoder.exe
--- a/SIM/images/test000.pgm
+++ b/SIM/images/test000.pgm
@ -0,0 +1,5 @@
+P5
+5
+1
+255
+<EFBFBD>e<>J
--- a/SIM/images/test001.pgm
+++ b/SIM/images/test001.pgm
--- a/SIM/images/test002.pgm
+++ b/SIM/images/test002.pgm
--- a/SIM/images/test003.pgm
+++ b/SIM/images/test003.pgm
--- a/SIM/images/test004.pgm
+++ b/SIM/images/test004.pgm
--- a/SIM/images/test005.pgm
+++ b/SIM/images/test005.pgm
--- a/SIM/images/test006.pgm
+++ b/SIM/images/test006.pgm
--- a/SIM/images/test007.pgm
+++ b/SIM/images/test007.pgm
--- a/SIM/images/test008.pgm
+++ b/SIM/images/test008.pgm
--- a/SIM/tb_jls_encoder.sv
+++ b/SIM/tb_jls_encoder.sv
@ -0,0 +1,228 @@
+
+//--------------------------------------------------------------------------------------------------------
+// Module  : tb_jls_encoder
+// Type    : simulation, top
+// Standard: SystemVerilog 2005 (IEEE1800-2005)
+// Function: testbench for jls_encoder,
+//           load some .pgm files (uncompressed image file), and push them to jls_encoder.
+//           get output JPEG-LS stream from jls_encoder, and write them to .jls files (JPEG-LS image file)
+//--------------------------------------------------------------------------------------------------------
+
+`timescale 1ps/1ps
+
+`define NEAR 1     // NEAR can be 0~7
+
+`define FILE_NO_FIRST 1   // first input file name is test000.pgm
+`define FILE_NO_FINAL 8   // final input file name is test000.pgm
+
+
+// bubble numbers that insert between pixels
+//    when = 0, do not insert bubble
+//    when > 0, insert BUBBLE_CONTROL bubbles
+//    when < 0, insert random 0~(-BUBBLE_CONTROL) bubbles
+`define BUBBLE_CONTROL -2
+
+
+
+
+// the input and output file names' format
+`define FILE_NAME_FORMAT  "test%03d"
+
+// input file (uncompressed .pgm file) directory
+`define INPUT_PGM_DIR     "./images"
+
+// output file (compressed .jls file) directory
+`define OUTPUT_JLS_DIR    "./"
+
+
+module tb_jls_encoder ();
+
+initial $dumpvars(1, tb_jls_encoder);
+
+// -------------------------------------------------------------------------------------------------------------------
+//   generate clock and reset
+// -------------------------------------------------------------------------------------------------------------------
+reg      rstn = 1'b0;
+reg       clk = 1'b0;
+always #50000 clk = ~clk;  // 10MHz
+initial begin repeat(4) @(posedge clk); rstn<=1'b1; end
+
+
+// -------------------------------------------------------------------------------------------------------------------
+//   signals for jls_encoder_i module
+// -------------------------------------------------------------------------------------------------------------------
+reg        i_sof = '0;
+reg [13:0] i_w = '0;
+reg [13:0] i_h = '0;
+reg        i_e = '0;
+reg [ 7:0] i_x = '0;
+wire       o_e;
+wire[15:0] o_data;
+wire       o_last;
+
+
+
+logic [7:0] img [4096*4096];
+int w = 0, h = 0;
+
+task automatic load_img(input logic [256*8:1] fname);
+    int linelen, depth=0, scanf_num;
+    logic [256*8-1:0] line;
+    int fp = $fopen(fname, "rb");
+    if(fp==0) begin
+        $display("*** error: could not open file %s", fname);
+        $finish;
+    end
+    linelen = $fgets(line, fp);
+    if(line[8*(linelen-2)+:16] != 16'h5035) begin
+        $display("*** error: the first line must be P5");
+        $fclose(fp);
+        $finish;
+    end
+    scanf_num = $fgets(line, fp);
+    scanf_num = $sscanf(line, "%d%d", w, h);
+    if(scanf_num == 1) begin
+        scanf_num = $fgets(line, fp);
+        scanf_num = $sscanf(line, "%d", h);
+    end
+    scanf_num = $fgets(line, fp);
+    scanf_num = $sscanf(line, "%d", depth);
+    if(depth!=255) begin
+        $display("*** error: images depth must be 255");
+        $fclose(fp);
+        $finish;
+    end
+    for(int i=0; i<h*w; i++)
+        img[i] = $fgetc(fp);
+    $fclose(fp);
+endtask
+
+
+// -------------------------------------------------------------------------------------------------------------------
+//   task: feed image pixels to jls_encoder_i module
+//   arguments:
+//         w   : image width
+//         h   : image height
+//         bubble_control : bubble numbers that insert between pixels
+//              when = 0, do not insert bubble
+//              when > 0, insert bubble_control bubbles
+//              when < 0, insert random 0~bubble_control bubbles
+// -------------------------------------------------------------------------------------------------------------------
+task automatic feed_img(input int bubble_control);
+    int num_bubble;
+    
+    // start feeding a image by assert i_sof for 368 cycles
+    repeat(368) begin
+        @(posedge clk)
+        i_sof <= 1'b1;
+        i_w <= w - 1;
+        i_h <= h - 1;
+        {i_e, i_x} <= '0;
+    end
+    
+    // for all pixels of the image
+    for(int i=0; i<h*w; i++) begin
+        
+        // calculate how many bubbles to insert
+        if(bubble_control<0) begin
+            num_bubble = $random % (1-bubble_control);
+            if(num_bubble<0)
+                num_bubble = -num_bubble;
+        end else begin
+            num_bubble = bubble_control;
+        end
+        
+        // insert bubbles
+        repeat(num_bubble) @(posedge clk) {i_sof, i_w, i_h, i_e, i_x} <= '0;
+        
+        // assert i_e to input a pixel
+        @(posedge clk)
+        {i_sof, i_w, i_h} <= '0;
+        i_e <= 1'b1;
+        i_x <= img[i];
+    end
+    
+    // 16 cycles idle between images
+    repeat(16) @(posedge clk) {i_sof, i_w, i_h, i_e, i_x} <= '0;
+endtask
+
+
+// -------------------------------------------------------------------------------------------------------------------
+//   jls_encoder_i module
+// -------------------------------------------------------------------------------------------------------------------
+jls_encoder #(
+    .NEAR     ( `NEAR     )
+) jls_encoder_i (
+    .rstn     ( rstn      ),
+    .clk      ( clk       ),
+    .i_sof    ( i_sof     ),
+    .i_w      ( i_w       ),
+    .i_h      ( i_h       ),
+    .i_e      ( i_e       ),
+    .i_x      ( i_x       ),
+    .o_e      ( o_e       ),
+    .o_data   ( o_data    ),
+    .o_last   ( o_last    )
+);
+
+
+
+// -------------------------------------------------------------------------------------------------------------------
+//  read images, feed them to jls_encoder_i module 
+// -------------------------------------------------------------------------------------------------------------------
+int file_no;    // file number
+initial begin
+    logic [256*8:1] input_file_name;
+    logic [256*8:1] input_file_format;
+    $sformat(input_file_format , "%s\\%s.pgm",  `INPUT_PGM_DIR, `FILE_NAME_FORMAT);
+    
+    while(~rstn) @ (posedge clk);
+
+    for(file_no=`FILE_NO_FIRST; file_no<=`FILE_NO_FINAL; file_no=file_no+1) begin
+        $sformat(input_file_name, input_file_format , file_no);
+
+        load_img(input_file_name);
+        $display("%s (%5dx%5d)", input_file_name, w, h);
+
+        if( w < 5 || w > 16384 || h < 1 || h > 16383 )         // image size not supported
+            $display("  *** image size not supported ***");
+        else
+            feed_img(`BUBBLE_CONTROL);
+    end
+    
+    repeat(100) @(posedge clk);
+
+    $finish;
+end
+
+
+// -------------------------------------------------------------------------------------------------------------------
+//  write output stream to .jls files 
+// -------------------------------------------------------------------------------------------------------------------
+logic [256*8:1] output_file_format;
+initial $sformat(output_file_format, "%s\\%s.jls", `OUTPUT_JLS_DIR, `FILE_NAME_FORMAT);
+logic [256*8:1] output_file_name;
+int opened = 0;
+int jls_file = 0;
+
+always @ (posedge clk)
+    if(o_e) begin
+        // the first data of an output stream, open a new file.
+        if(opened == 0) begin
+            opened = 1;
+            $sformat(output_file_name, output_file_format, file_no);
+            jls_file = $fopen(output_file_name , "wb");
+        end
+        
+        // write data to file.
+        if(opened != 0 && jls_file != 0)
+            $fwrite(jls_file, "%c%c", o_data[15:8], o_data[7:0]);
+        
+        // if it is the last data of an output stream, close the file.
+        if(o_last) begin
+            opened = 0;
+            $fclose(jls_file);
+        end
+    end
+
+endmodule
--- a/SIM/tb_jls_encoder_run_iverilog.bat
+++ b/SIM/tb_jls_encoder_run_iverilog.bat
@ -0,0 +1,5 @@
+del sim.out dump.vcd
+iverilog  -g2005-sv  -o sim.out  tb_jls_encoder.sv  ../RTL/jls_encoder.sv
+vvp -n sim.out
+del sim.out
+pause