change to Verilog2001

2025-01-17 20:42:53 +08:00 · 2023-06-03 21:15:55 +08:00 · 2023-06-03 21:15:55 +08:00 · 47df5b6219
commit 47df5b6219
parent a03807ee47
5 changed files with 459 additions and 365 deletions
--- a/.gitignore
+++ b/.gitignore
@ -1,2 +1,3 @@
 **/vivado
 **/quartus
+FPGA_jls_encoder_test
--- a/README.md
+++ b/README.md
@ -1,173 +1,33 @@
-![语言](https://img.shields.io/badge/语言-systemverilog_(IEEE1800_2005)-CAD09D.svg) ![仿真](https://img.shields.io/badge/仿真-iverilog-green.svg) ![部署](https://img.shields.io/badge/部署-quartus-blue.svg) ![部署](https://img.shields.io/badge/部署-vivado-FF1010.svg)
-
-中文 | [English](#en)
-
-FPGA JPEG-LS image compressor
-===========================
-
-基于 **FPGA** 的流式的 **JPEG-LS** 图象压缩器，特点是：
-
-* 用于压缩 **8bit** 的灰度图像。
-* 可选**无损模式**，即 NEAR=0 。
-* 可选**有损模式**，NEAR=1~7 可调。
-* 图像宽度取值范围为 [5,16384]，高度取值范围为 [1,16384]。
-* 极简流式输入输出。
-
-
-
-# 背景知识
-
-**JPEG-LS** （简称**JLS**）是一种无损/有损的图像压缩算法，其无损模式的压缩率相当优异，优于 Lossless-JPEG、Lossless-JPEG2000、Lossless-JPEG-XR、FELICES 等。**JPEG-LS** 用压缩前后的像素的最大差值（**NEAR**值）来控制失真，无损模式下 **NEAR=0**；有损模式下**NEAR>0**，**NEAR** 越大，失真越大，压缩率也越大。**JPEG-LS** 压缩图像的文件后缀是 .**jls** 。
-
-
-
-# 使用方法
-
-RTL 目录中的 [**jls_encoder.sv**](./RTL/jls_encoder.sv) 是用户可以调用的 JPEG-LS 压缩模块，它输入图像原始像素，输出 JPEG-LS 压缩流。
-
-## 模块参数
-
-**jls_encoder** 只有一个参数：
-
-```verilog
-parameter logic [2:0] NEAR
-```
-
-决定了 **NEAR** 值，取值为 3'd0 时，工作在无损模式；取值为  3'd1~3'd7 时，工作在有损模式。
-
-## 模块信号
-
-**jls_encoder** 的输入输出信号描述如下表。
-
-| 信号名称 | 全称 | 方向 | 宽度 | 描述 |
-| :---: | :---: | :---: | :---: | :--- |
-| rstn | 同步复位 | input | 1bit | 当时钟上升沿时若 rstn=0，模块复位，正常使用时 rstn=1 |
-| clk | 时钟 | input | 1bit | 时钟，所有信号都应该于 clk 上升沿对齐。 |
-| i_sof | 图像开始 | input | 1bit | 当需要输入一个新的图像时，保持至少368个时钟周期的 i_sof=1 |
-| i_w | 图像宽度-1 | input | 14bit | 例如图像宽度为 1920，则 i_w 应该置为 14‘d1919。需要在 i_sof=1 时保持有效。 |
-| i_h | 图像高度-1 | input | 14bit | 例如图像宽度为 1080，则 i_h 应该置为 14‘d1079。需要在 i_sof=1 时保持有效。 |
-| i_e | 输入像素有效 | input | 1bit | 当 i_e=1 时，一个像素需要被输入到 i_x 上。 |
-| i_x | 输入像素    | input | 8bit | 像素取值范围为 8'd0 ~ 8'd255 。 |
-| o_e | 输出有效    | output | 1bit | 当 o_e=1 时，输出流数据产生在 o_data 上。 |
-| o_data | 输出流数据 | output | 16bit | 大端序，o_data[15:8] 在先；o_data[7:0] 在后。 |
-| o_last | 输出流末尾 | output | 1bit | 当 o_e=1 时若 o_last=1 ，说明这是一张图象的输出流的最后一个数据。 |
-
-> 注：i_w 不能小于 14'd4 。
-
-## 输入图片
-
-**jls_encoder 模块**的操作的流程是：
-
-1. **复位**（可选）：令 rstn=0 至少 **1 个周期**进行复位，之后正常工作时都保持 rstn=1。实际上也可以不复位（即让 rstn 恒为1）。
-2. **开始**：保持 i_sof=1 **至少 368 个周期**，同时在 i_w 和 i_h 信号上输入图像的宽度和高度，i_sof=1 期间 i_w 和 i_h 要一直保持有效。
-3. **输入**：控制 i_e 和 i_x，从左到右，从上到下地输入该图像的所有像素。当 i_e=1 时，i_x 作为一个像素被输入。
-4. **图像间空闲**：所有像素输入结束后，需要空闲**至少 16 个周期**不做任何动作（即 i_sof=0，i_e=0）。然后才能跳到第2步，开始下一个图像。
-
-i_sof=1 和 i_e=1 之间；以及 i_e=1 各自之间可以插入任意个空闲气泡（即， i_sof=0，i_e=0），这意味着我们可以断断续续地输入像素（当然，不插入任何气泡才能达到最高性能）。
-
-下图展示了压缩 2 张图像的输入时序图（//代表省略若干周期，X代表don't care）。其中图像 1 在输入第一个像素后插入了 1 个气泡；而图像 2 在 i_sof=1 后插入了 1 个气泡。注意**图像间空闲**必须至少 **16 个周期**。
-
-               __    __//  __    __    __    __   //_    __    //    __    __//  __    __    __    //    __
-    clk    \__/  \__/  //_/  \__/  \__/  \__/  \__// \__/  \__///\__/  \__/  //_/  \__/  \__/  \__///\__/  \_
-                _______//________                 //           //     _______//________            //
-    i_sof  ____/       //        \________________//___________//____/       //        \___________//________
-                _______//________                 //           //     _______//________            //
-    i_w    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
-                _______//________                 //           //     _______//________            //
-    i_h    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
-                       //         _____       ____//_____      //            //               _____//____
-    i_e    ____________//________/     \_____/    //     \_____//____________//______________/     //    \___
-                       //         _____       ____//_____      //            //               _____//____
-    i_x    XXXXXXXXXXXX//XXXXXXXXX_____XXXXXXX____//_____XXXXXX//XXXXXXXXXXXX//XXXXXXXXXXXXXXX_____//____XXXX
-    
-    阶段：      |    开始图像1     |        输入图像1       | 图像间空闲  |    开始图像2      |       输入图像2       
-
-## 输出压缩流
-
-在输入过程中，**jls_encoder** 同时会输出压缩好的 **JPEG-LS流**，该流构成了完整的 .jls 文件的内容（包括文件头部和尾部）。o_e=1 时，o_data 是一个有效输出数据。其中，o_data 遵循大端序，即 o_data[15:8] 在流中的位置靠前，o_data[7:0] 在流中的位置靠后。在每个图像的输出流遇到最后一个数据时，o_last=1 指示一张图像的压缩流结束。
-
-
-
-# 仿真
-
-仿真相关文件都在 SIM 目录里，包括：
-
-* tb_jls_encoder.sv 是针对 jls_encoder 的 testbench。行为是：将指定文件夹里的 .pgm 格式的未压缩图像批量送入 jls_encoder 进行压缩，然后将 jls_encoder 的输出结果保存到 .jls 文件里。
-* tb_jls_encoder_run_iverilog.bat 包含了执行 iverilog 仿真的命令。
-* images 文件夹包含几张 .pgm 格式的图像文件。 .pgm 格式存储的是未压缩（也就是存储原始像素）的 8bit 灰度图像，可以使用 photoshop 软件或 Linux 图像查看器就能打开它（Windows图像查看器查看不了它）。
-
-> .pgm 文件格式非常简单，只有一个文件头来指示图像的长宽，然后紧接着就存放图像的所有原始像素。因此我选用 .pgm 文件作为仿真的输入文件，因为只需要在 testbench 中简单地编写一些代码就能解析 .pgm 文件，并把其中的像素取出发给 jls_encoder 。不过，你可以不关注 pgm 文件的格式，因为 jls_encoder 的工作与 pgm 格式并没有关系，它只需要接受图像的原始像素作为输入即可。你只需关注仿真的波形，关注图像像素是如何被送入 jls_encoder 中即可。
-
-使用 iverilog 进行仿真前，需要安装 iverilog ，见：[iverilog_usage](https://github.com/WangXuan95/WangXuan95/blob/main/iverilog_usage/iverilog_usage.md)
-
-然后双击 tb_jls_encoder_run_iverilog.bat 就可以运行仿真，该仿真需要运行十几分钟。
-
-仿真结束后，你可以看到文件夹中产生了几个 .jls 文件，它们就是压缩得到的图像文件。另外，仿真还产生了波形文件 dump.vcd ，你可以用 gtkwave 打开 dump.vcd 来查看波形。
-
-另外，你还可以修改一些仿真参数来进行：
-
- 修改 tb_jls_encoder.sv 里的宏名 **NEAR** 来改变压缩率。
- 修改 tb_jls_encoder.sv 里的宏名 **BUBBLE_CONTROL** 来决定输入相邻的像素间插入多少个气泡：
-  - **BUBBLE_CONTROL=0** 时，不插入任何气泡。
-  - **BUBBLE_CONTROL>0** 时，插入 **BUBBLE_CONTROL **个气泡。
-  - **BUBBLE_CONTROL<0** 时，每次插入随机的 **0~(-BUBBLE_CONTROL)** 个气泡
-
-> 在不同 NEAR 值和 BUBBLE_CONTROL 值下，本库已经经过了几百张照片的结果对比验证，充分保证无bug。（这部分自动化验证代码就没放上来了）
-
-## 查看压缩结果
-
-因为 **JPEG-LS** 比较小众和专业，大多数图片查看软件无法查看 .jls 文件。
-
-你可以试试用[该网站](https://filext.com/file-extension/JLS)来查看 .jls 文件（不过这个网站时常失效）。
-
-如果该网站失效，可以用我提供的解压器 decoder.exe 来把它解压回 .pgm 文件再查看。请在 SIM 目录下用 CMD 运行命令：
-
-```powershell
-.\decoder.exe <JLS_FILE_NAME> <PGM_FILE_NAME>
-```
-
-例如：
-
-```powershell
-.\decoder.exe test000.jls tmp.pgm
-```
-
-> 注：decoder.exe 编译自 UBC 提供的 C 语言源码： http://www.stat.columbia.edu/~jakulin/jpeg-ls/mirror.htm
-
-
-
-
-
-# FPGA 部署
-
-在 Xilinx Artix-7 xc7a35tcsg324-2 上，综合和实现的结果如下。
-
-|    LUT     |    FF    |              BRAM              | 最高时钟频率 |
-| :--------: | :------: | :----------------------------: | :----------: |
-| 2347 (11%) | 932 (2%) | 9个RAMB18 (9%)，等效于 144Kbit |    35 MHz    |
-
-35MHz 下，图像压缩的性能为 35 Mpixel/s ，对 1920x1080 图像的压缩帧率是 16.8fps 。
+![语言](https://img.shields.io/badge/语言-verilog_(IEEE1364_2001)-9A90FD.svg) ![仿真](https://img.shields.io/badge/仿真-iverilog-green.svg) ![部署](https://img.shields.io/badge/部署-quartus-blue.svg) ![部署](https://img.shields.io/badge/部署-vivado-FF1010.svg)

+[English](#en) | [中文](#cn)

+　

 <span id="en">FPGA JPEG-LS image compressor</span>
 ===========================

 **FPGA** based streaming **JPEG-LS** image compressor, features:

+* Pure Verilog design, compatible with various FPGA platforms.
 * For compressing **8bit** grayscale images.
 * Support **lossless mode**, i.e. NEAR=0 .
 * Support **lossy mode**, NEAR=1~7 adjustable.
 * The value range of image width is [5,16384], and the value range of height is [1,16384].
-* Minimalist streaming input and output.
-
+* Simple streaming input and output.

+　

 # Background

-**JPEG-LS** (abbreviated as **JLS**) is a lossless/lossy image compression algorithm which has the best lossless compression ratio compared to JPEG2000 and JPEG-XR. **JPEG-LS** uses the maximum difference between the pixels before and after compression (**NEAR** value) to control distortion, **NEAR=0** is the lossless mode; **NEAR>0** is the lossy mode, the larger the **NEAR**, the greater the distortion and the greater the compression ratio. The file suffix name for **JPEG-LS** compressed image is .**jls** .
+**JPEG-LS** (**JLS**) is a lossless/lossy image compression algorithm which has the best lossless compression ratio compared to PNG, Lossless-JPEG2000, Lossless-WEBP, Lossless-HEIF, etc. **JPEG-LS** uses the maximum difference between the pixels before and after compression (**NEAR** value) to control distortion, **NEAR=0** is the lossless mode; **NEAR>0** is the lossy mode, the larger the **NEAR**, the greater the distortion and the greater the compression ratio. The file suffix name for **JPEG-LS** compressed image is .**jls** .

+JPEG-LS has two generations:

+- JPEG-LS baseline (ITU-T T.87): JPEG-LS refers to the JPEG-LS baseline by default. **This repo implements the encoder of JPEG-LS baseline**. If you are interested in the software version of JPEG-LS baseline encoder, see https://github.com/WangXuan95/JPEG-LS (C language)
+- JPEG-LS extension (ITU-T T.870): Its compression ratio is higher than JPEG-LS baseline, but it is very rarely (even no code can be found online). **This repo is not about JPEG-LS extension**. However, I have a C implemented of JPEG-LS extension, see https://github.com/WangXuan95/JPEG-LS_extension
+
+　

 # Module Usage

@ -234,7 +94,7 @@ The following figure shows the input timing diagram of compressing 2 images (//r

 During the input, **jls_encoder** will also output a compressed **JPEG-LS stream**, which constitutes the content of the complete .jls file (including the file header and trailer). When `o_e=1`, `o_data` is a valid output data. Among them, `o_data` follows the big endian order, that is, `o_data[15:8]` is at the front of the stream, and `o_data[7:0]` is at the back of the stream. `o_last=1` indicates the end of the compressed stream for an image when the output stream for each image encounters the last data.

-
+　

 # RTL Simulation

@ -260,7 +120,7 @@ In addition, you can also modify some simulation parameters:
  - When **BUBBLE_CONTROL>0**, insert **BUBBLE_CONTROL ** bubbles.
  - When **BUBBLE_CONTROL<0**, insert random **0~(-BUBBLE_CONTROL)** bubbles each time.

-
+　

 ## View compressed JLS file

@ -282,7 +142,7 @@ For example:

 > Note: decoder.exe is compiled from the C language source code provided by UBC : http://www.stat.columbia.edu/~jakulin/jpeg-ls/mirror.htm

-
+　

 # FPGA Deployment

@ -294,3 +154,174 @@ On Xilinx Artix-7 xc7a35tcsg324-2, the synthesized and implemented results are a

 At 35MHz, the image compression performance is 35 Mpixel/s, which means the compression frame rate for 1920x1080 images is 16.8fps.

+　
+
+# Reference
+
+- ITU-T T.87 : Information technology – Lossless and near-lossless compression of continuous-tone still images – Baseline : https://www.itu.int/rec/T-REC-T.87/en
+- UBC's JPEG-LS baseline Public Domain Code : http://www.stat.columbia.edu/~jakulin/jpeg-ls/mirror.htm
+- Simple JPEG-LS baseline encoder in C language : https://github.com/WangXuan95/JPEG-LS 
+
+　
+
+　
+
+　
+
+<span id="cn">FPGA JPEG-LS image compressor</span>
+===========================
+
+基于 **FPGA** 的流式的 **JPEG-LS** 图像压缩器，特点是：
+
+* 纯 Verilog 设计，可在各种FPGA型号上部署
+* 用于压缩 **8bit** 的灰度图像。
+* 可选**无损模式**，即 NEAR=0 。
+* 可选**有损模式**，NEAR=1~7 可调。
+* 图像宽度取值范围为 [5,16384]，高度取值范围为 [1,16384]。
+* 极简流式输入输出。
+
+　
+
+# 背景知识
+
+**JPEG-LS** （简称**JLS**）是一种无损/有损的图像压缩算法，其无损模式的压缩率相当优异，优于 PNG、Lossless-JPEG2000、Lossless-WEBP、Lossless-HEIF 等。**JPEG-LS** 用压缩前后的像素的最大差值（**NEAR**值）来控制失真，无损模式下 **NEAR=0**；有损模式下**NEAR>0**，**NEAR** 越大，失真越大，压缩率也越大。**JPEG-LS** 压缩图像的文件后缀是 .**jls** 。
+
+JPEG-LS 有两代：
+
+- JPEG-LS baseline (ITU-T T.87) : 一般提到 JPEG-LS 默认都是指 JPEG-LS baseline。**本库也实现的是 JPEG-LS baseline 的 encoder** 。如果你对软件版本的 JPEG-LS baseline encoder 感兴趣，可以看 https://github.com/WangXuan95/JPEG-LS (C语言实现)
+- JPEG-LS extension (ITU-T T.870) : 其压缩率高于 JPEG-LS baseline ，但使用的非常少 (在网上搜不到任何代码) 。**本库与 JPEG-LS extension 无关！**不过我依照 ITU-T T.870 实现了 C 语言的 JPEG-LS extension，见 https://github.com/WangXuan95/JPEG-LS_extension
+
+　
+
+# 使用方法
+
+RTL 目录中的 [**jls_encoder.sv**](./RTL/jls_encoder.sv) 是用户可以调用的 JPEG-LS 压缩模块，它输入图像原始像素，输出 JPEG-LS 压缩流。
+
+## 模块参数
+
+**jls_encoder** 只有一个参数：
+
+```verilog
+parameter logic [2:0] NEAR
+```
+
+决定了 **NEAR** 值，取值为 3'd0 时，工作在无损模式；取值为  3'd1~3'd7 时，工作在有损模式。
+
+## 模块信号
+
+**jls_encoder** 的输入输出信号描述如下表。
+
+| 信号名称 | 全称 | 方向 | 宽度 | 描述 |
+| :---: | :---: | :---: | :---: | :--- |
+| rstn | 同步复位 | input | 1bit | 当时钟上升沿时若 rstn=0，模块复位，正常使用时 rstn=1 |
+| clk | 时钟 | input | 1bit | 时钟，所有信号都应该于 clk 上升沿对齐。 |
+| i_sof | 图像开始 | input | 1bit | 当需要输入一个新的图像时，保持至少368个时钟周期的 i_sof=1 |
+| i_w | 图像宽度-1 | input | 14bit | 例如图像宽度为 1920，则 i_w 应该置为 14‘d1919。需要在 i_sof=1 时保持有效。 |
+| i_h | 图像高度-1 | input | 14bit | 例如图像宽度为 1080，则 i_h 应该置为 14‘d1079。需要在 i_sof=1 时保持有效。 |
+| i_e | 输入像素有效 | input | 1bit | 当 i_e=1 时，一个像素需要被输入到 i_x 上。 |
+| i_x | 输入像素    | input | 8bit | 像素取值范围为 8'd0 ~ 8'd255 。 |
+| o_e | 输出有效    | output | 1bit | 当 o_e=1 时，输出流数据产生在 o_data 上。 |
+| o_data | 输出流数据 | output | 16bit | 大端序，o_data[15:8] 在先；o_data[7:0] 在后。 |
+| o_last | 输出流末尾 | output | 1bit | 当 o_e=1 时若 o_last=1 ，说明这是一张图像的输出流的最后一个数据。 |
+
+> 注：i_w 不能小于 14'd4 。
+
+## 输入图片
+
+**jls_encoder 模块**的操作的流程是：
+
+1. **复位**（可选）：令 rstn=0 至少 **1 个周期**进行复位，之后正常工作时都保持 rstn=1。实际上也可以不复位（即让 rstn 恒为1）。
+2. **开始**：保持 i_sof=1 **至少 368 个周期**，同时在 i_w 和 i_h 信号上输入图像的宽度和高度，i_sof=1 期间 i_w 和 i_h 要一直保持有效。
+3. **输入**：控制 i_e 和 i_x，从左到右，从上到下地输入该图像的所有像素。当 i_e=1 时，i_x 作为一个像素被输入。
+4. **图像间空闲**：所有像素输入结束后，需要空闲**至少 16 个周期**不做任何动作（即 i_sof=0，i_e=0）。然后才能跳到第2步，开始下一个图像。
+
+i_sof=1 和 i_e=1 之间；以及 i_e=1 各自之间可以插入任意个空闲气泡（即， i_sof=0，i_e=0），这意味着我们可以断断续续地输入像素（当然，不插入任何气泡才能达到最高性能）。
+
+下图展示了压缩 2 张图像的输入时序图（//代表省略若干周期，X代表don't care）。其中图像 1 在输入第一个像素后插入了 1 个气泡；而图像 2 在 i_sof=1 后插入了 1 个气泡。注意**图像间空闲**必须至少 **16 个周期**。
+
+               __    __//  __    __    __    __   //_    __    //    __    __//  __    __    __    //    __
+    clk    \__/  \__/  //_/  \__/  \__/  \__/  \__// \__/  \__///\__/  \__/  //_/  \__/  \__/  \__///\__/  \_
+                _______//________                 //           //     _______//________            //
+    i_sof  ____/       //        \________________//___________//____/       //        \___________//________
+                _______//________                 //           //     _______//________            //
+    i_w    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
+                _______//________                 //           //     _______//________            //
+    i_h    XXXXX_______//________XXXXXXXXXXXXXXXXX//XXXXXXXXXXX//XXXXX_______//________XXXXXXXXXXXX//XXXXXXXX
+                       //         _____       ____//_____      //            //               _____//____
+    i_e    ____________//________/     \_____/    //     \_____//____________//______________/     //    \___
+                       //         _____       ____//_____      //            //               _____//____
+    i_x    XXXXXXXXXXXX//XXXXXXXXX_____XXXXXXX____//_____XXXXXX//XXXXXXXXXXXX//XXXXXXXXXXXXXXX_____//____XXXX
+    
+    阶段：      |    开始图像1     |        输入图像1       | 图像间空闲  |    开始图像2      |       输入图像2       
+
+## 输出压缩流
+
+在输入过程中，**jls_encoder** 同时会输出压缩好的 **JPEG-LS流**，该流构成了完整的 .jls 文件的内容（包括文件头部和尾部）。o_e=1 时，o_data 是一个有效输出数据。其中，o_data 遵循大端序，即 o_data[15:8] 在流中的位置靠前，o_data[7:0] 在流中的位置靠后。在每个图像的输出流遇到最后一个数据时，o_last=1 指示一张图像的压缩流结束。
+
+　
+
+# 仿真
+
+仿真相关文件都在 SIM 目录里，包括：
+
+* tb_jls_encoder.sv 是针对 jls_encoder 的 testbench。行为是：将指定文件夹里的 .pgm 格式的未压缩图像批量送入 jls_encoder 进行压缩，然后将 jls_encoder 的输出结果保存到 .jls 文件里。
+* tb_jls_encoder_run_iverilog.bat 包含了执行 iverilog 仿真的命令。
+* images 文件夹包含几张 .pgm 格式的图像文件。 .pgm 格式存储的是未压缩（也就是存储原始像素）的 8bit 灰度图像，可以使用 photoshop 软件或 Linux 图像查看器就能打开它（Windows图像查看器查看不了它）。
+
+> .pgm 文件格式非常简单，只有一个文件头来指示图像的长宽，然后紧接着就存放图像的所有原始像素。因此我选用 .pgm 文件作为仿真的输入文件，因为只需要在 testbench 中简单地编写一些代码就能解析 .pgm 文件，并把其中的像素取出发给 jls_encoder 。不过，你可以不关注 pgm 文件的格式，因为 jls_encoder 的工作与 pgm 格式并没有关系，它只需要接受图像的原始像素作为输入即可。你只需关注仿真的波形，关注图像像素是如何被送入 jls_encoder 中即可。
+
+使用 iverilog 进行仿真前，需要安装 iverilog ，见：[iverilog_usage](https://github.com/WangXuan95/WangXuan95/blob/main/iverilog_usage/iverilog_usage.md)
+
+然后双击 tb_jls_encoder_run_iverilog.bat 就可以运行仿真，该仿真需要运行十几分钟。
+
+仿真结束后，你可以看到文件夹中产生了几个 .jls 文件，它们就是压缩得到的图像文件。另外，仿真还产生了波形文件 dump.vcd ，你可以用 gtkwave 打开 dump.vcd 来查看波形。
+
+另外，你还可以修改一些仿真参数来进行：
+
+- 修改 tb_jls_encoder.sv 里的宏名 **NEAR** 来改变压缩率。
+- 修改 tb_jls_encoder.sv 里的宏名 **BUBBLE_CONTROL** 来决定输入相邻的像素间插入多少个气泡：
+  - **BUBBLE_CONTROL=0** 时，不插入任何气泡。
+  - **BUBBLE_CONTROL>0** 时，插入 **BUBBLE_CONTROL **个气泡。
+  - **BUBBLE_CONTROL<0** 时，每次插入随机的 **0~(-BUBBLE_CONTROL)** 个气泡
+
+> 在不同 NEAR 值和 BUBBLE_CONTROL 值下，本库已经经过了几百张照片的结果对比验证，充分保证无bug。（这部分自动化验证代码就没放上来了）
+
+## 查看压缩结果
+
+因为 **JPEG-LS** 比较小众和专业，大多数图片查看软件无法查看 .jls 文件。
+
+你可以试试用[该网站](https://filext.com/file-extension/JLS)来查看 .jls 文件（不过这个网站时常失效）。
+
+如果该网站失效，可以用我提供的解压器 decoder.exe 来把它解压回 .pgm 文件再查看。请在 SIM 目录下用 CMD 运行命令：
+
+```powershell
+.\decoder.exe <JLS_FILE_NAME> <PGM_FILE_NAME>
+```
+
+例如：
+
+```powershell
+.\decoder.exe test000.jls tmp.pgm
+```
+
+> 注：decoder.exe 编译自 UBC 提供的 C 语言源码： http://www.stat.columbia.edu/~jakulin/jpeg-ls/mirror.htm
+
+　
+
+# FPGA 部署
+
+在 Xilinx Artix-7 xc7a35tcsg324-2 上，综合和实现的结果如下。
+
+|    LUT     |    FF    |              BRAM              | 最高时钟频率 |
+| :--------: | :------: | :----------------------------: | :----------: |
+| 2347 (11%) | 932 (2%) | 9个RAMB18 (9%)，等效于 144Kbit |    35 MHz    |
+
+35MHz 下，图像压缩的性能为 35 Mpixel/s ，对 1920x1080 图像的压缩帧率是 16.8fps 。
+
+　
+
+# 相关链接
+
+- ITU-T T.87 : Information technology – Lossless and near-lossless compression of continuous-tone still images – Baseline : https://www.itu.int/rec/T-REC-T.87/en
+- UBC's JPEG-LS baseline Public Domain Code : http://www.stat.columbia.edu/~jakulin/jpeg-ls/mirror.htm
+- 精简的 JPEG-LS baseline 编码器 (C语言) : https://github.com/WangXuan95/JPEG-LS 
--- a/RTL/jls_encoder.sv
+++ b/RTL/jls_encoder.sv
@ -2,12 +2,12 @@
 //--------------------------------------------------------------------------------------------------------
 // Module  : jls_encoder
 // Type    : synthesizable, IP's top
-// Standard: SystemVerilog 2005 (IEEE1800-2005)
+// Standard: Verilog 2001 (IEEE1364-2001)
 // Function: JPEG-LS image compressor
 //--------------------------------------------------------------------------------------------------------

 module jls_encoder #(
-    parameter [2:0] NEAR = 3'd1
+    parameter   [ 2:0] NEAR = 3'd0
 ) (
    input  wire        rstn,
    input  wire        clk,
@ -21,10 +21,12 @@ module jls_encoder #(
    output wire        o_last   // indicate the last output data of a image
 );

+
+
 //---------------------------------------------------------------------------------------------------------------------------
 // local parameters
 //---------------------------------------------------------------------------------------------------------------------------
-wire [3:0] P_QBPPS [8];
+wire [3:0] P_QBPPS [0:7];
 assign P_QBPPS[0] = 4'd8;
 assign P_QBPPS[1] = 4'd7;
 assign P_QBPPS[2] = 4'd6;
@ -34,19 +36,19 @@ assign P_QBPPS[5] = 4'd5;
 assign P_QBPPS[6] = 4'd5;
 assign P_QBPPS[7] = 4'd5;

-localparam logic               P_LOSSY     = NEAR != '0;
-localparam logic  signed [8:0] P_NEAR      = $signed({6'd0, NEAR});
-localparam logic  signed [8:0] P_T1        = $signed(9'd3) + $signed(9'd3) * P_NEAR;
-localparam logic  signed [8:0] P_T2        = $signed(9'd7) + $signed(9'd5) * P_NEAR;
-localparam logic  signed [8:0] P_T3        = $signed(9'd21)+ $signed(9'd7) * P_NEAR;
-localparam logic  signed [9:0] P_QUANT     = {P_NEAR, 1'b1};
-localparam logic  signed [9:0] P_QBETA     = $signed(10'd256 + {5'd0,NEAR,2'd0}) / P_QUANT;
-localparam logic  signed [9:0] P_QBETAHALF = (P_QBETA+$signed(10'd1)) / $signed(10'd2);
-wire        [3:0] P_QBPP      = P_QBPPS[NEAR];
-wire        [4:0] P_LIMIT     = 5'd31 - {1'b0, P_QBPP};
-localparam logic        [12:0] P_AINIT     = (NEAR=='0) ? 13'd4 : 13'd2;
+localparam                     P_LOSSY     = (NEAR != 3'd0);
+localparam        signed [8:0] P_NEAR      = $signed({6'd0, NEAR});
+localparam        signed [8:0] P_T1        = $signed(9'd3) + $signed(9'd3) * P_NEAR;
+localparam        signed [8:0] P_T2        = $signed(9'd7) + $signed(9'd5) * P_NEAR;
+localparam        signed [8:0] P_T3        = $signed(9'd21)+ $signed(9'd7) * P_NEAR;
+localparam        signed [9:0] P_QUANT     = {P_NEAR, 1'b1};
+localparam        signed [9:0] P_QBETA     = $signed(10'd256 + {5'd0,NEAR,2'd0}) / P_QUANT;
+localparam        signed [9:0] P_QBETAHALF = (P_QBETA+$signed(10'd1)) / $signed(10'd2);
+wire                     [3:0] P_QBPP      = P_QBPPS[NEAR];
+wire                     [4:0] P_LIMIT     = 5'd31 - {1'b0, P_QBPP};
+localparam              [12:0] P_AINIT     = (NEAR == 3'd0) ? 13'd4 : 13'd2;

-wire [3:0] J [32];
+wire [3:0] J [0:31];
 assign J[ 0] = 4'd0;
 assign J[ 1] = 4'd0;
 assign J[ 2] = 4'd0;
@ -85,188 +87,232 @@ assign J[31] = 4'd15;
 //---------------------------------------------------------------------------------------------------------------------------
 // function: is_near
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic func_is_near(input [7:0] x1, input [7:0] x2);
-    logic signed [8:0] ex1, ex2;
+function       [0:0] func_is_near;
+    input      [7:0] x1, x2;
+    reg signed [8:0] ex1, ex2;
+begin
    ex1 = $signed({1'b0,x1});
    ex2 = $signed({1'b0,x2});
-    return ex1 - ex2 <= P_NEAR && ex2 - ex1 <= P_NEAR;
+    func_is_near = ((ex1 - ex2 <= P_NEAR) && (ex2 - ex1 <= P_NEAR));
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: predictor (get_px)
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic [7:0] func_predictor(input [7:0] a, input [7:0] b, input [7:0] c);
+function  [7:0] func_predictor;
+    input [7:0] a, b, c;
+begin
    if( c>=a && c>=b )
-        return a>b ? b : a;
+        func_predictor = (a>b) ? b : a;
    else if( c<=a && c<=b )
-        return a>b ? a : b;
+        func_predictor = (a>b) ? a : b;
    else
-        return a - c + b;
+        func_predictor = a - c + b;
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: q_quantize
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic signed [3:0] func_q_quantize(input [7:0] x1, input [7:0] x2);
-    logic signed [8:0] delta;
+function signed [3:0] func_q_quantize;
+    input       [7:0] x1, x2;
+    reg signed  [8:0] delta;
+begin
    delta = $signed({1'b0,x1}) - $signed({1'b0,x2});
    if     (delta <= -P_T3 )
-        return -$signed(4'd4);
+        func_q_quantize = -$signed(4'd4);
    else if(delta <= -P_T2 )
-        return -$signed(4'd3);
+        func_q_quantize = -$signed(4'd3);
    else if(delta <= -P_T1 )
-        return -$signed(4'd2);
+        func_q_quantize = -$signed(4'd2);
    else if(delta <  -P_NEAR )
-        return -$signed(4'd1);
+        func_q_quantize = -$signed(4'd1);
    else if(delta <=  P_NEAR )
-        return  $signed(4'd0);
+        func_q_quantize =  $signed(4'd0);
    else if(delta <   P_T1 )
-        return  $signed(4'd1);
+        func_q_quantize =  $signed(4'd1);
    else if(delta <   P_T2 )
-        return  $signed(4'd2);
+        func_q_quantize =  $signed(4'd2);
    else if(delta <   P_T3 )
-        return  $signed(4'd3);
+        func_q_quantize =  $signed(4'd3);
    else
-        return  $signed(4'd4);
+        func_q_quantize =  $signed(4'd4);
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: get_q (part 1), qp1 = 81*Q(d-b) + 9*Q(b-c)
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic signed [9:0] func_get_qp1(input [7:0] c, input [7:0] b, input [7:0] d);
-    return $signed(10'd81) * func_q_quantize(d,b) + $signed(10'd9) * func_q_quantize(b,c);
+function signed [9:0] func_get_qp1;
+    input       [7:0] c, b, d;
+begin
+    func_get_qp1 = $signed(10'd81) * func_q_quantize(d,b) + $signed(10'd9) * func_q_quantize(b,c);
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: get_q (part 2), get sign(qs) and abs(qs), where qs = qp1 + Q(c-a)
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic [9:0] func_get_q(input signed [9:0] qp1, input [7:0] c, input [7:0] a);
-    logic signed [9:0] qs;
-    logic              s;
-    logic        [8:0] q;
+function         [9:0] func_get_q;
+    input signed [9:0] qp1;
+    input        [7:0] c, a;
+    reg   signed [9:0] qs;
+    reg                s;
+    reg          [8:0] q;
+begin
    qs = qp1 + func_q_quantize(c,a);
    s = qs[9];
    q = s ? (~qs[8:0]+9'd1) : qs[8:0];
-    return {s, q};
+    func_get_q = {s, q};
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: clip
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic [7:0] func_clip(input signed [9:0] val);
+function         [7:0] func_clip;
+    input signed [9:0] val;
+begin
    if( val > $signed(10'd255) )
-        return 8'd255;
+        func_clip = 8'd255;
    else if( val < $signed(10'd0) )
-        return 8'd0;
+        func_clip = 8'd0;
    else
-        return val[7:0];
+        func_clip = val[7:0];
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: errval_quantize
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic signed [9:0] func_errval_quantize(input signed [9:0] err);
+function  signed [9:0] func_errval_quantize;
+    input signed [9:0] err;
+begin
    if(err[9])
-        return -( (P_NEAR - err) / P_QUANT );
+        func_errval_quantize = -( (P_NEAR - err) / P_QUANT );
    else
-        return    (P_NEAR + err) / P_QUANT;
+        func_errval_quantize =    (P_NEAR + err) / P_QUANT;
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: modrange
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic signed [9:0] func_modrange(input signed [9:0] val);
-    logic signed [9:0] new_val;
-    new_val = val;
-    if( new_val[9] )
-        new_val += P_QBETA;
-    if( new_val >= P_QBETAHALF )
-        new_val -= P_QBETA;
-    return new_val;
+function  signed [9:0] func_modrange;
+    input signed [9:0] val;
+begin
+    func_modrange = val;
+    if( func_modrange[9] )
+        func_modrange = func_modrange + P_QBETA;
+    if( func_modrange >= P_QBETAHALF )
+        func_modrange = func_modrange - P_QBETA;
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: get k
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic [3:0] func_get_k(input [12:0] A, input [6:0] N, input rt);
-    logic [18:0] Nt, At;
-    logic [ 3:0] k;
+function  [ 3:0] func_get_k;
+    input [12:0] A;
+    input [ 6:0] N;
+    input        rt;
+    reg   [18:0] Nt, At;
+    reg   [ 3:0] ii;
+begin
    Nt = {12'h0, N};
    At = { 6'h0, A};
-    k = 4'd0;
-    if(rt)
-        At += {13'd0, N[6:1]};
-    for(int ii=0; ii<13; ii++)
+    func_get_k = 4'd0;
+    if (rt)
+        At = At + {13'd0, N[6:1]};
+    for (ii=4'd0; ii<4'd13; ii=ii+4'd1)
        if((Nt<<ii) < At)
-            k++;
-    return k;
+            func_get_k = func_get_k + 4'd1;
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: B update for run mode
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic [6:0] B_update(input reset, input [6:0] B, input errm0);
+function  [6:0] B_update;
+    input       reset;
+    input [6:0] B;
+    input       errm0;
+begin
    B_update = B;
-    if(errm0)
-        B_update ++;
-    if(reset)
-        B_update >>>= 1;
+    if (errm0)
+        B_update = B_update + 7'd1;
+    if (reset)
+        B_update = B_update >>> 1;
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: C, B update for regular mode
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic [14:0] C_B_update(input reset, input [6:0] N, input signed [7:0] C, input signed [6:0] B, input signed [9:0] err);
-    logic signed [9:0] Bt;
-    logic signed [7:0] Ct;
+function        [14:0] C_B_update;
+    input              reset;
+    input        [6:0] N;
+    input signed [7:0] C;
+    input signed [6:0] B;
+    input signed [9:0] err;
+    reg   signed [9:0] Bt;
+    reg   signed [7:0] Ct;
+begin
    Bt = B;
    Ct = C;
-    Bt += err * P_QUANT;
+    Bt = Bt + (err * P_QUANT);
    if(reset)
-        Bt >>>= 1;
+        Bt = Bt >>> 1;
    if( Bt <= -$signed({3'd0,N}) ) begin
-        Bt += $signed({3'd0,N});
+        Bt = Bt + $signed({3'd0,N});
        if( Bt <= -$signed({3'd0,N}) )
            Bt = -$signed({3'd0,N}-10'd1);
        if( Ct != $signed(8'd128) )
-            Ct--;
+            Ct = Ct - 8'd1;
    end else if( Bt > $signed(10'd0) ) begin
-        Bt -= $signed({3'd0,N});
+        Bt = Bt - $signed({3'd0,N});
        if( Bt > $signed(10'd0) )
            Bt = $signed(10'd0);
        if( Ct != $signed(8'd127) )
-            Ct++;
+            Ct = Ct + 8'd1;
    end
-    return {Ct, Bt[6:0]};
+    C_B_update = {Ct, Bt[6:0]};
+end
 endfunction


 //---------------------------------------------------------------------------------------------------------------------------
 // function: A update
 //---------------------------------------------------------------------------------------------------------------------------
-function automatic logic [12:0] A_update(input reset, input [12:0] A, input [9:0] inc);
+function  [12:0] A_update;
+    input        reset;
+    input [12:0] A;
+    input [ 9:0] inc;
+begin
    A_update = A + {3'd0, inc};
    if(reset)
-        A_update >>>= 1;
+        A_update = A_update >>> 1;
+end
 endfunction


 //-------------------------------------------------------------------------------------------------------------------
 // context memorys
 //-------------------------------------------------------------------------------------------------------------------
-reg        [ 5:0] Nram [366];
-reg        [12:0] Aram [366];
-reg signed [ 6:0] Bram [366];
+reg        [ 5:0] Nram [0:365];
+reg        [12:0] Aram [0:365];
+reg signed [ 6:0] Bram [0:365];
 reg signed [ 7:0] Cram [1:364];


@ -285,7 +331,7 @@ reg [14:0] a_jj;

 always @ (posedge clk)
    if(~rstn) begin
-        {a_sof, a_e, a_x, a_w, a_h, a_wl, a_hl, a_ii, a_jj} <= '0;
+        {a_sof, a_e, a_x, a_w, a_h, a_wl, a_hl, a_ii, a_jj} <= 0;
    end else begin
        a_sof <= i_sof;
        a_e <= i_e;
@ -295,13 +341,13 @@ always @ (posedge clk)
        if(a_sof) begin
            a_wl <= (a_w<14'd4 ? 14'd4 : a_w);
            a_hl <= a_h;
-            a_ii <= '0;
-            a_jj <= '0;
+            a_ii <= 14'd0;
+            a_jj <= 15'd0;
        end else if(a_e) begin
            if(a_ii < a_wl)
                a_ii <= a_ii + 14'd1;
            else begin
-                a_ii <= '0;
+                a_ii <= 14'd0;
                if(a_jj <= {1'b0,a_hl})
                    a_jj <= a_jj + 15'd1;
            end
@ -324,12 +370,12 @@ reg [ 7:0] b_x;
 always @ (posedge clk) begin
    b_sof <= a_sof & rstn;
    if(~rstn | a_sof) begin
-        {b_e, b_fc, b_lc, b_fr, b_eof, b_ii, b_x} <= '0;
+        {b_e, b_fc, b_lc, b_fr, b_eof, b_ii, b_x} <= 0;
    end else begin
        b_e <= a_e & (a_jj <= {1'b0,a_hl});
-        b_fc <= a_e & (a_ii == '0);
+        b_fc <= a_e & (a_ii == 14'd0);
        b_lc <= a_e & (a_ii == a_wl);
-        b_fr <= a_e & (a_jj == '0);
+        b_fr <= a_e & (a_jj == 15'd0);
        b_eof <= a_jj > {1'b0,a_hl};
        b_ii <= a_ii;
        b_x <= a_x;
@ -356,7 +402,7 @@ reg [ 7:0] c_d;
 always @ (posedge clk) begin
    c_sof <= b_sof & rstn;
    if(~rstn | b_sof) begin
-        {c_e,c_fc,c_lc,c_fr,c_eof,c_ii,c_x,c_b,c_bt,c_c} <= '0;
+        {c_e,c_fc,c_lc,c_fr,c_eof,c_ii,c_x,c_b,c_bt,c_c} <= 0;
    end else begin
        c_e <= b_e;
        c_fc <= b_fc;
@ -366,10 +412,10 @@ always @ (posedge clk) begin
        c_ii <= b_ii;
        if(b_e) begin
            c_x <= b_x;
-            c_b <= b_fr ? '0 : c_d;
+            c_b <= b_fr ? 8'd0 : c_d;
            if(b_fr) begin
-                c_bt <= '0;
-                c_c <= '0;
+                c_bt <= 8'd0;
+                c_c <= 8'd0;
            end else if(b_fc) begin
                c_bt <= c_d;
                c_c <= c_bt;
@ -394,12 +440,13 @@ reg [ 7:0] d_b;
 reg [ 7:0] d_c;
 reg signed [9:0] d_qp1;

+wire [7:0] c_w_d = c_fr ? 8'd0 : (c_lc ? c_b : c_d);
+
 always @ (posedge clk) begin
    d_sof <= c_sof & rstn;
    if(~rstn | c_sof) begin
-        {d_e, d_fc, d_lc, d_eof, d_ii, d_x, d_b, d_c, d_qp1} <= '0;
+        {d_e, d_fc, d_lc, d_eof, d_ii, d_x, d_b, d_c, d_qp1} <= 0;
    end else begin
-        logic [7:0] d;
        d_e <= c_e;
        d_fc <= c_fc;
        d_lc <= c_lc;
@ -408,8 +455,7 @@ always @ (posedge clk) begin
        d_x <= c_x;
        d_b <= c_b;
        d_c <= c_c;
-        d = c_fr ? '0 : (c_lc ? c_b : c_d);
-        d_qp1 <= func_get_qp1(c_c, c_b, d);
+        d_qp1 <= func_get_qp1(c_c, c_b, c_w_d);
    end
 end

@ -436,48 +482,49 @@ reg        [5:0] e_Nn;
 reg signed [7:0] e_Cn;
 reg signed [6:0] e_Bn;

+wire       [7:0] d_w_a = d_fc ? d_b : e_x;
+
+reg              s;           // not real register
+reg        [8:0] q;           // not real register
+reg              rt;          // not real register
+reg              runi;        // not real register
+reg              rune;        // not real register
+reg signed [7:0] Co;          // not real register
+reg        [6:0] No, Nn;      // not real register
+reg signed [6:0] Bo;          // not real register
+reg signed [9:0] px;          // not real register
+reg signed [9:0] err;         // not real register
+
 always @ (posedge clk) begin
    e_sof <= d_sof & rstn;
    e_2BleN <= 1'b0;
-    {e_write_C, e_Cn, e_write_en, e_Bn, e_Nn} <= '0;
+    {e_write_C, e_Cn, e_write_en, e_Bn, e_Nn} <= 0;
    if(~rstn | d_sof) begin
-        {e_e, e_fc, e_lc, e_eof, e_ii, e_runi, e_rune, e_x, e_q, e_rt, e_err, e_No} <= '0;
+        {e_e, e_fc, e_lc, e_eof, e_ii, e_runi, e_rune, e_x, e_q, e_rt, e_err, e_No} <= 0;
    end else begin
-        logic        [7:0] a;
-        logic              s;
-        logic        [8:0] q;
-        logic              rt;
-        logic              runi;
-        logic              rune;
-        logic signed [7:0] Co;
-        logic        [6:0] No, Nn;
-        logic signed [6:0] Bo;
-        logic signed [9:0] px;
-        logic signed [9:0] err;
-        a = d_fc ? d_b : e_x;
        rt = 1'b0;
        rune = 1'b0;
-        No = '0;
-        err = '0;
-        {s, q} = func_get_q(d_qp1, d_c, a);
+        No = 0;
+        err = 0;
+        {s, q} = func_get_q(d_qp1, d_c, d_w_a);
        Co = (e_write_C & e_q==q) ? e_Cn : Cram[q];
        runi = ~d_fc & e_runi | (q == 9'd0);
        if(runi) begin
-            runi = func_is_near(d_x, a);
+            runi = func_is_near(d_x, d_w_a);
            rune = ~runi;
        end
        if(d_e) begin
            if(runi) begin
-                e_x <= P_LOSSY ? a : d_x;
+                e_x <= P_LOSSY ? d_w_a : d_x;
            end else begin
                if(rune) begin
-                    rt = func_is_near(d_b, a);
-                    s = {1'b0,a} > ({1'b0,d_b} + {6'd0,NEAR}) ? 1'b1 : 1'b0;
+                    rt = func_is_near(d_b, d_w_a);
+                    s = {1'b0,d_w_a} > ({1'b0,d_b} + {6'd0,NEAR}) ? 1'b1 : 1'b0;
                    q = rt ? 9'd365 : 9'd0;
-                    px = rt ? a : d_b;
+                    px = rt ? d_w_a : d_b;
                end else begin
                    px[9:8] = 2'b00;
-                    px[7:0] = func_clip( $signed({2'h0, func_predictor(a,d_b,d_c)}) + ( s ? -$signed({Co[7],Co[7],Co}) : $signed({Co[7],Co[7],Co}) ) );
+                    px[7:0] = func_clip( $signed({2'h0, func_predictor(d_w_a, d_b, d_c)}) + ( s ? -$signed({Co[7],Co[7],Co}) : $signed({Co[7],Co[7],Co}) ) );
                end
                err = s ? px - $signed({2'd0, d_x}) : $signed({2'd0, d_x}) - px;
                err = func_errval_quantize(err);
@ -485,7 +532,7 @@ always @ (posedge clk) begin
                err = func_modrange(err);
                No = ((e_write_en & e_q==q) ? e_Nn : Nram[q]) + 7'd1;
                Nn = No;
-                if(No[6]) Nn >>>= 1;
+                if(No[6]) Nn = Nn >>> 1;
                e_Nn <= Nn[5:0];
                Bo = (e_write_en & e_q==q) ? e_Bn : Bram[q];
                e_write_en <= 1'b1;
@ -518,6 +565,7 @@ end
 // pipeline stage f: write Cram, Bram, Nram
 //-------------------------------------------------------------------------------------------------------------------
 reg [8:0] NBC_init_addr;
+
 always @ (posedge clk)
    NBC_init_addr <= e_sof ? NBC_init_addr + (NBC_init_addr < 9'd366 ? 9'd1 : 9'd0) : 9'd0;

@ -556,7 +604,7 @@ reg              ef_write_en;
 always @ (posedge clk) begin
    ef_sof <= e_sof & rstn;
    if(~rstn | e_sof) begin
-        {ef_e, ef_fc, ef_lc, ef_eof, ef_runi, ef_rune, ef_2BleN, ef_q, ef_rt, ef_err, ef_No, ef_write_en} <= '0;
+        {ef_e, ef_fc, ef_lc, ef_eof, ef_runi, ef_rune, ef_2BleN, ef_q, ef_rt, ef_err, ef_No, ef_write_en} <= 0;
    end else begin
        ef_e <= e_e;
        ef_fc <= e_fc;
@ -603,50 +651,50 @@ reg        g_write_en;
 reg [12:0] g_An;
 always @ (posedge clk)
    if(~rstn | f_sof)
-        {g_q, g_write_en, g_An} <= '0;
+        {g_q, g_write_en, g_An} <= 0;
    else
        {g_q, g_write_en, g_An} <= {f_q, f_write_en, f_An};

+reg [ 1:0] on;         // not real register
+reg [15:0] rc;         // not real register
+reg [ 4:0] ri;         // not real register
+reg [12:0] Ao;         // not real register
+reg [ 3:0] k;          // not real register
+reg [ 9:0] abserr;     // not real register
+reg [ 9:0] merr, Ainc; // not real register
+reg        map;        // not real register

 always @ (posedge clk) begin
    f_sof <= ef_sof & rstn;
    f_limit <= P_LIMIT;
    f_An <= P_AINIT;
    if(~rstn | ef_sof) begin
-        {f_e, f_eof, f_runi, f_rune, f_merr, f_k, f_rc, f_ri, f_on, f_cb, f_cn, f_q, f_write_en} <= '0;
+        {f_e, f_eof, f_runi, f_rune, f_merr, f_k, f_rc, f_ri, f_on, f_cb, f_cn, f_q, f_write_en} <= 0;
    end else begin
-        logic [ 1:0] on;
-        logic [15:0] rc;
-        logic [ 4:0] ri;
-        logic [12:0] Ao;
-        logic [ 3:0] k;
-        logic [ 9:0] abserr;
-        logic [ 9:0] merr, Ainc;
-        logic        map;
-        on = '0;
-        rc = (ef_fc|~ef_runi) ? '0 : f_rc;
+        on = 2'd0;
+        rc = (ef_fc|~ef_runi) ? 16'd0 : f_rc;
        ri = f_ri;
        Ao = (f_write_en & f_q==ef_q) ? f_An : (g_write_en & g_q==ef_q) ? g_An : ef_Ao;
        abserr = ef_err<$signed(10'd0) ? $unsigned(-ef_err) : $unsigned(ef_err);
-        merr='0;
-        Ainc='0;
+        merr = 10'd0;
+        Ainc = 10'd0;
        f_write_en <= ef_write_en;
        k = func_get_k(Ao, ef_No, ef_rt);
-        f_cb <= ef_fc ? '0 : f_rc;
+        f_cb <= ef_fc ? 16'd0 : f_rc;
        f_cn <= {1'b0,J[ri]} + 5'd1;
        if(ef_runi) begin
-            rc ++;
+            rc = rc + 16'd1;
            if(rc >= (16'd1<<J[ri])) begin
-                on++;
-                rc -= (16'd1<<J[ri]);
-                if(ri < 5'd31) ri ++;
+                on = on + 2'd1;
+                rc = rc - (16'd1<<J[ri]);
+                if(ri < 5'd31) ri = ri + 5'd1;
            end
            if(ef_lc & (rc > 16'd0))
-                on++;
+                on = on + 2'd1;
        end else if(ef_rune) begin
            f_limit <= P_LIMIT - 5'd1 - {1'b0,J[ri]};
-            if(ri > '0) ri --;
-            map = ~( (ef_err=='0) | ( (ef_err>$signed(10'd0)) ^ (k==4'd0 & ef_2BleN) ) );
+            if (ri > 5'd0) ri = ri - 5'd1;
+            map = ~( (ef_err==10'd0) | ( (ef_err>$signed(10'd0)) ^ (k==4'd0 & ef_2BleN) ) );
            merr = (abserr<<1) - {9'd0,ef_rt} - {9'd0,map};
            Ainc = ((merr + {9'd0,~ef_rt}) >> 1);
        end else begin
@ -700,21 +748,21 @@ reg  [ 4:0] g_zn;   // in range of 0~27
 reg  [ 9:0] g_db;
 reg  [ 3:0] g_dn;   // in range of 0~13

+wire [ 9:0] f_w_merr_sk = (f_merr >> f_k);
+
 always @ (posedge clk) begin
    g_sof <= f_sof & rstn;
    if(~rstn | f_sof) begin
-        {g_e, g_eof, g_runi, g_on, g_cb, g_cn, g_zn, g_db, g_dn} <= '0;
+        {g_e, g_eof, g_runi, g_on, g_cb, g_cn, g_zn, g_db, g_dn} <= 0;
    end else begin
-        logic [9:0] merr_sk;
-        merr_sk = f_merr >> f_k;
        g_e <= f_e;
        g_eof <= f_eof;
        g_runi <= f_runi;
        g_on <= f_on;
-        g_cb <= f_rune ? f_cb : '0;
-        g_cn <= f_rune ? f_cn : '0;
-        if(merr_sk < f_limit) begin 
-            g_zn <= merr_sk[4:0];
+        g_cb <= f_rune ? f_cb : 16'd0;
+        g_cn <= f_rune ? f_cn : 5'd0;
+        if(f_w_merr_sk < f_limit) begin 
+            g_zn <= f_w_merr_sk[4:0];
            g_db <= f_merr & ~(10'h3ff<<f_k);
            g_dn <= f_k;
        end else begin
@ -736,7 +784,8 @@ reg [ 5:0] h_bn;  // in range of 0~57

 always @ (posedge clk) begin
    h_sof <= g_sof & rstn;
-    {h_bb, h_bn} <= '0;
+    h_bb <= 57'd0;
+    h_bn <= 6'd0;
    if(~rstn | g_sof) begin
        h_eof <= 1'b0;
    end else begin
@ -767,42 +816,46 @@ reg [15:0] j_data;
 reg[247:0] j_bbuf;
 reg [ 7:0] j_bcnt;

+reg [247:0] bbuf;         // not real register
+reg [  7:0] bcnt;         // not real register
+
 always @ (posedge clk) begin
    j_sof <= h_sof & rstn;
-    {j_e, j_data} <= '0;
+    j_e    <= 1'b0;
+    j_data <= 16'h0;
    if(~rstn | h_sof) begin
-        {j_eof, j_bbuf, j_bcnt} <= '0;
+        j_eof  <= 1'b0;
+        j_bbuf <= 248'd0;
+        j_bcnt <= 8'h0;
    end else begin
-        logic [247:0] bbuf;
-        logic [  7:0] bcnt;
        bbuf = j_bbuf | ({h_bb,191'h0} >> j_bcnt);
        bcnt = j_bcnt + {2'd0,h_bn};
        if(bcnt >= 8'd16) begin
            j_e <= 1'b1;
            j_data[15:8] <= bbuf[247:240];
-            if(bbuf[247:240] == '1) begin
+            if(bbuf[247:240] == 8'hFF) begin
                bbuf = {1'h0, bbuf[239:0], 7'h0};
-                bcnt -= 8'd7;
+                bcnt = bcnt - 8'd7;
            end else begin
                bbuf = {      bbuf[239:0], 8'h0};
-                bcnt -= 8'd8;
+                bcnt = bcnt - 8'd8;
            end
            j_data[ 7:0] <= bbuf[247:240];
-            if(bbuf[247:240] == '1) begin
+            if(bbuf[247:240] == 8'hFF) begin
                bbuf = {1'h0, bbuf[239:0], 7'h0};
-                bcnt -= 8'd7;
+                bcnt = bcnt - 8'd7;
            end else begin
                bbuf = {      bbuf[239:0], 8'h0};
-                bcnt -= 8'd8;
+                bcnt = bcnt - 8'd8;
            end
        end else if(h_eof && bcnt > 8'd0) begin
            j_e <= 1'b1;
            j_data[15:8] <= bbuf[247:240];
-            if(bbuf[247:240] == '1)
+            if(bbuf[247:240] == 8'hFF)
                j_data[ 7:0] <= {1'b0,bbuf[239:233]};
            else
                j_data[ 7:0] <= bbuf[239:232];
-            bbuf = '0;
+            bbuf = 248'd0;
            bcnt = 8'd0;
        end
        j_bbuf <= bbuf;
@ -816,7 +869,7 @@ end
 // make .jls file header and footer
 //-------------------------------------------------------------------------------------------------------------------
 reg [15:0] jls_wl, jls_hl;
-wire[15:0] jls_header [13];
+wire[15:0] jls_header [0:12];
 assign jls_header[0] = 16'hFFD8;
 assign jls_header[1] = 16'h00FF;
 assign jls_header[2] = 16'hF700;
@ -831,10 +884,11 @@ assign jls_header[10]= 16'h0101;
 assign jls_header[11]= {13'b0,NEAR};
 assign jls_header[12]= 16'h0000;
 wire[15:0] jls_footer = 16'hFFD9;
+
 always @ (posedge clk)
    if(~rstn) begin
-        jls_wl <= '0;
-        jls_hl <= '0;
+        jls_wl <= 16'd0;
+        jls_hl <= 16'd0;
    end else begin
        jls_wl <= {2'd0,a_wl} + 16'd1;
        jls_hl <= {2'd0,a_hl} + 16'd1;
@ -853,21 +907,21 @@ reg [15:0] k_data;
 always @ (posedge clk) begin
    k_last <= 1'b0;
    k_e <= 1'b0;
-    k_data <= '0;
+    k_data <= 16'd0;
    if(j_sof) begin
-        k_footer_i <= '0;
+        k_footer_i <= 1'b0;
        if(k_header_i < 4'd13) begin
            k_e <= 1'b1;
            k_data <= jls_header[k_header_i];
            k_header_i <= k_header_i + 4'd1;
        end
    end else if(j_e) begin
-        k_header_i <= '0;
-        k_footer_i <= '0;
+        k_header_i <= 4'd0;
+        k_footer_i <= 1'b0;
        k_e <= 1'b1;
        k_data <= j_data;
    end else if(j_eof) begin
-        k_header_i <= '0;
+        k_header_i <= 4'd0;
        k_footer_i <= 1'b1;
        if(~k_footer_i) begin
            k_last <= 1'b1;
@ -875,8 +929,8 @@ always @ (posedge clk) begin
            k_data <= jls_footer;
        end
    end else begin
-        k_header_i <= '0;
-        k_footer_i <= '0;
+        k_header_i <= 4'd0;
+        k_footer_i <= 1'b0;
    end
 end

@ -884,7 +938,7 @@ end
 //-------------------------------------------------------------------------------------------------------------------
 // linebuffer for context pixels
 //-------------------------------------------------------------------------------------------------------------------
-reg [7:0] linebuffer [1<<14];
+reg [7:0] linebuffer [0:(1<<14)-1];
 always @ (posedge clk)  // line buffer read
    c_d <= linebuffer[a_ii];
 always @ (posedge clk)  // line buffer write
--- a/SIM/tb_jls_encoder.sv
+++ b/SIM/tb_jls_encoder.sv
@ -10,10 +10,10 @@

 `timescale 1ps/1ps

-`define NEAR 1     // NEAR can be 0~7
+`define NEAR          1    // NEAR can be 0~7

-`define FILE_NO_FIRST 1   // first input file name is test000.pgm
-`define FILE_NO_FINAL 8   // final input file name is test000.pgm
+`define FILE_NO_FIRST 1    // first input file name is test000.pgm
+`define FILE_NO_FINAL 8    // final input file name is test000.pgm


 // bubble numbers that insert between pixels
@ -51,30 +51,33 @@ initial begin repeat(4) @(posedge clk); rstn<=1'b1; end
 // -------------------------------------------------------------------------------------------------------------------
 //   signals for jls_encoder_i module
 // -------------------------------------------------------------------------------------------------------------------
-reg        i_sof = '0;
-reg [13:0] i_w = '0;
-reg [13:0] i_h = '0;
-reg        i_e = '0;
-reg [ 7:0] i_x = '0;
+reg        i_sof = 0;
+reg [13:0] i_w = 0;
+reg [13:0] i_h = 0;
+reg        i_e = 0;
+reg [ 7:0] i_x = 0;
 wire       o_e;
 wire[15:0] o_data;
 wire       o_last;



-logic [7:0] img [4096*4096];
-int w = 0, h = 0;
+reg [7:0] img [4096*4096-1:0];
+integer w = 0, h = 0;

-task automatic load_img(input logic [256*8:1] fname);
-    int linelen, depth=0, scanf_num;
-    logic [256*8-1:0] line;
-    int fp = $fopen(fname, "rb");
-    if(fp==0) begin
+task load_img;
+    input [256*8:1] fname;
+    reg [256*8-1:0] line;
+    integer linelen, depth, scanf_num, fp, i;
+begin
+    depth = 0;
+    fp = $fopen(fname, "rb");
+    if (fp==0) begin
        $display("*** error: could not open file %s", fname);
        $finish;
    end
    linelen = $fgets(line, fp);
-    if(line[8*(linelen-2)+:16] != 16'h5035) begin
+    if (line[8*(linelen-2)+:16] != 16'h5035) begin
        $display("*** error: the first line must be P5");
        $fclose(fp);
        $finish;
@ -87,17 +90,19 @@ task automatic load_img(input logic [256*8:1] fname);
    end
    scanf_num = $fgets(line, fp);
    scanf_num = $sscanf(line, "%d", depth);
-    if(depth!=255) begin
+    if (depth!=255) begin
        $display("*** error: images depth must be 255");
        $fclose(fp);
        $finish;
    end
-    for(int i=0; i<h*w; i++)
+    for (i=0; i<h*w; i=i+1)
        img[i] = $fgetc(fp);
    $fclose(fp);
+end
 endtask


+
 // -------------------------------------------------------------------------------------------------------------------
 //   task: feed image pixels to jls_encoder_i module
 //   arguments:
@ -108,20 +113,21 @@ endtask
 //              when > 0, insert bubble_control bubbles
 //              when < 0, insert random 0~bubble_control bubbles
 // -------------------------------------------------------------------------------------------------------------------
-task automatic feed_img(input int bubble_control);
-    int num_bubble;
-    
+task feed_img;
+    input integer bubble_control;
+    integer       num_bubble, i;
+begin
    // start feeding a image by assert i_sof for 368 cycles
    repeat(368) begin
        @(posedge clk)
        i_sof <= 1'b1;
        i_w <= w - 1;
        i_h <= h - 1;
-        {i_e, i_x} <= '0;
+        {i_e, i_x} <= 0;
    end
    
    // for all pixels of the image
-    for(int i=0; i<h*w; i++) begin
+    for(i=0; i<h*w; i=i+1) begin
        
        // calculate how many bubbles to insert
        if(bubble_control<0) begin
@ -133,17 +139,18 @@ task automatic feed_img(input int bubble_control);
        end
        
        // insert bubbles
-        repeat(num_bubble) @(posedge clk) {i_sof, i_w, i_h, i_e, i_x} <= '0;
+        repeat(num_bubble) @(posedge clk) {i_sof, i_w, i_h, i_e, i_x} <= 0;
        
        // assert i_e to input a pixel
        @(posedge clk)
-        {i_sof, i_w, i_h} <= '0;
+        {i_sof, i_w, i_h} <= 0;
        i_e <= 1'b1;
        i_x <= img[i];
    end
    
    // 16 cycles idle between images
-    repeat(16) @(posedge clk) {i_sof, i_w, i_h, i_e, i_x} <= '0;
+    repeat(16) @(posedge clk) {i_sof, i_w, i_h, i_e, i_x} <= 0;
+end
 endtask


@ -170,10 +177,11 @@ jls_encoder #(
 // -------------------------------------------------------------------------------------------------------------------
 //  read images, feed them to jls_encoder_i module 
 // -------------------------------------------------------------------------------------------------------------------
-int file_no;    // file number
+integer file_no;    // file number
+reg [256*8:1] input_file_name;
+reg [256*8:1] input_file_format;
+
 initial begin
-    logic [256*8:1] input_file_name;
-    logic [256*8:1] input_file_format;
    $sformat(input_file_format , "%s\\%s.pgm",  `INPUT_PGM_DIR, `FILE_NAME_FORMAT);
    
    while(~rstn) @ (posedge clk);
@ -182,7 +190,7 @@ initial begin
        $sformat(input_file_name, input_file_format , file_no);

        load_img(input_file_name);
-        $display("%s (%5dx%5d)", input_file_name, w, h);
+        $display("%100s (%5dx%5d)", input_file_name, w, h);

        if( w < 5 || w > 16384 || h < 1 || h > 16383 )         // image size not supported
            $display("  *** image size not supported ***");
@ -202,8 +210,8 @@ end
 logic [256*8:1] output_file_format;
 initial $sformat(output_file_format, "%s\\%s.jls", `OUTPUT_JLS_DIR, `FILE_NAME_FORMAT);
 logic [256*8:1] output_file_name;
-int opened = 0;
-int jls_file = 0;
+integer opened = 0;
+integer jls_file = 0;

 always @ (posedge clk)
    if(o_e) begin
--- a/SIM/tb_jls_encoder_run_iverilog.bat
+++ b/SIM/tb_jls_encoder_run_iverilog.bat
@ -1,5 +1,5 @@
 del sim.out dump.vcd
-iverilog  -g2005-sv  -o sim.out  tb_jls_encoder.sv  ../RTL/jls_encoder.sv
+iverilog  -g2001  -o sim.out  tb_jls_encoder.v  ../RTL/jls_encoder.v
 vvp -n sim.out
 del sim.out
 pause