change to Verilog2001

2025-01-13 20:22:52 +08:00 · 2023-06-07 20:54:14 +08:00 · 2023-06-07 20:54:14 +08:00 · c2a5cc679b
commit c2a5cc679b
parent ae9531b1cb
10 changed files with 580 additions and 495 deletions
--- a/README.md
+++ b/README.md
@ -1,8 +1,168 @@
-![语言](https://img.shields.io/badge/语言-systemverilog_(IEEE1800_2005)-CAD09D.svg) ![仿真](https://img.shields.io/badge/仿真-iverilog-green.svg) ![部署](https://img.shields.io/badge/部署-quartus-blue.svg) ![部署](https://img.shields.io/badge/部署-vivado-FF1010.svg)
+![语言](https://img.shields.io/badge/语言-verilog_(IEEE1364_2001)-9A90FD.svg) ![仿真](https://img.shields.io/badge/仿真-iverilog-green.svg) ![部署](https://img.shields.io/badge/部署-quartus-blue.svg) ![部署](https://img.shields.io/badge/部署-vivado-FF1010.svg)

-中文 | [English](#en)
+[English](#en) | [中文](#cn)

-Hard-PNG
+　
+
+<span id="en">Hard-PNG</span>
+===========================
+
+FPGA-based streaming **png** image decoder, input png stream, output original pixels.
+
+* Support image width less than 4000, height unlimited.
+* **Supports all color types** : Grayscale, Grayscale+A, RGB, Indexed RGB, and RGB+A.
+* Only 8bit depth is supported (actually most png images are 8bit depth).
+
+| ![diagram](./figures/diagram.png)  |
+| :--------------------------------: |
+| **Figure1** : diagram of Hard-PNG. |
+
+　
+
+# Background
+
+**png** is the second most common compressed image compression format after **jpg** .
+
+png image files have the **.png** suffix name.
+
+Take [SIM/test_image/img01.png](./SIM/test_image) in this repository as an example, it contains 98 bytes, which are called png stream. We can use [WinHex software](http://www.x-ways.net/winhex/) to view these bytes:
+
+```
+0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, ...... , 0xAE, 0x42, 0x60, 0x82
+```
+
+After the png stream is decompressed, the original pixels will be generated. This is a small image with only 4 columns and 2 rows, and a total of 8 pixels. The hexadecimal representation of these pixels is as follows. where R, G, B, A represent the red, green, blue and transparent channels of the pixel, respectively.
+
+|      |        列 1         |        列 2         |        列 3         |        列 4         |
+| :--: | :-----------------: | :-----------------: | :-----------------: | :-----------------: |
+| 行 1 | R:FF G:F2 B:00 A:FF | R:ED G:1C B:24 A:FF | R:00 G:00 B:00 A:FF | R:3F G:48 B:CC A:FF |
+| 行 2 | R:7F G:7F B:7F A:FF | R:ED G:1C B:24 A:FF | R:FF G:FF B:FF A:FF | R:FF G:AE B:CC A:FF |
+
+　
+
+# Hard-PNG Usage
+
+[hard_png.v](./RTL) in [RTL](./RTL) directory is a module that can input png stream and output decompressed original pixels. Its interface is shown in **Figure2**.
+
+| ![接口图](./figures/interface.png) |
+| :--------------------------------: |
+|  **Figure2** : ports of hard_png.  |
+
+## Input png stream
+
+It's easy to use hard_png module. Take the image [SIM/test_image/img01.png](./SIM/test_image) as an example again, such as **Figure3**, before inputting the png stream, a high level pulse must be generated on `istart` (with a width of at least one clock cycle), and then input the png stream through `ivalid` and `ibyte` signals (the png stream of this image has 98 bytes, these 98 bytes must be input to hard_png one by one), among which `ivalid` and `iready` constitutes handshake signals: `ivalid=1` indicates that the user wants to send a byte to hard_png. `iready=1` indicates that hard_png is ready to accept a byte. Only when `ivalid` and `iready` both = 1 at the same time, the handshake is successful, and `ibyte` is successfully input into hard_png.
+
+|    ![输入时序图](./figures/wave1.png)     |
+| :---------------------------------------: |
+| **Figure3** : input waveform of hard_png. |
+
+When it finish to input one png image, the next png image can be input immediately or later (that is, pulse the `istart` again, and then input the next png stream).
+
+## Output image information and pixels
+
+At the same time of inputting the png stream, the decompression result of this image (including the basic information of this image and the original pixels) will be output from the module, as shown in **Figure4**, first of all, `ostart` signal will appear A high-level pulse for one cycle, and `colortype`, `width`, and `height` will be valid simutinously, where:
+
+- `width`, `height` are the width and height of the image.
+- `colortype` is the color type of the png image, with the meaning in the table below.
+
+| colortype |     3'd0      |    3'd1     |     3'd2      |  3'd3   |     3‘d4      |
+| :-------: | :-----------: | :---------: | :-----------: | :-----: | :-----------: |
+|  meaning  |   grayscale   | grayscale+A |      RGB      |  RGB+A  |  indexed RGB  |
+|  remark   | R=G=B，A=0xFF |   R=G=B≠A   | R≠G≠B，A=0xFF | R≠G≠B≠A | R≠G≠B，A=0xFF |
+
+Then, `ovalid=1` means that there is a pixel output in this cycle, meanwhile, the R, G, B, A channels of this pixel will appear on `opixelr`, `opixelg`, `opixelb`, and `opixela` signals respectively.
+
+|     ![输出时序图](./figures/wave2.png)     |
+| :----------------------------------------: |
+| **Figure4** : output waveform of hard_png. |
+
+　
+
+# RTL Simulation
+
+Simulation related files are in the [SIM](./SIM) folder, where:
+
+- 14 png image files of different sizes and different color types are provided in [test_image](./SIM/test_image) folder.
+- tb_hard_png.v is the testbench code that compresses these images in sequence and writes the result (raw pixels) to txt files.
+- tb_hard_png_run_iverilog.bat is the command script to run iverilog simulation.
+- validation.py (a Python code) compares the simulation output with the result of the software png decoding to verify the correctness.
+
+Before using iverilog for simulation, you need to install iverilog , see: [iverilog_usage](https://github.com/WangXuan95/WangXuan95/blob/main/iverilog_usage/iverilog_usage.md)
+
+Then double-click tb_hard_png_run_iverilog.bat to run the simulation, which will run for about 30 minutes (it can be forced to close halfway, but the generated simulation waveform is incomplete).
+
+After the simulation runs, you can open the generated dump.vcd file to view the waveform.
+
+In addition, each png image will generate a corresponding .txt file, which contains the decoding result. For example, img01.png generates out01.txt, which contains the decoded 8 pixel values:
+
+```
+decode result:  colortype:3  width:4  height:2
+fff200ff ed1c24ff 000000ff 3f48ccff 7f7f7fff ed1c24ff ffffffff ffaec9ff 
+```
+
+## Correctness verification
+
+In order to verify that the decompression results are correct, I provide a Python program [validation.py](./SIM), which can decompress the .png file and compares it with each pixel in the .txt file generated by the simulation. If the comparison results are the same, the validation passed.
+
+In order to run validation.py , you need to install Python3 and its [numpy](https://pypi.org/project/numpy/) and [PIL](https://pypi.org/project/Pillow/) libraries.
+
+Then, run validation.py by this command:
+
+```
+python validation.py test_image/img03.png out03.txt
+```
+
+The meaning of this command is: Compare each pixel in [out03.txt]() to see if it matches [test_image/img03.png](). The print is as follows (indicating that the verification passed):
+
+```
+size1= (400, 4)
+size2= (400, 4)
+total 400 pixels validation successful!!
+```
+
+　
+
+# FPGA Deployment
+
+## FPGA resource usage
+
+|           FPGA chip            |  Logic   | Logic (%) |    BRAM    | BRAM (%) | max clk freq. (under timing closure) |
+| :----------------------------: | :------: | :-------: | :--------: | :------: | :----------------------------------: |
+|     Xilinx Artix-7 XC7A35T     | 2662×LUT |    13%    | 22×BRAM36K |   44%    |               66.6 Mhz               |
+| Altera Cyclone IV EP4CE40F23C6 | 5277×LE  |    13%    |  427kbit   |   37%    |                56 MHz                |
+
+## Performance
+
+When running at 50MHz, according to the number of clock cycles consumed by each image during simulation, we can calculate the performance.
+
+For example, for some of the test files I provided, performance examples are shown below.
+
+| png file  | color type  | image size | pixel count | png stream size | cycle count | time  |
+| :-------: | :---------: | :--------: | :---------: | :-------------: | :---------: | :---: |
+| img05.png |     RGB     |  300x256   |    76800    |      96536      |   1105702   | 23ms  |
+| img06.png |  Grayscale  |  300x263   |    78900    |      37283      |   395335    |  8ms  |
+| img09.png |    RGBA     |  300x263   |    78900    |     125218      |   1382303   | 28ms  |
+| img10.png | Indexed RGB |  631x742   |   468202    |     193489      |   2374224   | 48ms  |
+| img14.png | Indexed RGB | 1920x1080  |   2073600   |     818885      |  10177644   | 204ms |
+
+　
+
+
+# Reference
+
+* [upng](https://github.com/elanthis/upng): A lightweight C language png decoding library.
+* [TinyPNG](https://tinypng.com/): A lossy compression tool using png's indexed RGB.
+* [PNG Specification](https://www.w3.org/TR/REC-png.pdf).
+
+　
+
+　
+
+　
+
+　
+
+<span id="cn">Hard-PNG</span>
 ===========================

 基于FPGA的流式的 **png** 图象解码器，输入 png 码流，输出原始像素
@ -15,29 +175,29 @@ Hard-PNG
 | :----: |
 | **图1** : Hard-PNG 原理框图 |

-
+　

 # 背景知识

-png 是仅次于jpg的第二常见的图象压缩格式。png支持透明通道（A通道），支持无损压缩，支持索引RGB（基于调色板的有损压缩）。在色彩丰富的数码照片中，png只能获得1~4倍的压缩比。在人工合成图（例如平面设计）中，png能获得10倍以上的压缩比。
+png 是仅次于jpg的第二常见的图象压缩格式。png支持透明通道（A通道），支持无损压缩，支持索引RGB（基于调色板的有损压缩）。png 图像文件的扩展名为 .png 。

-png 图像文件的扩展名为 .png 。以本库中的 SIM/test_image/img01.png 为例，它包含98字节，这98字节就称为 png 码流。我们可以用 [WinHex软件](http://www.x-ways.net/winhex/) 查看到这些字节：
+以本库中的 SIM/test_image/img01.png 为例，它包含98字节，这98字节就称为 png 码流。我们可以用 [WinHex软件](http://www.x-ways.net/winhex/) (Windows上) 或用 hexdump 命令 (linux上) 查看到这些字节：

 ```
 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, ...... , 0xAE, 0x42, 0x60, 0x82
 ```
-该png码流解码后会产生原始像素，这是个小图像，只有4列2行，共8个像素，这些像素的十六进制表示如下表。其中R, G, B, A分别代表像素的红、绿、蓝、透明通道。
+该png码流解码后会产生原始像素，该图像只有4列2行，共8个像素，这些像素的十六进制表示如下表。其中R, G, B, A分别代表像素的红、绿、蓝、透明通道。

 |          | 列 1 | 列 2 | 列 3 | 列 4 |
 | :---:    | :---: | :---: | :---: | :---: |
 | 行 1 | R:FF G:F2 B:00 A:FF | R:ED G:1C B:24 A:FF | R:00 G:00 B:00 A:FF | R:3F G:48 B:CC A:FF |
 | 行 2 | R:7F G:7F B:7F A:FF | R:ED G:1C B:24 A:FF | R:FF G:FF B:FF A:FF | R:FF G:AE B:CC A:FF |

-
+　

 # 使用 Hard-PNG

-RTL 目录中的 hard_png.sv 是一个能够输入 png 码流，输出解压后的像素的模块，它的接口如**图2**所示。
+RTL 目录中的 hard_png.v 是一个能够输入 png 码流，输出解压后的像素的模块，它的接口如**图2**所示。

 | ![接口图](./figures/interface.png) |
 | :----: |
@ -71,14 +231,14 @@ hard_png 的使用方法很简单，以 SIM/test_image/img01.png 这张图像为
 | :----: |
 | **图4** : hard_png 的输出波形图 |

-
+　

 # 仿真

 仿真相关的东西都在 SIM 文件夹中，其中：

 - test_image 中提供 14 张不同尺寸，不同颜色类型的 png 图像文件。
- tb_hard_png.sv 是仿真代码，它会依次进行这些图像的压缩，然后把结果（原始像素）写入 txt 文件中。
+- tb_hard_png.v 是仿真代码，它会依次进行这些图像的压缩，然后把结果（原始像素）写入 txt 文件中。
 - tb_hard_png_run_iverilog.bat 包含了运行 iverilog 仿真的命令。
 - validation.py （Python代码）对仿真输出和软件 png 解码的结果进行比对，验证正确性。

@ -116,30 +276,30 @@ size2= (400, 4)
 total 400 pixels validation successful!!
 ```

-
+　

 # 部署信息

 ## FPGA 资源消耗

-|           FPGA 型号            | LUT  | LUT(%) |  FF  | FF(%) | Logic | Logic(%) |  BRAM   | BRAM(%) |
-| :----------------------------: | :--: | :----: | :--: | :---: | :---: | :------: | :-----: | :-----: |
-|     Xilinx Artix-7 XC7A35T     | 2581 |  13%   | 2253 |  5%   |   -   |    -     | 792kbit |   44%   |
-| Altera Cyclone IV EP4CE40F23C6 |  -   |   -    |  -   |   -   | 4682  |   11%    | 427kbit |   37%   |
+|           FPGA 型号            |  Logic   | Logic (%) |    BRAM    | BRAM (%) | 最高频率 (刚好时序收敛) |
+| :----------------------------: | :------: | :-------: | :--------: | :------: | :---------------------: |
+|     Xilinx Artix-7 XC7A35T     | 2662×LUT |    13%    | 22×BRAM36K |   44%    |        66.6 MHz         |
+| Altera Cyclone IV EP4CE40F23C6 | 5277×LE  |    13%    |  427kbit   |   37%    |         56 MHz          |

 ## 性能

-在 Altera Cyclone IV EP4CE40F23C6 上部署 hard_png ，时钟频率= 50MHz （正好时序收敛）。根据仿真时每个图像消耗的时钟周期数，可以算出压缩图像时的性能，举例如下表。
+当运行在 50MHz 时，根据仿真时每个图像消耗的时钟周期数，可以算出压缩图像时的性能。例如，对于部分我提供的测试文件，性能举例如下表。

 | 文件名 | 颜色类型 | 图象长宽 | 像素数 | png 码流大小 (字节) | 时钟周期数 | 消耗时间 |
-| :-----------: | :----------: | :----------: | :--------------: | :---------------: | :---------------: | ------------- |
+| :-----------: | :----------: | :----------: | :--------------: | :---------------: | :---------------: | :-----------: |
 | img05.png | RGB | 300x256 | 76800 | 96536 | 1105702 | 23ms |
 | img06.png | 灰度 | 300x263 | 78900 | 37283 | 395335 | 8ms |
 | img09.png | RGBA | 300x263 | 78900 | 125218 | 1382303 | 28ms |
 |   img10.png   |   索引RGB    |   631x742    | 468202 |      193489      |     2374224 | 48ms |
 | img14.png |     索引RGB |  1920x1080  |  2073600  |      818885      |    10177644 | 204ms |

-
+　


 # 参考链接
@ -150,150 +310,3 @@ total 400 pixels validation successful!!



-
-
-<span id="en">Hard-PNG</span>
-===========================
-
-FPGA-based streaming **png** image decoder, input png stream, output original pixels.
-
-* Support image width less than 4000, height unlimited.
-* **Supports all color types** : Grayscale, Grayscale+A, RGB, Indexed RGB, and RGB+A.
-* Only 8bit depth is supported (actually most png images are 8bit depth).
-
-|     ![diagram](./figures/diagram.png)     |
-| :---------------------------------------: |
-| **Figure1** : Hard-PNG schematic diagram. |
-
-
-
-# Background
-
-**png** is the second most common compressed image format after **jpg** . png supports transparency channel (A channel), lossless compression, and indexed RGB (palette-based lossy compression). In colorful digital photos, png can only get 1 to 4 times the lossless compression ratio. In synthetic images (such as graphic design), png can achieve more than 10 times the lossless compression ratio.
-
-png image files have the **.png** suffix name. Take [SIM/test_image/img01.png](./SIM/test_image) in this repository as an example, it contains 98 bytes, which are called png stream. We can use [WinHex software](http://www.x-ways.net/winhex/) to view these bytes:
-
-```
-0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, ...... , 0xAE, 0x42, 0x60, 0x82
-```
-
-After the png stream is decompressed, the original pixels will be generated. This is a small image with only 4 columns and 2 rows, and a total of 8 pixels. The hexadecimal representation of these pixels is as follows. where R, G, B, A represent the red, green, blue and transparent channels of the pixel, respectively.
-
-|      |        列 1         |        列 2         |        列 3         |        列 4         |
-| :--: | :-----------------: | :-----------------: | :-----------------: | :-----------------: |
-| 行 1 | R:FF G:F2 B:00 A:FF | R:ED G:1C B:24 A:FF | R:00 G:00 B:00 A:FF | R:3F G:48 B:CC A:FF |
-| 行 2 | R:7F G:7F B:7F A:FF | R:ED G:1C B:24 A:FF | R:FF G:FF B:FF A:FF | R:FF G:AE B:CC A:FF |
-
-
-
-# Hard-PNG Usage
-
-[hard_png.sv](./RTL) in [RTL](./RTL) directory is a module that can input png stream and output decompressed original pixels. Its interface is shown in **Figure2**.
-
-|  ![接口图](./figures/interface.png)  |
-| :----------------------------------: |
-| **Figure2** : interface of hard_png. |
-
-## Input png stream
-
-It's easy to use hard_png module. Take the image [SIM/test_image/img01.png](./SIM/test_image) as an example again, such as **Figure3**, before inputting the png stream, a high level pulse must be generated on `istart` (with a width of at least one clock cycle), and then input the png stream through `ivalid` and `ibyte` signals (the png stream of this image has 98 bytes, these 98 bytes must be input to hard_png one by one), among which `ivalid` and `iready` constitutes handshake signals: `ivalid=1` indicates that the user wants to send a byte to hard_png. `iready=1` indicates that hard_png is ready to accept a byte. Only when `ivalid` and `iready` both = 1 at the same time, the handshake is successful, and `ibyte` is successfully input into hard_png.
-
-|    ![输入时序图](./figures/wave1.png)     |
-| :---------------------------------------: |
-| **Figure3** : input waveform of hard_png. |
-
-When it finish to input one png image, the next png image can be input immediately or later (that is, pulse the `istart` again, and then input the next png stream).
-
-## Output image information and pixels
-
-At the same time of inputting the png stream, the decompression result of this image (including the basic information of this image and the original pixels) will be output from the module, as shown in **Figure4**, first of all, `ostart` signal will appear A high-level pulse for one cycle, and `colortype`, `width`, and `height` will be valid simutinously, where:
-
- `width`, `height` are the width and height of the image.
- `colortype` is the color type of the png image, with the meaning in the table below.
-
-| colortype |     3'd0      |    3'd1     |     3'd2      |  3'd3   |     3‘d4      |
-| :-------: | :-----------: | :---------: | :-----------: | :-----: | :-----------: |
-|  meaning  |   grayscale   | grayscale+A |      RGB      |  RGB+A  |  indexed RGB  |
-|  remark   | R=G=B，A=0xFF |   R=G=B≠A   | R≠G≠B，A=0xFF | R≠G≠B≠A | R≠G≠B，A=0xFF |
-
-Then, `ovalid=1` means that there is a pixel output in this cycle, meanwhile, the R, G, B, A channels of this pixel will appear on `opixelr`, `opixelg`, `opixelb`, and `opixela` signals respectively.
-
-|     ![输出时序图](./figures/wave2.png)     |
-| :----------------------------------------: |
-| **Figure4** : output waveform of hard_png. |
-
-
-
-# RTL Simulation
-
-Simulation related files are in the [SIM](./SIM) folder, where:
-
- 14 png image files of different sizes and different color types are provided in [test_image](./SIM/test_image) folder.
- tb_hard_png.sv is the testbench code that compresses these images in sequence and writes the result (raw pixels) to txt files.
- tb_hard_png_run_iverilog.bat is the command script to run iverilog simulation.
- validation.py (Python code) compares the simulation output with the result of the software png decoding to verify the correctness.
-
-Before using iverilog for simulation, you need to install iverilog , see: [iverilog_usage](https://github.com/WangXuan95/WangXuan95/blob/main/iverilog_usage/iverilog_usage.md)
-
-Then double-click tb_hard_png_run_iverilog.bat to run the simulation, which will run for about 30 minutes (it can be forced to close halfway, but the generated simulation waveform is incomplete).
-
-After the simulation runs, you can open the generated dump.vcd file to view the waveform.
-
-In addition, each png image will generate a corresponding .txt file, which contains the decoding result. For example, img01.png generates out01.txt, which contains the decoded 8 pixel values:
-
-```
-decode result:  colortype:3  width:4  height:2
-fff200ff ed1c24ff 000000ff 3f48ccff 7f7f7fff ed1c24ff ffffffff ffaec9ff 
-```
-
-## Correctness verification
-
-In order to verify that the decompression results are correct, I provide a Python program [validation.py](./SIM), which can decompress the .png file and compares it with each pixel in the .txt file generated by the simulation. If the comparison results are the same, the validation passed.
-
-In order to run validation.py , you need to install Python3 and its [numpy](https://pypi.org/project/numpy/) and [PIL](https://pypi.org/project/Pillow/) libraries.
-
-Then, run validation.py by this command:
-
-```
-python validation.py test_image/img03.png out03.txt
-```
-
-The meaning of this command is: Compare each pixel in [out03.txt]() to see if it matches [test_image/img03.png](). The print is as follows (indicating that the verification passed):
-
-```
-size1= (400, 4)
-size2= (400, 4)
-total 400 pixels validation successful!!
-```
-
-
-
-# FPGA Deployment
-
-## FPGA resource usage
-
-|           FPGA part            | LUT  | LUT(%) |  FF  | FF(%) | Logic | Logic(%) |  BRAM   | BRAM(%) |
-| :----------------------------: | :--: | :----: | :--: | :---: | :---: | :------: | :-----: | :-----: |
-|     Xilinx Artix-7 XC7A35T     | 2581 |  13%   | 2253 |  5%   |   -   |    -     | 792kbit |   44%   |
-| Altera Cyclone IV EP4CE40F23C6 |  -   |   -    |  -   |   -   | 4682  |   11%    | 427kbit |   37%   |
-
-## Performance
-
-I deploy hard_png on Altera Cyclone IV EP4CE40F23C6 and get clock frequency = 50MHz (just reach timing closure). According to the number of clock cycles consumed by each image during simulation, we can calculate the performance as shown in the following table.
-
-| png file  | color type  | image size | pixel count | png stream size | cycle count | time  |
-| :-------: | :---------: | :--------: | :---------: | :-------------: | :---------: | ----- |
-| img05.png |     RGB     |  300x256   |    76800    |      96536      |   1105702   | 23ms  |
-| img06.png |  Grayscale  |  300x263   |    78900    |      37283      |   395335    | 8ms   |
-| img09.png |    RGBA     |  300x263   |    78900    |     125218      |   1382303   | 28ms  |
-| img10.png | Indexed RGB |  631x742   |   468202    |     193489      |   2374224   | 48ms  |
-| img14.png | Indexed RGB | 1920x1080  |   2073600   |     818885      |  10177644   | 204ms |
-
-
-
-
-# Reference
-
-* [upng](https://github.com/elanthis/upng): A lightweight C language png decoding library.
-* [TinyPNG](https://tinypng.com/): A lossy compression tool using png's indexed RGB.
-* [PNG Specification](https://www.w3.org/TR/REC-png.pdf).
--- a/RTL/hard_png.sv
+++ b/RTL/hard_png.sv
--- a/RTL/huffman_builder.sv
+++ b/RTL/huffman_builder.sv
@ -2,7 +2,7 @@
 //--------------------------------------------------------------------------------------------------------
 // Module  : huffman_builder
 // Type    : synthesizable, IP's sub module
-// Standard: SystemVerilog 2005 (IEEE1800-2005)
+// Standard: Verilog 2001 (IEEE1364-2001)
 //--------------------------------------------------------------------------------------------------------

 module huffman_builder #(
@ -18,10 +18,17 @@ module huffman_builder #(
    rdaddr, rddata
 );

-function automatic integer clogb2(input integer val);
+
+
+function  integer clogb2;
+    input integer val;
+//function automatic integer clogb2(input integer val);
    integer valtmp;
+begin
    valtmp = val;
-    for(clogb2=0; valtmp>0; clogb2=clogb2+1) valtmp = valtmp>>1;
+    for (clogb2=0; valtmp>0; clogb2=clogb2+1)
+        valtmp = valtmp>>1;
+end
 endfunction

 input                               rstn;
@ -46,46 +53,48 @@ wire                                done;
 wire   [clogb2(2*NUMCODES-1)-1:0]   rdaddr;
 reg    [            OUTWIDTH-1:0]   rddata;

-reg    [clogb2(NUMCODES)-1:0] blcount  [BITLENGTH];
-reg    [   (1<<CODEBITS)-1:0] nextcode [BITLENGTH+1];
+reg    [clogb2(NUMCODES)-1:0] blcount  [0 : BITLENGTH-1];
+reg    [   (1<<CODEBITS)-1:0] nextcode [0 : BITLENGTH];

-initial for(int i=0; i< BITLENGTH; i++)  blcount[i] = '0;
-initial for(int i=0; i<=BITLENGTH; i++) nextcode[i] = '0;
+integer i;
+
+initial for(i=0; i< BITLENGTH; i=i+1)  blcount[i] = 0;
+initial for(i=0; i<=BITLENGTH; i=i+1) nextcode[i] = 0;

 reg  clear_tree2d = 1'b0;
 reg  build_tree2d = 1'b0;
-reg  [clogb2(BITLENGTH)-1:0] idx = '0;
-reg  [clogb2(2*NUMCODES-1)-1:0] clearidx = '0;
-reg  [ clogb2(NUMCODES)-1:0] nn='0, nnn, lnn='0;
-reg  [CODEBITS-1:0] ii='0, lii='0;
-reg  [CODEBITS-1:0] blenn, blen = '0;
+reg  [clogb2(BITLENGTH)-1:0] idx = 0;
+reg  [clogb2(2*NUMCODES-1)-1:0] clearidx = 0;
+reg  [ clogb2(NUMCODES)-1:0] nn=0, nnn, lnn=0;
+reg  [CODEBITS-1:0] ii=0, lii=0;
+reg  [CODEBITS-1:0] blenn, blen = 0;
 wire [(1<<CODEBITS)-1:0] tree1d = nextcode[blen];
 wire                     islast = (blen==0 || ii==0);
-reg  [clogb2(2*NUMCODES-1)-1:0] nodefilled = '0;
-reg  [clogb2(2*NUMCODES-1)-1:0] ntreepos, treepos='0;
+reg  [clogb2(2*NUMCODES-1)-1:0] nodefilled = 0;
+reg  [clogb2(2*NUMCODES-1)-1:0] ntreepos, treepos=0;
 wire [clogb2(2*NUMCODES-1)-1:0] ntpos= {ntreepos[clogb2(2*NUMCODES-1)-2:0], tree1d[ii]};
-reg  [clogb2(2*NUMCODES-1)-1:0] tpos = '0;
+reg  [clogb2(2*NUMCODES-1)-1:0] tpos = 0;
 reg         rdfilled;
 reg         valid = 1'b0;
-wire [OUTWIDTH-1:0] wrtree2d = (lii==0) ? lnn : nodefilled + (clogb2(2*NUMCODES-1))'(NUMCODES);
+wire [OUTWIDTH-1:0] wrtree2d = (lii==0) ? lnn : (nodefilled + NUMCODES);
 reg  alldone = 1'b0;

 assign done = alldone & run;

 always @ (posedge clk or negedge rstn)
    if(~rstn) begin
-        valid <= '0;
-        treepos <= '0;
-        tpos <= '0;
-        lii <= '0;
-        lnn <= '0;
+        valid <= 0;
+        treepos <= 0;
+        tpos <= 0;
+        lii <= 0;
+        lnn <= 0;
    end else begin
        if(istart) begin
-            valid <= '0;
-            treepos <= '0;
-            tpos <= '0;
-            lii <= '0;
-            lnn <= '0;
+            valid <= 0;
+            treepos <= 0;
+            tpos <= 0;
+            lii <= 0;
+            lnn <= 0;
        end else begin
            valid <= build_tree2d & nn<NUMCODES & blen>0;
            treepos <= ntreepos;
@ -97,131 +106,139 @@ always @ (posedge clk or negedge rstn)

 always @ (posedge clk or negedge rstn)
    if(~rstn)
-        blen <= '0;
+        blen <= 0;
    else begin
        if(istart)
-            blen <= '0;
+            blen <= 0;
        else if(islast)
            blen <= blenn;
    end

 always @ (posedge clk or negedge rstn)
    if(~rstn) begin
-        for(int i=0; i<BITLENGTH; i++)
-            blcount[i] <= '0;
+        for(i=0; i<BITLENGTH; i=i+1)
+            blcount[i] <= 0;
    end else begin
        if(istart | done) begin
-            for(int i=0; i<BITLENGTH; i++)
-                blcount[i] <= '0;
+            for(i=0; i<BITLENGTH; i=i+1)
+                blcount[i] <= 0;
        end else begin
            if(wren && wrdata<BITLENGTH)
-                blcount[wrdata] <= blcount[wrdata] + (clogb2(NUMCODES))'(1);
+                blcount[wrdata] <= blcount[wrdata] + 1;
        end
    end

-always_comb
+always @ (*)
    if(build_tree2d)
-        nnn = (nn<NUMCODES && islast) ? nn + (clogb2(NUMCODES))'(1) : nn;
+        nnn = (nn<NUMCODES && islast) ? (nn + 1) : nn;
+    else if (idx<BITLENGTH)
+        nnn = 64'hFFFF_FFFF_FFFF_FFFF;
    else
-        nnn = (idx<BITLENGTH) ? '1 : '0;
+        nnn = 0;
        
 always @ (posedge clk or negedge rstn)
    if(~rstn)
-        nn <= '0;
-    else
-        nn <= istart ? '0 : nnn;
-
+        nn <= 0;
+    else begin
+        if (istart)
+            nn <= 0;
+        else
+            nn <= nnn;
+    end
+    
 always @ (posedge clk or negedge rstn)
    if(~rstn) begin
-        for(int i=0; i<=BITLENGTH; i++) nextcode[i] <= '0;
+        for(i=0; i<=BITLENGTH; i=i+1) nextcode[i] <= 0;
        alldone <= 1'b0;
-        ii <= '0;
-        idx <= '0;
+        ii <= 0;
+        idx <= 0;
        build_tree2d <= 1'b0;
-        clearidx <= '0;
+        clearidx <= 0;
        clear_tree2d <= 1'b0;
    end else begin
-        nextcode[0] <= '0;
+        nextcode[0] <= 0;
        alldone <= 1'b0;
        if(istart | ~run) begin
-            if(istart) for(int i=0; i<=BITLENGTH; i++) nextcode[i] <= '0;
-            ii <= '0;
-            idx <= '0;
+            if(istart) for(i=0; i<=BITLENGTH; i=i+1) nextcode[i] <= 0;
+            ii <= 0;
+            idx <= 0;
            build_tree2d <= 1'b0;
-            clearidx <= '0;
+            clearidx <= 0;
            clear_tree2d <= 1'b0;
        end else if(run) begin
            if(~clear_tree2d) begin
-                if( clearidx >= (clogb2(2*NUMCODES-1))'(2*NUMCODES-1) )
+                if ( clearidx >= (2*NUMCODES-1) )
                    clear_tree2d <= 1'b1;
-                clearidx <= clearidx + (clogb2(2*NUMCODES-1))'(1);
+                clearidx <= clearidx + 1;
            end else if(build_tree2d) begin
                if(nn < NUMCODES) begin
                    if(islast) begin
-                        ii <= blenn - (CODEBITS)'(1);
+                        ii <= blenn - 1;
                        if(blen>0)
-                            nextcode[blen] <= tree1d + (1<<CODEBITS)'(1);
+                            nextcode[blen] <= tree1d + 1;
                    end else
-                        ii <= ii - (CODEBITS)'(1);
+                        ii <= ii - 1;
                end else
                    alldone <= 1'b1;
            end else begin
                if(idx<BITLENGTH) begin
-                    idx <= idx + (clogb2(BITLENGTH))'(1);
-                    nextcode[idx+1] <= ( ( nextcode[idx] + ((1<<CODEBITS)'(blcount[idx])) ) << 1 );
+                    idx <= idx + 1;
+                    nextcode[idx+1] <= ( ( nextcode[idx] + blcount[idx] ) << 1 );
                end else begin
-                    ii <= blen - (CODEBITS)'(1);
+                    ii <= blen - 1;
                    build_tree2d <= 1'b1;
                end
            end
        end
    end

-always_comb
+always @ (*)
    if(~run)
        ntreepos = 0;
    else if(valid) begin
        if(~rdfilled)
-            ntreepos = (clogb2(2*NUMCODES-1))'(rddata) - (clogb2(2*NUMCODES-1))'(NUMCODES);
+            ntreepos = rddata - NUMCODES;
+        else if (lii==0)
+            ntreepos = 0;
        else
-            ntreepos = (lii==0) ? '0 : nodefilled;
+            ntreepos = nodefilled;
    end else
        ntreepos = treepos;
    
 always @ (posedge clk or negedge rstn)
    if(~rstn) begin
-        nodefilled <= '0;
+        nodefilled <= 0;
    end else begin
        if(istart)
-            nodefilled <= '0;
+            nodefilled <= 0;
        else if(~run)
-            nodefilled <=              (clogb2(2*NUMCODES-1))'(1);
+            nodefilled <=              1;
        else if(valid & rdfilled & lii>0)
-            nodefilled <= nodefilled + (clogb2(2*NUMCODES-1))'(1);
+            nodefilled <= nodefilled + 1;
    end



-reg [CODEBITS-1:0] mem_huffman_bitlens [NUMCODES];
+reg [CODEBITS-1:0] mem_huffman_bitlens [0 : NUMCODES-1];

 always @ (posedge clk)
    if(wren)
        mem_huffman_bitlens[wraddr] <= wrdata;

-wire [clogb2(NUMCODES-1)-1:0] mem_rdaddr = (clogb2(NUMCODES-1))'(nnn) + (clogb2(NUMCODES-1))'(1);
+wire [clogb2(NUMCODES-1)-1:0] mem_rdaddr = nnn + 1;

 always @ (posedge clk)
    blenn <= mem_huffman_bitlens[mem_rdaddr];



-reg [OUTWIDTH:0] mem_tree2d [2*NUMCODES];
+reg [OUTWIDTH:0] mem_tree2d [0 : 2*NUMCODES-1];

 always @ (posedge clk)
    if( ~clear_tree2d | (valid & rdfilled) )
-        mem_tree2d[ (clogb2(2*NUMCODES-1))'(~clear_tree2d ? clearidx : tpos ) ] <= ~clear_tree2d ? {1'b1, (OUTWIDTH)'(0)} : {1'b0, wrtree2d};
+        mem_tree2d[ (~clear_tree2d ? clearidx : tpos ) ] <= ~clear_tree2d ? {1'b1, {(OUTWIDTH){1'b0}}} : {1'b0, wrtree2d};

 always @ (posedge clk)
-    {rdfilled, rddata} <= mem_tree2d[ (clogb2(2*NUMCODES-1))'(alldone ? rdaddr : ntpos ) ];
+    {rdfilled, rddata} <= mem_tree2d[ (alldone ? rdaddr : ntpos ) ];

 endmodule
--- a/RTL/huffman_decoder.sv
+++ b/RTL/huffman_decoder.sv
@ -2,7 +2,7 @@
 //--------------------------------------------------------------------------------------------------------
 // Module  : huffman_decoder
 // Type    : synthesizable, IP's sub module
-// Standard: SystemVerilog 2005 (IEEE1800-2005)
+// Standard: Verilog 2001 (IEEE1364-2001)
 //--------------------------------------------------------------------------------------------------------

 module huffman_decoder #(
@ -10,18 +10,24 @@ module huffman_decoder #(
    parameter    OUTWIDTH = 10
 )(
    rstn, clk,
-    istart,
-    ien, ibit,
+    istart, ien, ibit,
    oen, ocode,
    rdaddr, rddata
 );

-function automatic integer clogb2(input integer val);
+
+function  integer clogb2;
+    input integer val;
+//function automatic integer clogb2(input integer val);
    integer valtmp;
+begin
    valtmp = val;
-    for(clogb2=0; valtmp>0; clogb2=clogb2+1) valtmp = valtmp>>1;
+    for (clogb2=0; valtmp>0; clogb2=clogb2+1)
+        valtmp = valtmp>>1;
+end
 endfunction

+
 input                               rstn, clk;
 input                               istart, ien, ibit;
 output                              oen;
@ -32,37 +38,41 @@ input   [            OUTWIDTH-1:0]  rddata;
 wire                                rstn, clk;
 wire                                istart, ien, ibit;
 reg                                 oen = 1'b0;
-reg    [            OUTWIDTH-1:0]   ocode = '0;
+reg    [            OUTWIDTH-1:0]   ocode = 0;
 wire   [clogb2(2*NUMCODES-1)-1:0]   rdaddr;
 wire   [            OUTWIDTH-1:0]   rddata;

-reg    [clogb2(2*NUMCODES-1)-2:0]   tpos = '0;
+reg    [clogb2(2*NUMCODES-1)-2:0]   tpos = 0;
 wire   [clogb2(2*NUMCODES-1)-2:0]   ntpos;
 reg                                 ienl = 1'b0;

 assign rdaddr = {ntpos, ibit};

-assign ntpos = ienl ? (clogb2(2*NUMCODES-1)-1)'(rddata<(OUTWIDTH)'(NUMCODES) ? '0 : rddata-(OUTWIDTH)'(NUMCODES)) : tpos;
+assign ntpos = ienl ? ((rddata<NUMCODES) ? 0 : (rddata-NUMCODES)) : tpos;

 always @ (posedge clk or negedge rstn)
    if(~rstn)
-        ienl <= '0;
+        ienl <= 1'b0;
    else
-        ienl <= istart ? '0 : ien;
+        ienl <= istart ? 1'b0 : ien;

 always @ (posedge clk or negedge rstn)
    if(~rstn)
-        tpos <= '0;
-    else
-        tpos <= istart ? '0 : ntpos;
+        tpos <= 0;
+    else begin
+        if (istart)
+            tpos <= 0;
+        else
+            tpos <= ntpos;
+    end

-always_comb
+always @ (*)
    if(ienl && rddata<NUMCODES) begin
        oen   = 1'b1;
        ocode = rddata;
    end else begin
        oen   = 1'b0;
-        ocode = '0;
+        ocode = 0;
    end

 endmodule
--- a/SIM/tb_hard_png.sv
+++ b/SIM/tb_hard_png.sv
@ -2,7 +2,7 @@
 //--------------------------------------------------------------------------------------------------------
 // Module  : tb_hard_png
 // Type    : simulation, top
-// Standard: SystemVerilog 2005 (IEEE1800-2005)
+// Standard: Verilog 2001 (IEEE1364-2001)
 // Function: testbench for hard_png
 //--------------------------------------------------------------------------------------------------------

@ -18,7 +18,7 @@

 module tb_hard_png ();

-initial $dumpvars(0, tb_hard_png);
+initial $dumpvars(1, tb_hard_png);


 reg rstn = 1'b0;
@ -28,10 +28,10 @@ initial begin repeat(4) @(posedge clk); rstn<=1'b1; end



-reg          istart = '0;
+reg          istart = 1'b0;
 reg          ivalid = 1'b0;
 wire         iready;
-reg  [ 7:0]  ibyte  = '0;
+reg  [ 7:0]  ibyte  = 0;

 wire         ostart;
 wire [ 2:0]  colortype;
@ -66,14 +66,14 @@ hard_png hard_png_i (



-int fptxt = 0, fppng = 0;
+integer fptxt = 0, fppng = 0;
 reg [256*8:1] fname_png;
 reg [256*8:1] fname_txt;
-int png_no = 0;
-int txt_no = 0;
-
-int cyccnt = 0;
-int bytecnt = 0;
+integer png_no = 0;
+integer txt_no = 0;
+integer ii;
+integer cyccnt = 0;
+integer bytecnt = 0;

 initial begin
    while(~rstn) @(posedge clk);
@ -105,9 +105,9 @@ initial begin
                end
                if( ivalid & iready ) begin
                    ibyte <= $fgetc(fppng);
-                    bytecnt++;
+                    bytecnt = bytecnt + 1;
                end
-                cyccnt++;
+                cyccnt = cyccnt + 1;
            end
            ivalid <= 1'b0;
            
@ -131,7 +131,7 @@ initial begin
                $finish;
            end
            
-            for(int ii=0; ii<width*height; ii++) begin
+            for(ii=0; ii<width*height; ii=ii+1) begin
                @ (posedge clk);
                while(~ovalid) @ (posedge clk);
                $fwrite(fptxt, "%02x%02x%02x%02x ", opixelr, opixelg, opixelb, opixela);
--- a/SIM/tb_hard_png_run_iverilog.bat
+++ b/SIM/tb_hard_png_run_iverilog.bat
@ -1,5 +1,5 @@
 del sim.out dump.vcd
-iverilog  -g2005-sv  -o sim.out  tb_hard_png.sv  ../RTL/hard_png.sv  ../RTL/huffman_builder.sv  ../RTL/huffman_decoder.sv
+iverilog  -g2001  -o sim.out  tb_hard_png.v  ../RTL/hard_png.v  ../RTL/huffman_builder.v  ../RTL/huffman_decoder.v
 vvp -n sim.out
 del sim.out
 pause
--- a/figures/diagram.png
+++ b/figures/diagram.png
--- a/figures/interface.png
+++ b/figures/interface.png
--- a/figures/wave1.png
+++ b/figures/wave1.png
--- a/figures/wave2.png
+++ b/figures/wave2.png