Samenvatting
The 8-bit floating-point (FP8) data format has been increasingly adopted in neural network (NN) computations due to its superior dynamic range compared to traditional INT8. However, FP8-based multiplication, a core operation in NNs, still incurs significant power consumption. To address this issue, this paper presents an FPGA-based approximate multiplier design for FP8. Firstly, we conduct a bit-level analysis of the approximation method. Based on this analysis, we implement a fine-grained optimized design on mainstream FPGAs (AMD and Altera) using primitives and templates combined with physical layout constraints. Then, the accuracy and resource utilization of the FP8 approximate multiplier are evaluated and analyzed. The results indicate that, compared to previous FPGA-based 8-bit designs, our design achieves the minimal LUT consumption. Finally, we integrate the design into the inference phase of a representative NN model, demonstrating its excellent power efficiency. To the best of our knowledge, this is the first FPGA-based FP8 approximate multiplier design, which can serve as a benchmark for future designs and comparisons of FPGA-based low-precision floating-point approximate multipliers. The code of this work is available in our GitLab.
Originele taal-2 | English |
---|---|
Titel | Proceedings - 2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2025 |
Uitgeverij | IEEE |
Pagina's | 227-235 |
Aantal pagina's | 9 |
ISBN van elektronische versie | 9798331502812 |
DOI's | |
Status | Published - 28 mei 2025 |
Publicatie series
Naam | Proceedings - 2025 IEEE 33rd Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2025 |
---|
Bibliografische nota
Publisher Copyright:© 2025 IEEE.