文摘
In this paper, a novel architecture for sparse matrix multiplication is proposed. The architecture is suitable for implementation in specific environments such as dataflow engines. In order to avoid multiple streaming of elements from the host, we propose the architecture which buffers the elements from the input stream in on-chip memory in the form of pages. In the case of sparse matrices, the architecture processes only pages with non-zero elements. The proposed architecture allows replication of its blocks in order to parallelize the computation. The architecture is implemented on Maxeler dataflow engine based on Virtex 5 FPGA. The implementation results are given.