文摘
We have developed a new FFT algorithm for superscalar and very long instruction word (VLIW) processors. By taking advantage of the data independence between butterflies, as well as within each butterfly, and by utilizing a superscalar or VLIW microprocessor's ability to perform multiple operations concurrently, our FFT algorithm speeds up the computation time by a factor of the number of operations that can be issued in a single cycle. Our algorithm computes multiple independent butterflies in parallel by changing the execution order of operations within the butterfly tight loop. We implemented this algorithm on a high-end, commercially available programmable processor and achieved the execution time of 75 ms and 50 ms for complex-valued and real-valued 512 × 512 FFT, respectively. This programmable approach on a set of next-generation microprocessors via the new FFT algorithm provides flexibility and adaptability to new applications and changing requirements, which is a definite advantage over specialized hardware-based solutions.