a) Perform the first dimension of processing on data stored in the fast dimension of SDRAM
b) Reorganize (corner-turn) the data so that the second dimension of processing is in the fast dimension of SDRAM
c) Perform the second dimension of processing on data stored in the fast dimension of SDRAM
On the other hand, shared memory architectures, offer the potential for nearly eliminating the corner-turn step in this process, resulting in substantially reduced real-time execution. To obtain this reduction, the algorithm library that runs on the shared memory architecture needs to provide the option to execute algorithms efficiently from strided data sets.
The DNA-CS VQG4 quad Power PC board utilizes shared memory architecture and has a complementary "Core" VSIPL Library implementation available. This VSIPL implementation has been optimized for both strided and unstrided data sets in order to obtain the highest performance possible for either data form. This paper describe the processing strategy and provide real-time execution estimates based on the DNA-CS VSIPL Library performance as well as the VQG4's ability to sustain data transfers with SDRAM to obtain a transparent corner-turn.
For this data set size and computational requirement, 88% of the corner-turn penalty is eliminated. Each data array size and processing requirement will yield different results. What is most important is that the corner-turn overhead can be virtually eliminated from real-time processing estimates by the combination of shared memory and optimized libraries that support data striding.
View Entire Paper | Previous Page | White Papers Search
If you found this page useful, bookmark and share it on: