widdle factors for both DIT
and DIF can be found by looking
at Fig.|3
.ul
backwards and reversing the
arrows.
Fig.|5 shows an algorithm
for avoiding bit-reversal, at the
cost of doing a "not-in-place" algorithm.
Only the DIT version is shown but
DIF can easily be done by
placing twiddles after the nodes.
.pg
Fig.|6 shows how an FFT algorithms
can be designed which makes use of
a high speed scratch memory.
We will discuss the hardware
implications of this structure later
but for now we point
out that nodes can be pained
and two butterflies done on
4 input samples so that these
samples pass through 2 FFT
stages before the next 4 samples are
handled.
For example, if we enter
samples 0 and 8 as a
pair into node 0 of stage
0 and 4 and 12 into
node 4 of stage 0.
After doing these 2 butterflies we
proceed to node 0, stage 1 and node 4,
stage 1, winding up in the same
4 registers 0, 4, 8 and 12.
The paths we have followed
are indicated by the crosses.
In the same way, we
can enter registers 1, 5, 9 and
13 and again proceed through
2 stages.
In this way, half as
many memory cycles are
needed, provided that the
arithmetic element can handle
4 samples, rather than the usual 2.
In this particular version, but
reversal takes place since the
algorithm can be though t of as
"in-place-2 stages at a time".
.pg
Fig.|7 shows how bit-reversal
can be avoided.
In this 16 point algorithm
the first 2 stages are done
as in Fig.|1, namely, straightforward
in-place operation.
From then on, we enter 4
samples into air arithmetic element
to do two butterflies and
permits the results as shown
in the last two stages.
Notice that we are violating
our rate not to assign numbers
to registers, the propose being to trace
through the indexing.
Since our register numbering
emerges bit-reversal, it follows that
the output samples must be
normally ordered ( after all, in a completely
in-place algorithm, the
register numbering is untouched
throughout and the result emerges
bit -reversal).
.pg
Fig.|8 is another "two butterflies
at a time algorithm" but for
a different purpose than Fig.|7.
In Fig.|8 we arrange the
memory registers so as to
be able to read (or write) 2 complex
words at a time.
Thus, for example, samples 0 and
8 are entered into the butterfly in
parallel, saving a memory cycle.
The table in the right
hand corner shows the desired
matching up of samples as the FFT
progresses; in order to achieve
this match so that parallelism
can be maintained, permutation of 4
output points at a time
must be performed.
For example, the samples 0,8 and 4, 12