Fast Fourier Transform in C++ using KFR
In this article I’ll show you how to use Fast Fourier Transform in Digital Signal Processing and how to apply forward and inverse FFT on complex and real data using the KFR framework.
Introduction
Fast Fourier Transform (FFT) can be used to perform:
 Convolution (including convolution reverberation)
 Crosscorrelation and autocorrelation
 Applying large FIR filters
 Sample rate conversion
 Spectrum visualization
 Large integer multiplication
 and many other algorithms
Often FFT is the most efficient way to perform each of these algorithms.
Moreover, KFR has one of the most efficient FFT implementation, so you can get a great performance boost of your DSP applications using KFR’s FFT for all of these algorithms.
Differences from other implementations
KFR implementation of the FFT:
 is fully optimized for X86, X8664, ARM and AARCH64 processors
 uses vector intrinsics (if available for cpu)
 supports both single and double precision
 can cache internal data between calls to speed up plan creation
 can do forward and inverse FFT without a need to create two plans
 can be used for complextocomplex, realtocomplex and complextoreal transforms
 doesn’t require computing FFT at runtime and measuring their execution time to find an optimal configuration
 has special implementations for FFT sizes up to 256
 has no external dependencies
 is threadsafe
 is written in modern C++
 is headeronly
 is open source (GPL license)
Getting started
For the list of supported compilers and OSes, see the latest README: https://github.com/kfrlib/kfr/blob/master/README.md

Download the latest version of KFR:

Add the
include
folder to your include path. 
Insert the following line at the beginning of your project’s source code:
#include <kfr/dft.hpp>
KFR is headeronly, so no need to compile any source files,
create static library and pass it to linker.
The only thing you must to do is to unsure that kfr
folder from include
directory is available for inclusion.
Quick example
Complex input
Let’s apply the FFT to the complex input data:
using namespace kfr;
// prepare data (0, 1, 2, ..., 255)
univector<complex<double>, 256> data = cexp(
linspace(0, c_pi<double, 2>, 256) * make_complex(0, 1));
univector<complex<double>, 256> freq;
// do the forward FFT
freq = dft(data);
To perform the inverse FFT do the following:
data = idft(freq);
But as a result, we will get an input scaled by size (256 in this example): 0, 256, 512, ...
This is a property of FFT.
To get a desired result we have to divide result by size:
data = idft(freq) / data.size();
Real input
Frequency data are stored in CCS format as stated in the table below.
Note, that second half of an output is not stored because of symmetry.
index  real  imaginary 

0  frequency[0].real() (DC offset)  always 0 because of real input 
1  frequency[1].real()  frequency[1].imag() 
2  frequency[2].real()  frequency[2].imag() 
…  …  … 
N/21  frequency[N/21].real()  frequency[N/21].imag() 
N/2  frequency[N/2].real() (Nyquist frequency)  always 0 because of real input 
The size of the output data is equal to size/2+1
using namespace kfr;
// prepare data (0, 1, 2, ..., 255)
univector<double, 256> data = counter();
univector<complex<double>, 129> freq; // data.size() / 2 + 1
// do the forward FFT
freq = realdft(data);
For the inverse FFT, you have to prepare frequency data in the CSS format as well.
To be sure that you will get pure real output, you must ensure that freq[0] and freq[N/2] are real numbers.
using namespace kfr;
data = irealdft(freq);
// or to get the properly scaled result:
data = irealdft(freq) / data.size();
Creating FFT plan
Implementation of FFT requires twiddle coefficients to be prepared before actual processing occurs. If FFT will be performed more than once, then it makes sense to store the coefficients and reuse it every time.
FFT Plan caching
If you are using dft
, idft
, realdft
or irealdft
functions,
all plans will be kept in memory, so the next call to these functions
will reuse the saved data.
You can manually get the plan from the cache (or create a new if it doesn’t exist in the cache):
dft_plan_ptr<T> dft = dft_cache::instance().get(ctype<T>, size);
All functions related to the FFT caching are protected with mutex and are thread safe
ctype
is a special template variable intended
just to pass a type to the function.
dft_plan_ptr
is an alias for std::shared_ptr<const dft_plan<T>>
To clear all saved plans and free memory, call the clear
function:
dft_cache::instance().clear();
As of version 1.3.0, functions
convolve
andcorrelate
use the cache internally.
Examples using dft_plan
You can create a plan manually without using the cache to get more control over it.
dft_plan
class is intended to store all data needed for FFT.
using namespace kfr;
template <typename T> // data type, float or double
struct dft_plan
{
dft_plan(size_t size);
void execute(complex<T>* out,
const complex<T>* in,
u8* temp,
bool inverse = false) const;
void execute(univector<complex<T>, N>& out,
const univector<const complex<T>, N>& in,
univector<u8, N2>& temp,
bool inverse = false) const;
...
};
dft_plan
has constructor that takes FFT size as an argument.
Input and output arguments for calls to the execute
function
must have types complex<T>*
or univector<complex<T>, N>
.
You can create a plan and use it as many times as needed. After creating a plan, all access to it is threadsafe, because executing the FFT doesn’t modify internal data, so you can share this plan across threads in your application without protecting the plan with locks and mutexes.
using namespace kfr;
dft_plan<double> plan(1024);
univector<complex<double>, 1024> in;
univector<complex<double>, 1024> out;
// here fill `in` array with our data (samples)
univector<u8> temp(plan.temp_size);
plan.execute(out, in, temp, false); // direct FFT
// `out` now contains frequencies which have to be processed
plan.execute(in, out, temp, true); // inverse FFT
// `in` now contains processed data (samples)
...
// process new data
plan.execute(out, in, temp, false); // direct FFT
Real to complex and complex to real transforms using dft_plan_real
using namespace kfr;
dft_plan_real<double> plan(1024); // dft_plan_real for real transform
univector<double, 1024> in;
univector<complex<double>, 1024> out;
// here fill `in` array with our data (samples)
univector<u8> temp(plan.temp_size);
plan.execute(out, in, temp); // direct FFT
// `out` now contains frequencies which have to be processed
plan.execute(in, out, temp); // inverse FFT
// `in` now contains processed data (samples)
Conclusion
As you can see, using the Fast Fourier Transform is not so hard thing with KFR. Forward and inverse, complex and real transforms can be performed in one line of code with great performance and flexibility.
Moreover, the speed of KFR implementation of the FFT is in many cases higher than the speed of the most known and used libraries written in C, C++ or any other language.