The Math Behind Flash’s FFT Results

I’ve been playing with the HYPE framework recently, and I noticed that they use a SoundAnalyzer class to wrap the Flash Player’s native SoundMixer.computeSpectrum method. This method’s FFT mode is known to have some problems (not really, but more on that later), and as it turns out I’ve dealt with this some already. So here it is (after sitting in my drafts for longer than I care to admit), the Math behind Flash’s FFT!

First, let’s take a look at some of the raw data from the computeSpectrum method. When I originally intended to write this post I took the opportunity to capture aggregate results over a large number of frequency sweeps using an AIR application, and I’ve posted some of the data for you here.

index floor peak ceiling
0 2 6 23995
1 2 61 20591
2 9 103 23995
3 2 139 20591
4 14 187 23995
5 7 218 20591
253 947 10901 23950
254 217 10947 23995
255 947 10990 23950

As I alluded to earlier, the long known problem with computeSpectrum‘s FFT is that the majority of frequencies seem to be skewed toward the left side – but what they don’t mention in the docs is that it’s because the FFT results are still distributed linearly. Yep, even though the floor and ceiling values in my data were mostly useless (perhaps due to harmonics), you can see a rough pattern in the peak frequencies – an average linear distribution of about 43.07451!

…okay, now let me back up a second to explain that. The way that we naturally understand sound is logarithmic. In musical terms this means that an octave below the note A-440 (440 is the frequency) is A-220, but an octave above A-440 is actually A-880 (not 660!). Audibly we think of the distance between A-440 and A-220 to be the same as between A-440 and A-880, but when displayed in a linear distribution (like our FFT results) A-880 is twice the distance from A-440 as A-220 – and when you use this linear distribution to create visualizations, it doesn’t match our natural perception of the sound. So, just in case any of that got really confusing, I created a little Flash application to illustrate the issue better: FFTMath.

It’s important to note that this isn’t a flaw in Flash’s FFT results. FFT calculations are supposed to return linear frequency distributions, but this raw data needs to be redistributed into logarithmically spaced frequency bands before being used for visualizations. Before redistributing our linear FFT results though, we’re going to need some math to explain them.

After lots of searching on my magic number and a little bit of luck, I came across this amazing blog post: http://code.compartmental.net/2007/03/21/fft-averages/ – and Eureka!

Each point i in the FFT represents a frequency band centered on the frequency i/1024 * 44100 whose bandwidth is 2/1024 * 22050 = 43.0664062 Hz…

Alright math club, here’s the good part. Based on the math from this awesome blog post and the results from my testing, it looks like Flash uses a sample size of 1024 and a default sample rate of 44100 resulting in frequency bands of (2/1024) * (44100/2) or 43.0664062 Hz. Even though an FFT with 1024 samples should return 513 valid results below the Nyquist frequency (sampleRate/2), Flash further clips the data to 256 results for a top frequency of around 11,025 Hz – still much higher than needed for most visualizations.

To review:

frequency = i/1024 * 44100
bandwidth = (i==0) ? 1/1024 * 22050 : 2/1024 * 22050;

This information let’s us understand the results we’re getting from computeSpectrum. In my next post we’ll use this information to redistribute the data and create more accurate frequency visualizations in Flash.

Download: FrequencyAnalyzer.zip (2.37 MB)

7 Responses to “The Math Behind Flash’s FFT Results”

  1. Branden Hall says:

    Great info Ben, you beat me to the write up! Honestly, I hadn’t even done all of the math yet – the values used in HYPE were obtained experimentally rather than computationally. I just did a similar sweep across the frequencies, noted where each index peaked, and then divided up the data for use in the getOctave method.

  2. makc says:

    That’s an interesting point, however I think you will experience data loss after such redistribution. Values on upper side will be compressed out, and values on lower side will be sparse. So unless you also increase number of results, you will have mostly interpolated (non-original) values. Not sure how much worse this makes the result.

  3. Branden Hall says:

    makc – that’s exactly why I implemented a getOctave method rather than getFrequency. At the lowest end of the scale it’s 1 value to an octave, at the highest it’s something like 92 – so in HYPE I just take the peak value within the range that defines the octave.

  4. Ben says:

    makc, Branden – You’re right, redistributing to 256 values creates a fidelity problem. However, there is enough data for most use cases – a standard 31 band frequency display for example. You can also boost the fidelity of the lower frequencies by using a smaller sample rate. I’ll try to address these in my next post.

  5. doug says:

    Oh my god… how many years have i been searching for this!

    Absolutely top notch thank you so much ben.

  6. [...] Ben Stucki has started to attempt to explain his findings in respect of better understanding ComputeSpectrum Ben Stucki – The Math Behind Flash FFT Results [...]

  7. [...] my last post I wrote about The Math Behind Flash’s FFT Results and discussed the need to transform the default linear values returned by computeSpectrum into [...]