Over the past eight months I've been working on adding HE-AACv1 and HE-AACv2 support to the FFmpeg AAC decoder (ffaacdec). This work is now complete. Rob Swain wrote much of the HE-AACv1 decoder and deserves his share of the credit for it.
Right now the defacto standard AAC decoder is libfaad2. Faad for quite some time has supported decoding of HE-AAC(v2). Faad has a few problems though. Libfaad2 is pure GPL which is less than ideal for some applications. Ffaacdec is 100% LGPL. (If you are taking advantage of this licensing distinction, please consider donating to fund further work.) Libfaad2 also reinvents the wheel in regard to several standard bitstream and signal processing blocks. As Ronald says (paraphrased): FFmpeg already has very fast wheels. This makes our decoder much faster on floating point systems (like my Intel Core2).
HE-AACv1 (also known as aacPlus or AAC+) is the combination of MPEG-4 AAC-LC (which FFmpeg has supported natively since 2008) with a tool called Spectral Band Replication (SBR). SBR generates high frequency signal components based on the low frequency components and bit stream guidance. SBR uses a 32 band 32 time-slot analysis QMF to generate a joint time frequency representation of each AAC frame. SBR generates new 32-high frequency bands and transforms the results back to the time domain.
HE-AACv2 (also known as aacPlusv2 or eAAC+) is the combination of mono HE-AAC with a tool called Parametric Stereo (PS). PS takes the mono SBR QMF output and generates a left and right SBR QMF output. It does this by further splitting the QMF into 71 or 91 bands and using a few very small parameters to mix the input signal with a decorrelated variant of itself.. The parameters it uses are interchannel intensity difference (IID), interchannel coherence (ICC), and phase rotations (IPD/OPD). The ICC mode also is used to choose which mixing mode is used. PS also comes in a baseline flavor which always uses 71 bands, always uses mixing mode A, and does not use IPD/OPD.
A few implementation notes:
- Right now the FFmpeg decoder supports unrestricted PS. It can emulate baseline behavior by setting the PS_BASELINE define to 1 in aacps.c. This however does not fully optimize for the baseline case.
- It seems that mainstream encoders focus on the baseline feature set, especially in regard to IPD/OPD.
- The SBR filterbank has been optimized on top of a DCT-IV which in turn is implemented on top of the FFmpeg half IMDCT. This is based on a paper by Han-Wen Hsu, et al. Please note that many of the final equations presented by the paper are obviously incorrect but if you follow their work you should be able to arrive at the correct equations.
- The PS filterbank remains (largely) unoptimized.
- Coding Technologies (creators of SBR and PS) had an HE-AAC(v2) decoder test package focused mainly on signaling the presence of the SBR and PS extensions. While it disappeared after Dolby acquired CT, it is mirrored in the mplayer samples archive.
- The SBR spec says the maximum number of patches allowed is 5, the stream in the CT package has six. This leads me to believe that some version of the CT encoder didn't strictly enforce that limit and thus decoders should follow suit.
- The reference software and the spec seem to disagree about when to apply ipd/opd smoothing. The spec says only to apply it when ipd/opd are enabled. The reference software applies it constantly. These may seem equivelent but in actuality the reference decoder applies it to envelopes passed the end of the IPD/OPD enabled envelopes due to the 3-tap smoother. For now FFmpeg follows the spec here.
Related future work:
- MPEG Surround is a more flexible multichannel coding scheme based on PS.
- Allow the user to optionally decode without SBR and/or PS. In addition 3GPP specifies an SBR domain downmix that should be used if a mono channel configuration is requested.
- Add support for less popular AAC variants to the decoder including LTP, LD, and 960.
- The current experimental FFmpeg AAC-LC encoder is buggy and suboptimal.