Spectral Hole: ffmpeg

Showing posts with label ffmpeg. Show all posts

2011-01-13

FFmpeg AAC Decoder Todo List

Seeing as I resigned from being the FFmpeg AAC maintainer, here is my remaining decoder todo list for those interested in hacking on the decoder:

Add gapless and preroll support. This requires some libavformat integration but the lossless codecs must at least have some sort of gapless support. Support for the iTunSMPB tag is a must.
Optimize the SBR windowing. The actual transform should be fairly optimal. See the implementation notes from my first blog post.
Optimize the parametric stereo filterbank. There is a lot of redundant math in there.
Add fixed point support.
Consider a more conservative approach to SBR and PS detection. (If there is no signaling and the sample rate is less than or equal to 24 kHz assume SBR. If SBR is used and the stream is mono assume PS).

2010-09-20

Recently I wrote about the AAC channel model. Since then I tested a variety of real world broken files (synthetically recreated) with the reference decoder, faad, WinAMP, iTunes, and the Microsoft Windows 7 Decoder. I've placed the results in the multimedia wiki. I can't seem to get the CSS right to display it here.

Looking at the results of the other decoders I managed to fix bad_concat while removing special hacks for streams like elem_id0 with a new approach in FFmpeg. FFmpeg now treats streams with PCEs according to the letter of the channel configuration in the PCE. For streams lacking a PCE, element instance tags are now ignored and channels are treated purely positionally.

One other thing worth noting is that I couldn't make iTunes produce Parametric Stereo effects on any of the seven signalling streams from CT. Supposedly iTunes has supported PS since version 9.2. Perhaps it is due to the use of pure upsampling SBR.

2010-08-31

Why you don't want to build your Chromium packages against a system copy of FFmpeg

Why you don't want to build your Chromium packages against a system copy of FFmpeg:

Chromium's internal FFmpeg copy is based on FFmpeg-mt. FFmpeg-mt is an experimental multithreaded branch of FFmpeg. It hasn't been deemed ready to merge into FFmpeg proper. Chromium only uses a subset of FFmpeg functionality and is thus less likely to experience regressions from using FFmpeg-mt. FFmpeg-mt is however much faster than vanilla FFmpeg on multicore systems.
Chromium's FFmpeg is heavily patched. Some of the patches allow for a checked get_bits() implementation. Better input buffer checking without the overhead of a checked get_bits() will probably require a major bump in libavcodec. Some simply remove code that happens to be dead in Chromium's FFmpeg but is useful in general. Some seem to have gotten stuck in patch hell.
Chromium makes use of new FFmpeg features very quickly. Chromium first made use of av_register_protocol2() just one month after it was added to FFmpeg. If you are using system FFmpeg with Chromium either video playback breaks or the browser update is blocked by a system wide FFmpeg update.

If you are concerned about shipping FFmpeg, you can build Chromium against its internal libffmpegsumo and just not ship libffmpegsumo. Using the version number it is easy for a third party to reconstruct a libffmpegsumo that matches your Chromium build.

2010-07-11

AAC Verification

For a long time I've been using my ugly home spun aac-conf-tools to verify FFmpeg's decoder against the MPEG reference decoder over the ISO test vectors. This approach has one huge problem; it requires the non-free, unportable, and hard to build ISO reference software.

Luckily Mans Rullgard has come to my rescue and added off-by-one testing to FATE. This allows us to compare FFmpeg's output to predecoded streams. While migrating to this method it seemed worthwhile to use the output streams provided by ISO rather than decode ideal output on my system with FFmpeg or the reference decoder. In particular I don't trust the sloppy reference code on a modern compiler.

However this has caused several problems. Most importantly it appears that the output for the al##/am## series starts 2048 samples late compared to the reference decoder. For now I've generated silence (for streams that open with silence) or decoded the first 2048 samples with the reference decoder (for streams that don't) and prepended it to those streams as appropriate. The PNS (perceptual noise substitution) tool added in MPEG-4 AAC takes parts of the signal that noisy parts of the signals describes the noise, and allows the decoder to regenerate the noise. FFmpeg uses a RNG to generate the noise that is different from from the reference decoder so our results are different than the reference rendering but still fall within the requirements for conformance.

For the time being I've added five tests to try to cover the bulk of AAC features. Let's look at the tests individually.

fate-aac-al04_44: This test covers AAC-LC mono at 48000 Hz with the following bitstream features: program config element, data stream element, pulse data, TNS, and window shape switching.
fate-aac-al07_96: This test covers AAC-LC 5.1 at 96000 Hz with the following bitstream features: program config element, intensity stereo, mid/side stereo, TNS, and dependent coupling.
fate-aac-am00_88: This test covers AAC-Main mono at 88200 Hz with the following bitstream features: program config element, window shape switching, and backwards prediction.
fate-aac-al_sbr_hq_cm_48_2: This test covers AAC-LC stereo at 24000 Hz + SBR with the following bitstream features: indexed channel configuration, mid/side stereo, TNS, window shape switching, pure upsampling SBR, and upsampled SBR synthesis.
fate-aac-al_sbr_ps_06_ur: This test covers AAC-LC mono at 16000 Hz + SBR + PS with the following bitstream features: program config element, window shape switching, pure upsampling SBR, upsampled SBR synthesis, PS IID data, PS ICC data, PS mixing mode A, and PS iid-/icc-mode.

Things that are missing are syntax element order switching, PNS (explained above), non-meaningful window transitions (which FFmpeg handles differently than the spec does), independent coupling, downsampled SBR, the detailed SBR tool tests, PS IPD/OPD, PS mixing mode B, other sampling frequencies including 7350 Hz (missing from the conformance suite), HE-AAC signaling (the CT suite has its own problems). In addition the unofficial extensions FFmpeg supports like relaxed channel ordering and 6 patches in SBR are also missing.

2010-06-27

Adding HE-AAC support to ffaacdec

Over the past eight months I've been working on adding HE-AACv1 and HE-AACv2 support to the FFmpeg AAC decoder (ffaacdec). This work is now complete. Rob Swain wrote much of the HE-AACv1 decoder and deserves his share of the credit for it.

Right now the defacto standard AAC decoder is libfaad2. Faad for quite some time has supported decoding of HE-AAC(v2). Faad has a few problems though. Libfaad2 is pure GPL which is less than ideal for some applications. Ffaacdec is 100% LGPL. (If you are taking advantage of this licensing distinction, please consider donating to fund further work.) Libfaad2 also reinvents the wheel in regard to several standard bitstream and signal processing blocks. As Ronald says (paraphrased): FFmpeg already has very fast wheels. This makes our decoder much faster on floating point systems (like my Intel Core2).

HE-AACv1 (also known as aacPlus or AAC+) is the combination of MPEG-4 AAC-LC (which FFmpeg has supported natively since 2008) with a tool called Spectral Band Replication (SBR). SBR generates high frequency signal components based on the low frequency components and bit stream guidance. SBR uses a 32 band 32 time-slot analysis QMF to generate a joint time frequency representation of each AAC frame. SBR generates new 32-high frequency bands and transforms the results back to the time domain.

HE-AACv2 (also known as aacPlusv2 or eAAC+) is the combination of mono HE-AAC with a tool called Parametric Stereo (PS). PS takes the mono SBR QMF output and generates a left and right SBR QMF output. It does this by further splitting the QMF into 71 or 91 bands and using a few very small parameters to mix the input signal with a decorrelated variant of itself.. The parameters it uses are interchannel intensity difference (IID), interchannel coherence (ICC), and phase rotations (IPD/OPD). The ICC mode also is used to choose which mixing mode is used. PS also comes in a baseline flavor which always uses 71 bands, always uses mixing mode A, and does not use IPD/OPD.

A few implementation notes:

Right now the FFmpeg decoder supports unrestricted PS. It can emulate baseline behavior by setting the PS_BASELINE define to 1 in aacps.c. This however does not fully optimize for the baseline case.
It seems that mainstream encoders focus on the baseline feature set, especially in regard to IPD/OPD.
The SBR filterbank has been optimized on top of a DCT-IV which in turn is implemented on top of the FFmpeg half IMDCT. This is based on a paper by Han-Wen Hsu, et al. Please note that many of the final equations presented by the paper are obviously incorrect but if you follow their work you should be able to arrive at the correct equations.
The PS filterbank remains (largely) unoptimized.
Coding Technologies (creators of SBR and PS) had an HE-AAC(v2) decoder test package focused mainly on signaling the presence of the SBR and PS extensions. While it disappeared after Dolby acquired CT, it is mirrored in the mplayer samples archive.
The SBR spec says the maximum number of patches allowed is 5, the stream in the CT package has six. This leads me to believe that some version of the CT encoder didn't strictly enforce that limit and thus decoders should follow suit.
The reference software and the spec seem to disagree about when to apply ipd/opd smoothing. The spec says only to apply it when ipd/opd are enabled. The reference software applies it constantly. These may seem equivelent but in actuality the reference decoder applies it to envelopes passed the end of the IPD/OPD enabled envelopes due to the 3-tap smoother. For now FFmpeg follows the spec here.

Related future work:

MPEG Surround is a more flexible multichannel coding scheme based on PS.
Allow the user to optionally decode without SBR and/or PS. In addition 3GPP specifies an SBR domain downmix that should be used if a mono channel configuration is requested.
Add support for less popular AAC variants to the decoder including LTP, LD, and 960.
The current experimental FFmpeg AAC-LC encoder is buggy and suboptimal.

Spectral Hole