Spectral Hole: 2010-07

As it has been well established the new international gaming sensation StarCraft 2 uses the Theora codec to compress its prerendered video content. I've selected a few stills from one of the cinematics to look at the quality of their result. Maybe they jsut didn't use a high enough bitrate but these stills look subpar to me (though much better than the Smacker cutscenes from StarCraft 1.

Input #0, ogg, from 'cinematic_thedream.ogv':
  Duration: 00:02:45.70, start: 0.000000, bitrate: 3984 kb/s
    Stream #0.0: Data: skeleton
    Stream #0.1: Video: theora, yuv420p, 1280x720, 24 fps, 24 tbr, 24 tbn, 24 tbc
    Stream #0.2: Audio: vorbis, 44100 Hz, stereo, s16, 160 kb/s
    Metadata:
      ENCODER         : ffmpeg2theora-0.24

The thumbnails link to full size stills.

Frame 484
This looks very blocky to me.

Frame 548
There seems to be ringing around Kerrigan's body, especially her legs.

Frame 1072
More blocking.

I'm not saying that these cutscenes are necessarily representative of Theora's top quality. I merely think we should take the quality of the result into consideration when scoring this as a victory for Theora. Perhaps the cutscenes should have been encoded at a higher quality at the expense of releasing releasing on BluRay or multiple DVDs or some more content should have been pushed off the disc onto the release day patch. If they were absolutely stuck with this amount of space for cutscenes, I would have gladly paid a few extra cents for H.264 cut scenes.

For a long time I've been using my ugly home spun aac-conf-tools to verify FFmpeg's decoder against the MPEG reference decoder over the ISO test vectors. This approach has one huge problem; it requires the non-free, unportable, and hard to build ISO reference software.

Luckily Mans Rullgard has come to my rescue and added off-by-one testing to FATE. This allows us to compare FFmpeg's output to predecoded streams. While migrating to this method it seemed worthwhile to use the output streams provided by ISO rather than decode ideal output on my system with FFmpeg or the reference decoder. In particular I don't trust the sloppy reference code on a modern compiler.

However this has caused several problems. Most importantly it appears that the output for the al##/am## series starts 2048 samples late compared to the reference decoder. For now I've generated silence (for streams that open with silence) or decoded the first 2048 samples with the reference decoder (for streams that don't) and prepended it to those streams as appropriate. The PNS (perceptual noise substitution) tool added in MPEG-4 AAC takes parts of the signal that noisy parts of the signals describes the noise, and allows the decoder to regenerate the noise. FFmpeg uses a RNG to generate the noise that is different from from the reference decoder so our results are different than the reference rendering but still fall within the requirements for conformance.

For the time being I've added five tests to try to cover the bulk of AAC features. Let's look at the tests individually.

fate-aac-al04_44: This test covers AAC-LC mono at 48000 Hz with the following bitstream features: program config element, data stream element, pulse data, TNS, and window shape switching.
fate-aac-al07_96: This test covers AAC-LC 5.1 at 96000 Hz with the following bitstream features: program config element, intensity stereo, mid/side stereo, TNS, and dependent coupling.
fate-aac-am00_88: This test covers AAC-Main mono at 88200 Hz with the following bitstream features: program config element, window shape switching, and backwards prediction.
fate-aac-al_sbr_hq_cm_48_2: This test covers AAC-LC stereo at 24000 Hz + SBR with the following bitstream features: indexed channel configuration, mid/side stereo, TNS, window shape switching, pure upsampling SBR, and upsampled SBR synthesis.
fate-aac-al_sbr_ps_06_ur: This test covers AAC-LC mono at 16000 Hz + SBR + PS with the following bitstream features: program config element, window shape switching, pure upsampling SBR, upsampled SBR synthesis, PS IID data, PS ICC data, PS mixing mode A, and PS iid-/icc-mode.

Things that are missing are syntax element order switching, PNS (explained above), non-meaningful window transitions (which FFmpeg handles differently than the spec does), independent coupling, downsampled SBR, the detailed SBR tool tests, PS IPD/OPD, PS mixing mode B, other sampling frequencies including 7350 Hz (missing from the conformance suite), HE-AAC signaling (the CT suite has its own problems). In addition the unofficial extensions FFmpeg supports like relaxed channel ordering and 6 patches in SBR are also missing.

Spectral Hole

2010-07-28

StarCraft 2 Cutscenes

2010-07-11

AAC Verification

About Me