Spectral Hole

2012-05-02

High Priority Libav Projects

After Reading Kostya's High Priority Libav Projects I thought I'd make my own list:

Tools:

avprobe compact output: I'd like to see compact tabular output for avprobe -show_packets that can easily and quickly glanced over by a human.
avprobe machine readable output: I'd like to see avprobe export data in a standard format well understood by other tools like json rather than the current pseudo-ini format.
avconv audio enhancements: avconv should skip resampling and remixing when decoding to null.

Filters:

Audio filtering system: I'd like to see an audio filtering system that does remixing, resampling, nyquist commands, sound touch effects, auto gain control, arbitrary FIR and IIR filtering.

Testing:

probe-test: Each year dozens of changes are made to the probing functions fixing handfuls of files while breaking others. This happens particularly often in the four way war between AC3, MP3, MPEG-PS, and MPEG-TS. There should be a comprehensive probe test to make sure the these tweaks don't break other files. Every new tweak should be accompanied with a new file to the probe test that didn't work before the tweak and does work after.
Regular automated fuzzing: ClusterFuzz, GCI, and a variety of other projects have found lots of crashers and hangs in libav that have been fixed. We should do large scale automated fuzzing, similar to fate on a regular basis to find when new code and changes are making the code more vulnerable.
Better code coverage by fate: Our non-error path code coverage is really abysmal.

Encoders:

AAC Encoder: Libav has a not so great AAC encoder and library wrappers to several not so great AAC encoder libraries. We need some way to provide high quality AAC encoding support to the user. At this point the situation is so bad I think that standardized wrappers to proprietary vendor encoders and to encoder SDKs would be a huge step forward.

Decoders:

Improved VC-1 decoding: I've had the misfortune of seeing dozens and dozens of Windows Media Files that are identified as corrupt by expression encoder, show noticeable artifacting with Libav, but play perfectly in WMP.
GoToMeeting: I'd love to see decoders for GoToMeeting 2, 3, and 4. G2M4 doesn't even play nice with the MPlayer DLL loader.
Apple Intermediate Codec: While it is becoming less relevant year over year, AIC based QuickTime intermediate files continue to pop up regularly.
MP3Pro: While many multimedia hackers claim to never have seen MP3Pro files in the wild, a reasonable number of 22050 Hz MP3s featuring wide band content are in fact MP3Pro. I'll admit a bit of my interest in MP3Pro is academic curiosity about the history of SBR.
VoxWare Metasound: Once upon a time his codec did seem to have some popularity.
Theora cropping: Modern Theora streams seem to have the annoying habit of centering the picture when its size is not a multiple of 16 rather than placing the picture in the top left. It would be nice if we supported this cropping scheme.
Opus: Opus is the most baller emerging audio codec of this era. I'm a little bit jealous of the GSoC student working on it.

Demuxers:

Multistream PCM in MOV and MXF: Many professional MOV (and MXF) files use multiple mono PCM tracks for multichannel audio rather than a single multichannel track. It would be nice if we detected and supported these files as first class citizens.

Muxers:

Low overhead MPEG-TS output for HLS streaming.
Better MP4 profile support in iods: The MP4 muxer should be able to tell what profile and level to put into the iods atom automatically when encoding AAC and H.264.

Some things that I'm less excited about

VP4 and VP7: These codecs never seemed to hit critical mass. MPEG-2 and H.264 will be around forever regardless of what the freetard detractors say. No one seems to miss VP4 and 7 except for On2 enthusiasts and Skype packet snoopers.
More game codecs: Game codecs seem to be less robust than general purpose codecs: I'd rather not expand the attack surface to support a hand full of files used by single games. Popular game codecs like Smacker and Bink are fine as well as codecs used by popular modable games. Between Bink and Xiph codecs, we seem to cover prerendered assets for most current mainstream games.

2011-09-15

Why an Open Source ProRes Decoder Matters

ProRes 422 is Apple's lossy video high quality video compression format. It is the native format of Apple's popular Final Cut Pro video editing software. ProRes is also the format requested for HD Television and HD and SD Film to be delivered to the iTunes Store.

Apple favors its own OS X when releasing the ProRes. Apple updated the ProRes decoder for Mac OS X in June 2011, and Final Cut Pro X only for Macintosh provides a 64-bit ProRes implementation. The Windows download is 32-bit only and stuck in 2008 (not to mention the sorry state of QuickTime for Windows in general), and there is no ProRes decoder for Linux despite the growing cloud transcoding industry. Some Linux users have even resorted to loading a Windows DLL into MPlayer and Mencoder to handle ProRes content. There do appear to be a small handful of third party products that by hook or by crook include ProRes support.

Still Apple products and tools are the only first class citizens of the ProRes ecosystem or were until now. Today an intrepid hacker released an open source reverse engineered ProRes decoder for FFmpeg. Now everyone trying ingest ProRes files is on an equal footing. The iTunes Store's unfair ingestion advantage over Hulu, Netflix, YouTube, and the cloud transcoding industry is over. The days of loading a 32-bit Windows DLL are over. ProRes can now be decoded on ARM, in a 64-bit process, or on any platform with a C compiler. Hopefully someday this decoder will be extended into an encoder as well to end Apple's advantage in authoring ProRes not just consuming it.

These opinions are strictly mine and not those of my employer (YouTube).

2011-01-13

FFmpeg AAC Decoder Todo List

Seeing as I resigned from being the FFmpeg AAC maintainer, here is my remaining decoder todo list for those interested in hacking on the decoder:

Add gapless and preroll support. This requires some libavformat integration but the lossless codecs must at least have some sort of gapless support. Support for the iTunSMPB tag is a must.
Optimize the SBR windowing. The actual transform should be fairly optimal. See the implementation notes from my first blog post.
Optimize the parametric stereo filterbank. There is a lot of redundant math in there.
Add fixed point support.
Consider a more conservative approach to SBR and PS detection. (If there is no signaling and the sample rate is less than or equal to 24 kHz assume SBR. If SBR is used and the stream is mono assume PS).

2010-12-19

Android's Stagefright AAC Encoder or Reference Code by Any Other Name Would Smell as Sweet

Android and AAC

As readers of this blog know, AAC encoding holds a spot close to my heart. I'm responsible for getting the worst AAC encoder of all time included into FFmpeg and was never successful at fixing it.

The recent Android 2.3 "Gingerbread" release contains an Apache licensed AAC encoder as part of Android's new stagefright media library. I was excited that the free software community may finally have a good free AAC encoder. Unfortunately the stagefright AAC encoder seems to be little more than an optimized version of 3GPP's 26.411 fixed point reference encoder.

Origins

Looking at the source tree there is an immediate resemblance between the 3GPP code and the stagefright code:

$ ls 26411-900/26411-900-ANSI-C_source_code/3GPP_enhanced_aacPlus_etsiopsrc_200907/ETSI_aacPlusenc/etsiop_fastaacenc/src
aac_ram.c       bitenc.c        interface.c          psy_main.c   stat_bits.h
aac_ram.h       bitenc.h        interface.h          psy_main.h   stprepro.c
aac_rom.c       block_switch.c  line_pe.c            qc_data.h    stprepro.h
aac_rom.h       block_switch.h  line_pe.h            qc_main.c    tns.c
aacenc.c        channel_map.c   ms_stereo.c          qc_main.h    tns.h
adj_thr.c       channel_map.h   ms_stereo.h          quantize.c   tns_func.h
adj_thr.h       dyn_bits.c      pre_echo_control.c   quantize.h   tns_param.c
adj_thr_data.h  dyn_bits.h      pre_echo_control.h   sf_estim.c   tns_param.h
band_nrg.c      fft.c           psy_configuration.c  sf_estim.h   transform.c
band_nrg.h      fft.h           psy_configuration.h  spreading.c  transform.h
bit_cnt.c       grp_data.c      psy_const.h          spreading.h
bit_cnt.h       grp_data.h      psy_data.h           stat_bits.c

$ ls base/media/libstagefright/codecs/aacenc/src/
aac_rom.c      bitbuffer.c     line_pe.c            quantize.c
aacenc.c       bitenc.c        memalign.c           sf_estim.c
aacenc_core.c  block_switch.c  ms_stereo.c          spreading.c
adj_thr.c      channel_map.c   pre_echo_control.c   stat_bits.c
asm/           dyn_bits.c      psy_configuration.c  tns.c
band_nrg.c     grp_data.c      psy_main.c           transform.c
bit_cnt.c      interface.c     qc_main.c

As you can see almost all the files in stagefright aacenc are named identically to files in 26.403-v9.0.0. Still both are fixed point aac encoders and thus it is reasonable to expect similar names. Let's use Warren Toomy's Ctcompare tool to compare content.

./ctcompare -r 3gpp.ctf stagefright.ctf
5473  libstagefright/codecs/aacenc/src/aac_rom.c:1507-2262  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:701-1459
2270  libstagefright/codecs/aacenc/src/aac_rom.c:1044-1338  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:293-587
523  libstagefright/codecs/aacenc/basic_op/oper_32b.c:270-345  ETSI_aacPlusenc/etsiop_ffrlib/src/transcendent_enc.c:13-85
523  libstagefright/codecs/aacenc/basic_op/oper_32b.c:270-345  ETSI_aacPlusdec/etsiop_ffrlib/src/transcendent_enc.c:13-85
491  libstagefright/codecs/aacenc/src/bitenc.c:398-540  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:407-553
279  libstagefright/codecs/aacenc/src/aac_rom.c:1368-1403  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:593-628
243  libstagefright/codecs/aacenc/inc/aac_rom.h:65-95  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.h:38-69
218  libstagefright/codecs/aacenc/inc/bit_cnt.h:28-106  ETSI_aacPlusenc/etsiop_fastaacenc/src/bit_cnt.h:9-87
210  libstagefright/codecs/aacenc/src/aac_rom.c:2260-2347  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:1572-1659
199  libstagefright/codecs/aacenc/basic_op/oper_32b.c:42-179  ETSI_aacPlusenc/etsioplib/oper_32b.c:45-182
199  libstagefright/codecs/aacenc/basic_op/oper_32b.c:42-179  ETSI_aacPlusdec/etsioplib/oper_32b.c:45-182
196  libstagefright/codecs/aacenc/src/sf_estim.c:831-881  ETSI_aacPlusenc/etsiop_fastaacenc/src/sf_estim.c:768-809
194  libstagefright/codecs/aacenc/inc/tns.h:31-108  ETSI_aacPlusenc/etsiop_fastaacenc/src/tns.h:12-89
190  libstagefright/codecs/aacenc/src/psy_main.c:424-451  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:387-414
165  libstagefright/codecs/aacenc/inc/qc_data.h:23-92  ETSI_aacPlusenc/etsiop_fastaacenc/src/qc_data.h:4-73
157  libstagefright/codecs/aacenc/inc/tns_func.h:31-75  ETSI_aacPlusenc/etsiop_fastaacenc/src/tns_func.h:10-54
157  libstagefright/codecs/aacenc/src/tns.c:64-96  ETSI_aacPlusenc/etsiop_fastaacenc/src/tns.c:34-65
156  libstagefright/codecs/aacenc/inc/dyn_bits.h:23-80  ETSI_aacPlusenc/etsiop_fastaacenc/src/dyn_bits.h:4-61
147  libstagefright/codecs/aacenc/src/dyn_bits.c:305-351  ETSI_aacPlusenc/etsiop_fastaacenc/src/dyn_bits.c:319-363
145  libstagefright/codecs/aacenc/src/psy_main.c:607-656  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:559-600
139  libstagefright/codecs/aacenc/src/psy_main.c:397-414  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:359-376
131  libstagefright/codecs/aacenc/src/psy_main.c:381-397  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:342-358
130  libstagefright/codecs/aacenc/src/aac_rom.c:1393-1401  ETSI_aacPlusdec/etsiop_aacdec/src/aac_rom.c:928-936
128  libstagefright/codecs/aacenc/src/block_switch.c:81-110  ETSI_aacPlusenc/etsiop_fastaacenc/src/block_switch.c:53-75
126  libstagefright/codecs/aacenc/src/psy_main.c:42-78  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:24-60
123  libstagefright/codecs/aacenc/src/stat_bits.c:28-55  ETSI_aacPlusenc/etsiop_fastaacenc/src/stat_bits.c:10-32
122  libstagefright/codecs/aacenc/inc/psy_const.h:28-76  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_const.h:9-55
119  libstagefright/codecs/aacenc/src/bitenc.c:639-664  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:632-657
119  libstagefright/codecs/aacenc/src/bitenc.c:337-396  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:339-403
118  libstagefright/codecs/aacenc/src/block_switch.c:33-77  ETSI_aacPlusenc/etsiop_fastaacenc/src/block_switch.c:14-49
113  libstagefright/codecs/aacenc/inc/psy_data.h:23-66  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_data.h:4-47
110  libstagefright/codecs/aacenc/src/bitenc.c:620-639  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:613-632
109  libstagefright/codecs/aacenc/src/psy_main.c:382-393  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:361-372
109  libstagefright/codecs/aacenc/src/psy_main.c:399-410  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:343-354
108  libstagefright/codecs/aacenc/inc/qc_data.h:93-136  ETSI_aacPlusenc/etsiop_fastaacenc/src/qc_data.h:73-116
105  libstagefright/codecs/aacenc/basic_op/typedef.h:23-63  ETSI_aacPlusdec/etsioplib/typedef.h:15-55
105  libstagefright/codecs/aacenc/basic_op/typedef.h:23-63  ETSI_aacPlusenc/etsioplib/typedef.h:15-55
104  libstagefright/codecs/aacenc/inc/adj_thr_data.h:30-69  ETSI_aacPlusenc/etsiop_fastaacenc/src/adj_thr_data.h:13-52
103  libstagefright/codecs/aacenc/src/aac_rom.c:1489-1491  ETSI_aacPlusdec/etsiop_aacdec/src/aac_rom.c:72-79
102  libstagefright/codecs/aacenc/inc/block_switch.h:30-62  ETSI_aacPlusenc/etsiop_fastaacenc/src/block_switch.h:12-44
...

The top 4 matches aac_rom.c and oper_32b.c are all data tables. Since they are both fixed point AAC encoders a lot of the data may be required by both but I'd expect to see some tables wind up in different order or slightly modified to be more useful to a particular implementation. Instead the files have large sections of identical data and identical comments. In fact the first match in both files opens with:

/*
  these tables are used only for counting and 
  are stored in packed format
*/

This is followed by a series of tables with identical names and formatting down to the spaces. Ignoring these tables, bitenc.c, tns*, psy_main.c, and sf_estim.c all have huge portions of identical code and comments.

108  ../base/media/libstagefright/codecs/aacenc/inc/qc_data.h:93-136  ../26411-900/26411-900-ANSI-C_source_code/3GPP_enhanced_aacPlus_etsiopsrc_200907/ETSI_aacPlusenc/etsiop_fastaacenc/src/qc_data.h:73-116
  Word16          staticBitsUsed; /* for verification purposes */
  Word16          dynBitsUsed;    /* for verification purposes */
  Word16          pe;
  Word16          ancBitsUsed;
  Word16          fillBits;
} QC_OUT_ELEMENT;

typedef struct
{
  QC_OUT_CHANNEL  qcChannel[MAX_CHANNELS];
  QC_OUT_ELEMENT  qcElement;
  Word16          totStaticBitsUsed; /* for verification purposes */
  Word16          totDynBitsUsed;    /* for verification purposes */
  Word16          totAncBitsUsed;    /* for verification purposes */
  Word16          totFillBits;
  Word16          alignBits;
  Word16          bitResTot;
  Word16          averageBitsTot;
} QC_OUT;

typedef struct {
  Word32 chBitrate;
  Word16 averageBits;               /* brutto -> look ancillary.h */
  Word16 maxBits;
  Word16 bitResLevel;
  Word16 maxBitResBits;
  Word16 relativeBits;            /* Bits relative to total Bits scaled down by 2 */
} ELEMENT_BITS;

typedef struct
{
  /* this is basically struct QC_INIT */
  Word16 averageBitsTot;
  Word16 maxBitsTot;
  Word16 globStatBits;
  Word16 nChannels;
  Word16 bitResTot;

  Word16 maxBitFac;

  PADDING   padding;

  ELEMENT_BITS  elementBits;
  ADJ_THR_STATE adjThr;
=====================================
  Word16          staticBitsUsed; /* for verification purposes */
  Word16          dynBitsUsed;    /* for verification purposes */
  Word16          pe;
  Word16          ancBitsUsed;
  Word16          fillBits;
} QC_OUT_ELEMENT;

typedef struct
{
  QC_OUT_CHANNEL  qcChannel[MAX_CHANNELS];
  QC_OUT_ELEMENT  qcElement;
  Word16          totStaticBitsUsed; /* for verification purposes */
  Word16          totDynBitsUsed;    /* for verification purposes */
  Word16          totAncBitsUsed;    /* for verification purposes */
  Word16          totFillBits;
  Word16          alignBits;
  Word16          bitResTot;
  Word16          averageBitsTot;
} QC_OUT;

typedef struct {
  Word32 chBitrate;
  Word16 averageBits;               /* brutto -> look ancillary.h */
  Word16 maxBits;
  Word16 bitResLevel;
  Word16 maxBitResBits;
  Word16 relativeBits;            /* Bits relative to total Bits scaled down by 2 */
} ELEMENT_BITS;

typedef struct
{
  /* this is basically struct QC_INIT */
  Word16 averageBitsTot;
  Word16 maxBitsTot;
  Word16 globStatBits;
  Word16 nChannels;
  Word16 bitResTot;

  Word16 maxBitFac;

  PADDING   padding;

  ELEMENT_BITS  elementBits;
  ADJ_THR_STATE adjThr;

Right there we can see large runs of identical declarations down the typedeffed names and comments. As far as actual code goes and not just declations consider these three functions from stagefright's tns.c:

/**
*
* function name: TnsDetect
* description:  Calculate TNS filter and decide on TNS usage 
* returns:  0 if success
*
*/
Word32 TnsDetect(TNS_DATA* tnsData,        /*!< tns data structure (modified) */
                 TNS_CONFIG tC,            /*!< tns config structure */
                 Word32* pScratchTns,      /*!< pointer to scratch space */
                 const Word16 sfbOffset[], /*!< scalefactor size and table */
                 Word32* spectrum,         /*!< spectral data */
                 Word16 subBlockNumber,    /*!< subblock num */
                 Word16 blockType,         /*!< blocktype (long or short) */
                 Word32 * sfbEnergy)       /*!< sfb-wise energy */
{

  Word32  predictionGain;
  Word32  temp;
  Word32* pWork32 = &pScratchTns[subBlockNumber >> 8];
  Word16* pWeightedSpectrum = (Word16 *)&pScratchTns[subBlockNumber >> 8];

                                                                                                    
  if (tC.tnsActive) {
    CalcWeightedSpectrum(spectrum,
                         pWeightedSpectrum,
                         sfbEnergy,
                         sfbOffset,
                         tC.lpcStartLine,
                         tC.lpcStopLine,
                         tC.lpcStartBand,
                         tC.lpcStopBand,
                         pWork32);

    temp = blockType - SHORT_WINDOW;                                                          
    if ( temp != 0 ) {
        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        tC.lpcStopLine - tC.lpcStartLine,
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsLong.subBlockInfo.parcor);


        temp = predictionGain - tC.threshold;                                                  
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 1;                                      
        }
        else {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                      
        }

        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = predictionGain;                      
    }
    else{

        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        tC.lpcStopLine - tC.lpcStartLine,
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor);

        temp = predictionGain - tC.threshold;                                                 
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 1;                     
        }
        else {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                     
        }

        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = predictionGain;     
    }

  }
  else{

    temp = blockType - SHORT_WINDOW;                                                          
    if ( temp != 0 ) {
        tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                        
        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = 0;                                   
    }
    else {
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                       
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = 0;                  
    }
  }

  return(0);
}


/*****************************************************************************
*
* function name: TnsSync
* description: update tns parameter
*
*****************************************************************************/
void TnsSync(TNS_DATA *tnsDataDest,
             const TNS_DATA *tnsDataSrc,
             const TNS_CONFIG tC,
             const Word16 subBlockNumber,
             const Word16 blockType)
{
   TNS_SUBBLOCK_INFO *sbInfoDest;
   const TNS_SUBBLOCK_INFO *sbInfoSrc;
   Word32 i, temp;

   temp =  blockType - SHORT_WINDOW;                                                           
   if ( temp != 0 ) {
      sbInfoDest = &tnsDataDest->dataRaw.tnsLong.subBlockInfo;                                      
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsLong.subBlockInfo;                                       
   }
   else {
      sbInfoDest = &tnsDataDest->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                     
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                      
   }

   if (100*abs_s(sbInfoDest->predictionGain - sbInfoSrc->predictionGain) <
       (3 * sbInfoDest->predictionGain)) {
      sbInfoDest->tnsActive = sbInfoSrc->tnsActive;                                                 
      for ( i=0; i< tC.maxOrder; i++) {
        sbInfoDest->parcor[i] = sbInfoSrc->parcor[i];                                               
      }
   }
}

/*****************************************************************************
*
* function name: TnsEncode
* description: do TNS filtering
* returns:     0 if success
*
*****************************************************************************/
Word16 TnsEncode(TNS_INFO* tnsInfo,     /*!< tns info structure (modified) */
                 TNS_DATA* tnsData,     /*!< tns data structure (modified) */
                 Word16 numOfSfb,       /*!< number of scale factor bands */
                 TNS_CONFIG tC,         /*!< tns config structure */
                 Word16 lowPassLine,    /*!< lowpass line */
                 Word32* spectrum,      /*!< spectral data (modified) */
                 Word16 subBlockNumber, /*!< subblock num */
                 Word16 blockType)      /*!< blocktype (long or short) */
{
  Word32 i;
  Word32 temp_s;
  Word32 temp;
  TNS_SUBBLOCK_INFO *psubBlockInfo;

  temp_s = blockType - SHORT_WINDOW;                                                             
  if ( temp_s != 0) {                                                                               
    psubBlockInfo = &tnsData->dataRaw.tnsLong.subBlockInfo;
 if (psubBlockInfo->tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                       
      return(0);
    }
    else {

      Parcor2Index(psubBlockInfo->parcor,
                   tnsInfo->coef,
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(tnsInfo->coef,
                   psubBlockInfo->parcor,
                   tC.maxOrder,
                   tC.coefRes);

      for (i=tC.maxOrder - 1; i>=0; i--)  {
        temp = psubBlockInfo->parcor[i] - TNS_PARCOR_THRESH;         
        if ( temp > 0 )
          break;
        temp = psubBlockInfo->parcor[i] + TNS_PARCOR_THRESH;         
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = i + 1;                                                    


      tnsInfo->tnsActive[subBlockNumber] = 1;                                                       
      for (i=subBlockNumber+1; itnsActive[i] = 0;                                                                  
      }
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                                
      tnsInfo->length[subBlockNumber] = numOfSfb - tC.tnsStartBand;                                 


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]),
                            (min(tC.tnsStopLine,lowPassLine) - tC.tnsStartLine),
                            psubBlockInfo->parcor,
                            tnsInfo->order[subBlockNumber],
                            &(spectrum[tC.tnsStartLine]));

    }
  }     /* if (blockType!=SHORT_WINDOW) */
  else /*short block*/ {                                                                            
    psubBlockInfo = &tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber];
 if (psubBlockInfo->tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                       
      return(0);
    }
    else {

      Parcor2Index(psubBlockInfo->parcor,
                   &tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(&tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   psubBlockInfo->parcor,
                   tC.maxOrder,
                   tC.coefRes);
      for (i=(tC.maxOrder - 1); i>=0; i--)  {
        temp = psubBlockInfo->parcor[i] - TNS_PARCOR_THRESH;    
         if ( temp > 0 )
          break;

        temp = psubBlockInfo->parcor[i] + TNS_PARCOR_THRESH;    
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = i + 1;                                                    

      tnsInfo->tnsActive[subBlockNumber] = 1;                                                       
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                                
      tnsInfo->length[subBlockNumber] = numOfSfb - tC.tnsStartBand;                             


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]), (tC.tnsStopLine - tC.tnsStartLine),
                 psubBlockInfo->parcor,
                 tnsInfo->order[subBlockNumber],
                 &(spectrum[tC.tnsStartLine]));

    }
  }

  return(0);
}

Here are the same three functions in the same order with nearly identical implementation in the 3GPP code:

/*!
  \brief   Calculate TNS filter and decide on TNS usage 
  \return  zero
*/
Word32 TnsDetect(TNS_DATA* tnsData,        /*!< tns data structure (modified) */
                 TNS_CONFIG tC,            /*!< tns config structure */
                 Word32* pScratchTns,      /*!< pointer to scratch space */
                 const Word16 sfbOffset[], /*!< scalefactor size and table */
                 Word32* spectrum,         /*!< spectral data */
                 Word16 subBlockNumber,    /*!< subblock num */
                 Word16 blockType,         /*!< blocktype (long or short) */
                 Word32 * sfbEnergy)       /*!< sfb-wise energy */
{

  Word16  predictionGain;
  Word16  temp;
  Word32* pWork32 = &pScratchTns[mult(subBlockNumber,FRAME_LEN_SHORT)];
  Word16* pWeightedSpectrum = (Word16 *)&pScratchTns[mult(subBlockNumber,FRAME_LEN_SHORT)];

                                                                                                   test();
  if (tC.tnsActive) {
    CalcWeightedSpectrum(spectrum,
                         pWeightedSpectrum,
                         sfbEnergy,
                         sfbOffset,
                         tC.lpcStartLine,
                         tC.lpcStopLine,
                         tC.lpcStartBand,
                         tC.lpcStopBand,
                         pWork32);

    temp = sub( blockType, SHORT_WINDOW );                                                         test();
    if ( temp != 0 ) {
        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        sub(tC.lpcStopLine,tC.lpcStartLine),
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsLong.subBlockInfo.parcor);


        temp = sub( predictionGain, tC.threshold );                                                test(); 
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 1;                                     move16();
        }
        else {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                     move16();
        }

        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = predictionGain;                     move16();
    }
    else{

        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        sub(tC.lpcStopLine, tC.lpcStartLine),
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor);

        temp = sub( predictionGain, tC.threshold );                                                test();
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 1;                    move16();
        }
        else {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                    move16();
        }

        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = predictionGain;    move16();
    }

  }
  else{

    temp = sub( blockType, SHORT_WINDOW );                                                         test();
    if ( temp != 0 ) {
        tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                       move16();
        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = 0;                                  move16();
    }
    else {
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                      move16();
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = 0;                 move16();
    }
  }

  return(0);
}


/*****************************************************************************
    functionname: TnsSync
    description:

*****************************************************************************/
void TnsSync(TNS_DATA *tnsDataDest,
             const TNS_DATA *tnsDataSrc,
             const TNS_CONFIG tC,
             const Word16 subBlockNumber,
             const Word16 blockType)
{
   TNS_SUBBLOCK_INFO *sbInfoDest;
   const TNS_SUBBLOCK_INFO *sbInfoSrc;
   Word16 i, temp;

   temp = sub( blockType, SHORT_WINDOW );                                                          test();
   if ( temp != 0 ) {
      sbInfoDest = &tnsDataDest->dataRaw.tnsLong.subBlockInfo;                                     move32();
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsLong.subBlockInfo;                                      move32();
   }
   else {
      sbInfoDest = &tnsDataDest->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                    move32();
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                     move32();
   }

   if (100*abs(sbInfoDest->predictionGain - sbInfoSrc->predictionGain) <
       (3 * sbInfoDest->predictionGain)) {
      sbInfoDest->tnsActive = sbInfoSrc->tnsActive;                                                move16();
      for ( i=0; i< tC.maxOrder; i++) {
        sbInfoDest->parcor[i] = sbInfoSrc->parcor[i];                                              move32();
      }
   }
}


/*!
  \brief    do TNS filtering
  \return   zero
*/
Word16 TnsEncode(TNS_INFO* tnsInfo,     /*!< tns info structure (modified) */
                 TNS_DATA* tnsData,     /*!< tns data structure (modified) */
                 Word16 numOfSfb,       /*!< number of scale factor bands */
                 TNS_CONFIG tC,         /*!< tns config structure */
                 Word16 lowPassLine,    /*!< lowpass line */
                 Word32* spectrum,      /*!< spectral data (modified) */
                 Word16 subBlockNumber, /*!< subblock num */
                 Word16 blockType)      /*!< blocktype (long or short) */
{
  Word16 i;
  Word16 temp_s;
  Word32 temp;

  temp_s = sub(blockType,SHORT_WINDOW);                                                            test();
  if ( temp_s != 0) {                                                                              test();
    if (tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                      move16();
      return(0);
    }
    else {

      Parcor2Index(tnsData->dataRaw.tnsLong.subBlockInfo.parcor,
                   tnsInfo->coef,
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(tnsInfo->coef,
                   tnsData->dataRaw.tnsLong.subBlockInfo.parcor,
                   tC.maxOrder,
                   tC.coefRes);

      for (i=sub(tC.maxOrder,1); i>=0; i--)  {
        temp = L_sub( tnsData->dataRaw.tnsLong.subBlockInfo.parcor[i], TNS_PARCOR_THRESH );        test();
        if ( temp > 0 )
          break;
        temp = L_add( tnsData->dataRaw.tnsLong.subBlockInfo.parcor[i], TNS_PARCOR_THRESH );        test();
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = add(i,1);                                                   move16();


      tnsInfo->tnsActive[subBlockNumber] = 1;                                                      move16();
      for (i=add(subBlockNumber,1); itnsActive[i] = 0;                                                                 move16();
      }
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                               move16();
      tnsInfo->length[subBlockNumber] = sub(numOfSfb,tC.tnsStartBand);                             move16();   


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]),
                            sub(S_min(tC.tnsStopLine,lowPassLine),tC.tnsStartLine),
                            tnsData->dataRaw.tnsLong.subBlockInfo.parcor,
                            tnsInfo->order[subBlockNumber],
                            &(spectrum[tC.tnsStartLine]));

    }
  }     /* if (blockType!=SHORT_WINDOW) */
  else /*short block*/ {                                                                           test();
    if (tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                      move16();
      return(0);
    }
    else {

      Parcor2Index(tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor,
                   &tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(&tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor,
                   tC.maxOrder,
                   tC.coefRes);
      for (i=sub(tC.maxOrder,1); i>=0; i--)  {
        temp = L_sub( tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor[i], TNS_PARCOR_THRESH );   test();
         if ( temp > 0 )
          break;

        temp = L_add( tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor[i], TNS_PARCOR_THRESH );   test();
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = add(i,1);                                                   move16();

      tnsInfo->tnsActive[subBlockNumber] = 1;                                                      move16();
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                               move16();
      tnsInfo->length[subBlockNumber] = sub(numOfSfb, tC.tnsStartBand);                            move16();


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]), sub(tC.tnsStopLine,tC.tnsStartLine),
                 tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor,
                 tnsInfo->order[subBlockNumber],
                 &(spectrum[tC.tnsStartLine]));

    }
  }

  return(0);
}

I think at this point you'd have to be a crazy person to not see that the Stagefright AAC encoder is an independent implementation and not derived from 26.411.

Licensing

No where do VisualOn and Android seem to acknowledge that their encoder is derived from the 3GPP reference encoder. The only copyright headers on the Stagefright encoder are:

/*
 ** Copyright 2003-2010, VisualOn, Inc.
 **
 ** Licensed under the Apache License, Version 2.0 (the "License");
 ** you may not use this file except in compliance with the License.
 ** You may obtain a copy of the License at
 **
 **     http://www.apache.org/licenses/LICENSE-2.0
 **
 ** Unless required by applicable law or agreed to in writing, software
 ** distributed under the License is distributed on an "AS IS" BASIS,
 ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ** See the License for the specific language governing permissions and
 ** limitations under the License.
 */

The licensing of the 3GPP encoder is somewhat ambiguous on it's own. There is no mention of copyright in the 3GPP bundle except for:

Copyright Notification
No part may be reproduced except as authorized by written permission.
The copyright and the foregoing restriction extend to reproduction in all media

on the cover of the documentation, and this function in the decoder:

static void display_copyright_message(void)
{
  fprintf(stderr,"\n");
  fprintf(stderr,"*************************************************************\n");
  fprintf(stderr,"* Enhanced aacPlus 3GPP ETSI-op Reference Decoder\n");
  fprintf(stderr,"* Build %s, %s\n", __DATE__, __TIME__);
  fprintf(stderr,"*\n");
  fprintf(stderr,"*************************************************************\n\n");
}

It seems pretty clear to me that 3GPP intends on their reference code to be used but the terms of such use are unknown. The 3GPP code was provided by a company called Coding Technologies. Coding Technologies has since been acquired by Dolby and is called Dolby International. Dolby isn't the most open source friendly company out there.

Some 3GPP source files contain similarity to MPEG reference source files which bear the notice:

/**********************************************************************
 
SC 29 Software Copyright Licencing Disclaimer:

This software module was originally developed by


and edited by


in the course of development of the ISO/IEC 13818-7 and ISO/IEC 14496-3 
standards for reference purposes and its performance may not have been 
optimized. This software module is an implementation of one or more tools as 
specified by the ISO/IEC 13818-7 and ISO/IEC 14496-3 standards.
ISO/IEC gives users free license to this software module or modifications 
thereof for use in products claiming conformance to audiovisual and 
image-coding related ITU Recommendations and/or ISO/IEC International 
Standards. ISO/IEC gives users the same free license to this software module or 
modifications thereof for research purposes and further ISO/IEC standardisation.
Those intending to use this software module in products are advised that its 
use may infringe existing patents. ISO/IEC have no liability for use of this 
software module or modifications thereof. Copyright is not released for 
products that do not conform to audiovisual and image-coding related ITU 
Recommendations and/or ISO/IEC International Standards.
The original developer retains full right to modify and use the code for its 
own purpose, assign or donate the code to a third party and to inhibit third 
parties from using the code for products that do not conform to audiovisual and 
image-coding related ITU Recommendations and/or ISO/IEC International Standards.
This copyright notice must be included in all copies or derivative works.
Copyright (c) ISO/IEC 1997.
**********************************************************************/

"Copyright is not released for products that do not conform to audiovisual and image-coding related ITU Recommendations and/or ISO/IEC International Standards" is generally viewed as the problematic clause in this license. This clause was problematic for LAME before they rewrote the last of the dist10 reference code and has made FAAC undistributable. To put this code under an apache license you would need to track down the copyright holders at the top and ask them to relicense. However Dolby clearly can relicense the code it owns and the code CT owns without asking anyone.

Community

From a community stand point my mind was initially boggled why the documentation is Proprietary & Confidential. Line endings are mixed and there is plenty of trailing whitepace in the source. This sort of thing wouldn't fly in many large opensource projects. Then I realized they simply bought an encoder from one of the many companies selling optimized multimedia reference code. It was probably cheaper than having employees write their own or port the reference code themselves. Still the license on their documentation doesn't give my much faith in their attention to detail on such matters.

Maybe a community effort can build on top of this like opencore-amr did with the AMR code from earlier android releases and we can finally have a decent OSS AAC encoder but I'm not going to hold my breath. Punting code over a wall usually isn't a good strategy for building community.

2010-11-29

In Defense of Lossy+Lossless Hybrid Audio Codecs

There have been many complaints leveled against lossy+lossless hybrid audio codecs lately. A lossy+lossless codec is one that codes a lossy layer, and has an optional enhancement layer to bring the quality up to lossless. Some examples of hybrid lossy+lossess audio formats are HD-AAC (MPEG-4 SLS), mp3HD, and WavPack. WavPack and MPEG-4 SLS can also run in pure lossless mode.

Lossy+lossless audio codecs do have many shortcomings. They require a canonical bit-exact version of the lossy decoder. This doesn't align well with modern popular pure lossy codecs. They (generally, WavPack does not) require a completely different coding scheme for the lossless enhancement layer that has similar complexity to a pure lossless coder on its own. They need to keep the lossless correction layers synchronized to the lossy file either on the file system or with special tools to attach and separate them. The sum filesize of both the lossy and lossless content is generally larger than what you would get using a pure lossless coder. All of these things certainly serve as barriers to adoption and good reasons to avoid lossy+lossless codecs should you not need their particular benefits.

The biggest benefit of lossy+lossless is that if you need to have a lossy copy as well as a lossless copy the sum space consumed will be less than having a lossy copy and a pure lossless copy (assuming you are using a good lossy+lossless codec).

The reason you might want a lossy copy rather than transcoding on the fly is for syncing a portable device where the whole library doesn't fit. In that case the user may swap music in and out on a regular basis. Transcoding 72 hours of music (72 hours * 128 kbps = 3.96 GB) well is a slow process, even on a fast machine.

I do recognize that this is a niche use case and that lossy+lossless adds a lot of complexity to the system. As storage on portables keeps getting larger and larger this advantage seems less and less important.

I think if anybody could have popularized such a codec it would have been Apple during the heyday of the iPod shuffle. iPods use a database with a syncing tool (iTunes) rather than USB mass storage making the single file solution an obvious choice. Remuxing on the fly is much faster than transcoding on the fly. But seeing as they haven't adopted lossy+lossless yet, I doubt they will bother with it at all.

2010-09-20

AAC Channel Model Revisited

Recently I wrote about the AAC channel model. Since then I tested a variety of real world broken files (synthetically recreated) with the reference decoder, faad, WinAMP, iTunes, and the Microsoft Windows 7 Decoder. I've placed the results in the multimedia wiki. I can't seem to get the CSS right to display it here.

Looking at the results of the other decoders I managed to fix bad_concat while removing special hacks for streams like elem_id0 with a new approach in FFmpeg. FFmpeg now treats streams with PCEs according to the letter of the channel configuration in the PCE. For streams lacking a PCE, element instance tags are now ignored and channels are treated purely positionally.

One other thing worth noting is that I couldn't make iTunes produce Parametric Stereo effects on any of the seven signalling streams from CT. Supposedly iTunes has supported PS since version 9.2. Perhaps it is due to the use of pure upsampling SBR.

AAC Bistream Flaws Part 2: AAC-960, Zero Sized Sections, and ADIF

AAC-960

AAC has a 1024 MDCT samples per frame variant and a 960 MDCT samples per frame variant. You are probably using the 1024 samples per frame variant. Most applications including FFmpeg, WinAMP, and iTunes don't support AAC-960. However until very recently the spec required the AAC/HE-AAC/HE-AACv2 profiles to support both 960 and 1024 variants.

Hilariously nobody seemed to notice this issue until April 2008 when MPEG working document M15437: "proposed clarification for ISO/IEC 14496-4" was published.

Dolby explains the problem much better than I can in MPEG working document M15641: "further considerations regarding the 960 transform length in the AAC profile, the HE AAC profile and the HE AAC v2 profile":

The MPEG-4 AAC-LC AOT can operate in different flavours, either with a frame length of 1024 samples or with a frame length of 960 samples. The AAC profile, the High Efficiency AAC profile, and the High Efficiency AAC v2 profile do not impose any restriction on the frame length that is used.
Typically application standards which build on one of these profiles use only one frame length (either 1024 or 960) and do not require support for both. An incomplete list of application standards that build on these profiles can be found in the Annex.
Implementers of decoders that comply with these profiles were not able to test their implementations for conformance until early this year when conformance sequences for the AAC AOT with a frame length of 960 samples have been made available. Such test sequences are still missing for the SBR AOT and the PS AOT.
There are popular players around that do not support the 960 frame length. Examples of these would be winamp, iTunes or the iPod.
The code from 3GPP is a very popular reference implementation that has been used many times as a starting point for a port to embedded devices. This code does not support the 960 frame length. As a result embedded players based on this code lack support for the 960 frame length.
One important part of the MPEG-4 reference software (mp4mcdec) does not support the 960 frame length. Instead it disregards the frameLengthFlag and consequently plays back at wrong speed.
A profile defines a subset of features that gives content producers the certainty that their content will play on all devices that support the profile. On the other hand there is a high amount of devices that do not comply with the profile by not supporting the 960 frame length, which makes using the 960 frame length a bad choice for creating content.

The solution chosen was to partition the AAC/HE-AAC/HE-AACv2 profiles into regular (1024) and 960 variants. This was clearly the best solution. Still the AAC-960 profile doesn't strike me as particularly useful. It makes short windows only 16 samples (2 ms) shorter. It makes frames 128 samples shorter (16 ms at 8 kHz), however there are Low Delay AAC variants that attack this problem in a more comprehensive manner.

Zero Sized Sections

AAC scalefactor bands are sectioned into groups of adjacent bands that use the same Huffman codebooks. The way these sections are written into the bitstream is that a 4 bit field that specifies the literal codebook number is written then the section length (3 bits for short windows and 5 bits for long windows with an all bits set escape mechanism for longer sections). This is repeated until all the scalefactor bands up to the maximum scalefactor band all have codebooks assigned. Sections with a size of zero are allowed. In theory you can have as many zero sized sections as you want (until you hit the frame size cap in bits). Now combine with an all zero buffer (like a get_bits() implementation that returns zeros when the input buffer is exhausted) and you have a recipe for an infinite loop (thanks to the Google Chrome team for finding this).

Now this isn't the end of the world, a buffer exhaustion check can be added to the sectioning loop to fix this. But these zero sized sections serve no constructive purpose. Zero sized sections could be forbidden, better yet the size minus one could be coded, or they could be used for the escape mechanism.

ADIF

While not technically AAC this encapsulation format does appear in the AAC specification as an annex and is AAC specific. It is formatted as single a header upfront followed by raw concatenated variable length AAC frames with no additional framing information. People seem to like to use ADIF when it is not an appropriate choice. It has no error resilience and no seekablity due to the way it is framed. Worse yet the ADIF global header is not compatible with the MPEG-4 global header (Audio Specific Config) when the MPEG-4 global header could be used here. This would be a great way to test Audio Specific Config parsing without a full MPEG-4 demuxer. (I invented my own ADIF like format for this very purpose). At least the reason for the different header format is probably a legacy MPEG-2 issue.