2010-12-19

Android's Stagefright AAC Encoder or Reference Code by Any Other Name Would Smell as Sweet

Android and AAC

As readers of this blog know, AAC encoding holds a spot close to my heart. I'm responsible for getting the worst AAC encoder of all time included into FFmpeg and was never successful at fixing it.

The recent Android 2.3 "Gingerbread" release contains an Apache licensed AAC encoder as part of Android's new stagefright media library. I was excited that the free software community may finally have a good free AAC encoder. Unfortunately the stagefright AAC encoder seems to be little more than an optimized version of 3GPP's 26.411 fixed point reference encoder.

Origins

Looking at the source tree there is an immediate resemblance between the 3GPP code and the stagefright code:

$ ls 26411-900/26411-900-ANSI-C_source_code/3GPP_enhanced_aacPlus_etsiopsrc_200907/ETSI_aacPlusenc/etsiop_fastaacenc/src
aac_ram.c       bitenc.c        interface.c          psy_main.c   stat_bits.h
aac_ram.h       bitenc.h        interface.h          psy_main.h   stprepro.c
aac_rom.c       block_switch.c  line_pe.c            qc_data.h    stprepro.h
aac_rom.h       block_switch.h  line_pe.h            qc_main.c    tns.c
aacenc.c        channel_map.c   ms_stereo.c          qc_main.h    tns.h
adj_thr.c       channel_map.h   ms_stereo.h          quantize.c   tns_func.h
adj_thr.h       dyn_bits.c      pre_echo_control.c   quantize.h   tns_param.c
adj_thr_data.h  dyn_bits.h      pre_echo_control.h   sf_estim.c   tns_param.h
band_nrg.c      fft.c           psy_configuration.c  sf_estim.h   transform.c
band_nrg.h      fft.h           psy_configuration.h  spreading.c  transform.h
bit_cnt.c       grp_data.c      psy_const.h          spreading.h
bit_cnt.h       grp_data.h      psy_data.h           stat_bits.c

$ ls base/media/libstagefright/codecs/aacenc/src/
aac_rom.c      bitbuffer.c     line_pe.c            quantize.c
aacenc.c       bitenc.c        memalign.c           sf_estim.c
aacenc_core.c  block_switch.c  ms_stereo.c          spreading.c
adj_thr.c      channel_map.c   pre_echo_control.c   stat_bits.c
asm/           dyn_bits.c      psy_configuration.c  tns.c
band_nrg.c     grp_data.c      psy_main.c           transform.c
bit_cnt.c      interface.c     qc_main.c

As you can see almost all the files in stagefright aacenc are named identically to files in 26.403-v9.0.0. Still both are fixed point aac encoders and thus it is reasonable to expect similar names. Let's use Warren Toomy's Ctcompare tool to compare content.

./ctcompare -r 3gpp.ctf stagefright.ctf
5473  libstagefright/codecs/aacenc/src/aac_rom.c:1507-2262  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:701-1459
2270  libstagefright/codecs/aacenc/src/aac_rom.c:1044-1338  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:293-587
523  libstagefright/codecs/aacenc/basic_op/oper_32b.c:270-345  ETSI_aacPlusenc/etsiop_ffrlib/src/transcendent_enc.c:13-85
523  libstagefright/codecs/aacenc/basic_op/oper_32b.c:270-345  ETSI_aacPlusdec/etsiop_ffrlib/src/transcendent_enc.c:13-85
491  libstagefright/codecs/aacenc/src/bitenc.c:398-540  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:407-553
279  libstagefright/codecs/aacenc/src/aac_rom.c:1368-1403  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:593-628
243  libstagefright/codecs/aacenc/inc/aac_rom.h:65-95  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.h:38-69
218  libstagefright/codecs/aacenc/inc/bit_cnt.h:28-106  ETSI_aacPlusenc/etsiop_fastaacenc/src/bit_cnt.h:9-87
210  libstagefright/codecs/aacenc/src/aac_rom.c:2260-2347  ETSI_aacPlusenc/etsiop_fastaacenc/src/aac_rom.c:1572-1659
199  libstagefright/codecs/aacenc/basic_op/oper_32b.c:42-179  ETSI_aacPlusenc/etsioplib/oper_32b.c:45-182
199  libstagefright/codecs/aacenc/basic_op/oper_32b.c:42-179  ETSI_aacPlusdec/etsioplib/oper_32b.c:45-182
196  libstagefright/codecs/aacenc/src/sf_estim.c:831-881  ETSI_aacPlusenc/etsiop_fastaacenc/src/sf_estim.c:768-809
194  libstagefright/codecs/aacenc/inc/tns.h:31-108  ETSI_aacPlusenc/etsiop_fastaacenc/src/tns.h:12-89
190  libstagefright/codecs/aacenc/src/psy_main.c:424-451  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:387-414
165  libstagefright/codecs/aacenc/inc/qc_data.h:23-92  ETSI_aacPlusenc/etsiop_fastaacenc/src/qc_data.h:4-73
157  libstagefright/codecs/aacenc/inc/tns_func.h:31-75  ETSI_aacPlusenc/etsiop_fastaacenc/src/tns_func.h:10-54
157  libstagefright/codecs/aacenc/src/tns.c:64-96  ETSI_aacPlusenc/etsiop_fastaacenc/src/tns.c:34-65
156  libstagefright/codecs/aacenc/inc/dyn_bits.h:23-80  ETSI_aacPlusenc/etsiop_fastaacenc/src/dyn_bits.h:4-61
147  libstagefright/codecs/aacenc/src/dyn_bits.c:305-351  ETSI_aacPlusenc/etsiop_fastaacenc/src/dyn_bits.c:319-363
145  libstagefright/codecs/aacenc/src/psy_main.c:607-656  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:559-600
139  libstagefright/codecs/aacenc/src/psy_main.c:397-414  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:359-376
131  libstagefright/codecs/aacenc/src/psy_main.c:381-397  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:342-358
130  libstagefright/codecs/aacenc/src/aac_rom.c:1393-1401  ETSI_aacPlusdec/etsiop_aacdec/src/aac_rom.c:928-936
128  libstagefright/codecs/aacenc/src/block_switch.c:81-110  ETSI_aacPlusenc/etsiop_fastaacenc/src/block_switch.c:53-75
126  libstagefright/codecs/aacenc/src/psy_main.c:42-78  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:24-60
123  libstagefright/codecs/aacenc/src/stat_bits.c:28-55  ETSI_aacPlusenc/etsiop_fastaacenc/src/stat_bits.c:10-32
122  libstagefright/codecs/aacenc/inc/psy_const.h:28-76  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_const.h:9-55
119  libstagefright/codecs/aacenc/src/bitenc.c:639-664  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:632-657
119  libstagefright/codecs/aacenc/src/bitenc.c:337-396  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:339-403
118  libstagefright/codecs/aacenc/src/block_switch.c:33-77  ETSI_aacPlusenc/etsiop_fastaacenc/src/block_switch.c:14-49
113  libstagefright/codecs/aacenc/inc/psy_data.h:23-66  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_data.h:4-47
110  libstagefright/codecs/aacenc/src/bitenc.c:620-639  ETSI_aacPlusenc/etsiop_fastaacenc/src/bitenc.c:613-632
109  libstagefright/codecs/aacenc/src/psy_main.c:382-393  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:361-372
109  libstagefright/codecs/aacenc/src/psy_main.c:399-410  ETSI_aacPlusenc/etsiop_fastaacenc/src/psy_main.c:343-354
108  libstagefright/codecs/aacenc/inc/qc_data.h:93-136  ETSI_aacPlusenc/etsiop_fastaacenc/src/qc_data.h:73-116
105  libstagefright/codecs/aacenc/basic_op/typedef.h:23-63  ETSI_aacPlusdec/etsioplib/typedef.h:15-55
105  libstagefright/codecs/aacenc/basic_op/typedef.h:23-63  ETSI_aacPlusenc/etsioplib/typedef.h:15-55
104  libstagefright/codecs/aacenc/inc/adj_thr_data.h:30-69  ETSI_aacPlusenc/etsiop_fastaacenc/src/adj_thr_data.h:13-52
103  libstagefright/codecs/aacenc/src/aac_rom.c:1489-1491  ETSI_aacPlusdec/etsiop_aacdec/src/aac_rom.c:72-79
102  libstagefright/codecs/aacenc/inc/block_switch.h:30-62  ETSI_aacPlusenc/etsiop_fastaacenc/src/block_switch.h:12-44
...
The top 4 matches aac_rom.c and oper_32b.c are all data tables. Since they are both fixed point AAC encoders a lot of the data may be required by both but I'd expect to see some tables wind up in different order or slightly modified to be more useful to a particular implementation. Instead the files have large sections of identical data and identical comments. In fact the first match in both files opens with:
/*
  these tables are used only for counting and 
  are stored in packed format
*/

This is followed by a series of tables with identical names and formatting down to the spaces. Ignoring these tables, bitenc.c, tns*, psy_main.c, and sf_estim.c all have huge portions of identical code and comments.

108  ../base/media/libstagefright/codecs/aacenc/inc/qc_data.h:93-136  ../26411-900/26411-900-ANSI-C_source_code/3GPP_enhanced_aacPlus_etsiopsrc_200907/ETSI_aacPlusenc/etsiop_fastaacenc/src/qc_data.h:73-116
  Word16          staticBitsUsed; /* for verification purposes */
  Word16          dynBitsUsed;    /* for verification purposes */
  Word16          pe;
  Word16          ancBitsUsed;
  Word16          fillBits;
} QC_OUT_ELEMENT;

typedef struct
{
  QC_OUT_CHANNEL  qcChannel[MAX_CHANNELS];
  QC_OUT_ELEMENT  qcElement;
  Word16          totStaticBitsUsed; /* for verification purposes */
  Word16          totDynBitsUsed;    /* for verification purposes */
  Word16          totAncBitsUsed;    /* for verification purposes */
  Word16          totFillBits;
  Word16          alignBits;
  Word16          bitResTot;
  Word16          averageBitsTot;
} QC_OUT;

typedef struct {
  Word32 chBitrate;
  Word16 averageBits;               /* brutto -> look ancillary.h */
  Word16 maxBits;
  Word16 bitResLevel;
  Word16 maxBitResBits;
  Word16 relativeBits;            /* Bits relative to total Bits scaled down by 2 */
} ELEMENT_BITS;

typedef struct
{
  /* this is basically struct QC_INIT */
  Word16 averageBitsTot;
  Word16 maxBitsTot;
  Word16 globStatBits;
  Word16 nChannels;
  Word16 bitResTot;

  Word16 maxBitFac;

  PADDING   padding;

  ELEMENT_BITS  elementBits;
  ADJ_THR_STATE adjThr;
=====================================
  Word16          staticBitsUsed; /* for verification purposes */
  Word16          dynBitsUsed;    /* for verification purposes */
  Word16          pe;
  Word16          ancBitsUsed;
  Word16          fillBits;
} QC_OUT_ELEMENT;

typedef struct
{
  QC_OUT_CHANNEL  qcChannel[MAX_CHANNELS];
  QC_OUT_ELEMENT  qcElement;
  Word16          totStaticBitsUsed; /* for verification purposes */
  Word16          totDynBitsUsed;    /* for verification purposes */
  Word16          totAncBitsUsed;    /* for verification purposes */
  Word16          totFillBits;
  Word16          alignBits;
  Word16          bitResTot;
  Word16          averageBitsTot;
} QC_OUT;

typedef struct {
  Word32 chBitrate;
  Word16 averageBits;               /* brutto -> look ancillary.h */
  Word16 maxBits;
  Word16 bitResLevel;
  Word16 maxBitResBits;
  Word16 relativeBits;            /* Bits relative to total Bits scaled down by 2 */
} ELEMENT_BITS;

typedef struct
{
  /* this is basically struct QC_INIT */
  Word16 averageBitsTot;
  Word16 maxBitsTot;
  Word16 globStatBits;
  Word16 nChannels;
  Word16 bitResTot;

  Word16 maxBitFac;

  PADDING   padding;

  ELEMENT_BITS  elementBits;
  ADJ_THR_STATE adjThr;

Right there we can see large runs of identical declarations down the typedeffed names and comments. As far as actual code goes and not just declations consider these three functions from stagefright's tns.c:

/**
*
* function name: TnsDetect
* description:  Calculate TNS filter and decide on TNS usage 
* returns:  0 if success
*
*/
Word32 TnsDetect(TNS_DATA* tnsData,        /*!< tns data structure (modified) */
                 TNS_CONFIG tC,            /*!< tns config structure */
                 Word32* pScratchTns,      /*!< pointer to scratch space */
                 const Word16 sfbOffset[], /*!< scalefactor size and table */
                 Word32* spectrum,         /*!< spectral data */
                 Word16 subBlockNumber,    /*!< subblock num */
                 Word16 blockType,         /*!< blocktype (long or short) */
                 Word32 * sfbEnergy)       /*!< sfb-wise energy */
{

  Word32  predictionGain;
  Word32  temp;
  Word32* pWork32 = &pScratchTns[subBlockNumber >> 8];
  Word16* pWeightedSpectrum = (Word16 *)&pScratchTns[subBlockNumber >> 8];

                                                                                                    
  if (tC.tnsActive) {
    CalcWeightedSpectrum(spectrum,
                         pWeightedSpectrum,
                         sfbEnergy,
                         sfbOffset,
                         tC.lpcStartLine,
                         tC.lpcStopLine,
                         tC.lpcStartBand,
                         tC.lpcStopBand,
                         pWork32);

    temp = blockType - SHORT_WINDOW;                                                          
    if ( temp != 0 ) {
        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        tC.lpcStopLine - tC.lpcStartLine,
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsLong.subBlockInfo.parcor);


        temp = predictionGain - tC.threshold;                                                  
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 1;                                      
        }
        else {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                      
        }

        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = predictionGain;                      
    }
    else{

        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        tC.lpcStopLine - tC.lpcStartLine,
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor);

        temp = predictionGain - tC.threshold;                                                 
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 1;                     
        }
        else {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                     
        }

        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = predictionGain;     
    }

  }
  else{

    temp = blockType - SHORT_WINDOW;                                                          
    if ( temp != 0 ) {
        tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                        
        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = 0;                                   
    }
    else {
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                       
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = 0;                  
    }
  }

  return(0);
}


/*****************************************************************************
*
* function name: TnsSync
* description: update tns parameter
*
*****************************************************************************/
void TnsSync(TNS_DATA *tnsDataDest,
             const TNS_DATA *tnsDataSrc,
             const TNS_CONFIG tC,
             const Word16 subBlockNumber,
             const Word16 blockType)
{
   TNS_SUBBLOCK_INFO *sbInfoDest;
   const TNS_SUBBLOCK_INFO *sbInfoSrc;
   Word32 i, temp;

   temp =  blockType - SHORT_WINDOW;                                                           
   if ( temp != 0 ) {
      sbInfoDest = &tnsDataDest->dataRaw.tnsLong.subBlockInfo;                                      
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsLong.subBlockInfo;                                       
   }
   else {
      sbInfoDest = &tnsDataDest->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                     
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                      
   }

   if (100*abs_s(sbInfoDest->predictionGain - sbInfoSrc->predictionGain) <
       (3 * sbInfoDest->predictionGain)) {
      sbInfoDest->tnsActive = sbInfoSrc->tnsActive;                                                 
      for ( i=0; i< tC.maxOrder; i++) {
        sbInfoDest->parcor[i] = sbInfoSrc->parcor[i];                                               
      }
   }
}

/*****************************************************************************
*
* function name: TnsEncode
* description: do TNS filtering
* returns:     0 if success
*
*****************************************************************************/
Word16 TnsEncode(TNS_INFO* tnsInfo,     /*!< tns info structure (modified) */
                 TNS_DATA* tnsData,     /*!< tns data structure (modified) */
                 Word16 numOfSfb,       /*!< number of scale factor bands */
                 TNS_CONFIG tC,         /*!< tns config structure */
                 Word16 lowPassLine,    /*!< lowpass line */
                 Word32* spectrum,      /*!< spectral data (modified) */
                 Word16 subBlockNumber, /*!< subblock num */
                 Word16 blockType)      /*!< blocktype (long or short) */
{
  Word32 i;
  Word32 temp_s;
  Word32 temp;
  TNS_SUBBLOCK_INFO *psubBlockInfo;

  temp_s = blockType - SHORT_WINDOW;                                                             
  if ( temp_s != 0) {                                                                               
    psubBlockInfo = &tnsData->dataRaw.tnsLong.subBlockInfo;
 if (psubBlockInfo->tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                       
      return(0);
    }
    else {

      Parcor2Index(psubBlockInfo->parcor,
                   tnsInfo->coef,
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(tnsInfo->coef,
                   psubBlockInfo->parcor,
                   tC.maxOrder,
                   tC.coefRes);

      for (i=tC.maxOrder - 1; i>=0; i--)  {
        temp = psubBlockInfo->parcor[i] - TNS_PARCOR_THRESH;         
        if ( temp > 0 )
          break;
        temp = psubBlockInfo->parcor[i] + TNS_PARCOR_THRESH;         
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = i + 1;                                                    


      tnsInfo->tnsActive[subBlockNumber] = 1;                                                       
      for (i=subBlockNumber+1; itnsActive[i] = 0;                                                                  
      }
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                                
      tnsInfo->length[subBlockNumber] = numOfSfb - tC.tnsStartBand;                                 


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]),
                            (min(tC.tnsStopLine,lowPassLine) - tC.tnsStartLine),
                            psubBlockInfo->parcor,
                            tnsInfo->order[subBlockNumber],
                            &(spectrum[tC.tnsStartLine]));

    }
  }     /* if (blockType!=SHORT_WINDOW) */
  else /*short block*/ {                                                                            
    psubBlockInfo = &tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber];
 if (psubBlockInfo->tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                       
      return(0);
    }
    else {

      Parcor2Index(psubBlockInfo->parcor,
                   &tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(&tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   psubBlockInfo->parcor,
                   tC.maxOrder,
                   tC.coefRes);
      for (i=(tC.maxOrder - 1); i>=0; i--)  {
        temp = psubBlockInfo->parcor[i] - TNS_PARCOR_THRESH;    
         if ( temp > 0 )
          break;

        temp = psubBlockInfo->parcor[i] + TNS_PARCOR_THRESH;    
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = i + 1;                                                    

      tnsInfo->tnsActive[subBlockNumber] = 1;                                                       
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                                
      tnsInfo->length[subBlockNumber] = numOfSfb - tC.tnsStartBand;                             


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]), (tC.tnsStopLine - tC.tnsStartLine),
                 psubBlockInfo->parcor,
                 tnsInfo->order[subBlockNumber],
                 &(spectrum[tC.tnsStartLine]));

    }
  }

  return(0);
}

Here are the same three functions in the same order with nearly identical implementation in the 3GPP code:

/*!
  \brief   Calculate TNS filter and decide on TNS usage 
  \return  zero
*/
Word32 TnsDetect(TNS_DATA* tnsData,        /*!< tns data structure (modified) */
                 TNS_CONFIG tC,            /*!< tns config structure */
                 Word32* pScratchTns,      /*!< pointer to scratch space */
                 const Word16 sfbOffset[], /*!< scalefactor size and table */
                 Word32* spectrum,         /*!< spectral data */
                 Word16 subBlockNumber,    /*!< subblock num */
                 Word16 blockType,         /*!< blocktype (long or short) */
                 Word32 * sfbEnergy)       /*!< sfb-wise energy */
{

  Word16  predictionGain;
  Word16  temp;
  Word32* pWork32 = &pScratchTns[mult(subBlockNumber,FRAME_LEN_SHORT)];
  Word16* pWeightedSpectrum = (Word16 *)&pScratchTns[mult(subBlockNumber,FRAME_LEN_SHORT)];

                                                                                                   test();
  if (tC.tnsActive) {
    CalcWeightedSpectrum(spectrum,
                         pWeightedSpectrum,
                         sfbEnergy,
                         sfbOffset,
                         tC.lpcStartLine,
                         tC.lpcStopLine,
                         tC.lpcStartBand,
                         tC.lpcStopBand,
                         pWork32);

    temp = sub( blockType, SHORT_WINDOW );                                                         test();
    if ( temp != 0 ) {
        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        sub(tC.lpcStopLine,tC.lpcStartLine),
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsLong.subBlockInfo.parcor);


        temp = sub( predictionGain, tC.threshold );                                                test(); 
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 1;                                     move16();
        }
        else {
          tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                     move16();
        }

        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = predictionGain;                     move16();
    }
    else{

        predictionGain = CalcTnsFilter( &pWeightedSpectrum[tC.lpcStartLine],
                                        tC.acfWindow,
                                        sub(tC.lpcStopLine, tC.lpcStartLine),
                                        tC.maxOrder,
                                        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor);

        temp = sub( predictionGain, tC.threshold );                                                test();
        if ( temp > 0 ) {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 1;                    move16();
        }
        else {
          tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                    move16();
        }

        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = predictionGain;    move16();
    }

  }
  else{

    temp = sub( blockType, SHORT_WINDOW );                                                         test();
    if ( temp != 0 ) {
        tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive = 0;                                       move16();
        tnsData->dataRaw.tnsLong.subBlockInfo.predictionGain = 0;                                  move16();
    }
    else {
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive = 0;                      move16();
        tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].predictionGain = 0;                 move16();
    }
  }

  return(0);
}


/*****************************************************************************
    functionname: TnsSync
    description:

*****************************************************************************/
void TnsSync(TNS_DATA *tnsDataDest,
             const TNS_DATA *tnsDataSrc,
             const TNS_CONFIG tC,
             const Word16 subBlockNumber,
             const Word16 blockType)
{
   TNS_SUBBLOCK_INFO *sbInfoDest;
   const TNS_SUBBLOCK_INFO *sbInfoSrc;
   Word16 i, temp;

   temp = sub( blockType, SHORT_WINDOW );                                                          test();
   if ( temp != 0 ) {
      sbInfoDest = &tnsDataDest->dataRaw.tnsLong.subBlockInfo;                                     move32();
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsLong.subBlockInfo;                                      move32();
   }
   else {
      sbInfoDest = &tnsDataDest->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                    move32();
      sbInfoSrc  = &tnsDataSrc->dataRaw.tnsShort.subBlockInfo[subBlockNumber];                     move32();
   }

   if (100*abs(sbInfoDest->predictionGain - sbInfoSrc->predictionGain) <
       (3 * sbInfoDest->predictionGain)) {
      sbInfoDest->tnsActive = sbInfoSrc->tnsActive;                                                move16();
      for ( i=0; i< tC.maxOrder; i++) {
        sbInfoDest->parcor[i] = sbInfoSrc->parcor[i];                                              move32();
      }
   }
}


/*!
  \brief    do TNS filtering
  \return   zero
*/
Word16 TnsEncode(TNS_INFO* tnsInfo,     /*!< tns info structure (modified) */
                 TNS_DATA* tnsData,     /*!< tns data structure (modified) */
                 Word16 numOfSfb,       /*!< number of scale factor bands */
                 TNS_CONFIG tC,         /*!< tns config structure */
                 Word16 lowPassLine,    /*!< lowpass line */
                 Word32* spectrum,      /*!< spectral data (modified) */
                 Word16 subBlockNumber, /*!< subblock num */
                 Word16 blockType)      /*!< blocktype (long or short) */
{
  Word16 i;
  Word16 temp_s;
  Word32 temp;

  temp_s = sub(blockType,SHORT_WINDOW);                                                            test();
  if ( temp_s != 0) {                                                                              test();
    if (tnsData->dataRaw.tnsLong.subBlockInfo.tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                      move16();
      return(0);
    }
    else {

      Parcor2Index(tnsData->dataRaw.tnsLong.subBlockInfo.parcor,
                   tnsInfo->coef,
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(tnsInfo->coef,
                   tnsData->dataRaw.tnsLong.subBlockInfo.parcor,
                   tC.maxOrder,
                   tC.coefRes);

      for (i=sub(tC.maxOrder,1); i>=0; i--)  {
        temp = L_sub( tnsData->dataRaw.tnsLong.subBlockInfo.parcor[i], TNS_PARCOR_THRESH );        test();
        if ( temp > 0 )
          break;
        temp = L_add( tnsData->dataRaw.tnsLong.subBlockInfo.parcor[i], TNS_PARCOR_THRESH );        test();
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = add(i,1);                                                   move16();


      tnsInfo->tnsActive[subBlockNumber] = 1;                                                      move16();
      for (i=add(subBlockNumber,1); itnsActive[i] = 0;                                                                 move16();
      }
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                               move16();
      tnsInfo->length[subBlockNumber] = sub(numOfSfb,tC.tnsStartBand);                             move16();   


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]),
                            sub(S_min(tC.tnsStopLine,lowPassLine),tC.tnsStartLine),
                            tnsData->dataRaw.tnsLong.subBlockInfo.parcor,
                            tnsInfo->order[subBlockNumber],
                            &(spectrum[tC.tnsStartLine]));

    }
  }     /* if (blockType!=SHORT_WINDOW) */
  else /*short block*/ {                                                                           test();
    if (tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].tnsActive == 0) {
      tnsInfo->tnsActive[subBlockNumber] = 0;                                                      move16();
      return(0);
    }
    else {

      Parcor2Index(tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor,
                   &tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   tC.maxOrder,
                   tC.coefRes);

      Index2Parcor(&tnsInfo->coef[subBlockNumber*TNS_MAX_ORDER_SHORT],
                   tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor,
                   tC.maxOrder,
                   tC.coefRes);
      for (i=sub(tC.maxOrder,1); i>=0; i--)  {
        temp = L_sub( tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor[i], TNS_PARCOR_THRESH );   test();
         if ( temp > 0 )
          break;

        temp = L_add( tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor[i], TNS_PARCOR_THRESH );   test();
        if ( temp < 0 )
          break;
      }
      tnsInfo->order[subBlockNumber] = add(i,1);                                                   move16();

      tnsInfo->tnsActive[subBlockNumber] = 1;                                                      move16();
      tnsInfo->coefRes[subBlockNumber] = tC.coefRes;                                               move16();
      tnsInfo->length[subBlockNumber] = sub(numOfSfb, tC.tnsStartBand);                            move16();


      AnalysisFilterLattice(&(spectrum[tC.tnsStartLine]), sub(tC.tnsStopLine,tC.tnsStartLine),
                 tnsData->dataRaw.tnsShort.subBlockInfo[subBlockNumber].parcor,
                 tnsInfo->order[subBlockNumber],
                 &(spectrum[tC.tnsStartLine]));

    }
  }

  return(0);
}

I think at this point you'd have to be a crazy person to not see that the Stagefright AAC encoder is an independent implementation and not derived from 26.411.

Licensing

No where do VisualOn and Android seem to acknowledge that their encoder is derived from the 3GPP reference encoder. The only copyright headers on the Stagefright encoder are:

/*
 ** Copyright 2003-2010, VisualOn, Inc.
 **
 ** Licensed under the Apache License, Version 2.0 (the "License");
 ** you may not use this file except in compliance with the License.
 ** You may obtain a copy of the License at
 **
 **     http://www.apache.org/licenses/LICENSE-2.0
 **
 ** Unless required by applicable law or agreed to in writing, software
 ** distributed under the License is distributed on an "AS IS" BASIS,
 ** WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 ** See the License for the specific language governing permissions and
 ** limitations under the License.
 */

The licensing of the 3GPP encoder is somewhat ambiguous on it's own. There is no mention of copyright in the 3GPP bundle except for:

Copyright Notification
No part may be reproduced except as authorized by written permission.
The copyright and the foregoing restriction extend to reproduction in all media
on the cover of the documentation, and this function in the decoder:
static void display_copyright_message(void)
{
  fprintf(stderr,"\n");
  fprintf(stderr,"*************************************************************\n");
  fprintf(stderr,"* Enhanced aacPlus 3GPP ETSI-op Reference Decoder\n");
  fprintf(stderr,"* Build %s, %s\n", __DATE__, __TIME__);
  fprintf(stderr,"*\n");
  fprintf(stderr,"*************************************************************\n\n");
}

It seems pretty clear to me that 3GPP intends on their reference code to be used but the terms of such use are unknown. The 3GPP code was provided by a company called Coding Technologies. Coding Technologies has since been acquired by Dolby and is called Dolby International. Dolby isn't the most open source friendly company out there.

Some 3GPP source files contain similarity to MPEG reference source files which bear the notice:

/**********************************************************************
 
SC 29 Software Copyright Licencing Disclaimer:

This software module was originally developed by


and edited by


in the course of development of the ISO/IEC 13818-7 and ISO/IEC 14496-3 
standards for reference purposes and its performance may not have been 
optimized. This software module is an implementation of one or more tools as 
specified by the ISO/IEC 13818-7 and ISO/IEC 14496-3 standards.
ISO/IEC gives users free license to this software module or modifications 
thereof for use in products claiming conformance to audiovisual and 
image-coding related ITU Recommendations and/or ISO/IEC International 
Standards. ISO/IEC gives users the same free license to this software module or 
modifications thereof for research purposes and further ISO/IEC standardisation.
Those intending to use this software module in products are advised that its 
use may infringe existing patents. ISO/IEC have no liability for use of this 
software module or modifications thereof. Copyright is not released for 
products that do not conform to audiovisual and image-coding related ITU 
Recommendations and/or ISO/IEC International Standards.
The original developer retains full right to modify and use the code for its 
own purpose, assign or donate the code to a third party and to inhibit third 
parties from using the code for products that do not conform to audiovisual and 
image-coding related ITU Recommendations and/or ISO/IEC International Standards.
This copyright notice must be included in all copies or derivative works.
Copyright (c) ISO/IEC 1997.
**********************************************************************/

"Copyright is not released for products that do not conform to audiovisual and image-coding related ITU Recommendations and/or ISO/IEC International Standards" is generally viewed as the problematic clause in this license. This clause was problematic for LAME before they rewrote the last of the dist10 reference code and has made FAAC undistributable. To put this code under an apache license you would need to track down the copyright holders at the top and ask them to relicense. However Dolby clearly can relicense the code it owns and the code CT owns without asking anyone.

Community

From a community stand point my mind was initially boggled why the documentation is Proprietary & Confidential. Line endings are mixed and there is plenty of trailing whitepace in the source. This sort of thing wouldn't fly in many large opensource projects. Then I realized they simply bought an encoder from one of the many companies selling optimized multimedia reference code. It was probably cheaper than having employees write their own or port the reference code themselves. Still the license on their documentation doesn't give my much faith in their attention to detail on such matters.

Maybe a community effort can build on top of this like opencore-amr did with the AMR code from earlier android releases and we can finally have a decent OSS AAC encoder but I'm not going to hold my breath. Punting code over a wall usually isn't a good strategy for building community.

3 comments:

  1. Unfortunately this is starting to make its way into several Linux distributions, even ones who should know better.

    Debian:

    http://packages.debian.org/sid/libvo-amrwbenc0

    Ubuntu:

    http://packages.ubuntu.com/oneiric/libvo-amrwbenc0

    Linux Mint obviously since it just links to Ubuntu's repositories. Likely to be many Debian and Ubuntu based distros carrying this.

    Gentoo:

    http://gentoo-portage.com/media-libs/vo-aacenc

    Part of MPlayer:

    http://ffmpeg.mplayerhq.hu/doxygen/trunk/libvo-amrwbenc_8c.html

    Through FFMPEG:

    http://www.ffmpeg.org/doxygen/trunk/libvo-amrwbenc_8c.html

    So what do you have to do? Just plagiarize some source code and slap whatever license you want on it? That's not how copyright law works.

    I guarantee Fedora would never do something this dumb.

    (I also checked Trisquel and they don't have it, probably because AAC is patent encumbered even if the library was free software.)

    ReplyDelete
  2. Hello,

    I've found out that, when I used a -cutoff [Hz] options in an ffmpeg native aac encoder, the average sound quality gets better in blind listening tests.
    ex: ffmpeg -i infile -strict experimental -cutoff 17000 -acodec aac -ab 128k outfile

    I wanted to help you improving your encoder, but the ffmpeg source code remains incomprehensible for me because of my lack of skill.
    Could you please make the cutoff feature default, just like most other aac encoders?

    My recommendation is:
    cutoff[kHz] = min(7+ab/12,15+ab/64,ar/2).
    ab is audio bitrate in kbps, ar is sampling frequency in kHz(such as 44.1, 48).

    ReplyDelete
  3. Hello,

    I've found out that the ffmpeg's native aac encoder gets better when I used a -cutoff option in blind tests.
    example: ffmpeg -i infile -strict experimental -cutoff 17000 -acodec aac -ab 128k outfile

    I want to help you improve your aac encoder, but the whole source code remains incomprehensible for me, because of my lack of skill.

    Could you make the cutoff feature(LPF) default, just like most other aac encoders?

    My recommendation is:
    cutoff[kHz] = min(7+ab/12,15+ab/64,ar/2);
    ar is sampling frequency in kHz(like 44.1,48).
    ab is audio bitrate in kbps(64,128,192...).

    Your encoder currently(in rev.35051) wastes bits in frequencies even above 20kHz, where most humans cannot hear.

    Delete my comment if this is redundant.

    ReplyDelete