INFO.TXT for MPEG Audio Layer-3 Shareware Code

Version 1.48 - 19.July.94

This text is organized as a kind of Mini- FAQ (Frequently Asked 
Questions). It covers several topics:

1. ISO-MPEG Standard
2. MPEG Audio Codec Family ("Layer 1, 2, 3")
3. Layer-3 Products 

For further comments and questions regarding Layer-3, 
please contact:

layer3@iis.fhg.de

or

Fraunhofer-IIS, Erlangen, Germany, Fax: +49-9131-776-399

For further infos about MPEG, you may also like to contact:

phade@cs.tu-berlin.de


1. ISO-MPEG Standard


Q: What is MPEG, exactly?
   
A: MPEG is the "Moving Picture Experts Group", working under the 
   joint direction of the International Standards Organization (ISO) 
   and the International Electro-Technical Commission (IEC). This 
   group works on standards for the coding of moving pictures and 
   associated audio.
   

Q: What is the status of MPEG's work, then? What about MPEG-1, -2, 
   and so on?
   
A: MPEG approaches the growing need for multimedia standards step-by-
   step. Today, three "phases" are defined:
   
   MPEG-1: "Coding of Moving Pictures and Associated Audio for 
           Digital Storage Media at up to about 1.5 MBit/s"  
 
   Status: International Standard IS-11172, completed in 10.92
   
   MPEG-2: "Generic Coding of Moving Pictures and Associated Audio"
   
   Status: Comittee Draft CD 13818 as found in documents MPEG93 / 
           N601, N602, N603 (11.93)   

   MPEG-3: does no longer exist (has been merged into MPEG-2)
   
   MPEG-4: "Very Low Bitrate Audio-Visual Coding"
   
   Status: Call for Proposals 11.94, Working Draft in 11.96 


Q: MPEG-1 is ready-for-use. How does the standard look like?

A: MPEG-1 consists of 4 parts:

   IS 11172-1: System
   describes synchronization and multiplexing of video and audio

   IS 11172-2: Video
   describes compression of non-interlaced video signals
   
   IS 11172-3: Audio
   describes compression of audio signals 
   
   CD 11172-4: Compliance Testing
   describes procedures for determining the characteristics of coded 
   bitstreams and the decoding porcess and for testing compliance 
   with the requirements stated in the other parts


Q: How do I get the MPEG documents?

A: You may order it from your national standards body.
   E.g., in Germany, please contact:
   DIN-Beuth Verlag, Auslandsnormen
   Mrs. Niehoff, Burggrafenstr. 6, D-10772 Berlin, Germany
   Phone: 030-2601-2757, Fax: 030-2601-1231


2. MPEG Audio Codec Family ("Layer 1, 2, 3")
   

Q: Talking about MPEG audio coding, I heard a lot about "Layer 1, 2 
   and 3". What does it mean, exactly?   

A: MPEG-1, IS 11172-3, describes the compression of audio signals 
   using high performance perceptual coding schemes. It specifies a 
   family of three audio coding schemes, simply called Layer-1,-2,-3, 
   with increasing encoder complexity and performance (sound quality 
   per bitrate). The three codecs are compatible in a hierarchical 
   way, i.e. a Layer-N decoder is able to decode bitstream data 
   encoded in Layer-N and all Layers below N (e.g., a Layer-3 
   decoder may accept Layer-1,-2 and -3, whereas a Layer-2 decoder 
   may accept only Layer-1 and -2.)


Q: So we have a family of three audio coding schemes. What does the 
   MPEG standard define, exactly?
   
A: For each Layer, the standard specifies the bitstream format and 
   the decoder. To allow for future improvements, it does *not* 
   specify the encoder, but an informative chapter gives an example
   for an encoder for each Layer.    


Q: What have the three audio Layers in common?

A: All Layers use the same basic structure. The coding scheme can be  
   described as "perceptual noise shaping" or "perceptual subband / 
   transform coding". 

   The encoder analyzes the spectral components of the audio signal 
   by calculating a filterbank or transform and applies a 
   psychoacoustic model to estimate the just noticeable noise-
   level. In its quantization and coding stage, the encoder tries 
   to allocate the available number of data bits in a way to meet 
   both the bitrate and masking requirements.

   The decoder is much less complex. Its only task is to synthesize 
   an audio signal out of the coded spectral components.
   
   All Layers use the same analysis filterbank (polyphase with 32 
   subbands). Layer-3 adds a MDCT transform to increase the frequency 
   resolution.
   
   All Layers use the same "header information" in their bitstream, 
   to support the hierarchical structure of the standard.
   
   All Layers use a bitstream structure that contains parts that are 
   more sensitive to biterrors ("header", "bit allocation", 
   "scalefactors", "side information") and parts that are less 
   sensitive ("data of spectral components").  

   All Layers may use 32, 44.1 or 48 kHz sampling frequency.
   
   All Layers are allowed to work with similar bitrates:
   Layer-1: from 32 kbps to 448 kbps
   Layer-2: from 32 kbps to 384 kbps
   Layer-3: from 32 kbps to 320 kbps


Q: What are the main differences between the three Layers, from a 
   global view?

A: From Layer-1 to Layer-3,
   complexity increases (mainly true for the encoder),
   overall codec delay increases, and
   performance increases (sound quality per bitrate).


Q: Which Layer should I use for my application?

A: Good Question. Of course, it depends on all your requirements. But 
   as a first approach, you should consider the available bitrate of 
   your application as the Layers have been designed to support 
   certain areas of bitrates most efficiently, i.e. with a minimum 
   drop of sound quality.

   Let us look a little closer at the strong domains of each Layer.
   The ISO target bitrates indicate the main areas of optimization 
   for each Layer.
    
   Layer-1: Its original ISO target bitrate was 192 kbps per audio 
   channel.

   Layer-1 is a simplified version of Layer-2. It is most useful for 
   bitrates around the "high" bitrates around or above 192 kbps. A 
   version of Layer-1 is used as "PASC" with the DCC recorder.

   Layer-2: Its original ISO target bitrate was 128 kbps per audio 
   channel.  
   
   Layer-2 is identical with MUSICAM. It has been designed as trade-
   off between sound quality per bitrate and encoder complexity. It 
   is most useful for bitrates around the "medium" bitrates of 128 or 
   even 96 kbps per audio channel. The DAB (EU 147) proponents have 
   decided to use Layer-2 in the future Digital Audio Broadcasting 
   network.      

   Layer-3: Its original ISO target bitrate was 64 kbps per audio 
   channel.   
   
   Layer-3 merges the best ideas of MUSICAM and ASPEC. It has been 
   designed for best performance at "low" bitrates around 64 kbps or 
   even below. The Layer-3 format specifies a set of advanced features
   that all address one goal: to preserve as much sound quality as 
   possible even at rather low bitrates. Today, Layer-3 is already in 
   use in various telecommunication networks (ISDN, satellite links, 
   and so on) and speech announcement systems. 


Q: So you tell me to consider Layer-3 for my low bitrate 
   applications. I have seen equipment working with Layer-2 for low 
   bitrates, too. Why should I worry about Layer-3, then?
   
A: As I told you before, all Layers may be used for low bitrates. So 
   you may also apply Layer-2 for low bitrates (e.g. 64 kbps per 
   channel). But be careful! 
   
   Using Layer-3 for low bitrates means:
   
   - unrivalled sound quality at 64 kbps per channel or below
   - useful for mono as well as for stereo signals
   - full audio bandwidth at 64 or 56 kbps
  
   Furthermore, if you are willing to accept some limitations, 
   with Layer-3 you can get the same performance as with Layer-2,   
   but at a lower bitrate. 


Q: Tell me more about sound quality. How do you assess that?

A: Today, there is no alternative to expensive listening tests. 
   During the ISO-MPEG-1 process, 3 international listening tests 
   have been performed, with a lot of trained listeners, supervised 
   by Swedish Radio. They took place in 7.90, 3.91 and 11.91. Another 
   international listening test was performed by CCIR, now ITU-R, in 
   92.      
   
   All these tests used the "triple stimulus, hidden reference" 
   method and the CCIR impairment scale to assess the audio quality.
   The listening sequence is "ABC", with A = original, BC = pair of 
   original / coded signal with random sequence, and the listener has 
   to evaluate both B and C with a number between 1.0 and 5.0. The 
   meaning of these values is:
   
   5.0 = transparent (this should be the original signal)
   4.0 = perceptible, but not annoying (first differences noticable)  
   3.0 = slightly annoying   
   2.0 = annoying
   1.0 = very annoying

   With perceptual codecs (like MPEG audio), all traditional 
   parameters (like SNR, THD+N, bandwidth) are especially useless. 
   Fraunhofer-IIS works on objective quality assessment tools, like 
   the NMR meter (Noise-to-Mask-Ratio), too. BTW: If you need more 
   informations about NMR, please contact nmr@iis.fhg.de.


Q: Now that I know how to assess quality, come on, tell me the 
   results of these tests.
   
A: Well, for details you should study one of those AES papers listed 
   below. The main result is that for low bitrates (64 kbps per 
   channel), Layer-2 scored always between 2.1 and 2.6, whereas 
   Layer-3 scored between 3.6 and 3.8. 

   This is a significant increase in sound quality, indeed! 
   Furthermore, the selection process for critical sound material 
   showed that it was rather difficult to find worst-case material 
   for Layer-3 whereas it was not so hard to find such items for 
   Layer-2.  


Q. Someone claimed that some international working group on audio
   coding (TG10?) has concluded and that there was some trouble with
   Layer 3, specifically on male voice in the German language. Is
   that correct?

A. One moment, please. The former CCIR has changed its name into ITU-
   Radiocommunication. In 1992, they founded a test group called TG10-
   2 with the task to prepare the draft for a new recommendation for
   the use of low bitrate audio coding in digital sound broadcasting
   applications.

   This test group concluded its work in 10.93. The draft
   recommendation defines three fields of broadcast applications:

   a) distribution and contribution links
   (20 kHz bandwidth, no audible impairments with up to 5 cascaded
   codecs)

   Recommendation: Layer-2 with 180 kbps per channel (mono or
   one independently coded channel of a stereo-signal); for a single
   distribution link without cascading, Layer-2 with 120 kbps per
   channel

   b) emission
   (20 kHz bandwidth)

   Recommendation: Layer-2 with 128 kbps per channel (mono or
   one independently coded channel of a stereo-signal)

   c) commentary links
   (15 kHz bandwidth)

   Recommendation: Layer-3 with 60 kbps for monophonic and 120 kbps
   for stereophonic signals (applying joint-stereo coding)

   So these are the recommendations. And again, it nicely fits
   into the above mentioned application profile of MPEG audio: with 
   medium bitrates, Layer-2 performs satisfying enough; with really 
   low bitrates, you need Layer-3.

   The recommendations are based on international listening and
   evaluation tests performed mainly in 1992.

   For contribution and distribution, Layer-2 was the only system
   that fulfilled the requirements.

   For emission, the codecs had to score at least 4.0 on the CCIR
   impairment scale, even for the most critical material. At 128 kbps
   per channel, AC-2, Layer-2 and Layer-3 fulfilled this requirement,
   and Layer-2 got the recommendation mainly because of its
   "commonality with the distribution and contribution application".

   Further tests for emission were performed at 192 kbps joint-stereo
   coding. Layer-3 clearly met the requirements, Layer-2 fulfilled
   them only marginally, with doubts remaining during further tests in
   1993. Result: *no* recommendation for 192 kbps joint-stero.

   For commentary, the quality requirements were for speech
   to be equivalent to 14-bit linear PCM, and for music, some
   perceptible impairments were to be tolerated. In the test in 92
   Layer-3 was by far the only codec that fulfilled these
   requirements (e.g. overall monophonic, it scored 3.6 in contrast to
   Layer-2 at 2.05 - and for male German speech, it scored 4.4 in
   contrast to Layer-2 at 2.4). So there was simply no alternative to
   Layer-3.

   Further tests were conducted in 93 using headphones. They showed
   that Layer-3 with monophonic speech (the test item is German male
   voice) at 60 kbps did not fully meet the quality requirements.

   Layer-2 was not included in these tests as its low bitrate
   performance was clearly too poor right from the start. Therefore,
   the listeners had no "lower anchor" during the listening test (the
   codec that always gets the "1" and "2" scores) - a fact that
   certainly influences the absolute scoring. Funny enough, the
   same speech signal has been tested in some previous sessions
   without complaints...

   The ITU decided to recommend Layer-3 and to include a temporary
   footnote that will be removed as soon as an improved Layer-3 codec
   fulfills their requirements completely, i.e. even with that well-
   known critical male German speech item (for many other speech
   items, Layer-3 has no trouble at all).


Q: OK, a Layer-2 codec at low bitrates may sound poor today, but 
   couldn't that be improved in the future? I guess you just told me 
   before that the encoder is not fixed in the standard.
   
A: Good thinking! As the sound quality mainly depends on the encoder 
   implementation, it is true that there is no such thing as a "Layer-
   N"- quality. So we definitely only know the performance of the 
   reference codecs during the international tests. Who knows what 
   will happen in the future? What we do know now, is:
   
   Today, Layer-3 already provides a sound quality that comes very 
   near to CD quality at 64 kbps per channel. Layer-2 is far away 
   from that.
   
   Tomorrow, both Layers may improve. Layer-2 has been designed as a 
   trade-off between quality and complexity, so the bitstream format 
   allows only limited innovations. In contrast, even the current
   reference Layer-3-codec exploits only a small part of the powerful 
   mechanisms inside the Layer-3 bitstream format.  


Q: All in all, you sound as if anybody should use Layer-3 for low 
   bitrates. Why on earth do some vendors still offer only Layer-2 
   equipment for these applications?
   
A: Well, maybe because they started to design and develop their 
   system rather early, e.g. in 1990. As Layer-2 is identical with 
   MUSICAM, it has been available since summer of 90, at latest. In 
   that year, Layer-3 development started and could be successfully 
   finished in spring 92. So, for a certain time, vendors could only 
   exploit the existing part of the new MPEG standard.   
   
   Now the situation has changed. All Layers are available, the 
   standard is completed, and new systems need not limit themselves, 
   but may capitalize on the full features of MPEG audio.


Q: What other topics do I have to keep in mind? Tell me about the 
   complexity of Layer-3.
   
A: Alright. First, we have to separate between decoder and encoder. 

   For a stereo Layer-3-decoder, our real-time implementations use 
   either one DSP32C (AT&T) or one DSP56002 (Mot). For an ASIC, 
   Intermetall (ITT) estimated an overhead of around 30 % chip area 
   for adding the necessary Layer-3 modules to a Layer-2-decoder. So 
   you need not worry too much about decoder complexity.

   For a stereo Layer-3-encoder achieving reference quality, our 
   current real-time implementations use two DSP32C and two DSP56002.
   But again: as more and more horsepower becomes available on one 
   chip, the matter of encoder complexity will decrease.   


Q: And what about the codec delay?

A: Well, the standard gives some figures of the theoretical minimum 
   delay:
   Layer-1: 19 ms (<50 ms)
   Layer-2: 35 ms (100 ms)
   Layer-3: 59 ms (150 ms)
   The practical values are significantly above that. As they depend 
   on the implementation, exact figures are hard to give. So the 
   figures in brackets are just rough thumb values. 
   
   Yes, for some applications, a very short delay is of critical 
   importance. E.g. in a feedback link, a reporter can only talk 
   intelligibly if the overall delay is below around 10 ms. 
   If broadcasters want to apply MPEG audio coding, they have to use 
   "N-1" switches in the studio to overcome this problem (or 
   appropriate echo-cancellers) - or they have to forget about MPEG 
   at all. 
   
   But with most applications, these figures are small enough to 
   present no extra problem. At least, if one can accept a Layer-2 
   delay, one can most likely also accept the higher Layer-3 delay.


Q: Someone told me that, with Layer-3, the codec delay would depend 
   on the actual audio signal, varying over the time. Is this really 
   true? 

A: No. The codec delay does *not* depend on the audio signal.

   With all Layers, the delay depends on the actual implementation 
   used in a specific codec, so different codecs may have different 
   delays. Furthermore, the delay depends on the actual sample rate 
   and bitrate of your codec.   
   
   One of Layer-3's advanced unique features is the optional use of a 
   "bit reservoir". The bit reservoir is a buffer that is controlled 
   by the encoder. In "easy times", the encoder may fill this buffer 
   with data bits that are not required to meet the masking 
   requirements of the actual audio signal. In "hard times", the 
   encoder may use the saved data bits to meet peak bitrate demands.
   The buffer size of the bit reservoir adds to the codec delay. Its 
   value is a constant that is explicitly defined in the encoder. 
   
   So don't get confused. The codec delay does not change with the 
   music - that would really be a silly behaviour for an audio codec.


Q: OK, I am hooked on! Where can I find more technical informations 
   about MPEG audio coding, especially about Layer-3?
   
A: Well, there is a variety of AES papers, e.g.

   K. Brandenburg, G. Stoll, ...: "The ISO/MPEG-Audio Codec: A 
   Generic Standard for Coding of High Quality Digital Audio", 92nd 
   AES, Vienna 1992, pp.3336
   
   E. Eberlein, H. Popp, ...: "Layer-3, a Flexible Coding Standard", 
   94th AES, Berlin 93, pp.3493   
   
   K. Brandenburg, G. Zimmer, ...: "Variable Data-Rate Recording on a 
   PC Using MPEG-Audio Layer-3", 95th AES, New York 93
   
   B. Grill, J. Herre,... : "Improved MPEG-2 Audio Multi-Channel 
   Encoding", 96th AES, Amsterdam 94

   And for further informations, please contact layer3@iis.fhg.de...


3. Layer-3 Products

This is a list of available Layer-3 products - disclosed at 1.1.94. 
For further informations, please contact the companies directly.

3.1. Telecommunication Codecs

a) MusicTAXI Type 3
   The MusicTAXI is a real-time audio codec for the full-duplex 
   transmission of mono or stereo audio signals via ISDN. It supports 
   Layer-2 and -3. 
     Dialog 4 System Engineering GmbH
     Monreposstr. 57
     D-71634 Ludwigsburg, Germany
     Fax                     +49-7141-22667

b) MAGIC Series
   The Multi Audio-System with Groupable Interfaces and Codecs 
   supports Layer-2 and -3 as well as G.722 and G.711. Its 
   transmission procedures comply with H.221, H.242 or G.704. The 
   codec is a universal device useful in ISDN applications as well as 
   in satellite links, LAN or WAN networks or audio memory   
   installations.
     PKI Philips Kommunikations Industrie AG
     Thurn-und-Taxis-Str. 14
     D-90411 Nuernberg, Germany
     Fax             +49-911-526-6315

c) Zephyr Codec
   The Zephyr is a Layer-3 codec for the transmission of mono or 
   stereo audio signals via ISDN, Switch-56 or V.35-networks. It also 
   offers a G.722 feedback link.
     Telos Systems
     2101 Superior Avenue
     Cleveland, OH 44114, USA
     Fax             +1-216-241-4103

3.2. Speech Announcement System

a) DAS VIII HiFi
   This digital speech announcement system for mass transit 
   applications applies Layer-3 to use the ROM based speech memory 
   most efficiently. Moreover, the system offers an unrivalled sound 
   quality at a very competitive price. 
     Meister Electronic GmbH
     Koelner Str. 57
     D-51149 Koeln, Germany
     Fax                 +49-2203-12079

3.3 PC Boards

a) Layer-3 PC Board
   This full-size PC/AT ISA card is a real-time audio processing 
   board. It performs two-channel Layer-3 encoding and decoding, 
   depending on the software configuration. The board offers digital 
   audio interfaces (AES and IEC) and an additional X.21 interface 
   for the reduced data stream. The board is delivered with a library 
   of C drivers and a demo programm. 
     Audio Export Georg Neumann & Co. GmbH
     Badstr. 14
     D-74072 Heilbronn, Germany
     Fax                 +49-7131-68790

b) L3-PC-Card
   This PC-Card supports a real-time Layer-3 audio codec. It offers 
   digital audio interfaces (AES and IEC) and two additional X.21 
   interfaces for one or two reduced data streams. And a decoder-
   only PC card is also available.     
     Dialog 4 System Engineering GmbH     
     Monreposstr. 57
     D-71634 Ludwigsburg, Germany
     Fax                     +49-7141-22667

3.4. ICs

a) ISO-MPEG Decoder Chip MASC 3500 
   This MPEG decoder chip offers the use of the full ISO-MPEG-audio 
   standard, i.e. Layer-1, -2, and -3. The ASIC is based on the MASC 
   DSP family (.8 um) and comes in a small 68 pin PLCC package. 
   First samples will be available in 3.Q.94.
     ITT Intermetall GmbH
     Hans-Bunte-Str. 19
     D-79108 Freiburg, Germany
     Fax                     +49-761-517-880

3.5. Layer-3 Shareware

The layer 3 shareware is copyright Fraunhofer - IIS 1994

a) Shareware encoder/decoder for IBM PCs or Compatibles, version 1.00

 The programms are written for IBM-PCs or Compatibles with MS-Dos. 
 L3ENC.EXE and L3DEC.EXE should work on practically any PC with 386 
 type CPU or better. For the encoder, a 486DX33 or better is recommended.

 On a 486DX2/66 the performance of the software-only decoder is about
 33% of the performance necessary for real time audio processing. 
 The encoder needs about 14 minutes to encode a 1 minute audio data 
 file. These figures assume coding/decoding of stereo audio material 
 at 44.1 kHz/sec.

b) Shareware encoder/decoder for Sun workstations, version 1.00
 
 The encoder takes about 5 minutes for encoding of 1 minute of stereo audio
 data on a SPARC station 10. The decoder works in real time.

Availability of the shareware packages:

-  via anonymous ftp from fhginfo.fhg.de (153.96.1.4)

 You may download our Layer-3 audio software package from the 
 directory /pub/layer3. You will find the following files:
 For IBM PCs:
   l3v100.txt     a short description of the files found in l3v100.zip
   l3v100.zip     encoder, decoder, documentation and a sample bitstream
   l3v100n.txt    a short description of the files found in l3v100n.zip
   l3v100n.zip    encoder, decoder and documentation (no bitstream)  
   bstr100.l3     a sample bitstream encoded with l3enc version 1.00
 For SUN workstations: 
   l3v100.sun.txt     short description of the files found in l3v100.sun.zip
   l3v100.sun.tar.gz  encoder, decoder, documentation and a sample bitstream
   l3v100n.sun.txt    short description of the files found in l3v100n.sun.zip
   l3v100n.sun.tar.gz encoder, decoder and documentation (no bitstream)  
   bstr100.l3       sample bitstream encoded with version 1.00 of the encoder

-  via direct modem download (up to 14.400 bps)
                    
    Modem telephone number  : +49 911 9933662           Name: FHG
    Packet switching network: (0) 262 45 9110 10290     Name: FHG
    (For the telephone number, replace "+" with your appropriate
    international dial prefix, e.g. "011" for the USA.)
    Follow the menus as desired.

-  via shipment of diskette (only including registration)

 You may order a diskette directly from:

 Mailbox System Nuernberg (MSN)
 Hanft & Hartmann
 Innerer Kleinreuther Weg 21
 D-90408 Nuernberg
 Germany

 Please note: MSN will only ship a diskette if they get paid for the 
 registration fee before. The registration fee is 85 Deutsche Mark 
 (about 50 US$) (plus sales tax, if applicable) for one copy of the 
 package. The preferred method of payment is via credit card. Currently, 
 MSN accepts VISA, Master Card / Eurocard / Access credit cards. For
 details see the file REGISTER.TXT found in the shareware packeage.
 
 You may reach MSN also via Internet: msn@iis.fhg.de
                     or via Fax: +49 911 9933661
                     or via BBS: +49 911 9933662        Name: FHG
                     or via X25: 0262 45 9110 10290     Name: FHG
                     (e.g. in USA, please replace "+" with "011")

- via email

 You may get our shareware also by a direct request to msn@iis.fhg.de.
 In this case, the shareware is split into about 30 small uuencoded
 parts...
 
4. End of INFO.TXT