INFO.TXT for MPEG Audio Layer-3 Shareware Code Version 1.48 - 19.July.94 This text is organized as a kind of Mini- FAQ (Frequently Asked Questions). It covers several topics: 1. ISO-MPEG Standard 2. MPEG Audio Codec Family ("Layer 1, 2, 3") 3. Layer-3 Products For further comments and questions regarding Layer-3, please contact: layer3@iis.fhg.de or Fraunhofer-IIS, Erlangen, Germany, Fax: +49-9131-776-399 For further infos about MPEG, you may also like to contact: phade@cs.tu-berlin.de 1. ISO-MPEG Standard Q: What is MPEG, exactly? A: MPEG is the "Moving Picture Experts Group", working under the joint direction of the International Standards Organization (ISO) and the International Electro-Technical Commission (IEC). This group works on standards for the coding of moving pictures and associated audio. Q: What is the status of MPEG's work, then? What about MPEG-1, -2, and so on? A: MPEG approaches the growing need for multimedia standards step-by- step. Today, three "phases" are defined: MPEG-1: "Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 MBit/s" Status: International Standard IS-11172, completed in 10.92 MPEG-2: "Generic Coding of Moving Pictures and Associated Audio" Status: Comittee Draft CD 13818 as found in documents MPEG93 / N601, N602, N603 (11.93) MPEG-3: does no longer exist (has been merged into MPEG-2) MPEG-4: "Very Low Bitrate Audio-Visual Coding" Status: Call for Proposals 11.94, Working Draft in 11.96 Q: MPEG-1 is ready-for-use. How does the standard look like? A: MPEG-1 consists of 4 parts: IS 11172-1: System describes synchronization and multiplexing of video and audio IS 11172-2: Video describes compression of non-interlaced video signals IS 11172-3: Audio describes compression of audio signals CD 11172-4: Compliance Testing describes procedures for determining the characteristics of coded bitstreams and the decoding porcess and for testing compliance with the requirements stated in the other parts Q: How do I get the MPEG documents? A: You may order it from your national standards body. E.g., in Germany, please contact: DIN-Beuth Verlag, Auslandsnormen Mrs. Niehoff, Burggrafenstr. 6, D-10772 Berlin, Germany Phone: 030-2601-2757, Fax: 030-2601-1231 2. MPEG Audio Codec Family ("Layer 1, 2, 3") Q: Talking about MPEG audio coding, I heard a lot about "Layer 1, 2 and 3". What does it mean, exactly? A: MPEG-1, IS 11172-3, describes the compression of audio signals using high performance perceptual coding schemes. It specifies a family of three audio coding schemes, simply called Layer-1,-2,-3, with increasing encoder complexity and performance (sound quality per bitrate). The three codecs are compatible in a hierarchical way, i.e. a Layer-N decoder is able to decode bitstream data encoded in Layer-N and all Layers below N (e.g., a Layer-3 decoder may accept Layer-1,-2 and -3, whereas a Layer-2 decoder may accept only Layer-1 and -2.) Q: So we have a family of three audio coding schemes. What does the MPEG standard define, exactly? A: For each Layer, the standard specifies the bitstream format and the decoder. To allow for future improvements, it does *not* specify the encoder, but an informative chapter gives an example for an encoder for each Layer. Q: What have the three audio Layers in common? A: All Layers use the same basic structure. The coding scheme can be described as "perceptual noise shaping" or "perceptual subband / transform coding". The encoder analyzes the spectral components of the audio signal by calculating a filterbank or transform and applies a psychoacoustic model to estimate the just noticeable noise- level. In its quantization and coding stage, the encoder tries to allocate the available number of data bits in a way to meet both the bitrate and masking requirements. The decoder is much less complex. Its only task is to synthesize an audio signal out of the coded spectral components. All Layers use the same analysis filterbank (polyphase with 32 subbands). Layer-3 adds a MDCT transform to increase the frequency resolution. All Layers use the same "header information" in their bitstream, to support the hierarchical structure of the standard. All Layers use a bitstream structure that contains parts that are more sensitive to biterrors ("header", "bit allocation", "scalefactors", "side information") and parts that are less sensitive ("data of spectral components"). All Layers may use 32, 44.1 or 48 kHz sampling frequency. All Layers are allowed to work with similar bitrates: Layer-1: from 32 kbps to 448 kbps Layer-2: from 32 kbps to 384 kbps Layer-3: from 32 kbps to 320 kbps Q: What are the main differences between the three Layers, from a global view? A: From Layer-1 to Layer-3, complexity increases (mainly true for the encoder), overall codec delay increases, and performance increases (sound quality per bitrate). Q: Which Layer should I use for my application? A: Good Question. Of course, it depends on all your requirements. But as a first approach, you should consider the available bitrate of your application as the Layers have been designed to support certain areas of bitrates most efficiently, i.e. with a minimum drop of sound quality. Let us look a little closer at the strong domains of each Layer. The ISO target bitrates indicate the main areas of optimization for each Layer. Layer-1: Its original ISO target bitrate was 192 kbps per audio channel. Layer-1 is a simplified version of Layer-2. It is most useful for bitrates around the "high" bitrates around or above 192 kbps. A version of Layer-1 is used as "PASC" with the DCC recorder. Layer-2: Its original ISO target bitrate was 128 kbps per audio channel. Layer-2 is identical with MUSICAM. It has been designed as trade- off between sound quality per bitrate and encoder complexity. It is most useful for bitrates around the "medium" bitrates of 128 or even 96 kbps per audio channel. The DAB (EU 147) proponents have decided to use Layer-2 in the future Digital Audio Broadcasting network. Layer-3: Its original ISO target bitrate was 64 kbps per audio channel. Layer-3 merges the best ideas of MUSICAM and ASPEC. It has been designed for best performance at "low" bitrates around 64 kbps or even below. The Layer-3 format specifies a set of advanced features that all address one goal: to preserve as much sound quality as possible even at rather low bitrates. Today, Layer-3 is already in use in various telecommunication networks (ISDN, satellite links, and so on) and speech announcement systems. Q: So you tell me to consider Layer-3 for my low bitrate applications. I have seen equipment working with Layer-2 for low bitrates, too. Why should I worry about Layer-3, then? A: As I told you before, all Layers may be used for low bitrates. So you may also apply Layer-2 for low bitrates (e.g. 64 kbps per channel). But be careful! Using Layer-3 for low bitrates means: - unrivalled sound quality at 64 kbps per channel or below - useful for mono as well as for stereo signals - full audio bandwidth at 64 or 56 kbps Furthermore, if you are willing to accept some limitations, with Layer-3 you can get the same performance as with Layer-2, but at a lower bitrate. Q: Tell me more about sound quality. How do you assess that? A: Today, there is no alternative to expensive listening tests. During the ISO-MPEG-1 process, 3 international listening tests have been performed, with a lot of trained listeners, supervised by Swedish Radio. They took place in 7.90, 3.91 and 11.91. Another international listening test was performed by CCIR, now ITU-R, in 92. All these tests used the "triple stimulus, hidden reference" method and the CCIR impairment scale to assess the audio quality. The listening sequence is "ABC", with A = original, BC = pair of original / coded signal with random sequence, and the listener has to evaluate both B and C with a number between 1.0 and 5.0. The meaning of these values is: 5.0 = transparent (this should be the original signal) 4.0 = perceptible, but not annoying (first differences noticable) 3.0 = slightly annoying 2.0 = annoying 1.0 = very annoying With perceptual codecs (like MPEG audio), all traditional parameters (like SNR, THD+N, bandwidth) are especially useless. Fraunhofer-IIS works on objective quality assessment tools, like the NMR meter (Noise-to-Mask-Ratio), too. BTW: If you need more informations about NMR, please contact nmr@iis.fhg.de. Q: Now that I know how to assess quality, come on, tell me the results of these tests. A: Well, for details you should study one of those AES papers listed below. The main result is that for low bitrates (64 kbps per channel), Layer-2 scored always between 2.1 and 2.6, whereas Layer-3 scored between 3.6 and 3.8. This is a significant increase in sound quality, indeed! Furthermore, the selection process for critical sound material showed that it was rather difficult to find worst-case material for Layer-3 whereas it was not so hard to find such items for Layer-2. Q. Someone claimed that some international working group on audio coding (TG10?) has concluded and that there was some trouble with Layer 3, specifically on male voice in the German language. Is that correct? A. One moment, please. The former CCIR has changed its name into ITU- Radiocommunication. In 1992, they founded a test group called TG10- 2 with the task to prepare the draft for a new recommendation for the use of low bitrate audio coding in digital sound broadcasting applications. This test group concluded its work in 10.93. The draft recommendation defines three fields of broadcast applications: a) distribution and contribution links (20 kHz bandwidth, no audible impairments with up to 5 cascaded codecs) Recommendation: Layer-2 with 180 kbps per channel (mono or one independently coded channel of a stereo-signal); for a single distribution link without cascading, Layer-2 with 120 kbps per channel b) emission (20 kHz bandwidth) Recommendation: Layer-2 with 128 kbps per channel (mono or one independently coded channel of a stereo-signal) c) commentary links (15 kHz bandwidth) Recommendation: Layer-3 with 60 kbps for monophonic and 120 kbps for stereophonic signals (applying joint-stereo coding) So these are the recommendations. And again, it nicely fits into the above mentioned application profile of MPEG audio: with medium bitrates, Layer-2 performs satisfying enough; with really low bitrates, you need Layer-3. The recommendations are based on international listening and evaluation tests performed mainly in 1992. For contribution and distribution, Layer-2 was the only system that fulfilled the requirements. For emission, the codecs had to score at least 4.0 on the CCIR impairment scale, even for the most critical material. At 128 kbps per channel, AC-2, Layer-2 and Layer-3 fulfilled this requirement, and Layer-2 got the recommendation mainly because of its "commonality with the distribution and contribution application". Further tests for emission were performed at 192 kbps joint-stereo coding. Layer-3 clearly met the requirements, Layer-2 fulfilled them only marginally, with doubts remaining during further tests in 1993. Result: *no* recommendation for 192 kbps joint-stero. For commentary, the quality requirements were for speech to be equivalent to 14-bit linear PCM, and for music, some perceptible impairments were to be tolerated. In the test in 92 Layer-3 was by far the only codec that fulfilled these requirements (e.g. overall monophonic, it scored 3.6 in contrast to Layer-2 at 2.05 - and for male German speech, it scored 4.4 in contrast to Layer-2 at 2.4). So there was simply no alternative to Layer-3. Further tests were conducted in 93 using headphones. They showed that Layer-3 with monophonic speech (the test item is German male voice) at 60 kbps did not fully meet the quality requirements. Layer-2 was not included in these tests as its low bitrate performance was clearly too poor right from the start. Therefore, the listeners had no "lower anchor" during the listening test (the codec that always gets the "1" and "2" scores) - a fact that certainly influences the absolute scoring. Funny enough, the same speech signal has been tested in some previous sessions without complaints... The ITU decided to recommend Layer-3 and to include a temporary footnote that will be removed as soon as an improved Layer-3 codec fulfills their requirements completely, i.e. even with that well- known critical male German speech item (for many other speech items, Layer-3 has no trouble at all). Q: OK, a Layer-2 codec at low bitrates may sound poor today, but couldn't that be improved in the future? I guess you just told me before that the encoder is not fixed in the standard. A: Good thinking! As the sound quality mainly depends on the encoder implementation, it is true that there is no such thing as a "Layer- N"- quality. So we definitely only know the performance of the reference codecs during the international tests. Who knows what will happen in the future? What we do know now, is: Today, Layer-3 already provides a sound quality that comes very near to CD quality at 64 kbps per channel. Layer-2 is far away from that. Tomorrow, both Layers may improve. Layer-2 has been designed as a trade-off between quality and complexity, so the bitstream format allows only limited innovations. In contrast, even the current reference Layer-3-codec exploits only a small part of the powerful mechanisms inside the Layer-3 bitstream format. Q: All in all, you sound as if anybody should use Layer-3 for low bitrates. Why on earth do some vendors still offer only Layer-2 equipment for these applications? A: Well, maybe because they started to design and develop their system rather early, e.g. in 1990. As Layer-2 is identical with MUSICAM, it has been available since summer of 90, at latest. In that year, Layer-3 development started and could be successfully finished in spring 92. So, for a certain time, vendors could only exploit the existing part of the new MPEG standard. Now the situation has changed. All Layers are available, the standard is completed, and new systems need not limit themselves, but may capitalize on the full features of MPEG audio. Q: What other topics do I have to keep in mind? Tell me about the complexity of Layer-3. A: Alright. First, we have to separate between decoder and encoder. For a stereo Layer-3-decoder, our real-time implementations use either one DSP32C (AT&T) or one DSP56002 (Mot). For an ASIC, Intermetall (ITT) estimated an overhead of around 30 % chip area for adding the necessary Layer-3 modules to a Layer-2-decoder. So you need not worry too much about decoder complexity. For a stereo Layer-3-encoder achieving reference quality, our current real-time implementations use two DSP32C and two DSP56002. But again: as more and more horsepower becomes available on one chip, the matter of encoder complexity will decrease. Q: And what about the codec delay? A: Well, the standard gives some figures of the theoretical minimum delay: Layer-1: 19 ms (<50 ms) Layer-2: 35 ms (100 ms) Layer-3: 59 ms (150 ms) The practical values are significantly above that. As they depend on the implementation, exact figures are hard to give. So the figures in brackets are just rough thumb values. Yes, for some applications, a very short delay is of critical importance. E.g. in a feedback link, a reporter can only talk intelligibly if the overall delay is below around 10 ms. If broadcasters want to apply MPEG audio coding, they have to use "N-1" switches in the studio to overcome this problem (or appropriate echo-cancellers) - or they have to forget about MPEG at all. But with most applications, these figures are small enough to present no extra problem. At least, if one can accept a Layer-2 delay, one can most likely also accept the higher Layer-3 delay. Q: Someone told me that, with Layer-3, the codec delay would depend on the actual audio signal, varying over the time. Is this really true? A: No. The codec delay does *not* depend on the audio signal. With all Layers, the delay depends on the actual implementation used in a specific codec, so different codecs may have different delays. Furthermore, the delay depends on the actual sample rate and bitrate of your codec. One of Layer-3's advanced unique features is the optional use of a "bit reservoir". The bit reservoir is a buffer that is controlled by the encoder. In "easy times", the encoder may fill this buffer with data bits that are not required to meet the masking requirements of the actual audio signal. In "hard times", the encoder may use the saved data bits to meet peak bitrate demands. The buffer size of the bit reservoir adds to the codec delay. Its value is a constant that is explicitly defined in the encoder. So don't get confused. The codec delay does not change with the music - that would really be a silly behaviour for an audio codec. Q: OK, I am hooked on! Where can I find more technical informations about MPEG audio coding, especially about Layer-3? A: Well, there is a variety of AES papers, e.g. K. Brandenburg, G. Stoll, ...: "The ISO/MPEG-Audio Codec: A Generic Standard for Coding of High Quality Digital Audio", 92nd AES, Vienna 1992, pp.3336 E. Eberlein, H. Popp, ...: "Layer-3, a Flexible Coding Standard", 94th AES, Berlin 93, pp.3493 K. Brandenburg, G. Zimmer, ...: "Variable Data-Rate Recording on a PC Using MPEG-Audio Layer-3", 95th AES, New York 93 B. Grill, J. Herre,... : "Improved MPEG-2 Audio Multi-Channel Encoding", 96th AES, Amsterdam 94 And for further informations, please contact layer3@iis.fhg.de... 3. Layer-3 Products This is a list of available Layer-3 products - disclosed at 1.1.94. For further informations, please contact the companies directly. 3.1. Telecommunication Codecs a) MusicTAXI Type 3 The MusicTAXI is a real-time audio codec for the full-duplex transmission of mono or stereo audio signals via ISDN. It supports Layer-2 and -3. Dialog 4 System Engineering GmbH Monreposstr. 57 D-71634 Ludwigsburg, Germany Fax +49-7141-22667 b) MAGIC Series The Multi Audio-System with Groupable Interfaces and Codecs supports Layer-2 and -3 as well as G.722 and G.711. Its transmission procedures comply with H.221, H.242 or G.704. The codec is a universal device useful in ISDN applications as well as in satellite links, LAN or WAN networks or audio memory installations. PKI Philips Kommunikations Industrie AG Thurn-und-Taxis-Str. 14 D-90411 Nuernberg, Germany Fax +49-911-526-6315 c) Zephyr Codec The Zephyr is a Layer-3 codec for the transmission of mono or stereo audio signals via ISDN, Switch-56 or V.35-networks. It also offers a G.722 feedback link. Telos Systems 2101 Superior Avenue Cleveland, OH 44114, USA Fax +1-216-241-4103 3.2. Speech Announcement System a) DAS VIII HiFi This digital speech announcement system for mass transit applications applies Layer-3 to use the ROM based speech memory most efficiently. Moreover, the system offers an unrivalled sound quality at a very competitive price. Meister Electronic GmbH Koelner Str. 57 D-51149 Koeln, Germany Fax +49-2203-12079 3.3 PC Boards a) Layer-3 PC Board This full-size PC/AT ISA card is a real-time audio processing board. It performs two-channel Layer-3 encoding and decoding, depending on the software configuration. The board offers digital audio interfaces (AES and IEC) and an additional X.21 interface for the reduced data stream. The board is delivered with a library of C drivers and a demo programm. Audio Export Georg Neumann & Co. GmbH Badstr. 14 D-74072 Heilbronn, Germany Fax +49-7131-68790 b) L3-PC-Card This PC-Card supports a real-time Layer-3 audio codec. It offers digital audio interfaces (AES and IEC) and two additional X.21 interfaces for one or two reduced data streams. And a decoder- only PC card is also available. Dialog 4 System Engineering GmbH Monreposstr. 57 D-71634 Ludwigsburg, Germany Fax +49-7141-22667 3.4. ICs a) ISO-MPEG Decoder Chip MASC 3500 This MPEG decoder chip offers the use of the full ISO-MPEG-audio standard, i.e. Layer-1, -2, and -3. The ASIC is based on the MASC DSP family (.8 um) and comes in a small 68 pin PLCC package. First samples will be available in 3.Q.94. ITT Intermetall GmbH Hans-Bunte-Str. 19 D-79108 Freiburg, Germany Fax +49-761-517-880 3.5. Layer-3 Shareware The layer 3 shareware is copyright Fraunhofer - IIS 1994 a) Shareware encoder/decoder for IBM PCs or Compatibles, version 1.00 The programms are written for IBM-PCs or Compatibles with MS-Dos. L3ENC.EXE and L3DEC.EXE should work on practically any PC with 386 type CPU or better. For the encoder, a 486DX33 or better is recommended. On a 486DX2/66 the performance of the software-only decoder is about 33% of the performance necessary for real time audio processing. The encoder needs about 14 minutes to encode a 1 minute audio data file. These figures assume coding/decoding of stereo audio material at 44.1 kHz/sec. b) Shareware encoder/decoder for Sun workstations, version 1.00 The encoder takes about 5 minutes for encoding of 1 minute of stereo audio data on a SPARC station 10. The decoder works in real time. Availability of the shareware packages: - via anonymous ftp from fhginfo.fhg.de (153.96.1.4) You may download our Layer-3 audio software package from the directory /pub/layer3. You will find the following files: For IBM PCs: l3v100.txt a short description of the files found in l3v100.zip l3v100.zip encoder, decoder, documentation and a sample bitstream l3v100n.txt a short description of the files found in l3v100n.zip l3v100n.zip encoder, decoder and documentation (no bitstream) bstr100.l3 a sample bitstream encoded with l3enc version 1.00 For SUN workstations: l3v100.sun.txt short description of the files found in l3v100.sun.zip l3v100.sun.tar.gz encoder, decoder, documentation and a sample bitstream l3v100n.sun.txt short description of the files found in l3v100n.sun.zip l3v100n.sun.tar.gz encoder, decoder and documentation (no bitstream) bstr100.l3 sample bitstream encoded with version 1.00 of the encoder - via direct modem download (up to 14.400 bps) Modem telephone number : +49 911 9933662 Name: FHG Packet switching network: (0) 262 45 9110 10290 Name: FHG (For the telephone number, replace "+" with your appropriate international dial prefix, e.g. "011" for the USA.) Follow the menus as desired. - via shipment of diskette (only including registration) You may order a diskette directly from: Mailbox System Nuernberg (MSN) Hanft & Hartmann Innerer Kleinreuther Weg 21 D-90408 Nuernberg Germany Please note: MSN will only ship a diskette if they get paid for the registration fee before. The registration fee is 85 Deutsche Mark (about 50 US$) (plus sales tax, if applicable) for one copy of the package. The preferred method of payment is via credit card. Currently, MSN accepts VISA, Master Card / Eurocard / Access credit cards. For details see the file REGISTER.TXT found in the shareware packeage. You may reach MSN also via Internet: msn@iis.fhg.de or via Fax: +49 911 9933661 or via BBS: +49 911 9933662 Name: FHG or via X25: 0262 45 9110 10290 Name: FHG (e.g. in USA, please replace "+" with "011") - via email You may get our shareware also by a direct request to msn@iis.fhg.de. In this case, the shareware is split into about 30 small uuencoded parts... 4. End of INFO.TXT