home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Internet MPEG Audio Archive
/
IMAA.mdf
/
util
/
dos
/
l3v100n
/
info.txt
< prev
next >
Wrap
Text File
|
1994-07-18
|
26KB
|
635 lines
INFO.TXT for MPEG Audio Layer-3 Shareware Code
Version 1.48 - 19.July.94
This text is organized as a kind of Mini- FAQ (Frequently Asked
Questions). It covers several topics:
1. ISO-MPEG Standard
2. MPEG Audio Codec Family ("Layer 1, 2, 3")
3. Layer-3 Products
For further comments and questions regarding Layer-3,
please contact:
layer3@iis.fhg.de
or
Fraunhofer-IIS, Erlangen, Germany, Fax: +49-9131-776-399
For further infos about MPEG, you may also like to contact:
phade@cs.tu-berlin.de
1. ISO-MPEG Standard
Q: What is MPEG, exactly?
A: MPEG is the "Moving Picture Experts Group", working under the
joint direction of the International Standards Organization (ISO)
and the International Electro-Technical Commission (IEC). This
group works on standards for the coding of moving pictures and
associated audio.
Q: What is the status of MPEG's work, then? What about MPEG-1, -2,
and so on?
A: MPEG approaches the growing need for multimedia standards step-by-
step. Today, three "phases" are defined:
MPEG-1: "Coding of Moving Pictures and Associated Audio for
Digital Storage Media at up to about 1.5 MBit/s"
Status: International Standard IS-11172, completed in 10.92
MPEG-2: "Generic Coding of Moving Pictures and Associated Audio"
Status: Comittee Draft CD 13818 as found in documents MPEG93 /
N601, N602, N603 (11.93)
MPEG-3: does no longer exist (has been merged into MPEG-2)
MPEG-4: "Very Low Bitrate Audio-Visual Coding"
Status: Call for Proposals 11.94, Working Draft in 11.96
Q: MPEG-1 is ready-for-use. How does the standard look like?
A: MPEG-1 consists of 4 parts:
IS 11172-1: System
describes synchronization and multiplexing of video and audio
IS 11172-2: Video
describes compression of non-interlaced video signals
IS 11172-3: Audio
describes compression of audio signals
CD 11172-4: Compliance Testing
describes procedures for determining the characteristics of coded
bitstreams and the decoding porcess and for testing compliance
with the requirements stated in the other parts
Q: How do I get the MPEG documents?
A: You may order it from your national standards body.
E.g., in Germany, please contact:
DIN-Beuth Verlag, Auslandsnormen
Mrs. Niehoff, Burggrafenstr. 6, D-10772 Berlin, Germany
Phone: 030-2601-2757, Fax: 030-2601-1231
2. MPEG Audio Codec Family ("Layer 1, 2, 3")
Q: Talking about MPEG audio coding, I heard a lot about "Layer 1, 2
and 3". What does it mean, exactly?
A: MPEG-1, IS 11172-3, describes the compression of audio signals
using high performance perceptual coding schemes. It specifies a
family of three audio coding schemes, simply called Layer-1,-2,-3,
with increasing encoder complexity and performance (sound quality
per bitrate). The three codecs are compatible in a hierarchical
way, i.e. a Layer-N decoder is able to decode bitstream data
encoded in Layer-N and all Layers below N (e.g., a Layer-3
decoder may accept Layer-1,-2 and -3, whereas a Layer-2 decoder
may accept only Layer-1 and -2.)
Q: So we have a family of three audio coding schemes. What does the
MPEG standard define, exactly?
A: For each Layer, the standard specifies the bitstream format and
the decoder. To allow for future improvements, it does *not*
specify the encoder, but an informative chapter gives an example
for an encoder for each Layer.
Q: What have the three audio Layers in common?
A: All Layers use the same basic structure. The coding scheme can be
described as "perceptual noise shaping" or "perceptual subband /
transform coding".
The encoder analyzes the spectral components of the audio signal
by calculating a filterbank or transform and applies a
psychoacoustic model to estimate the just noticeable noise-
level. In its quantization and coding stage, the encoder tries
to allocate the available number of data bits in a way to meet
both the bitrate and masking requirements.
The decoder is much less complex. Its only task is to synthesize
an audio signal out of the coded spectral components.
All Layers use the same analysis filterbank (polyphase with 32
subbands). Layer-3 adds a MDCT transform to increase the frequency
resolution.
All Layers use the same "header information" in their bitstream,
to support the hierarchical structure of the standard.
All Layers use a bitstream structure that contains parts that are
more sensitive to biterrors ("header", "bit allocation",
"scalefactors", "side information") and parts that are less
sensitive ("data of spectral components").
All Layers may use 32, 44.1 or 48 kHz sampling frequency.
All Layers are allowed to work with similar bitrates:
Layer-1: from 32 kbps to 448 kbps
Layer-2: from 32 kbps to 384 kbps
Layer-3: from 32 kbps to 320 kbps
Q: What are the main differences between the three Layers, from a
global view?
A: From Layer-1 to Layer-3,
complexity increases (mainly true for the encoder),
overall codec delay increases, and
performance increases (sound quality per bitrate).
Q: Which Layer should I use for my application?
A: Good Question. Of course, it depends on all your requirements. But
as a first approach, you should consider the available bitrate of
your application as the Layers have been designed to support
certain areas of bitrates most efficiently, i.e. with a minimum
drop of sound quality.
Let us look a little closer at the strong domains of each Layer.
The ISO target bitrates indicate the main areas of optimization
for each Layer.
Layer-1: Its original ISO target bitrate was 192 kbps per audio
channel.
Layer-1 is a simplified version of Layer-2. It is most useful for
bitrates around the "high" bitrates around or above 192 kbps. A
version of Layer-1 is used as "PASC" with the DCC recorder.
Layer-2: Its original ISO target bitrate was 128 kbps per audio
channel.
Layer-2 is identical with MUSICAM. It has been designed as trade-
off between sound quality per bitrate and encoder complexity. It
is most useful for bitrates around the "medium" bitrates of 128 or
even 96 kbps per audio channel. The DAB (EU 147) proponents have
decided to use Layer-2 in the future Digital Audio Broadcasting
network.
Layer-3: Its original ISO target bitrate was 64 kbps per audio
channel.
Layer-3 merges the best ideas of MUSICAM and ASPEC. It has been
designed for best performance at "low" bitrates around 64 kbps or
even below. The Layer-3 format specifies a set of advanced features
that all address one goal: to preserve as much sound quality as
possible even at rather low bitrates. Today, Layer-3 is already in
use in various telecommunication networks (ISDN, satellite links,
and so on) and speech announcement systems.
Q: So you tell me to consider Layer-3 for my low bitrate
applications. I have seen equipment working with Layer-2 for low
bitrates, too. Why should I worry about Layer-3, then?
A: As I told you before, all Layers may be used for low bitrates. So
you may also apply Layer-2 for low bitrates (e.g. 64 kbps per
channel). But be careful!
Using Layer-3 for low bitrates means:
- unrivalled sound quality at 64 kbps per channel or below
- useful for mono as well as for stereo signals
- full audio bandwidth at 64 or 56 kbps
Furthermore, if you are willing to accept some limitations,
with Layer-3 you can get the same performance as with Layer-2,
but at a lower bitrate.
Q: Tell me more about sound quality. How do you assess that?
A: Today, there is no alternative to expensive listening tests.
During the ISO-MPEG-1 process, 3 international listening tests
have been performed, with a lot of trained listeners, supervised
by Swedish Radio. They took place in 7.90, 3.91 and 11.91. Another
international listening test was performed by CCIR, now ITU-R, in
92.
All these tests used the "triple stimulus, hidden reference"
method and the CCIR impairment scale to assess the audio quality.
The listening sequence is "ABC", with A = original, BC = pair of
original / coded signal with random sequence, and the listener has
to evaluate both B and C with a number between 1.0 and 5.0. The
meaning of these values is:
5.0 = transparent (this should be the original signal)
4.0 = perceptible, but not annoying (first differences noticable)
3.0 = slightly annoying
2.0 = annoying
1.0 = very annoying
With perceptual codecs (like MPEG audio), all traditional
parameters (like SNR, THD+N, bandwidth) are especially useless.
Fraunhofer-IIS works on objective quality assessment tools, like
the NMR meter (Noise-to-Mask-Ratio), too. BTW: If you need more
informations about NMR, please contact nmr@iis.fhg.de.
Q: Now that I know how to assess quality, come on, tell me the
results of these tests.
A: Well, for details you should study one of those AES papers listed
below. The main result is that for low bitrates (64 kbps per
channel), Layer-2 scored always between 2.1 and 2.6, whereas
Layer-3 scored between 3.6 and 3.8.
This is a significant increase in sound quality, indeed!
Furthermore, the selection process for critical sound material
showed that it was rather difficult to find worst-case material
for Layer-3 whereas it was not so hard to find such items for
Layer-2.
Q. Someone claimed that some international working group on audio
coding (TG10?) has concluded and that there was some trouble with
Layer 3, specifically on male voice in the German language. Is
that correct?
A. One moment, please. The former CCIR has changed its name into ITU-
Radiocommunication. In 1992, they founded a test group called TG10-
2 with the task to prepare the draft for a new recommendation for
the use of low bitrate audio coding in digital sound broadcasting
applications.
This test group concluded its work in 10.93. The draft
recommendation defines three fields of broadcast applications:
a) distribution and contribution links
(20 kHz bandwidth, no audible impairments with up to 5 cascaded
codecs)
Recommendation: Layer-2 with 180 kbps per channel (mono or
one independently coded channel of a stereo-signal); for a single
distribution link without cascading, Layer-2 with 120 kbps per
channel
b) emission
(20 kHz bandwidth)
Recommendation: Layer-2 with 128 kbps per channel (mono or
one independently coded channel of a stereo-signal)
c) commentary links
(15 kHz bandwidth)
Recommendation: Layer-3 with 60 kbps for monophonic and 120 kbps
for stereophonic signals (applying joint-stereo coding)
So these are the recommendations. And again, it nicely fits
into the above mentioned application profile of MPEG audio: with
medium bitrates, Layer-2 performs satisfying enough; with really
low bitrates, you need Layer-3.
The recommendations are based on international listening and
evaluation tests performed mainly in 1992.
For contribution and distribution, Layer-2 was the only system
that fulfilled the requirements.
For emission, the codecs had to score at least 4.0 on the CCIR
impairment scale, even for the most critical material. At 128 kbps
per channel, AC-2, Layer-2 and Layer-3 fulfilled this requirement,
and Layer-2 got the recommendation mainly because of its
"commonality with the distribution and contribution application".
Further tests for emission were performed at 192 kbps joint-stereo
coding. Layer-3 clearly met the requirements, Layer-2 fulfilled
them only marginally, with doubts remaining during further tests in
1993. Result: *no* recommendation for 192 kbps joint-stero.
For commentary, the quality requirements were for speech
to be equivalent to 14-bit linear PCM, and for music, some
perceptible impairments were to be tolerated. In the test in 92
Layer-3 was by far the only codec that fulfilled these
requirements (e.g. overall monophonic, it scored 3.6 in contrast to
Layer-2 at 2.05 - and for male German speech, it scored 4.4 in
contrast to Layer-2 at 2.4). So there was simply no alternative to
Layer-3.
Further tests were conducted in 93 using headphones. They showed
that Layer-3 with monophonic speech (the test item is German male
voice) at 60 kbps did not fully meet the quality requirements.
Layer-2 was not included in these tests as its low bitrate
performance was clearly too poor right from the start. Therefore,
the listeners had no "lower anchor" during the listening test (the
codec that always gets the "1" and "2" scores) - a fact that
certainly influences the absolute scoring. Funny enough, the
same speech signal has been tested in some previous sessions
without complaints...
The ITU decided to recommend Layer-3 and to include a temporary
footnote that will be removed as soon as an improved Layer-3 codec
fulfills their requirements completely, i.e. even with that well-
known critical male German speech item (for many other speech
items, Layer-3 has no trouble at all).
Q: OK, a Layer-2 codec at low bitrates may sound poor today, but
couldn't that be improved in the future? I guess you just told me
before that the encoder is not fixed in the standard.
A: Good thinking! As the sound quality mainly depends on the encoder
implementation, it is true that there is no such thing as a "Layer-
N"- quality. So we definitely only know the performance of the
reference codecs during the international tests. Who knows what
will happen in the future? What we do know now, is:
Today, Layer-3 already provides a sound quality that comes very
near to CD quality at 64 kbps per channel. Layer-2 is far away
from that.
Tomorrow, both Layers may improve. Layer-2 has been designed as a
trade-off between quality and complexity, so the bitstream format
allows only limited innovations. In contrast, even the current
reference Layer-3-codec exploits only a small part of the powerful
mechanisms inside the Layer-3 bitstream format.
Q: All in all, you sound as if anybody should use Layer-3 for low
bitrates. Why on earth do some vendors still offer only Layer-2
equipment for these applications?
A: Well, maybe because they started to design and develop their
system rather early, e.g. in 1990. As Layer-2 is identical with
MUSICAM, it has been available since summer of 90, at latest. In
that year, Layer-3 development started and could be successfully
finished in spring 92. So, for a certain time, vendors could only
exploit the existing part of the new MPEG standard.
Now the situation has changed. All Layers are available, the
standard is completed, and new systems need not limit themselves,
but may capitalize on the full features of MPEG audio.
Q: What other topics do I have to keep in mind? Tell me about the
complexity of Layer-3.
A: Alright. First, we have to separate between decoder and encoder.
For a stereo Layer-3-decoder, our real-time implementations use
either one DSP32C (AT&T) or one DSP56002 (Mot). For an ASIC,
Intermetall (ITT) estimated an overhead of around 30 % chip area
for adding the necessary Layer-3 modules to a Layer-2-decoder. So
you need not worry too much about decoder complexity.
For a stereo Layer-3-encoder achieving reference quality, our
current real-time implementations use two DSP32C and two DSP56002.
But again: as more and more horsepower becomes available on one
chip, the matter of encoder complexity will decrease.
Q: And what about the codec delay?
A: Well, the standard gives some figures of the theoretical minimum
delay:
Layer-1: 19 ms (<50 ms)
Layer-2: 35 ms (100 ms)
Layer-3: 59 ms (150 ms)
The practical values are significantly above that. As they depend
on the implementation, exact figures are hard to give. So the
figures in brackets are just rough thumb values.
Yes, for some applications, a very short delay is of critical
importance. E.g. in a feedback link, a reporter can only talk
intelligibly if the overall delay is below around 10 ms.
If broadcasters want to apply MPEG audio coding, they have to use
"N-1" switches in the studio to overcome this problem (or
appropriate echo-cancellers) - or they have to forget about MPEG
at all.
But with most applications, these figures are small enough to
present no extra problem. At least, if one can accept a Layer-2
delay, one can most likely also accept the higher Layer-3 delay.
Q: Someone told me that, with Layer-3, the codec delay would depend
on the actual audio signal, varying over the time. Is this really
true?
A: No. The codec delay does *not* depend on the audio signal.
With all Layers, the delay depends on the actual implementation
used in a specific codec, so different codecs may have different
delays. Furthermore, the delay depends on the actual sample rate
and bitrate of your codec.
One of Layer-3's advanced unique features is the optional use of a
"bit reservoir". The bit reservoir is a buffer that is controlled
by the encoder. In "easy times", the encoder may fill this buffer
with data bits that are not required to meet the masking
requirements of the actual audio signal. In "hard times", the
encoder may use the saved data bits to meet peak bitrate demands.
The buffer size of the bit reservoir adds to the codec delay. Its
value is a constant that is explicitly defined in the encoder.
So don't get confused. The codec delay does not change with the
music - that would really be a silly behaviour for an audio codec.
Q: OK, I am hooked on! Where can I find more technical informations
about MPEG audio coding, especially about Layer-3?
A: Well, there is a variety of AES papers, e.g.
K. Brandenburg, G. Stoll, ...: "The ISO/MPEG-Audio Codec: A
Generic Standard for Coding of High Quality Digital Audio", 92nd
AES, Vienna 1992, pp.3336
E. Eberlein, H. Popp, ...: "Layer-3, a Flexible Coding Standard",
94th AES, Berlin 93, pp.3493
K. Brandenburg, G. Zimmer, ...: "Variable Data-Rate Recording on a
PC Using MPEG-Audio Layer-3", 95th AES, New York 93
B. Grill, J. Herre,... : "Improved MPEG-2 Audio Multi-Channel
Encoding", 96th AES, Amsterdam 94
And for further informations, please contact layer3@iis.fhg.de...
3. Layer-3 Products
This is a list of available Layer-3 products - disclosed at 1.1.94.
For further informations, please contact the companies directly.
3.1. Telecommunication Codecs
a) MusicTAXI Type 3
The MusicTAXI is a real-time audio codec for the full-duplex
transmission of mono or stereo audio signals via ISDN. It supports
Layer-2 and -3.
Dialog 4 System Engineering GmbH
Monreposstr. 57
D-71634 Ludwigsburg, Germany
Fax +49-7141-22667
b) MAGIC Series
The Multi Audio-System with Groupable Interfaces and Codecs
supports Layer-2 and -3 as well as G.722 and G.711. Its
transmission procedures comply with H.221, H.242 or G.704. The
codec is a universal device useful in ISDN applications as well as
in satellite links, LAN or WAN networks or audio memory
installations.
PKI Philips Kommunikations Industrie AG
Thurn-und-Taxis-Str. 14
D-90411 Nuernberg, Germany
Fax +49-911-526-6315
c) Zephyr Codec
The Zephyr is a Layer-3 codec for the transmission of mono or
stereo audio signals via ISDN, Switch-56 or V.35-networks. It also
offers a G.722 feedback link.
Telos Systems
2101 Superior Avenue
Cleveland, OH 44114, USA
Fax +1-216-241-4103
3.2. Speech Announcement System
a) DAS VIII HiFi
This digital speech announcement system for mass transit
applications applies Layer-3 to use the ROM based speech memory
most efficiently. Moreover, the system offers an unrivalled sound
quality at a very competitive price.
Meister Electronic GmbH
Koelner Str. 57
D-51149 Koeln, Germany
Fax +49-2203-12079
3.3 PC Boards
a) Layer-3 PC Board
This full-size PC/AT ISA card is a real-time audio processing
board. It performs two-channel Layer-3 encoding and decoding,
depending on the software configuration. The board offers digital
audio interfaces (AES and IEC) and an additional X.21 interface
for the reduced data stream. The board is delivered with a library
of C drivers and a demo programm.
Audio Export Georg Neumann & Co. GmbH
Badstr. 14
D-74072 Heilbronn, Germany
Fax +49-7131-68790
b) L3-PC-Card
This PC-Card supports a real-time Layer-3 audio codec. It offers
digital audio interfaces (AES and IEC) and two additional X.21
interfaces for one or two reduced data streams. And a decoder-
only PC card is also available.
Dialog 4 System Engineering GmbH
Monreposstr. 57
D-71634 Ludwigsburg, Germany
Fax +49-7141-22667
3.4. ICs
a) ISO-MPEG Decoder Chip MASC 3500
This MPEG decoder chip offers the use of the full ISO-MPEG-audio
standard, i.e. Layer-1, -2, and -3. The ASIC is based on the MASC
DSP family (.8 um) and comes in a small 68 pin PLCC package.
First samples will be available in 3.Q.94.
ITT Intermetall GmbH
Hans-Bunte-Str. 19
D-79108 Freiburg, Germany
Fax +49-761-517-880
3.5. Layer-3 Shareware
The layer 3 shareware is copyright Fraunhofer - IIS 1994
a) Shareware encoder/decoder for IBM PCs or Compatibles, version 1.00
The programms are written for IBM-PCs or Compatibles with MS-Dos.
L3ENC.EXE and L3DEC.EXE should work on practically any PC with 386
type CPU or better. For the encoder, a 486DX33 or better is recommended.
On a 486DX2/66 the performance of the software-only decoder is about
33% of the performance necessary for real time audio processing.
The encoder needs about 14 minutes to encode a 1 minute audio data
file. These figures assume coding/decoding of stereo audio material
at 44.1 kHz/sec.
b) Shareware encoder/decoder for Sun workstations, version 1.00
The encoder takes about 5 minutes for encoding of 1 minute of stereo audio
data on a SPARC station 10. The decoder works in real time.
Availability of the shareware packages:
- via anonymous ftp from fhginfo.fhg.de (153.96.1.4)
You may download our Layer-3 audio software package from the
directory /pub/layer3. You will find the following files:
For IBM PCs:
l3v100.txt a short description of the files found in l3v100.zip
l3v100.zip encoder, decoder, documentation and a sample bitstream
l3v100n.txt a short description of the files found in l3v100n.zip
l3v100n.zip encoder, decoder and documentation (no bitstream)
bstr100.l3 a sample bitstream encoded with l3enc version 1.00
For SUN workstations:
l3v100.sun.txt short description of the files found in l3v100.sun.zip
l3v100.sun.tar.gz encoder, decoder, documentation and a sample bitstream
l3v100n.sun.txt short description of the files found in l3v100n.sun.zip
l3v100n.sun.tar.gz encoder, decoder and documentation (no bitstream)
bstr100.l3 sample bitstream encoded with version 1.00 of the encoder
- via direct modem download (up to 14.400 bps)
Modem telephone number : +49 911 9933662 Name: FHG
Packet switching network: (0) 262 45 9110 10290 Name: FHG
(For the telephone number, replace "+" with your appropriate
international dial prefix, e.g. "011" for the USA.)
Follow the menus as desired.
- via shipment of diskette (only including registration)
You may order a diskette directly from:
Mailbox System Nuernberg (MSN)
Hanft & Hartmann
Innerer Kleinreuther Weg 21
D-90408 Nuernberg
Germany
Please note: MSN will only ship a diskette if they get paid for the
registration fee before. The registration fee is 85 Deutsche Mark
(about 50 US$) (plus sales tax, if applicable) for one copy of the
package. The preferred method of payment is via credit card. Currently,
MSN accepts VISA, Master Card / Eurocard / Access credit cards. For
details see the file REGISTER.TXT found in the shareware packeage.
You may reach MSN also via Internet: msn@iis.fhg.de
or via Fax: +49 911 9933661
or via BBS: +49 911 9933662 Name: FHG
or via X25: 0262 45 9110 10290 Name: FHG
(e.g. in USA, please replace "+" with "011")
- via email
You may get our shareware also by a direct request to msn@iis.fhg.de.
In this case, the shareware is split into about 30 small uuencoded
parts...
4. End of INFO.TXT