Basics

Samplerate

Sample rate is the rate at which the samples are read from your sound card when you sample. Sample rate is directly linked to audio bandwidth achievable: A sound file with a sample rate of 8 kHz does not contain frequencies beyond 4 kHz. This means that you should always use the highest sample rate that your sound card supports when you sample a signal.

The encoder changes the sample rate of your audio data to match it to the audio quality of the bitstream produced by the encoder. This process is called downsampling.

Bitrate

The main parameter controlling the sound quality is the bitrate that the encoder runs at. In a nutshell, the higher the bitrate, the better the quality.

The bitrate of the encoder is linked to the samplerate that the encoded file will have. Usually, the encoder will choose a samplerate that is suited best for encoding at that bitrate. You can override this samplerate using the -esr switch (see section 2.2.1).

The bitrate of the bitstream output is selected via the -br switch. The bitrate is specified in bits/second. The bitrate is the total bitrate for all encoded channels, i.e. if you select -br 112000 and encode in stereo, both channels will be stuffed into one bitstream of 112000 bits/second.

The encoder supports bitrates of 8, 16, 18, 20, 24, 32, 40, 48, 56, 64, 96, 112, 128, 160, 192 and 256 kBit/s. While all of these can be used with mono signals, stereo works from 20 kBit/s on upwards.

Stereo mode

If encoding stereo, the bitrate of the encoder is linked to a stereo mode. MPEG Layer-3 knows four modes for stereo encoding.

dual channel

(also known as dual mono) In this mode, the encoder treats the two input channels as separate entities, assuming there is no similarity between the channels. This would be appropriate if you e.g. have a bilingual signal where one channel contains a german speaker and one contains an english speaker.

stereo

In this mode, like in dual channel above, the encoder makes no use of potentially existing correlations between the two input channels. It can, however, negotiate the bit demand between both channel, i.e. give one channel more bits if the other contains silence.

MS stereo

In this mode, the encoder will make use of a correlation between both channels. The signal will be matrixed into a sum (»mid«) and difference (»side«) signal. For quasi-mono signals, this will give a significant gain in encoding quality.

This mode does not destroy phase information like IS stereo (see below) and thus can be used to encode DOLBY ProLogic $^{\mbox{tm}}$ surround signals.

MS/IS stereo

In this mode, high-frequency parts of the signal will be downmixed to mono and transmitted with a direction information (which is basically a pan). This mode (called »intensity stereo« will loose phase information and should not be used for high-quality encoding.

Table 2.1 gives you an overview which mode will be used for which bitrate.

Table 2.1: different stereo modes
Bitrates - stereo mode

8000 - 18000 mono only

18000 - 96000 MS/IS stereo

96000 - 192000 MS stereo

192000 - 256000 stereo

**Table 2.1:** different stereo modes
Bitrates	-		stereo mode
8000	-	18000	mono only
18000	-	96000	MS/IS stereo
96000	-	192000	MS stereo
192000	-	256000	stereo

Encoding speed

Several factors influence the speed of the encoder. They include:

Number of channels in the output signal. If your output signal has only one channel, the encoder will run at twice the speed compared to stereo encoding.
Output sample rate. If the encoder produces a file at 22.050 kHz (that is, a file that contains 22050 samples per second), it runs at twice the speed compared to one that produces twice the number of samples per second (i.e. produces a 44.1 kHz output).
Mismatch between input and output sample rate. If your input and output sample rates differ, the encoder will have to run a resampling filter and thus will be slower. (Integer ratios between input and output sample rate perform slightly better than non-integer ratios, though).
Time-domain bandlimiting . The encoder needs to band-limit the signal to compress it. By default, the encoder will use a high-quality time domain filter to do this band-limiting. You can tell it to use a faster filter, possibly sacrificing some quality (see 2.2.2).
Full huffman search and careful iteration. You can tell the encoder to try hard to do the best encoding possible, at the expense of a factor of up to three in running time (see 2.2.2).

Version V3.0 of the encoder reaches realtime speed on a Pentium 166 when encoding at 64 kBit/s, 22,050 kHz, stereo. On a SUN Sparc Ultra-1 (143 MHz) the performance is similar.

Input file specification

The encoder can read AIFF, AIFF-C, WAV/RIFF and raw PCM data files. While the first three only work from a file, plain PCM data can be fed into the encoder via a pipe. This is useful for live encoding (also known as streaming).

Input from file: filename

-if filename: will tell the encoder the filename it reads it input from. If the file is a RIFF/WAVE file or an AIFF/AIFC file, the encoder will automatically adapt to the sound file format. For other formats or plain PCM data, see below.

Piping data into the encoder

-sti: tells the encoder to get its input from stdin rather than from a file. This only works when the input is plain pcm data (see below).

plain PCM data input

If the encoder gets its input as plain pcm data (or if it does not recognize the sound format by itself), you need to tell it all about the structure of the PCM stream, i.e. the number of bits per sample, the number of channel and the samplerate.

-iff fileformat

This is a string containing name=value pairs, separated by blanks. Table 2.2 gives a reference which names and values are possible here. For stereo files, the encoder assumes that the PCM data is interleaved and that the sample for the right channel follows that for the left channel.

**Table 2.2:** input file format specification
Name	Value(s)	Explanation
`sr`	any	The rate the PCM signal is sampled at [Hz]
`nc`	1, 2	The number of channels in the signal
`bps`	8, 16, 24, 32	The number of bits per sample
`little-endian`		The signal is little-endian (Intel format)
`big-endian`		The signal is big-endian (Motorola format)

As an example, -iff "nc=2 sr=44100 bps=16" would be used to read a 44.1 kHz stereo file with 16 bits per sample while -iff "nc=1 sr=8000 bps=8" would tell the encoder that the data is mono, sampled at 8 kHz with 8 bits per sample.

Remember that this feature is only needed for input from files other than RIFF/WAV, AIFF and AIFC.

Output file specification

On output, the encoder can be instructed to write a plain Layer-3 bitstream or a wave file containing the Layer-3 stream. These wave files can be played by the media control on a machine running under Microsoft Windows that has the MPEG Layer-3 ACM codec installed (you can get one by installing Microsoft Netshow $^{\mbox{tm}}$ ,http://www.microsoft.com/netshow/ ). If the output is a plain Layer-3 stream, it can be piped into other applications. This is useful for live streaming.

-of filename: tells the encoder the filename of the file that the encoder will write the bitstream to. If the file does not exist, it is created; if it does exist, it will be overwritten.
-l3wav: tells the encoder to wrap the MPEG Layer-3 file into a Microsoft RIFF/WAVE file.

Streaming data out of the encoder

-sto: tells the encoder to write its output into stdout rather than in a file. This only works when the output is a raw Layer-3 bitstream (i.e. it does not work in conjunction with -l3wav ).

layer3@iis.fhg.de, 03/98