| Next: Advanced features
| Up: MP3Enc Features
| Previous: MP3Enc Features
| Table of Contents |
Subsections
Sample rate is the rate at which the samples are read from your sound card
when you sample. Sample rate is directly linked to audio bandwidth
achievable: A sound file with a sample rate of 8 kHz does not contain
frequencies beyond 4 kHz. This means that you should always use the
highest sample rate that your sound card supports when you sample a
signal.
The encoder changes the sample rate of your audio data to match it to
the audio quality of the bitstream produced by the encoder. This process
is called downsampling.
The main parameter controlling the sound quality is the
bitrate that the encoder runs at. In a nutshell, the higher
the bitrate, the better the quality.
The bitrate of the encoder is linked to the samplerate that the encoded
file will have. Usually, the encoder will choose a samplerate that
is suited best for encoding at that bitrate. You can override this
samplerate using the -esr switch (see section
2.2.1).
The bitrate of the bitstream output is selected via the -br
switch. The bitrate is specified in bits/second. The bitrate is the
total bitrate for all encoded channels, i.e. if you select
-br 112000 and encode in stereo, both channels
will be stuffed into one bitstream of 112000 bits/second.
The encoder supports bitrates of 8, 16, 18, 20, 24, 32, 40, 48, 56, 64, 96,
112, 128, 160, 192 and 256 kBit/s. While all of these can be used with
mono signals, stereo works from 20 kBit/s on upwards.
If encoding stereo, the bitrate of the encoder is linked to a
stereo mode. MPEG Layer-3 knows four modes for stereo encoding.
- dual channel
- (also known as dual mono) In this mode, the
encoder treats the two input channels as separate entities, assuming
there is no similarity between the channels. This would be appropriate
if you e.g. have a bilingual signal where one channel contains a german
speaker and one contains an english speaker.
- stereo
- In this mode, like in dual channel above, the encoder
makes
no use of potentially existing correlations between the two input
channels. It can, however, negotiate the bit demand between both
channel, i.e. give one channel more bits if the other contains silence.
- MS
stereo
- In this mode, the encoder will make use of a correlation between
both channels. The signal will be matrixed into a sum (»mid«) and
difference (»side«) signal. For quasi-mono signals, this will give
a significant gain in encoding quality.
This mode does not destroy phase information like IS stereo (see below)
and thus can be used to encode DOLBY ProLogic surround signals.
- MS/IS stereo
- In this mode, high-frequency parts of the signal will be
downmixed to mono and transmitted with a direction information (which is
basically a pan). This mode (called »intensity
stereo« will loose phase information and should not be
used for high-quality encoding.
Table 2.1 gives you an overview which mode will be used
for which bitrate.
Table 2.1:
different stereo modes
Bitrates |
- |
|
stereo mode |
8000 |
- |
18000 |
mono only |
18000 |
- |
96000 |
MS/IS stereo |
96000 |
- |
192000 |
MS stereo |
192000 |
- |
256000 |
stereo |
Several factors influence the speed of the encoder. They include:
- Number of channels in the output signal. If your output signal has
only one channel, the encoder will run at twice the speed compared to
stereo encoding.
- Output sample rate. If the encoder produces a file at 22.050 kHz
(that is, a file that contains 22050 samples per second), it runs at
twice the speed compared to one that produces twice the number of samples
per second (i.e. produces a 44.1 kHz output).
- Mismatch between input and output sample rate. If your input and
output sample rates differ, the encoder will have to run a resampling
filter and thus will be slower. (Integer ratios between input and output
sample rate perform slightly better than non-integer ratios, though).
- Time-domain bandlimiting . The encoder needs to band-limit the
signal to compress it. By default, the encoder will use a high-quality
time domain filter to do this band-limiting. You can tell it to use a
faster filter, possibly sacrificing some quality (see
2.2.2).
- Full huffman search and careful iteration. You can tell the
encoder to try hard to do the best encoding possible, at the expense of
a factor of up to three in running time (see 2.2.2).
Version V3.0 of the encoder reaches realtime speed on a Pentium 166 when
encoding at 64 kBit/s, 22,050 kHz, stereo. On a SUN Sparc Ultra-1 (143
MHz) the performance is similar.
The encoder can read AIFF, AIFF-C, WAV/RIFF and raw PCM data
files. While the first three only work from a file, plain PCM data can be
fed into the encoder via a pipe. This is useful for live encoding (also
known as streaming).
- -if filename
- will tell the
encoder the filename it reads it input from. If the file is a RIFF/WAVE
file or an AIFF/AIFC file, the encoder will automatically adapt to the
sound file format. For other formats or plain PCM data, see below.
- -sti
- tells the encoder to get its input from stdin rather than
from a file. This only works when the input is plain pcm data (see below).
If the encoder gets its input as plain pcm data (or if it does not
recognize the sound format by itself), you need to tell it all
about the structure of the PCM stream, i.e. the number of bits per
sample, the number of channel and the samplerate.
- -iff fileformat
- This is a string containing
name=value pairs, separated by blanks. Table 2.2 gives a
reference which names and values are possible here.
For stereo files, the encoder assumes that the PCM data is interleaved
and that the sample for the right channel follows that for the left channel.
Table 2.2:
input file format specification
Name |
Value(s) |
Explanation |
sr |
any |
The rate the PCM signal is sampled at [Hz] |
nc |
1, 2 |
The number of channels in the signal |
bps |
8, 16, 24, 32 |
The number of bits per sample |
little-endian |
|
The signal is little-endian (Intel format) |
big-endian |
|
The signal is big-endian (Motorola format) |
As an example, -iff "nc=2 sr=44100 bps=16" would
be used to read a 44.1 kHz stereo file with 16 bits per sample while
-iff "nc=1 sr=8000 bps=8" would tell the encoder
that the data is mono, sampled at 8 kHz with 8 bits per sample.
Remember that this feature is only needed for input from files other
than RIFF/WAV, AIFF and AIFC.
On output, the encoder can be instructed to write a plain Layer-3
bitstream or a wave file containing the Layer-3 stream. These wave files
can be played by the media control on a machine running under Microsoft
Windows that has the MPEG Layer-3 ACM codec installed (you can get one by
installing Microsoft Netshow,http://www.microsoft.com/netshow/
).
If the output is a plain Layer-3 stream, it can be piped into other
applications. This is useful for live streaming.
- -of filename
- tells the encoder
the filename of the file that the encoder will write the bitstream
to. If the file does not exist, it is created; if it does exist, it will
be overwritten.
- -l3wav
- tells the encoder to wrap the
MPEG Layer-3 file into a Microsoft RIFF/WAVE file.
- -sto
- tells the encoder to write its output into stdout
rather than in a file. This only works when the output is a raw Layer-3
bitstream (i.e. it does not work in conjunction with -l3wav ).
layer3@iis.fhg.de, 03/98