home *** CD-ROM | disk | FTP | other *** search
-
- For similar reasons, one MCU is also the best chunk size for the frequency
- coefficient quantization and dequantization steps.
-
- For downsampling and upsampling, the best chunk size is to have each call
- transform Vk sample rows from or to Vmax sample rows (Vk = this component's
- vertical sampling factor, Vmax = largest vertical sampling factor). There are
- eight such chunks in each MCU row. Using a whole MCU row as the chunk size
- would reduce function call overhead a fraction, but would imply more buffering
- to provide context for cross-pixel smoothing.
-
-
- *** Compression object structure ***
-
- I propose the following set of objects for the compressor. Here an "object"
- is the common interface for one or more modules having comparable functions.
-
- Most of these objects can be justified as information-hiding modules.
- I've indicated what information is private to each object/module.
-
- Note that in all cases, the caller of a method is expected to have allocated
- any storage needed for it to return its result. (Typically this storage can
- be re-used in successive calls, so malloc'ing and free'ing once per call is
- not reasonable.) Also, much of the context required (compression parameters,
- image size, etc) will be passed around in large common data structures, which
- aren't described here; see the header files. Notice that any object that
- might need to allocate working storage receives an "init" and a "term" call;
- "term" should be careful to free all allocated storage so that the JPEG system
- can be used multiple times during a program run. (For the same reason,
- depending on static initialization of variables is a no-no. The only
- exception to the free-all-allocated-storage rule is that storage allocated for
- the entire processing of an image need not be explicitly freed, since the
- memory manager's free_all cleanup will free it.)
-
- 1. Input file conversion to standardized form. This provides these methods:
- input_init: read the file header, report image size & component count.
- get_input_row: read one pixel row, return it in our standard format.
- input_term: finish up at the end.
- In implementations that support multiple input formats, input_init could
- set up an appropriate get_input_row method depending on the format it
- finds. Note that in most applications, the selection and opening of the
- input file will be under the control of the user interface module; and
- indeed the user interface may have already read the input header, so that
- all that input_init may have to do is return previously saved values. The
- behind-the-scenes interaction between this object and the user interface is
- not specified by this architecture.
- (Hides format of input image and mechanism used to read it. This code is
- likely to vary considerably from one implementation to another. Note that
- the color space and number of color components of the source are not hidden;
- but they are used only by the next object.)
-
- 2. Gamma and color space conversion. This provides three methods:
- colorin_init: initialization.
- get_sample_rows: read, convert, and return a specified number of pixel
- rows (not more than remain in the picture).
- colorin_term: finish up at the end.
- The most efficient approach seems to be for this object to call
- get_input_row directly, rather than being passed the input data; that way,
- any intermediate storage required can be local to this object.
- (get_sample_rows might tell get_input_row to read directly into its own
- output area and then convert in place; or it may do something different.
- For example, conversion in place wouldn't work if it is changing the number
- of color components.) The output of this step is in the standardized
- sample array format shown previously.
- (Hides all knowledge of color space semantics and conversion. Remaining
- modules only need to know the number of JPEG components.)
-
- 3. Edge expansion: needs only a single method.
- edge_expand: Given an NxM sample array, expand to a desired size (a
- multiple of the MCU dimensions) by duplicating the last
- row or column. Repeat for each component.
- Expansion will occur in place, so the caller must have pre-allocated enough
- storage. (I'm assuming that it is easier and faster to do this expansion
- than it is to worry about boundary conditions in the next two steps.
- Notice that vertical expansion will occur only once, at the bottom of the
- picture, so only horizontal expansion by a few pixels is speed-critical.)
- (This doesn't really hide any information, so maybe it could be a simple
- subroutine instead of a method. Depends on whether we want to be able to
- use alternative, optimized methods.)
-
- 4. Downsampling: this will be applied to one component at a time.
- downsample_init: initialize (precalculate convolution factors, for
- example). This will be called once per scan.
- downsample: Given a sample array, reduce it to a smaller number of
- samples using specified sampling factors.
- downsample_term: clean up at the end of a scan.
- If the current component has vertical sampling factor Vk and the largest
- sampling factor is Vmax, then the input is always Vmax sample rows (whose
- width is a multiple of Hmax) and the output is always Vk sample rows.
- Vmax additional rows above and below the nominal input rows are also passed
- for use by partial-pixel-averaging sampling methods. (Is this necessary?)
- At the top and bottom of the image, these extra rows are copies of the
- first or last actual input row.
- (This hides whether and how cross-pixel averaging occurs.)
-
- 5. MCU extraction (creation of a single sequence of 8x8 sample blocks).
- extract_init: initialize as needed. This will be called once per scan.
- extract_MCUs: convert a sample array to a sequence of MCUs.
- extract_term: clean up at the end of a scan.
- Given one or more MCU rows worth of image data, extract sample blocks in the
- appropriate order; pass these off to subsequent steps one MCU at a time.
- The input must be a multiple of the MCU dimensions. It will probably be
- most convenient for the DCT transform, frequency quantization, and zigzag
- reordering of each block to be done as simple subroutines of this step.
- Once a transformed MCU has been completed, it'll be passed off to a
- method call, which will be passed as a parameter to extract_MCUs.
- That routine might either encode and output the MCU immediately, or buffer
- it up for later output if we want to do global optimization of the entropy
- encoding coefficients. Note: when outputting a noninterleaved file this
- object will be called separately for each component. Direct output could
- be done for the first component, but the others would have to be buffered.
- (Again, an object mainly on the grounds that multiple instantiations might
- be useful.)
-
- 6. DCT transformation of each 8x8 block. This probably doesn't have to be a
- full-fledged method, but just a plain subroutine that will be called by MCU
- extraction. One 8x8 block will be processed per call.
-
- 7. Quantization scaling and zigzag reordering of the elements in each 8x8
- block. (This can probably be a plain subroutine called once per block by
- MCU extraction; hard to see a need for multiple instantiations here.)
-
- 8. Entropy encoding (Huffman or arithmetic).
- entropy_encode_init: prepare for one scan.
- entropy_encode: accepts an MCU's worth of quantized coefficients,
- encodes and outputs them.
- entropy_encode_term: finish up at end of a scan (dump any buffered
- bytes, for example).
- The data output by this module will be sent to the entropy_output method
- provided by the pipeline controller. (It will probably be worth using
- buffering to pass multiple bytes per call of the output method.) The
- output method could be just write_jpeg_data, but might also be a dummy
- routine that counts output bytes (for use during cut-and-try coefficient
- optimization).
- (This hides which entropy encoding method is in use.)
-
- 9. JPEG file header construction. This will provide these methods:
- write_file_header: output the initial header.
- write_scan_header: output scan header (called once per component
- if noninterleaved mode).
- write_jpeg_data: the actual data output method for the preceding step.
- write_scan_trailer: finish up after one scan.
- write_file_trailer: finish up at end of file.
- Note that compressed data is passed to the write_jpeg_data method, in case
- a simple fwrite isn't appropriate for some reason.
- (This hides which variant JPEG file format is being written. Also, the
- actual mechanism for writing the file is private to this object and the
- user interface.)
-
- 10. Pipeline control. This object will provide the "main loop" that invokes
- all the pipeline objects. Note that we will need several different main
- loops depending on the situation (interleaved output or not, global
- optimization of encoding parameters or not, etc). This object will do
- most of the memory allocation, since it will provide the working buffers
- that are the inputs and outputs of the pipeline steps.
- (An object mostly to support multiple instantiations; however, overall
- memory management and sequencing of operations are known only here.)
-
- 11. Overall control. This module will provide at least two routines:
- jpeg_compress: the main entry point to the compressor.
- per_scan_method_selection: called by pipeline controllers for
- secondary method selection passes.
- jpeg_compress is invoked from the user interface after the UI has selected
- the input and output files and obtained values for all compression
- parameters that aren't dynamically determined. jpeg_compress performs
- basic initialization (e.g., calculating the size of MCUs), does the
- "global" method selection pass, and finally calls the selected pipeline
- control object. (Per-scan method selections will be invoked by the
- pipeline controller.)
- Note that jpeg_compress can't be a method since it is invoked prior to
- method selection.
-
- 12. User interface; this is the architecture's term for "the rest of the
- application program", i.e., that which invokes the JPEG compressor. In a
- standalone JPEG compression program the UI need be little more than a C
- main() routine and argument parsing code; but we can expect that the JPEG
- compressor may be incorporated into complex graphics applications, wherein
- the UI is much more complex. Much of the UI will need to be written
- afresh for each non-Unix-like platform the compressor is ported to.
- The UI is expected to supply input and output files and values for all
- non-automatically-chosen compression parameters. (Hence defaults are
- determined by the UI; we should provide helpful routines to fill in
- the recommended defaults.) The UI must also supply error handling
- routines and some mechanism for trace messages.
- (This module hides the user interface provided --- command line,
- interactive, etc. Except for error/message handling, the UI calls the
- portable JPEG code, not the other way around.)
-
- 13. (Optional) Compression parameter selection control.
- entropy_optimize: given an array of MCUs ready to be fed to entropy
- encoding, find optimal encoding parameters.
- The actual optimization algorithm ought to be separated out as an object,
- even though a special pipeline control method will be needed. (The
- pipeline controller only has to understand that the output of extract_MCUs
- must be built up as a virtual array rather than fed directly to entropy
- encoding and output. This pipeline behavior may also be useful for future
- implementation of hierarchical modes, etc.)
- To minimize the amount of control logic in the optimization module, the
- pipeline control doesn't actually hand over big-array pointers, but rather
- an "iterator": a function which knows how to scan the stored image.
- (This hides the details of the parameter optimization algorithm.)
-
- The present design doesn't allow for multiple passes at earlier points
- in the pipeline, but allowing that would only require providing some
- new pipeline control methods; nothing else need change.
-
- 14. A memory management object. This will provide methods to allocate "small"
- things and "big" things. Small things have to fit in memory and you get
- back direct pointers (this could be handled by direct calls to malloc, but
- it's cleaner not to assume malloc is the right routine). "Big" things
- mean buffered images for multiple passes, noninterleaved output, etc.
- In this case the memory management object will give you room for a few MCU
- rows and you have to ask for access to the next few; dumping and reloading
- in a temporary file will go on behind the scenes. (All big objects are
- image arrays containing either samples or coefficients, and will be
- scanned top-to-bottom some number of times, so we can apply this access
- model easily.) On a platform with virtual memory, the memory manager can
- treat small and big things alike: just malloc up enough virtual memory for
- the whole image, and let the operating system worry about swapping the
- image to disk.
-
- Most of the actual calls on the memory manager will be made from pipeline
- control objects; changing any data item from "small" to "big" status would
- require a new pipeline control object, since it will contain the logic to
- ask for a new chunk of a big thing. Thus, one way in which pipeline
- controllers will vary is in which structures they treat as big.
-
- The memory manager will need to be told roughly how much space is going to
- be requested overall, so that it can figure out how big a buffer is safe
- to allocate for a "big" object. (If it happens that you are dealing with
- a small image, you'd like to decide to keep it all in memory!) The most
- flexible way of doing this is to divide allocation of "big" objects into
- two steps. First, there will be one or more "request" calls that indicate
- the desired object sizes; then an "instantiate" call causes the memory
- manager to actually construct the objects. The instantiation must occur
- before the contents of any big object can be accessed.
-
- For 80x86 CPUs, we would like the code to be compilable under small or
- medium model, meaning that pointers are 16 bits unless explicitly declared
- FAR. Hence space allocated by the "small" allocator must fit into the
- 64Kb default data segment, along with stack space and global/static data.
- For normal JPEG operations we seem to need only about 32Kb of such space,
- so we are within the target (and have a reasonable slop for the needs of
- a surrounding application program). However, some color quantization
- algorithms need 64Kb or more of all-in-memory space in order to create
- color histograms. For this purpose, we will also support "medium" size
- things. These are semantically the same as "small" things but are
- referenced through FAR pointers.
-
- The following methods will be needed:
- alloc_small: allocate an object of given size; use for any random
- data that's not an image array.
- free_small: release same.
- alloc_medium: like alloc_small, but returns a FAR pointer. Use for
- any object bigger than a couple kilobytes.
- free_medium: release same.
- alloc_small_sarray: construct an all-in-memory image sample array.
- free_small_sarray: release same.
- alloc_small_barray,
- free_small_barray: ditto for block (coefficient) arrays.
- request_big_sarray: request a virtual image sample array. The size
- of the in-memory buffer will be determined by the
- memory manager, but it will always be a multiple
- of the passed-in MCU height.
- request_big_barray: ditto for block (coefficient) arrays.
- alloc_big_arrays: instantiate all the big arrays previously requested.
- This call will also pass some info about future
- memory demands, so that the memory manager can
- figure out how much space to leave unallocated.
- access_big_sarray: obtain access to a specified portion of a virtual
- image sample array.
- free_big_sarray: release a virtual sample array.
- access_big_barray,
- free_big_barray: ditto for block (coefficient) arrays.
- free_all: release any remaining storage. This is called
- before normal or error termination; the main reason
- why it must exist is to ensure that any temporary
- files will be deleted upon error termination.
-
- alloc_big_arrays will be called by the pipeline controller, which does
- most of the memory allocation anyway. The only reason for having separate
- request calls is to allow some of the other modules to get big arrays.
- The pipeline controller is required to give an upper bound on total future
- small-array requests, so that this space can be discounted. (A fairly
- conservative estimate will be adequate.) Future small-object requests
- aren't counted; the memory manager has to use a slop factor for those.
- 10K or so seems to be sufficient. (In an 80x86, small objects aren't an
- issue anyway, since they don't compete for far-heap space. "Medium"-size
- objects will have to be counted separately.)
-
- The distinction between sample and coefficient array routines is annoying,
- but it has to be maintained for machines in which "char *" is represented
- differently from "int *". On byte-addressable machines some of these
- methods could perhaps point to the same code.
-
- The array routines will operate on only 2-D arrays (one component at a
- time), since different components may require different-size arrays.
-
- (This object hides the knowledge of whether virtual memory is available,
- as well as the actual interface to OS and library support routines.)
-
- Note that any given implementation will presumably contain only one
- instantiation of input file header reading, overall control, user interface,
- and memory management. Thus these could be called as simple subroutines,
- without bothering with an object indirection. This is essential for overall
- control (which has to initialize the object structure); for consistency we
- will impose objectness on the other three.
-
-
- *** Decompression object structure ***
-
- I propose the following set of objects for decompression. The general
- comments at the top of the compression object section also apply here.
-
- 1. JPEG file scanning. This will provide these methods:
- read_file_header: read the file header, determine which variant
- JPEG format is in use, read everything through SOF.
- read_scan_header: read scan header (up through SOS). This is called
- after read_file_header and again after each scan;
- it returns TRUE if it finds SOS, FALSE if EOI.
- read_jpeg_data: fetch data for entropy decoder.
- resync_to_restart: try to recover from bogus data (see below).
- read_scan_trailer: finish up after one scan, prepare for another call
- of read_scan_header (may be a no-op).
- read_file_trailer: finish up at end of file (probably a no-op).
- The entropy decoder must deal with restart markers, but all other JPEG
- marker types will be handled in this object; useful data from the markers
- will be extracted into data structures available to subsequent routines.
- Note that on exit from read_file_header, only the SOF-marker data should be
- assumed valid (image size, component IDs, sampling factors); other data
- such as Huffman tables may not appear until after the SOF. The overall
- image size and colorspace can be determined after read_file_header, but not
- whether or how the data is interleaved. (This hides which variant JPEG
- file format is being read. In particular, for JPEG-in-TIFF the read_header
- routines might not be scanning standard JPEG markers at all; they could
- extract the data from TIFF tags. The user interface will already have
- opened the input file and possibly read part of the header before
- read_file_header is called.)
-
- When reading a file with a nonzero restart interval, the entropy decoder
- expects to see a correct sequence of restart markers. In some cases, these
- markers may be synthesized by the file-format module (a TIFF reader might
- do so, for example, using tile boundary pointers to determine where the
- restart intervals fall). If the incoming data is corrupted, the entropy
- decoder will read as far as the next JPEG marker, which may or may not be
- the expected next restart marker. If it isn't, resync_to_restart is called
- to try to locate a good place to resume reading. We make this heuristic a
- file-format-dependent operation since some file formats may have special
- info that's not available to the entropy decoder (again, TIFF is an
- example). Note that resync_to_restart is NOT called at the end of a scan;
- it is read_scan_trailer's responsibility to resync there.
-
- NOTE: for JFIF/raw-JPEG file format, the read_jpeg_data routine is actually
- supplied by the user interface; the jrdjfif module uses read_jpeg_data
- internally to scan the input stream. This makes it possible for the user
- interface module to single-handedly implement special applications like
- reading from a non-stdio source. For JPEG-in-TIFF format, the need for
- random access will make it impossible for this to work; hence the TIFF
- header module will override the UI-supplied read_jpeg_data routine.
- Non-stdio input from a TIFF file will require extensive surgery to the TIFF
- header module, if indeed it is practical at all.
-
- 2. Entropy (Huffman or arithmetic) decoding of the coefficient sequence.
- entropy_decode_init: prepare for one scan.
- entropy_decode: decodes and returns an MCU's worth of quantized
- coefficients per call.
- entropy_decode_term: finish up after a scan (may be a no-op).
- This will read raw data by calling the read_jpeg_data method (I don't see
- any reason to provide a further level of indirection).
- (This hides which entropy encoding method is in use.)
-
- 3. Quantization descaling and zigzag reordering of the elements in each 8x8
- block. This will be folded into entropy_decode for efficiency reasons:
- many of the coefficients are zeroes, and this can be exploited most easily
- within entropy_decode since the encoding explicitly skips zeroes.
-
- 4. MCU disassembly (conversion of a possibly interleaved sequence of 8x8
- blocks back to separate components in pixel map order).
- disassemble_init: initialize. This will be called once per scan.
- disassemble_MCU: Given an MCU's worth of dequantized blocks,
- distribute them into the proper locations in a
- coefficient image array.
- disassemble_term: clean up at the end of a scan.
- Probably this should be called once per MCU row and should call the
- entropy decoder repeatedly to obtain the row's data. The output is
- always a multiple of an MCU's dimensions.
- (An object on the grounds that multiple instantiations might be useful.)
-
- 5. Cross-block smoothing per JPEG section K.8 or a similar algorithm.
- smooth_coefficients: Given three block rows' worth of a single
- component, emit a smoothed equivalent of the
- middle row. The "above" and "below" pointers
- may be NULL if at top/bottom of image.
- The pipeline controller will do the necessary buffering to provide the
- above/below context. Smoothing will be optional since a good deal of
- extra memory is needed to buffer the additional block rows.
- (This object hides the details of the smoothing algorithm.)
-
- 6. Inverse DCT transformation of each 8x8 block.
- reverse_DCT: given an MCU row's worth of blocks, perform inverse
- DCT on each block and output the results into an array
- of samples.
- We put this method into the jdmcu module for symmetry with the division of
- labor in compression. Note that the actual IDCT code is a separate source
- file.
-
- 7. Upsampling and smoothing: this will be applied to one component at a
- time. Note that cross-pixel smoothing, which was a separate step in the
- prototype code, will now be performed simultaneously with expansion.
- upsample_init: initialize (precalculate convolution factors, for
- example). This will be called once per scan.
- upsample: Given a sample array, enlarge it by specified sampling
- factors.
- upsample_term: clean up at the end of a scan.
- If the current component has vertical sampling factor Vk and the largest
- sampling factor is Vmax, then the input is always Vk sample rows (whose
- width is a multiple of Hk) and the output is always Vmax sample rows.
- Vk additional rows above and below the nominal input rows are also passed
- for use in cross-pixel smoothing. At the top and bottom of the image,
- these extra rows are copies of the first or last actual input row.
- (This hides whether and how cross-pixel smoothing occurs.)
-
- 8. Cropping to the original pixel dimensions (throwing away duplicated
- pixels at the edges). This won't be a separate object, just an
- adjustment of the nominal image size in the pipeline controller.
-
- 9. Color space reconversion and gamma adjustment.
- colorout_init: initialization. This will be passed the component
- data from read_file_header, and will determine the
- number of output components.
- color_convert: convert a specified number of pixel rows. Input and
- output are image arrays of same size but possibly
- different numbers of components.
- colorout_term: cleanup (probably a no-op except for memory dealloc).
- In practice will usually be given an MCU row's worth of pixel rows, except
- at the bottom where a smaller number of rows may be left over. Note that
- this object works on all the components at once.
- When quantizing colors, color_convert may be applied to the colormap
- instead of actual pixel data. color_convert is called by the color
- quantizer in this case; the pipeline controller calls color_convert
- directly only when not quantizing.
- (Hides all knowledge of color space semantics and conversion. Remaining
- modules only need to know the number of JPEG and output components.)
-
- 10. Color quantization (used only if a colormapped output format is requested).
- We use two different strategies depending on whether one-pass (on-the-fly)
- or two-pass quantization is requested. Note that the two-pass interface
- is actually designed to let the quantizer make any number of passes.
- color_quant_init: initialization, allocate working memory. In 1-pass
- quantization, should call put_color_map.
- color_quantize: convert a specified number of pixel rows. Input
- and output are image arrays of same size, but input
- is N coefficients and output is only one. (Used only
- in 1-pass quantization.)
- color_quant_prescan: prescan a specified number of pixel rows in
- 2-pass quantization.
- color_quant_doit: perform multi-pass color quantization. Input is a
- "big" sample image, output is via put_color_map and
- put_pixel_rows. (Used only in 2-pass quantization.)
- color_quant_term: cleanup (probably a no-op except for memory dealloc).
- The input to the color quantizer is always in the unconverted colorspace;
- its output colormap must be in the converted colorspace. The quantizer
- has the choice of which space to work in internally. It must call
- color_convert either on its input data or on the colormap it sends to the
- output module.
- For one-pass quantization the image is simply processed by color_quantize,
- a few rows at a time. For two-pass quantization, the pipeline controller
- accumulates the output of steps 1-8 into a "big" sample image. The
- color_quant_prescan method is invoked during this process so that the
- quantizer can accumulate statistics. (If the input file has multiple
- scans, the prescan may be done during the final scan or as a separate
- pass.) At the end of the image, color_quant_doit is called; it must
- create and output a colormap, then rescan the "big" image and pass mapped
- data to the output module. Additional scans of the image could be made
- before the output pass is done (in fact, prescan could be a no-op).
- As with entropy parameter optimization, the pipeline controller actually
- passes an iterator function rather than direct access to the big image.
- (Hides color quantization algorithm.)
-
- 11. Writing of the desired image format.
- output_init: produce the file header given data from read_file_header.
- put_color_map: output colormap, if any (called by color quantizer).
- If used, must be called before any pixel data is output.
- put_pixel_rows: output image data in desired format.
- output_term: finish up at the end.
- The actual timing of I/O may differ from that suggested by the routine
- names; for instance, writing of the file header may be delayed until
- put_color_map time if the actual number of colors is needed in the header.
- Also, the colormap is available to put_pixel_rows and output_term as well
- as put_color_map.
- Note that whether colormapping is needed will be determined by the user
- interface object prior to method selection. In implementations that
- support multiple output formats, the actual output format will also be
- determined by the user interface.
- (Hides format of output image and mechanism used to write it. Note that
- several other objects know the color model used by the output format.
- The actual mechanism for writing the file is private to this object and
- the user interface.)
-
- 12. Pipeline control. This object will provide the "main loop" that invokes
- all the pipeline objects. Note that we will need several different main
- loops depending on the situation (interleaved input or not, whether to
- apply cross-block smoothing or not, etc). We may want to divvy up the
- pipeline controllers into two levels, one that retains control over the
- whole file and one that is invoked per scan.
- This object will do most of the memory allocation, since it will provide
- the working buffers that are the inputs and outputs of the pipeline steps.
- (An object mostly to support multiple instantiations; however, overall
- memory management and sequencing of operations are known only here.)
-
- 13. Overall control. This module will provide at least two routines:
- jpeg_decompress: the main entry point to the decompressor.
- per_scan_method_selection: called by pipeline controllers for
- secondary method selection passes.
- jpeg_decompress is invoked from the user interface after the UI has
- selected the input and output files and obtained values for all
- user-specified options (e.g., output file format, whether to do block
- smoothing). jpeg_decompress calls read_file_header, performs basic
- initialization (e.g., calculating the size of MCUs), does the "global"
- method selection pass, and finally calls the selected pipeline control
- object. (Per-scan method selections will be invoked by the pipeline
- controller.)
- Note that jpeg_decompress can't be a method since it is invoked prior to
- method selection.
-
- 14. User interface; this is the architecture's term for "the rest of the
- application program", i.e., that which invokes the JPEG decompressor.
- The UI is expected to supply input and output files and values for all
- operational parameters. The UI must also supply error handling routines.
- (This module hides the user interface provided --- command line,
- interactive, etc. Except for error handling, the UI calls the portable
- JPEG code, not the other way around.)
-
- 15. A memory management object. This will be identical to the memory
- management for compression (and will be the same code, in combined
- programs). See above for details.
-
-
- *** Initial method selection ***
-
- The main ugliness in this design is the portion of startup that will select
- which of several instantiations should be used for each of the objects. (For
- example, Huffman or arithmetic for entropy encoding; one of several pipeline
- controllers depending on interleaving, the size of the image, etc.) It's not
- really desirable to have a single chunk of code that knows the names of all
- the possible instantiations and the conditions under which to select each one.
-
- The best approach seems to be to provide a selector function for each object
- (group of related method calls). This function knows about each possible
- instantiation of its object and how to choose the right one; but it doesn't
- know about any other objects.
-
- Note that there will be several rounds of method selection: at initial startup,
- after overall compression parameters are determined (after the file header is
- read, if decompressing), and one in preparation for each scan (this occurs
- more than once if the file is noninterleaved). Each object method will need
- to be clearly identified as to which round sets it up.
-
-
- *** Implications of DNL marker ***
-
- Some JPEG files may use a DNL marker to postpone definition of the image
- height (this would be useful for a fax-like scanner's output, for instance).
- In these files the SOF marker claims the image height is 0, and you only
- find out the true image height at the end of the first scan.
-
- We could handle these files as follows:
- 1. Upon seeing zero image height, replace it by 65535 (the maximum allowed).
- 2. When the DNL is found, update the image height in the global image
- descriptor.
- This implies that pipeline control objects must avoid making copies of the
- image height, and must re-test for termination after each MCU row. This is
- no big deal.
-
- In situations where image-size data structures are allocated, this approach
- will result in very inefficient use of virtual memory or
- much-larger-than-necessary temporary files. This seems acceptable for
- something that probably won't be a mainstream usage. People might have to
- forgo use of memory-hogging options (such as two-pass color quantization or
- noninterleaved JPEG files) if they want efficient conversion of such files.
- (One could improve efficiency by demanding a user-supplied upper bound for the
- height, less than 65536; in most cases it could be much less.)
-
- Alternately, we could insist that DNL-using files be preprocessed by a
- separate program that reads ahead to the DNL, then goes back and fixes the SOF
- marker. This is a much simpler solution and is probably far more efficient.
- Even if one wants piped input, buffering the first scan of the JPEG file
- needs a lot smaller temp file than is implied by the maximum-height method.
- For this approach we'd simply treat DNL as a no-op in the decompressor (at
- most, check that it matches the SOF image height).
-
- We will not worry about making the compressor capable of outputting DNL.
- Something similar to the first scheme above could be applied if anyone ever
- wants to make that work.
-
-
- *** Memory manager internal structure ***
-
- The memory manager contains the most potential for system dependencies.
- To isolate system dependencies as much as possible, we have broken the
- memory manager into two parts. There is a reasonably system-independent
- "front end" (jmemmgr.c) and a "back end" that contains only the code
- likely to change across systems. All of the memory management methods
- outlined above are implemented by the front end. The back end provides
- the following routines for use by the front end (none of these routines
- are known to the rest of the JPEG code):
-
- jmem_init, jmem_term system-dependent initialization/shutdown
-
- jget_small, jfree_small interface to malloc and free library routines
-
- jget_large, jfree_large interface to FAR malloc/free in MS-DOS machines;
- otherwise same as jget_small/jfree_small
-
- jmem_available estimate available memory
-
- jopen_backing_store create a backing-store object
-
- read_backing_store, manipulate a backing store object
- write_backing_store,
- close_backing_store
-
- On some systems there will be more than one type of backing-store object
- (specifically, in MS-DOS a backing store file might be an area of extended
- memory as well as a disk file). jopen_backing_store is responsible for
- choosing how to implement a given object. The read/write/close routines
- are method pointers in the structure that describes a given object; this
- lets them be different for different object types.
-
- It may be necessary to ensure that backing store objects are explicitly
- released upon abnormal program termination. (For example, MS-DOS won't free
- extended memory by itself.) To support this, we will expect the main program
- or surrounding application to arrange to call the free_all method upon
- abnormal termination; this may require a SIGINT signal handler, for instance.
- (We don't want to have the system-dependent module install its own signal
- handler, because that would pre-empt the surrounding application's ability
- to control signal handling.)
-
-
- *** Notes for MS-DOS implementors ***
-
- The standalone cjpeg and djpeg applications can be compiled in "small" memory
- model, at least at the moment; as the code grows we may be forced to switch to
- "medium" model. (Small = both code and data pointers are near by default;
- medium = far code pointers, near data pointers.) Medium model will slow down
- calls through method pointers, but I don't think this will amount to any
- significant speed penalty.
-
- When integrating the JPEG code into a larger application, it's a good idea to
- stay with a small-data-space model if possible. An 8K stack is much more than
- sufficient for the JPEG code, and its static data requirements are less than
- 1K. When executed, it will typically malloc about 10K-20K worth of near heap
- space (and lots of far heap, but that doesn't count in this calculation).
- This figure will vary depending on image size and other factors, but figuring
- 30K should be more than sufficient. Thus you have about 25K available for
- other modules' static data and near heap requirements before you need to go to
- a larger memory model. The C library's static data will account for several K
- of this, but that still leaves a good deal for your needs. (If you are tight
- on space, you could reduce JPEG_BUF_SIZE from 4K to 1K to save 3K of near heap
- space.)
-
- As the code is improved, we will endeavor to hold the near data requirements
- to the range given above. This does imply that certain data structures will
- be allocated as FAR although they would fit in near space if we assumed the
- JPEG code is stand-alone. (The LZW tables in jrdgif/jwrgif are examples.)
- To make an optimal implementation, you might want to move these structures
- back to near heap if you know there is sufficient space.
-
- FAR data space may also be a tight resource when you are dealing with large
- images. The most memory-intensive case is decompression with two-pass color
- quantization. This requires a 128Kb color histogram plus strip buffers
- amounting to about 150 bytes per column for typical sampling ratios (eg, about
- 96000 bytes for a 640-pixel-wide image). You may not be able to process wide
- images if you have large data structures of your own.
-
-
- *** Potential optimizations ***
-
- For colormapped input formats it might be worthwhile to merge the input file
- reading and the colorspace conversion steps; in other words, do the colorspace
- conversion by hacking up the colormap before inputting the image body, rather
- than doing the conversion on each pixel independently. Not clear if this is
- worth the uglification involved. In the above design for the compressor, only
- the colorspace conversion step ever sees the output of get_input_row, so this
- sort of thing could be done via private agreement between those two modules.
-
- Level shift from 0..255 to -128..127 may be done either during colorspace
- conversion, or at the moment of converting an 8x8 sample block into the format
- used by the DCT step (which will be signed short or long int). This could be
- selectable by a compile-time flag, so that the intermediate steps can work on
- either signed or unsigned chars as samples, whichever is most easily handled
- by the platform. However, making sure that rounding is done right will be a
- lot easier if we can assume positive values. At the moment I think that
- benefit is worth the overhead of "& 0xFF" when reading out sample values on
- signed-char-only machines.
-