UNIX Power Tools

UNIX Power ToolsSearch this book
Previous: 24.7 Compressing Files to Save Space Chapter 24
Other Ways to Get Disk Space
Next: 24.9 How Much Disk Space?
 

24.8 Save Space: tar and compress a Directory Tree

In the UNIX filesystem, files are stored in blocks (52.9). Each nonempty file, no matter how small, takes at least one block. [2] A directory tree full of little files can fill up a lot of partly empty blocks. A big file is more efficient because it fills all (except possibly the last) of its blocks completely.

[2] Completely empty files (zero characters) don't take a block.

The tar (19.5) command can read lots of little files and put them into one big file. Later, when you need one of the little files, you can extract it from the tar archive. Seems like a good space-saving idea, doesn't it? But tar, which was really designed for magnetic tape archives, adds "garbage" characters at the end of each file to make it an even size. So, a big tar archive uses about as many blocks as the separate little files do.

Okay, then why am I writing this article? Because the gzip (24.7) utility can solve the problems. It squeezes files down - especially, compressing gets rid of repeated characters. Compressing a tar archive typically saves 50 percent or more.

Making a compressed archive of a directory and all of its subdirectories is easy: tar copies the whole tree when you give it the top directory name. Just be sure to save the archive in some directory that won't be copied - so tar won't try to archive its own archive! I usually put the archive in the parent directory. For example, to archive my directory named project, I'd use the commands below. If you work on a system that has 14-character filename length limits, be sure that the archive filename (here, project.tar.gz) won't be too long. The .tar.gz extension isn't required, just a convention. Watch carefully for errors:

.. 

-r 
% cd project
% tar clf - . | gzip > ../project.tar.gz
% cd ..
% rm -r project

The tar l (lowercase letter L) option will print messages if any of the files you're archiving have other hard links (18.4). If a lot of your files have other links, archiving the directory may not save much disk space - the other links will keep those files on the disk, even after your rm -r command.

Any time you want a list of the files in the archive, use tar t or tar tv:

more 





% gzcat project.tar.gz | tar tvf - | more
rw-r--r--239/100    485 Oct  5 19:03 1991 ./Imakefile
rw-rw-r--239/100   4703 Oct  5 21:17 1991 ./scalefonts.c
rw-rw-r--239/100   3358 Oct  5 21:55 1991 ./xcms.c
rw-rw-r--239/100  12385 Oct  5 22:07 1991 ./io/input.c
rw-rw-r--239/100   7048 Oct  5 21:59 1991 ./io/output.c
   ...

To extract all the files from the archive, type:

% mkdir project
% cd project
% gzcat ../project.tar.gz | tar xf -

Of course, you don't have to extract the files into a directory named project. You can read the archive file from other directories, move it to other computers, and so on.

You can also extract just a few files and/or directories from the archive. Be sure to use exactly the name shown by the tar t command above. For instance, to restore the old subdirectory named project/io (and everything that was in it), you'd type:

% mkdir project
% cd project
% gzcat ../project.tar.gz | tar xf - ./io

- JP


Previous: 24.7 Compressing Files to Save Space UNIX Power ToolsNext: 24.9 How Much Disk Space?
24.7 Compressing Files to Save Space Book Index24.9 How Much Disk Space?

The UNIX CD Bookshelf NavigationThe UNIX CD BookshelfUNIX Power ToolsUNIX in a NutshellLearning the vi Editorsed & awkLearning the Korn ShellLearning the UNIX Operating System