[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This chapter describes the user interface of the path specifications that the Kpathsearch library implements.
Conceptually, there are three stages: look in an externally-built database, generating a list of directories in which to search, and finally looking up files using that list. The sections below describe each of these in turn.
In the implementation, however, these stages are interleaved—directory lists are only generated as needed for a particular file lookup, and only if a file cannot be found in the pre-existing database, and then they are cached for future lookups. (Analogous to lazy evaluation in programming languages.) This implies that directories that are created during the run are not seen.
1.1 Filename database | Using an externally-built file to search. | |
1.2 Directory list generation | Specifying where to search. | |
1.3 File lookups | Finding files in directory lists. |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Kpathsearch goes to some lengths to minimize disk accesses for searches (see section Subdirectory problems). Nevertheless, at installations with hundreds of directories, doing a linear search of each directory for a given file can take some time, depending on the speed of the disk, whether it’s NFS-mounted, and so on.
Therefore, Kpathsearch can use an externally-built “database” that maps files to directories, thus avoiding the need to search the disk. By convention, the file is named ‘ls-R’, and is located at the root of the TeX installation hierarchy. Presently, one and only one ‘ls-R’ is read; its location is determined at compile-time.
You can build the file with the command ls -R root-dir
>ls-R
, if your ls
produces the right output format (see the
section below). GNU ls
, for example, does output in this format.
It is probably most useful to do this via cron
, so changes in the
installed files will be automatically reflected (albeit with some delay)
in the database.
Because the database may be out-of-date for a particular run (if a font
was just built with MakeTeXPK
, for example), if a file is not
found in the database, by default Kpathsearch goes ahead and searches
the disk. If a given path element begins with ‘%%’, however,
only the database will be searched; the disk is never searched. (If
the database does not exist, nothing will be searched.) Because this can
lead to great surprise on the part of users (“I see the font
‘foo.tfm’ when I do an ls
; why can’t Dvips find it?”), I
recommend using this only as a last resort.
1.1.1 Database format | Precise details of the database. |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
The “database” read by Kpathsearch is a line-oriented file of plain
text. The format is that generated by GNU (and perhaps other) ls
programs given the ‘-R’ option, as follows.
Blank lines are ignored.
If a line begins with ‘/’ and ends with a colon, it’s the name of a directory.
All other lines name entries in the most recently seen directory ‘/’’s in such lines will yield possibly-strange results.
Files listed with no preceding directory are ignored.
For example, here’s the first few lines of ‘ls-R’ on my system:
bibtex dvips fonts ini ls-R mf tex /usr/local/lib/texmf/bibtex: bib bst doc /usr/local/lib/texmf/bibtex/bib: asi.bib bibshare btxdoc.bib
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Kpathsearch constructs a directory list from an environment variable var set by the user, (possibly) a setting from a configuration file, and a default path set at compile time. Each of these are colon-separated lists of directories. If var is set, its value is used; otherwise, if a config file defines a value, that value is used; otherwise, the compilation default is used. In any case, once the path specification to use is determined, its evaluation is independent of its source.
The “colon” and “slash” mentioned below aren’t necessarily ‘:’ and ‘/’ on non-Unix systems; the library tries to adapt these characters to other operating systems’ conventions.
The following subdirectories explain the various kinds of expansion the path is subjected to. After expansion, nonexistent directories in the path is ignored.
1.2.1 Default expansion | Extra colons expand to the compilation default. | |
1.2.2 Tilde expansion | ~ and ~user expand to home directories. | |
1.2.3 Variable expansion | $foo and ${foo} expand to environment values. | |
1.2.4 Subdirectory expansion | a// and a//b recursively expand to subdirs. | |
1.2.5 Path specification example | An example. |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If an environment variable or config file value has a leading or trailing or doubled colon, the default path is inserted at that point.
Putting an extra colon into the default value has unpredictable results, and may cause the program to crash, so installers beware.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A leading ‘~’ or ‘~user’ in a path component is replaced by the current or user’s home directory, respectively.
If user is invalid, or the home directory cannot be determined, Kpathsea uses ‘.’ instead.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
A construct ‘$foo’ or ‘${foo}’ is replaced by the expansion of the environment variable ‘foo’. In the first case, the variable name consists of consecutive alphanumeric-or-underscore characters. In the second, the variable name consists of everything between the braces.
Remember to quote the ‘$’’s and braces as necessary for your shell.
Shell variable values cannot be seen by Kpathsea.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
If a component directory d contains ‘//’, all subdirectories of d are included in the path: first those subdirectories directly under d, then the subsubdirectories under those, and so on. At each level, the order in which the directories are searched is unspecified.
If you specify any filename components after the ‘//’, only subdirectories which have those components are included. For example, ‘/a//b’ would expand into directories ‘/a/1/b’, ‘/a/2/b’, ‘/a/1/1/b’, and so on, but not ‘/a/b/c’ or ‘/a/1’.
1.2.4.1 Subdirectory problems | If you have trouble with subdirectories. |
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Perhaps the first problem is best put as a question-and-answer.
Question: I know all about slow starting TeX ‘:-)’. How do I organize the directory scheme to avoid the slowness, while at the same time enjoying a structured inputs directory?
(Naturally, this applies to any Kpathsea-using program, not just TeX.)
I will give the false Short Answer first, then the Real Explanation.
The Short Answer: in your equivalent of ‘/usr/local/lib/tex/macros//’ and ‘.../fonts//’, make each subdirectory contain either 1) only directories; or 2) only files.
As long as you do not have (literally) hundreds of subdirectories, this should cure the problem. It has in every case I have been told about.
The Real Explanation: the thing that makes TeX slow is calling
stat
(if you don’t know what stat
(2) is, ignore this explanation)
on “too many” pathnames, where “too many” is some nebulous number
depending on things like whether the filesystem is NFS-mounted or not,
whether it’s on a fast disk, whether your Fast File System
implementation is really Fast, etc., etc.
(Side note: If you’re curious, you can find this number by writing a program that does nothing but read filenames (presumably from a file) and stat them, and see how many pathnames make the execution time noticeable. On the systems I use (Suns with an NFS-mounted directory, ISC 2.2.1 and a local directory), it’s several hundred, at least. On an NFS-mounted directory under Solaris 2.1, 150 is quite slow, according to ‘hammer@kis.uni-freiburg.de’.)
Whether or not it’s directories or files that are being stat-ed is irrelevant (this is why the Short Answer is false). It’s sheer numbers that count.
If you think your directory structure is ok, and you’re still
experiencing slowness, I advise running TeX (or whatever program)
under a debugger, setting the bit DEBUG_STAT
in the variable
kpathsea_debug
(see ‘debug.h’) to one and seeing exactly
what is getting stat-ed. If only few things are getting stat-ed, and
TeX is still slow, tell me.
I should also mention “the trick”, which I stole from GNU find. (Matthew Farwell ‘<dylan@ibmpcug.co.uk>’ suggested it, and David MacKenzie ‘<djm@gnu.ai.mit.edu>’ implemented it, as far as I know.)
The trick is that in every real Unix implementation (that I know about)
(as opposed to the POSIX specification), a directory which contains no
subdirectories will have exactly two links (specifically, one each for
‘.’ and ‘..’). That is to say, the st_nlink
field in
the stat structure will be two. Thus, the path searching code doesn’t
have to stat every entry in the bottom-level directories—it can check
st_nlink
, and if it’s two, it knows there are no subdirectories.
But if you have a directory that contains *one* subdirectory and five
hundred files, st_nlink
will be 3, and Kpathsea has to
stat every one of those 501 entries. Therein lies slowness.
You can disable the trick by undefining UNIX_ST_LINK
in
‘kpathsea/config.h’.
The subdirectory searching has one other known (and irreconciliable) deficiency. If a directory d being searched for subdirectories contains plain files and symbolic links to other directories, but no true subdirectories, d will be considered a leaf directory, i.e., the symbolic links will not be followed.
The directory immediately followed by the ‘//’, however, is always searched for subdirectories, even if it is a “leaf”. We do this since presumably you would not have asked for the directory to be searched for subdirectories if you didn’t want it to be.
This is a consequence of the trick explained above. You can work around this problem by creating an empty dummy subdirectory in d; then d will no longer be a leaf, and the symlinks will be followed.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
For example, the following value for an environment variable says to search the following: the current user’s ‘fonts’ directory and alll its subdirectories, then the directory ‘fonts’ in user ‘karl’s home directory, and finally the system default directories specified at compilation time.
~/fonts//:~karl/fonts:
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
Given the directory list generated from the rules in the previous section, looking up a file presents no problem at all: we just look in each directory in the list in turn, and return the first one found.
The only complication is if the filename is absolute or explicitly relative, i.e., (under Unix-like operating systems) starts with ‘/’ or ‘./’ or ‘../’. Then the library does not use the directory list at all. Instead, the file is simply searched for in the given directory.
In an attempt to speed lookups, the directory in which a file is found is floated to the top of the directory list. This helps in the common case of several files in the same directory being searched for.
[Top] | [Contents] | [Index] | [ ? ] |
This document was generated on January 15, 2023 using texi2html 5.0.
The buttons in the navigation panels have the following meaning:
Button | Name | Go to | From 1.2.3 go to |
---|---|---|---|
[ << ] | FastBack | Beginning of this chapter or previous chapter | 1 |
[ < ] | Back | Previous section in reading order | 1.2.2 |
[ Up ] | Up | Up section | 1.2 |
[ > ] | Forward | Next section in reading order | 1.2.4 |
[ >> ] | FastForward | Next chapter | 2 |
[Top] | Top | Cover (top) of document | |
[Contents] | Contents | Table of contents | |
[Index] | Index | Index | |
[ ? ] | About | About (help) |
where the Example assumes that the current position is at Subsubsection One-Two-Three of a document of the following structure:
This document was generated on January 15, 2023 using texi2html 5.0.