home *** CD-ROM | disk | FTP | other *** search
- Xref: bloom-picayune.mit.edu comp.lang.perl:14079 news.answers:4268
- Path: bloom-picayune.mit.edu!enterpoop.mit.edu!usc!cs.utexas.edu!uunet!olivea!pagesat!spssig.spss.com!news.oc.com!convex!tchrist
- From: tchrist@convex.COM (Tom Christiansen)
- Newsgroups: comp.lang.perl,news.answers
- Subject: comp.lang.perl FAQ (part 2 of 2)
- Message-ID: <1992Nov30.130440.11167@news.eng.convex.com>
- Date: 30 Nov 92 13:04:40 GMT
- Expires: Mon, 4 Jan 1993 12:00:00 GMT
- References: <1992Nov30.124619.8579@news.eng.convex.com>
- Sender: usenet@news.eng.convex.com (news access account)
- Reply-To: tchrist@convex.COM (Tom Christiansen)
- Followup-To: comp.lang.perl
- Organization: Convex Computer Corporation, Colorado Springs, CO
- Lines: 1348
- Approved: news-answers-request@MIT.Edu
- Originator: tchrist@pixel.convex.com
- Nntp-Posting-Host: pixel.convex.com
- X-Disclaimer: This message was written by a user at CONVEX Computer
- Corp. The opinions expressed are those of the user and
- not necessarily those of CONVEX.
-
- Archive-name: perl-faq/part2
- Version: $Id: perl-tech,v 1.2 92/11/30 05:22:44 tchrist Exp Locker: tchrist $
-
- This posting contains answers to the following techical questions
- regarding Perl:
-
- 2.1) What are all these $@*%<> signs and how do I know when to use them?
- 2.2) Why don't backticks work as they do in shells?
- 2.3) How come Perl operators have different precedence than C operators?
- 2.4) How come my converted awk/sed/sh script runs more slowly in Perl?
- 2.5) How can I call my system's unique C functions from Perl?
- 2.6) Where do I get the include files to do ioctl() or syscall()?
- 2.7) Why doesn't "local($foo) = <FILE>;" work right?
- 2.8) How can I detect keyboard input without reading it?
- 2.9) How can I make an array of arrays or other recursive data types?
- 2.10) How can I quote a variable to use in a regexp?
- 2.11) Why do setuid Perl scripts complain about kernel problems?
- 2.12) How do I open a pipe both to and from a command?
- 2.13) How can I change the first N letters of a string?
- 2.14) How can I manipulate fixed-record-length files?
- 2.15) How can I make a file handle local to a subroutine?
- 2.16) How can I extract just the unique elements of an array?
- 2.17) How can I call alarm() or usleep() from Perl?
- 2.18) How can I test whether an array contains a certain element?
- 2.19) How can I do an atexit() or setjmp()/longjmp() in Perl?
- 2.20) Why doesn't Perl interpret my octal data octally?
- 2.21) How do I sort an associative array by value instead of by key?
- 2.22) How can I capture STDERR from an external command?
- 2.23) Why doesn't open return an error when a pipe open fails?
- 2.24) How can I compare two date strings?
- 2.25) What's the fastest way to code up a given task in perl?
- 2.26) How can I know how many entries are in an associative array?
- 2.27) Why can't my perl program read from STDIN after I gave it ^D (EOF) ?
- 2.28) Do I always/never have to quote my strings or use semicolons?
- 2.29) How can I translate tildes in a filename?
- 2.30) How can I convert my shell script to Perl?
- 2.31) What is variable suicide and how can I prevent it?
- 2.32) Can I use Perl regular expressions to match balanced text?
- 2.33) Can I use Perl to run a telnet or ftp session?
- 2.34) What does "Malformed command links" mean?
- 2.35) How can I set up a footer format to be used with write()?
- 2.36) Why does my Perl program keep growing in size?
-
-
- 2.1) What are all these $@*%<> signs and how do I know when to use them?
-
- Those are type specifiers: $ for scalar values, @ for indexed arrays,
- and % for hashed arrays. The * means all types of that symbol name
- and are sometimes used like pointers; the <> are used for inputting
- a record from a filehandle. See the question on arrays of arrays
- for more about Perl pointers.
-
- Always make sure to use a $ for single values and @ for multiple ones.
- Thus element 2 of the @foo array is accessed as $foo[2], not @foo[2],
- which is a list of length one (not a scalar), and is a fairly common
- novice mistake. Sometimes you can get by with @foo[2], but it's
- not really doing what you think it's doing for the reason you think
- it's doing it, which means one of these days, you'll shoot yourself
- in the foot; ponder for a moment what these will really do:
- @foo[0] = `cmd args`;
- @foo[2] = <FILE>;
- Just always say $foo[2] and you'll be happier.
-
- This may seem confusing, but try to think of it this way: you use the
- character of the type which you *want back*. You could use @foo[1..3] for
- a slice of three elements of @foo, or even @foo{A,B,C} for a slice of
- of %foo. This is the same as using ($foo[1], $foo[2], $foo[3]) and
- ($foo{A}, $foo{B}, $foo{C}) respectively. In fact, you can even use
- lists to subscript arrays and pull out more lists, like @foo[@bar] or
- @foo{@bar}, where @bar is in both cases presumably a list of subscripts.
-
- While there are a few places where you don't actually need these type
- specifiers, except for files, you should always use them. Note that
- <FILE> is NOT the type specifier for files; it's the equivalent of awk's
- getline function, that is, it reads a line from the handle FILE. When
- doing open, close, and other operations besides the getline function on
- files, do NOT use the brackets.
-
- Beware of saying:
- $foo = BAR;
- Which wil be interpreted as
- $foo = 'BAR';
- and not as
- $foo = <BAR>;
- If you always quote your strings, you'll avoid this trap.
-
- Normally, files are manipulated something like this (with appropriate
- error checking added if it were production code):
-
- open (FILE, ">/tmp/foo.$$");
- print FILE "string\n";
- close FILE;
-
- If instead of a filehandle, you use a normal scalar variable with file
- manipulation functions, this is considered an indirect reference to a
- filehandle. For example,
-
- $foo = "TEST01";
- open($foo, "file");
-
- After the open, these two while loops are equivalent:
-
- while (<$foo>) {}
- while (<TEST01>) {}
-
- as are these two statements:
-
- close $foo;
- close TEST01;
-
- but NOT to this:
-
- while (<$TEST01>) {} # error
- ^
- ^ note spurious dollar sign
-
- This is another common novice mistake; often it's assumed that
-
- open($foo, "output.$$");
-
- will fill in the value of $foo, which was previously undefined.
- This just isn't so -- you must set $foo to be the name of a valid
- filehandle before you attempt to open it.
-
-
- 2.2) Why don't backticks work as they do in shells?
-
- Several reason. One is because backticks do not interpolate within
- double quotes in Perl as they do in shells.
-
- Let's look at two common mistakes:
-
- $foo = "$bar is `wc $file`"; # WRONG
-
- This should have been:
-
- $foo = "$bar is " . `wc $file`;
-
- But you'll have an extra newline you might not expect. This
- does not work as expected:
-
- $back = `pwd`; chdir($somewhere); chdir($back); # WRONG
-
- Because backticks do not automatically eat trailing or embedded
- newlines. The chop() function will remove the last character from
- a string. This should have been:
-
- chop($back = `pwd`); chdir($somewhere); chdir($back);
-
- You should also be aware that while in the shells, embedding
- single quotes will protect variables, in Perl, you'll need
- to escape the dollar signs.
-
- Shell: foo=`cmd 'safe $dollar'`
- Perl: $foo=`cmd 'safe \$dollar'`;
-
-
- 2.3) How come Perl operators have different precedence than C operators?
-
- Actually, they don't; all C operators have the same precedence in Perl as
- they do in C. The problem is with a class of functions called list
- operators, e.g. print, chdir, exec, system, and so on. These are somewhat
- bizarre in that they have different precedence depending on whether you
- look on the left or right of them. Basically, they gobble up all things
- on their right. For example,
-
- unlink $foo, "bar", @names, "others";
-
- will unlink all those file names. A common mistake is to write:
-
- unlink "a_file" || die "snafu";
-
- The problem is that this gets interpreted as
-
- unlink("a_file" || die "snafu");
-
- To avoid this problem, you can always make them look like function calls
- or use an extra level of parentheses:
-
- (unlink "a_file") || die "snafu";
- unlink("a_file") || die "snafu";
-
- Sometimes you actually do care about the return value:
-
- unless ($io_ok = print("some", "list")) { }
-
- Yes, print() return I/O success. That means
-
- $io_ok = print(2+4) * 5;
-
- returns 5 times whether printing (2+4) succeeded, and
- print(2+4) * 5;
- returns the same 5*io_success value and tosses it.
-
- See the Perl man page's section on Precedence for more gory details,
- and be sure to use the -w flag to catch things like this.
-
-
- 2.4) How come my converted awk/sed/sh script runs more slowly in Perl?
-
- The natural way to program in those languages may not make for the fastest
- Perl code. Notably, the awk-to-perl translator produces sub-optimal code;
- see the a2p man page for tweaks you can make.
-
- Two of Perl's strongest points are its associative arrays and its regular
- expressions. They can dramatically speed up your code when applied
- properly. Recasting your code to use them can help a lot.
-
- How complex are your regexps? Deeply nested sub-expressions with {n,m} or
- * operators can take a very long time to compute. Don't use ()'s unless
- you really need them. Anchor your string to the front if you can.
-
- Something like this:
- next unless /^.*%.*$/;
- runs more slowly than the equivalent:
- next unless /%/;
-
- Note that this:
- next if /Mon/;
- next if /Tue/;
- next if /Wed/;
- next if /Thu/;
- next if /Fri/;
- runs faster than this:
- next if /Mon/ || /Tue/ || /Wed/ || /Thu/ || /Fri/;
- which in turn runs faster than this:
- next if /Mon|Tue|Wed|Thu|Fri/;
- which runs *much* faster than:
- next if /(Mon|Tue|Wed|Thu|Fri)/;
-
- There's no need to use /^.*foo.*$/ when /foo/ will do.
-
- Remember that a printf costs more than a simple print.
-
- Don't split() every line if you don't have to.
-
- Another thing to look at is your loops. Are you iterating through
- indexed arrays rather than just putting everything into a hashed
- array? For example,
-
- @list = ('abc', 'def', 'ghi', 'jkl', 'mno', 'pqr', 'stv');
-
- for $i ($[ .. $#list) {
- if ($pattern eq $list[$i]) { $found++; }
- }
-
- First of all, it would be faster to use Perl's foreach mechanism
- instead of using subscripts:
-
- foreach $elt (@list) {
- if ($pattern eq $elt) { $found++; }
- }
-
- Better yet, this could be sped up dramatically by placing the whole
- thing in an associative array like this:
-
- %list = ('abc', 1, 'def', 1, 'ghi', 1, 'jkl', 1,
- 'mno', 1, 'pqr', 1, 'stv', 1 );
- $found += $list{$pattern};
-
- (but put the %list assignment outside of your input loop.)
-
- You should also look at variables in regular expressions, which is
- expensive. If the variable to be interpolated doesn't change over the
- life of the process, use the /o modifier to tell Perl to compile the
- regexp only once, like this:
-
- for $i (1..100) {
- if (/$foo/o) {
- &some_func($i);
- }
- }
-
- Finally, if you have a bunch of patterns in a list that you'd like to
- compare against, instead of doing this:
-
- @pats = ('_get.*', 'bogus', '_read', '.*exit', '_write');
- foreach $pat (@pats) {
- if ( $name =~ /^$pat$/ ) {
- &some_func();
- last;
- }
- }
-
- If you build your code and then eval it, it will be much faster.
- For example:
-
- @pats = ('_get.*', 'bogus', '_read', '.*exit', '_write');
- $code = <<EOS
- while (<>) {
- study;
- EOS
- foreach $pat (@pats) {
- $code .= <<EOS
- if ( /^$pat\$/ ) {
- &some_func();
- next;
- }
- EOS
- }
- $code .= "}\n";
- print $code if $debugging;
- eval $code;
-
-
-
- 2.5) How can I call my system's unique C functions from Perl?
-
- If these are system calls and you have the syscall() function, then
- you're probably in luck -- see the next question. For arbitrary
- library functions, it's not quite so straight-forward. While you
- can't have a C main and link in Perl routines, if you're
- determined, you can extend Perl by linking in your own C routines.
- See the usub/ subdirectory in the Perl distribution kit for an example
- of doing this to build a Perl that understands curses functions. It's
- neither particularly easy nor overly-documented, but it is feasible.
-
-
- 2.6) Where do I get the include files to do ioctl() or syscall()?
-
- These are generated from your system's C include files using the h2ph
- script (once called makelib) from the Perl source directory. This will
- make files containing subroutine definitions, like &SYS_getitimer, which
- you can use as arguments to your function.
-
- You might also look at the h2pl subdirectory in the Perl source for how to
- convert these to forms like $SYS_getitimer; there are both advantages and
- disadvantages to this. Read the notes in that directory for details.
-
- In both cases, you may well have to fiddle with it to make these work; it
- depends how funny-looking your system's C include files happen to be.
-
- If you're trying to get at C structures, then you should take a look
- at using c2ph, which uses debugger "stab" entries generated by your
- BSD or GNU C compiler to produce machine-independent perl definitions
- for the data structures. This allows to you avoid hardcoding
- structure layouts, types, padding, or sizes, greatly enhancing
- portability. c2ph comes with the perl distribution. On an SCO
- system, GCC only has COFF debugging support by default, so you'll have
- to build GCC 2.1 with DBX_DEBUGGING_INFO defined, and use -gstabs to
- get c2ph to work there.
-
- See the file /pub/perl/info/ch2ph on convex.com via anon ftp
- for more traps and tips on this process.
-
-
- 2.7) Why doesn't "local($foo) = <FILE>;" work right?
-
- Well, it does. The thing to remember is that local() provides an array
- context, and that the <FILE> syntax in an array context will read all the
- lines in a file. To work around this, use:
-
- local($foo);
- $foo = <FILE>;
-
- You can use the scalar() operator to cast the expression into a scalar
- context:
-
- local($foo) = scalar(<FILE>);
-
-
- 2.8) How can I detect keyboard input without reading it?
-
- You should check out the Frequently Asked Questions list in
- comp.unix.* for things like this: the answer is essentially the same.
- It's very system dependent. Here's one solution that works on BSD
- systems:
-
- sub key_ready {
- local($rin, $nfd);
- vec($rin, fileno(STDIN), 1) = 1;
- return $nfd = select($rin,undef,undef,0);
- }
-
- A closely related question is how to input a single character from the
- keyboard. Again, this is a system dependent operation. The following
- code that may or may not help you:
-
- $BSD = -f '/vmunix';
- if ($BSD) {
- system "stty cbreak </dev/tty >/dev/tty 2>&1";
- }
- else {
- system "stty", '-icanon',
- system "stty", 'eol', "\001";
- }
-
- $key = getc(STDIN);
-
- if ($BSD) {
- system "stty -cbreak </dev/tty >/dev/tty 2>&1";
- }
- else {
- system "stty", 'icanon';
- system "stty", 'eol', '^@'; # ascii null
- }
- print "\n";
-
- You could also handle the stty operations yourself for speed if you're
- going to be doing a lot of them. This code works to toggle cbreak
- and echo modes on a BSD system:
-
- sub set_cbreak { # &set_cbreak(1) or &set_cbreak(0)
- local($on) = $_[0];
- local($sgttyb,@ary);
- require 'sys/ioctl.ph';
- $sgttyb_t = 'C4 S' unless $sgttyb_t; # c2ph: &sgttyb'typedef()
-
- ioctl(STDIN,&TIOCGETP,$sgttyb) || die "Can't ioctl TIOCGETP: $!";
-
- @ary = unpack($sgttyb_t,$sgttyb);
- if ($on) {
- $ary[4] |= &CBREAK;
- $ary[4] &= ~&ECHO;
- } else {
- $ary[4] &= ~&CBREAK;
- $ary[4] |= &ECHO;
- }
- $sgttyb = pack($sgttyb_t,@ary);
-
- ioctl(STDIN,&TIOCSETP,$sgttyb) || die "Can't ioctl TIOCSETP: $!";
- }
-
- Note that this is one of the few times you actually want to use the
- getc() function; it's in general way too expensive to call for normal
- I/O. Normally, you just use the <FILE> syntax, or perhaps the read()
- or sysread() functions.
-
- For perspectives on more portable solutions, use anon ftp to retrieve
- the file /pub/perl/info/keypress from convex.com.
-
-
- 2.9) How can I make an array of arrays or other recursive data types?
-
- Remember that Perl isn't about nested data structures (actually,
- perl0 .. perl4 weren't, but maybe perl5 will be, at least
- somewhat). It's about flat ones, so if you're trying to do this, you
- may be going about it the wrong way or using the wrong tools. You
- might try parallel arrays with common subscripts.
-
- But if you're bound and determined, you can use the multi-dimensional
- array emulation of $a{'x','y','z'}, or you can make an array of names
- of arrays and eval it.
-
- For example, if @name contains a list of names of arrays, you can
- get at a the j-th element of the i-th array like so:
-
- $ary = $name[$i];
- $val = eval "\$$ary[$j]";
-
- or in one line
-
- $val = eval "\$$name[$i][\$j]";
-
- You could also use the type-globbing syntax to make an array of *name
- values, which will be more efficient than eval. Here @name hold
- a list of pointers, which we'll have to dereference through a temporary
- variable.
-
- For example:
-
- { local(*ary) = $name[$i]; $val = $ary[$j]; }
-
- In fact, you can use this method to make arbitrarily nested data
- structures. You really have to want to do this kind of thing
- badly to go this far, however, as it is notationally cumbersome.
-
- Let's assume you just simply *have* to have an array of arrays of
- arrays. What you do is make an array of pointers to arrays of
- pointers, where pointers are *name values described above. You
- initialize the outermost array normally, and then you build up your
- pointers from there. For example:
-
- @w = ( 'ww' .. 'xx' );
- @x = ( 'xx' .. 'yy' );
- @y = ( 'yy' .. 'zz' );
- @z = ( 'zz' .. 'zzz' );
-
- @ww = reverse @w;
- @xx = reverse @x;
- @yy = reverse @y;
- @zz = reverse @z;
-
- Now make a couple of array of pointers to these:
-
- @A = ( *w, *x, *y, *z );
- @B = ( *ww, *xx, *yy, *zz );
-
- And finally make an array of pointers to these arrays:
-
- @AAA = ( *A, *B );
-
- To access an element, such as AAA[i][j][k], you must do this:
-
- local(*foo) = $AAA[$i];
- local(*bar) = $foo[$j];
- $answer = $bar[$k];
-
- Similar manipulations on associative arrays are also feasible.
-
- You could take a look at recurse.pl package posted by Felix Lee
- <flee@cs.psu.edu>, which lets you simulate vectors and tables (lists and
- associative arrays) by using type glob references and some pretty serious
- wizardry.
-
- In C, you're used to creating recursive datatypes for operations
- like recursive decent parsing or tree traversal. In Perl, these
- algorithms are best implemented using associative arrays. Take an
- array called %parent, and build up pointers such that $parent{$person}
- is the name of that person's parent. Make sure you remember that
- $parent{'adam'} is 'adam'. :-) With a little care, this approach can
- be used to implement general graph traversal algorithms as well.
-
-
- 2.10) How can I quote a variable to use in a regexp?
-
- From the manual:
-
- $pattern =~ s/(\W)/\\$1/g;
-
- Now you can freely use /$pattern/ without fear of any unexpected
- meta-characters in it throwing off the search. If you don't know
- whether a pattern is valid or not, enclose it in an eval to avoid
- a fatal run-time error.
-
-
- 2.11) Why do setuid Perl scripts complain about kernel problems?
-
- This message:
-
- YOU HAVEN'T DISABLED SET-ID SCRIPTS IN THE KERNEL YET!
- FIX YOUR KERNEL, PUT A C WRAPPER AROUND THIS SCRIPT, OR USE -u AND UNDUMP!
-
- is triggered because setuid scripts are inherently insecure due to a
- kernel bug. If your system has fixed this bug, you can compile Perl
- so that it knows this. Otherwise, create a setuid C program that just
- execs Perl with the full name of the script.
-
-
- 2.12) How do I open a pipe both to and from a command?
-
- In general, this is a dangerous move because you can find yourself in a
- deadlock situation. It's better to put one end of the pipe to a file.
- For example:
-
- # first write some_cmd's input into a_file, then
- open(CMD, "some_cmd its_args < a_file |");
- while (<CMD>) {
-
- # or else the other way; run the cmd
- open(CMD, "| some_cmd its_args > a_file");
- while ($condition) {
- print CMD "some output\n";
- # other code deleted
- }
- close CMD || warn "cmd exited $?";
-
- # now read the file
- open(FILE,"a_file");
- while (<FILE>) {
-
- If you have ptys, you could arrange to run the command on a pty and
- avoid the deadlock problem. See the chat2.pl package in the
- distributed library for ways to do this.
-
- At the risk of deadlock, it is theoretically possible to use a
- fork, two pipe calls, and an exec to manually set up the two-way
- pipe. (BSD system may use socketpair() in place of the two pipes,
- but this is not as portable.) The open2 library function distributed
- with the current perl release will do this for you.
-
- It assumes it's going to talk to something like adb, both writing to
- it and reading from it. This is presumably safe because you "know"
- that commands like adb will read a line at a time and output a line at
- a time. Programs like sort that read their entire input stream first,
- however, are quite apt to cause deadlock.
-
-
- 2.13) How can I change the first N letters of a string?
-
- Remember that the substr() function produces an lvalue, that is, it may be
- assigned to. Therefore, to change the first character to an S, you could
- do this:
-
- substr($var,0,1) = 'S';
-
- This assumes that $[ is 0; for a library routine where you can't know $[,
- you should use this instead:
-
- substr($var,$[,1) = 'S';
-
- While it would be slower, you could in this case use a substitute:
-
- $var =~ s/^./S/;
-
- But this won't work if the string is empty or its first character is a
- newline, which "." will never match. So you could use this instead:
-
- $var =~ s/^[^\0]?/S/;
-
- To do things like translation of the first part of a string, use substr,
- as in:
-
- substr($var, $[, 10) =~ tr/a-z/A-Z/;
-
- If you don't know then length of what to translate, something like
- this works:
-
- /^(\S+)/ && substr($_,$[,length($1)) =~ tr/a-z/A-Z/;
-
- For some things it's convenient to use the /e switch of the
- substitute operator:
-
- s/^(\S+)/($tmp = $1) =~ tr#a-z#A-Z#, $tmp/e
-
- although in this case, it runs more slowly than does the previous example.
-
-
- 2.14) How can I manipulate fixed-record-length files?
-
- The most efficient way is using pack and unpack. This is faster than
- using substr. Here is a sample chunk of code to break up and put back
- together again some fixed-format input lines, in this case, from ps.
-
- # sample input line:
- # 15158 p5 T 0:00 perl /mnt/tchrist/scripts/now-what
- $ps_t = 'A6 A4 A7 A5 A*';
- open(PS, "ps|");
- $_ = <PS>; print;
- while (<PS>) {
- ($pid, $tt, $stat, $time, $command) = unpack($ps_t, $_);
- for $var ('pid', 'tt', 'stat', 'time', 'command' ) {
- print "$var: <", eval "\$$var", ">\n";
- }
- print 'line=', pack($ps_t, $pid, $tt, $stat, $time, $command), "\n";
- }
-
-
- 2.15) How can I make a file handle local to a subroutine?
-
- You must use the type-globbing *VAR notation. Here is some code to
- cat an include file, calling itself recursively on nested local
- include files (i.e. those with #include "file", not #include <file>):
-
- sub cat_include {
- local($name) = @_;
- local(*FILE);
- local($_);
-
- warn "<INCLUDING $name>\n";
- if (!open (FILE, $name)) {
- warn "can't open $name: $!\n";
- return;
- }
- while (<FILE>) {
- if (/^#\s*include "([^"]*)"/) {
- &cat_include($1);
- } else {
- print;
- }
- }
- close FILE;
- }
-
-
- 2.16) How can I extract just the unique elements of an array?
-
- There are several possible ways, depending on whether the
- array is ordered and you wish to preserve the ordering.
-
- a) If @in is sorted, and you want @out to be sorted:
-
- $prev = 'nonesuch';
- @out = grep($_ ne $prev && (($prev) = $_), @in);
-
- This is nice in that it doesn't use much extra memory,
- simulating uniq's behavior of removing only adjacent
- duplicates.
-
- b) If you don't know whether @in is sorted:
-
- undef %saw;
- @out = grep(!$saw{$_}++, @in);
-
- c) Like (b), but @in contains only small integers:
-
- @out = grep(!$saw[$_]++, @in);
-
- d) A way to do (b) without any loops or greps:
-
- undef %saw;
- @saw{@in} = ();
- @out = sort keys %saw; # remove sort if undesired
-
- e) Like (d), but @in contains only small positive integers:
-
- undef @ary;
- @ary[@in] = @in;
- @out = sort @ary;
-
-
- 2.17) How can I call alarm() or usleep() from Perl?
-
- It's available as a built-in as of version 3.038. If you want finer
- granularity than 1 second (as usleep() provides) and have itimers and
- syscall() on your system, you can use the following. You could also
- use select().
-
- It takes a floating-point number representing how long to delay until
- you get the SIGALRM, and returns a floating- point number representing
- how much time was left in the old timer, if any. Note that the C
- function uses integers, but this one doesn't mind fractional numbers.
-
- # alarm; send me a SIGALRM in this many seconds (fractions ok)
- # tom christiansen <tchrist@convex.com>
- sub alarm {
- require 'syscall.ph';
- require 'sys/time.ph';
-
- local($ticks) = @_;
- local($in_timer,$out_timer);
- local($isecs, $iusecs, $secs, $usecs);
-
- local($itimer_t) = 'L4'; # should be &itimer'typedef()
-
- $secs = int($ticks);
- $usecs = ($ticks - $secs) * 1e6;
-
- $out_timer = pack($itimer_t,0,0,0,0);
- $in_timer = pack($itimer_t,0,0,$secs,$usecs);
-
- syscall(&SYS_setitimer, &ITIMER_REAL, $in_timer, $out_timer)
- && die "alarm: setitimer syscall failed: $!";
-
- ($isecs, $iusecs, $secs, $usecs) = unpack($itimer_t,$out_timer);
- return $secs + ($usecs/1e6);
- }
-
-
- 2.18) How can I test whether an array contains a certain element?
-
- There are several ways to approach this. If you are going to make
- this query many times and the values are arbitrary strings, the
- fastest way is probably to invert the original array and keep an
- associative array lying about whose keys are the first array's values.
-
- @blues = ('turquoise', 'teal', 'lapis lazuli');
- undef %is_blue;
- for (@blues) { $is_blue{$_} = 1; }
-
- Now you can check whether $is_blue{$some_color}. It might have been
- a good idea to keep the blues all in an assoc array in the first place.
-
- If the values are all small integers, you could use a simple
- indexed array. This kind of an array will take up less space:
-
- @primes = (2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31);
- undef @is_tiny_prime;
- for (@primes) { $is_tiny_prime[$_] = 1; }
-
- Now you check whether $is_tiny_prime[$some_number].
-
- If the values in question are integers, but instead of strings,
- you can save quite a lot of space by using bit strings instead:
-
- @articles = ( 1..10, 150..2000, 2017 );
- undef $read;
- grep (vec($read,$_,1) = 1, @articles);
-
- Now check whether vec($read,$n,1) is true for some $n.
-
-
- 2.19) How can I do an atexit() or setjmp()/longjmp() in Perl?
-
- Perl's exception-handling mechanism is its eval operator. You
- can use eval as setjmp and die as longjmp. Here's an example
- of Larry's for timed-out input, which in C is often implemented
- using setjmp and longjmp:
-
- $SIG{ALRM} = TIMEOUT;
- sub TIMEOUT { die "restart input\n" }
-
- do { eval { &realcode } } while $@ =~ /^restart input/;
-
- sub realcode {
- alarm 15;
- $ans = <STDIN>;
- alarm 0;
- }
-
- Here's an example of Tom's for doing atexit() handling:
-
- sub atexit { push(@_exit_subs, @_) }
-
- sub _cleanup { unlink $tmp }
-
- &atexit('_cleanup');
-
- eval <<'End_Of_Eval'; $here = __LINE__;
- # as much code here as you want
- End_Of_Eval
-
- $oops = $@; # save error message
-
- # now call his stuff
- for (@_exit_subs) { &$_() }
-
- $oops && ($oops =~ s/\(eval\) line (\d+)/$0 .
- " line " . ($1+$here)/e, die $oops);
-
- You can register your own routines via the &atexit function now. You
- might also want to use the &realcode method of Larry's rather than
- embedding all your code in the here-is document. Make sure to leave
- via die rather than exit, or write your own &exit routine and call
- that instead. In general, it's better for nested routines to exit
- via die rather than exit for just this reason.
-
- Eval is also quite useful for testing for system dependent features,
- like symlinks, or using a user-input regexp that might otherwise
- blowup on you.
-
-
- 2.20) Why doesn't Perl interpret my octal data octally?
-
- Perl only understands octal and hex numbers as such when they occur
- as constants in your program. If they are read in from somewhere
- and assigned, then no automatic conversion takes place. You must
- explicitly use oct() or hex() if you want this kind of thing to happen.
- Actually, oct() knows to interpret both hex and octal numbers, while
- hex only converts hexadecimal ones. For example:
-
- {
- print "What mode would you like? ";
- $mode = <STDIN>;
- $mode = oct($mode);
- unless ($mode) {
- print "You can't really want mode 0!\n";
- redo;
- }
- chmod $mode, $file;
- }
-
- Without the octal conversion, a requested mode of 755 would turn
- into 01363, yielding bizarre file permissions of --wxrw--wt.
-
- If you want something that handles decimal, octal and hex input,
- you could follow the suggestion in the man page and use:
-
- $val = oct($val) if $val =~ /^0/;
-
- 2.21) How do I sort an associative array by value instead of by key?
-
- You have to declare a sort subroutine to do this. Let's assume
- you want an ASCII sort on the values of the associative array %ary.
- You could do so this way:
-
- foreach $key (sort by_value keys %ary) {
- print $key, '=', $ary{$key}, "\n";
- }
- sub by_value { $ary{$a} cmp $ary{$b}; }
-
- If you wanted a descending numeric sort, you could do this:
-
- sub by_value { $ary{$b} <=> $ary{$a}; }
-
- You can also inline your sort function, like this:
-
- foreach $key ( sort { $ary{$b} <=> $ary{$a} } keys %ary ) {
- print $key, '=', $ary{$key}, "\n";
- }
-
- If you wanted a function that didn't have the array name hard-wired
- into it, you could so this:
-
- foreach $key (&sort_by_value(*ary)) {
- print $key, '=', $ary{$key}, "\n";
- }
- sub sort_by_value {
- local(*x) = @_;
- sub _by_value { $x{$a} cmp $x{$b}; }
- sort _by_value keys %x;
- }
-
- If you want neither an alphabetic nor a numeric sort, then you'll
- have to code in your own logic instead of relying on the built-in
- signed comparison operators "cmp" and "<=>".
-
- Note that if you're sorting on just a part of the value, such as a
- piece you might extract via split, unpack, pattern-matching, or
- substr, then rather than performing that operation inside your sort
- routine on each call to it, it is significantly more efficient to
- build a parallel array of just those portions you're sorting on, sort
- the indices of this parallel array, and then to subscript your original
- array using the newly sorted indices. This method works on both
- regular and associative arrays, since both @ary[@idx] and @ary{@idx}
- make sense. See page 245 in the Camel Book on "Sorting an Array by a
- Computable Field" for a simple example of this.
-
-
- 2.22) How can I capture STDERR from an external command?
-
- There are three basic ways of running external commands:
-
- system $cmd;
- $output = `$cmd`;
- open (PIPE, "cmd |");
-
- In the first case, both STDOUT and STDERR will go the same place as
- the script's versions of these, unless redirected. You can always put
- them where you want them and then read them back when the system
- returns. In the second and third cases, you are reading the STDOUT
- *only* of your command. If you would like to have merged STDOUT and
- STDERR, you can use shell file-descriptor redirection to dup STDERR to
- STDOUT:
-
- $output = `$cmd 2>&1`;
- open (PIPE, "cmd 2>&1 |");
-
- Another possibility is to run STDERR into a file and read the file
- later, as in
-
- $output = `$cmd 2>some_file`;
- open (PIPE, "cmd 2>some_file |");
-
- Here's a way to read from both of them and know which descriptor
- you got each line from. The trick is to pipe only STDERR through
- sed, which then marks each of its lines, and then sends that
- back into a merged STDOUT/STDERR stream, from which your Perl program
- then reads a line at a time:
-
- open (CMD,
- "3>&1 (cmd args 2>&1 1>&3 3>&- | sed 's/^/STDERR:/' 3>&-) 3>&- |");
-
- while (<CMD>) {
- if (s/^STDERR://) {
- print "line from stderr: ", $_;
- } else {
- print "line from stdout: ", $_;
- }
- }
-
- Be apprised that you *must* use Bourne shell redirection syntax
- here, not csh! In fact, you can't even do these things with csh.
- For details on how lucky you are that perl's system() and backtick
- and pipe opens all use Bourne shell, fetch the file from convex.com
- called /pub/csh.whynot -- and you'll be glad that perl's shell
- interface is the Bourne shell.
-
-
- 2.23) Why doesn't open return an error when a pipe open fails?
-
- These statements:
-
- open(TOPIPE, "|bogus_command") || die ...
- open(FROMPIPE, "bogus_command|") || die ...
-
- will not fail just for lack of the bogus_command. They'll only
- fail if the fork to run them fails, which is seldom the problem.
-
- If you're writing to the TOPIPE, you'll get a SIGPIPE if the child
- exits prematurely or doesn't run. If you are reading from the
- FROMPIPE, you need to check the close() to see what happened.
-
- If you want an answer sooner than pipe buffering might otherwise
- afford you, you can do something like this:
-
- $kid = open (PIPE, "bogus_command |"); # XXX: check defined($kid)
- (kill 0, $kid) || die "bogus_command failed";
-
- This works fine if bogus_command doesn't have shell metas in it, but
- if it does, the shell may well not have exited before the kill 0. You
- could always introduce a delay:
-
- $kid = open (PIPE, "bogus_command </dev/null |");
- sleep 1;
- (kill 0, $kid) || die "bogus_command failed";
-
- but this is sometimes undesirable, and in any event does not guarantee
- correct behavior. But it seems slightly better than nothing.
-
- Similar tricks can be played with writable pipes if you don't wish to
- catch the SIGPIPE.
-
-
- 2.24) How can I compare two date strings?
-
- If the dates are in an easily parsed, predetermined format, then you
- can break them up into their component parts and call &timelocal from
- the distributed perl library. If the date strings are in arbitrary
- formats, however, it's probably easier to use the getdate program
- from the Cnews distribution, since it accepts a wide variety of dates.
- Note that in either case the return values you will really be
- comparing will be the total time in seconds as return by time().
-
- Here's a getdate function for perl that's not very efficient; you
- can do better this by sending it many dates at once or modifying
- getdate to behave better on a pipe. Beware the hardcoded pathname.
-
- sub getdate {
- local($_) = shift;
-
- s/-(\d{4})$/+$1/ || s/\+(\d{4})$/-$1/;
- # getdate has broken timezone sign reversal!
-
- $_ = `/usr/local/lib/news/newsbin/getdate '$_'`;
- chop;
- $_;
- }
-
- Richard Ohnemus <rick@IMD.Sterling.COM> actually has a getdate.y
- for use with the Perl yacc. You can get this from ftp.sterling.com
- [192.124.9.1] in /local/perl-byacc1.8.1.tar.Z, or send the author
- mail for details.
-
-
- 2.25) What's the fastest way to code up a given task in perl?
-
- Because Perl so lends itself to a variety of different approaches
- for any given task, a common question is which is the fastest way
- to code a given task. Since some approaches can be dramatically
- more efficient that others, it's sometimes worth knowing which is
- best. Unfortunately, the implementation that first comes to mind,
- perhaps as a direct translation from C or the shell, often yields
- suboptimal performance. Not all approaches have the same results
- across different hardware and software platforms. Furthermore,
- legibility must sometimes be sacrificed for speed.
-
- While an experienced perl programmer can sometimes eye-ball the code
- and make an educated guess regarding which way would be fastest,
- surprises can still occur. So, in the spirit of perl programming
- being an empirical science, the best way to find out which of several
- different methods runs the fastest is simply to code them all up and
- time them. For example:
-
- $COUNT = 10_000; $| = 1;
-
- print "method 1: ";
-
- ($u, $s) = times;
- for ($i = 0; $i < $COUNT; $i++) {
- # code for method 1
- }
- ($nu, $ns) = times;
- printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);
-
- print "method 2: ";
-
- ($u, $s) = times;
- for ($i = 0; $i < $COUNT; $i++) {
- # code for method 2
- }
- ($nu, $ns) = times;
- printf "%8.4fu %8.4fs\n", ($nu - $u), ($ns - $s);
-
- For more specific tips, see the section on Efficiency in the
- ``Other Oddments'' chapter at the end of the Camel Book.
-
-
- 2.26) How can I know how many entries are in an associative array?
-
- While the number of elements in a @foobar array is simply @foobar when
- used in a scalar, you can't figure out how many elements are in an
- associative array in an analogous fashion. That's because %foobar in
- a scalar context returns the ratio (as a string) of number of buckets
- filled versus the number allocated. For example, scalar(%ENV) might
- return "20/32". While perl could in theory keep a count, this would
- break down on associative arrays that have been bound to dbm files.
-
- However, while you can't get a count this way, one thing you *can* use
- it for is to determine whether there are any elements whatsoever in
- the array, since "if (%table)" is guaranteed to be false if nothing
- has ever been stored in it.
-
- So you either have to keep your own count around and increments
- it every time you store a new key in the array, or else do it
- on the fly when you really care, perhaps like this:
-
- $count++ while each %ENV;
-
- This preceding method will be faster than extracting the
- keys into a temporary array to count them.
-
- As of a very recent patch, you can say
-
- $count = keys %ENV;
-
-
-
- 2.27) Why can't my perl program read from STDIN after I gave it ^D (EOF) ?
-
- Because some stdio's set error and eof flags that need clearing.
-
- Try keeping around the seekpointer and go there, like this:
- $where = tell(LOG);
- seek(LOG, $where, 0);
-
- If that doesn't work, try seeking to a different part of the file and
- then back. If that doesn't work, try seeking to a different part of
- the file, reading something, and then seeking back. If that doesn't
- work, give up on your stdio package and use sysread. You can't call
- stdio's clearerr() from Perl, so if you get EINTR from a signal
- handler, you're out of luck. Best to just use sysread() from the
- start for the tty.
-
-
- 2.28) Do I always/never have to quote my strings or use semicolons?
-
- You don't have to quote strings that can't mean anything else
- in the language, like identifiers with any upper-case letters
- in them. Therefore, it's fine to do this:
-
- $SIG{INT} = Timeout_Routine;
- or
-
- @Days = (Sun, Mon, Tue, Wed, Thu, Fri, Sat, Sun);
-
- but you can't get away with this:
-
- $foo{while} = until;
-
- in place of
-
- $foo{'while'} = 'until';
-
- The requirements on semicolons have been increasingly relaxed. You no
- longer need one at the end of a block, but stylistically, you're
- better to use them if you don't put the curly brace on the same line:
-
- for (1..10) { print }
-
- is ok, as is
-
- @nlist = sort { $a <=> $b } @olist;
-
- but you probably shouldn't do this:
-
- for ($i = 0; $i < @a; $i++) {
- print "i is $i\n" # <-- oops!
- }
-
- because you might want to add lines later, and anyway,
- it looks funny. :-)
-
-
- 2.29) How can I translate tildes in a filename?
-
- Perl doesn't expand tildes -- the shell (ok, some shells) do.
- The classic request is to be able to do something like:
-
- open(FILE, "~/dir1/file1");
- open(FILE, "~tchrist/dir1/file1");
-
- which doesn't work. (And you don't know it, because you
- did a system call without an "|| die" clause! :-)
-
- If you *know* you're on a system with the csh, and you *know*
- that Larry hasn't internalized file globbing, then you could
- get away with
-
- $filename = <~tchrist/dir1/file1>;
-
- but that's pretty iffy.
-
- A better way is to do the translation yourself, as in:
-
- $filename =~ s#^~(\w+)(/.*)?$#(getpwnam($1))[7].$2#e;
-
- More robust and efficient versions that checked for error conditions,
- handed simple ~/blah notation, and cached lookups are all reasonable
- enhancements.
-
-
- 2.30) How can I convert my shell script to Perl?
-
- Larry's standard answer for this is to send your script to me (Tom
- Christiansen) with appropriate supplications and offerings. :-(
- That's because there's no automatic machine translator. Even if you
- were, you wouldn't gain a lot, as most of the external programs would
- still get called. It's the same problem as blind translation into C:
- you're still apt to be bogged down by exec()s. You have to analyze
- the dataflow and algorithm and rethink it for optimal speedup. It's
- not uncommon to see one, two, or even three orders of magnitude of
- speed difference between the brute-force and the recoded approaches.
-
-
- 2.31) What is variable suicide and how can I prevent it?
-
- Variable suicide is a nasty sideeffect of dynamic scoping and
- the way variables are passed by reference. If you say
-
- $x = 17;
- &munge($x);
- sub munge {
- local($x);
- local($myvar) = $_[0];
- ...
- }
-
- Then you have just clubbered $_[0]! Why this is occurring
- is pretty heavy wizardry: the reference to $x stored in
- $_[0] was temporarily occluded by the previous local($x)
- statement (which, you're recall, occurs at run-time, not
- compile-time). The work around is simple, however: declare
- your formal parameters first:
-
- sub munge {
- local($myvar) = $_[0];
- local($x);
- ...
- }
-
- That doesn't help you if you're going to be trying to access
- @_ directly after the local()s. In this case, careful use
- of the package facility is your only recourse.
-
- Another manifestation of this problem occurs due to the
- magical nature of the index variable in a foreach() loop.
-
- @num = 0 .. 4;
- print "num begin @num\n";
- foreach $m (@num) { &ug }
- print "num finish @num\n";
- sub ug {
- local($m) = 42;
- print "m=$m $num[0],$num[1],$num[2],$num[3]\n";
- }
-
- Which prints out the mysterious:
-
- num begin 0 1 2 3 4
- m=42 42,1,2,3
- m=42 0,42,2,3
- m=42 0,1,42,3
- m=42 0,1,2,42
- m=42 0,1,2,3
- num finish 0 1 2 3 4
-
- What's happening here is that $m is an alias for each
- element of @num. Inside &ug, you temporarily change
- $m. Well, that means that you've also temporarily
- changed whatever $m is an alias to!! The only workaround
- is to be careful with global variables, using packages,
- and/or just be aware of this potential in foreach() loops.
-
-
- 2.32) Can I use Perl regular expressions to match balanced text?
-
- No, or at least, not by the themselves.
-
- Regexps just aren't powerful enough. Although Perl's patterns aren't
- strictly regular because they do backtracking (the \1 notation), you
- still can't do it. You need to employ auxiliary logic. A simple
- approach would involve keeping a bit of state around, something
- vaguely like this (although we don't handle patterns on the same line):
-
- while(<>) {
- if (/pat1/) {
- if ($inpat++ > 0) { warn "already saw pat1" }
- redo;
- }
- if (/pat2/) {
- if (--$inpat < 0) { warn "never saw pat1" }
- redo;
- }
- }
-
- A rather more elaborate subroutine to pull out balanced and possibly
- nested single chars, like ` and ', { and }, or ( and ) can be found
- on convex.com in /pub/perl/scripts/pull_quotes.
-
-
- 2.33) Can I use Perl to run a telnet or ftp session?
-
- Sure, you can connect directly to them using sockets, or you can run a
- session on a pty. In either case, Randal's chat2 package, which is
- distributed with the perl source, will come in handly. It address
- much the same problem space as Don Libes's expect package does. Two
- examples of using managing an ftp session using chat2 can be found on
- convex.com in /pub/perl/scripts/ftp-chat2.shar .
-
- Caveat lector: chat2 is documented only by example, may not run on
- System V systems, and is subtly machine dependent both in its ideas
- of networking and in pseudottys.
-
-
- 2.34) What does "Malformed command links" mean?
-
- This is a bug in 4.035. While in general it's merely a cosmetic
- problem, it often comanifests with a highly undesirable coredumping
- problem. Programs known to be affected by the fatal coredump include
- plum and pcops. Since perl5 is pretty much a total rewrite, we can
- count on it being fixed then, but if anyone tracks down the coredump
- problem before then, a significant portion of the Perl world would
- rejoice.
-
-
- 2.35) How can I set up a footer format to be used with write()?
-
- While the $^ variable contains the name of the current header format,
- there is no corresponding mechanism to automatically do the same thing
- for a footer. Not knowing how big a format is going to be until you
- evaluate it is one of the major problems.
-
- If you have a fixed-size footer, you can get footers by checking for
- line left on page ($-) before each write, and printing the footer
- yourself if necessary.
-
- Another strategy is to open a pipe to yourself, using open(KID, "|-")
- and always write()ing to the KID, who then postprocesses its STDIN to
- rearrange headers and footers however you like. Not very convenient,
- but doable.
-
-
- 2.36) Why does my Perl program keep growing in size?
-
- While there may be a real memory leak in the Perl source code or even
- whichever malloc() you're using, common causes are incomplete eval()s
- or local()s in loops.
-
- An eval() which terminates in error due to a failed parsing
- will leave a bit of memory unusable.
-
- A local() inside a loop:
-
- for (1..100) {
- local(@array);
- }
-
- will build up 100 versions of @array before the loop is done.
- The work-around is:
-
- local(@array);
- for (1..100) {
- undef @array;
- }
-
- Larry reports that this behavior is fixed for perl5.
-
- --
- Tom Christiansen tchrist@convex.com convex!tchrist
-
- Miksch's Law:
- If a string has one end, then it has another end.
- --
- Tom Christiansen tchrist@convex.com convex!tchrist
-
-
- It's all magic. :-) --Larry Wall in <7282@jpl-devvax.JPL.NASA.GOV>
-