CIS 2.55 - Notes 0009

main

November 25th, 2024

CIS 2.55

Main

Files

Syllabus

Overview

Links

Homeworks

UPLOAD HOMEWORKS

Notes

0001

0002

0003

0004

0005

0006

0007

0008

0009

0010

0011

0012

0013

0014

0015

Bayes (src)

Tests

Sample Midterm

Sample Final

Misc

Arithmetics

Fourier Mult

Notes 0009

Directory Methods

In Perl, we can manipulate directories in pretty much the same fashion as in the UNIX shell, using just about the same commands.

For example, creating a directory, is a mkdir function.

Here are some directory manipulation functions (directly from the perlfunc man pages):

Creating Directories

mkdir FILENAME,MASK mkdir FILENAME

Creates the directory specified by FILENAME, with permissions specified by MASK (as modified by umask). If it succeeds it returns true, otherwise it returns false and sets $! (errno). If omitted, MASK defaults to 0777.

In general, it is better to create directories with permissive MASK, and let the user modify that with their umask, than it is to supply a restrictive MASK and give the user no way to be more permissive. The exceptions to this rule are when the file or directory should be kept private (mail files, for instance). The perlfunc(1) entry on umask discusses the choice of MASK in more detail.

Removing Directories

rmdir FILENAME rmdir

Deletes the directory specified by FILENAME if that directory is empty. If it succeeds it returns true, otherwise it returns false and sets $! (errno). If FILENAME is omitted, uses $_.

Be warned using unlink to remove directories. Here are the unlink specs:

unlink LIST unlink

Deletes a list of files. Returns the number of files successfully deleted.

$cnt = unlink 'a', 'b', 'c';

unlink @goners;

unlink <*.bak>;

Note: unlink will not delete directories unless you are superuser and the -U flag is supplied to Perl. Even if these conditions are met, be warned that unlinking a directory can inflict damage on your filesystem. Use rmdir instead.

If LIST is omitted, uses $_.

Changing Directories (current directory)

chdir EXPR

Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to the directory specified by $ENV{HOME}, if set; if not, changes to the directory specified by $ENV{LOGDIR}. If neither is set, chdir does nothing. It returns true upon success, false otherwise. See the example under die.

Reading Directories

Before you can read a directory, you need to open it:

opendir DIRHANDLE,EXPR

Opens a directory named EXPR for processing by readdir, telldir, seekdir, rewinddir, and closedir. Returns true if successful. DIRHANDLEs have their own namespace separate from FILEHANDLEs.

Once you open it, you can read the DIRHANDLE to get contents of the directory:

readdir DIRHANDLE

Returns the next directory entry for a directory opened by opendir. If used in list context, returns all the rest of the entries in the directory. If there are no more entries, returns an undefined value in scalar context or a null list in list context.

If you're planning to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir there, it would have been testing the wrong file.

opendir(DIR, $some_dir) || die "can't opendir $some_dir: $!"; @dots = grep { /^\./ && -f "$some_dir/$_" } readdir(DIR); closedir DIR;

Once done reading, obviously there is a closedir method.

closedir DIRHANDLE

Closes a directory opened by opendir and returns the success of that system call.

DIRHANDLE may be an expression whose value can be used as an indirect dirhandle, usually the real dirhandle name.

Alternative Method

There is also the filename globbing operator. The general format of it is:

@files = <*.xml>;

This will store the names of all xml files from the current directory in @files array.

You should however avoid it, since that makes it very confusing to use with reading file handles. For example: <$a> grabs a line from a file handle referenced by $a, while <a> looks for the file named "a" in the current directory.

A better way to express the glob is with a glob:

@files = glob("*.xml");

You can also use that in loops:

while(glob "*.xml"){
    print "xml file named: $_\n";
}

One thing about globs though, is that if the filename contains a "\n" (newline) character, the glob will return that as two different names (it splits things on new lines). While this is extremely rare, it does happen - and you're a lot better of using opendir and readdir combination as opposed to the glob (then again, if you need a quick fix without too much typing, then it's ok to use it - There's More Than One Way To Do It).

Recursion - Traversing Directories

You can traverse the directory structure fairly easily using a recursive subroutine:

sub trav {
   my $dir = shift;
   opendir DIR,$dir or die $!;
   my @entries = map { "$dir/$_" } grep { !/^\.$|^\.\.$/ } readdir DIR;
   closedir DIR;
   for (@entries) {
      print "file or directory: $_\n";
      trav($_) if -d;
   }
}

To traverse the current directory, you'd call it with:

trav(".");

(or with any other directory - the one you want to traverse). Note that you can have other things (subroutine calls) besides the print inside of that trav(). Actually a good exercise would be to modify this subroutine to also accept a reference to a subroutine as a parameter and call that subroutine on every directory. This would make the trav method a lot more reusable.

Non-Recursive?

As you know (or should know) that anything you can do recursively you can do without recursion using a stack, here's the non-recursive implementation:

sub trav {
   for $f (@_) {
      print "visit: $f\n";
      if(-d $f){
         opendir DIR,$f or die $!;
         push @_,map { "$f/$_" } grep { !/^\.$|^\.\.$/ } readdir DIR;
         close DIR;
      }
   }
}

You call this trav the same was as above (the order of the traversal is different, but otherwise it's the same). The order is different because in effect, we're not using a "stack" but a "queue" (which makes our traversal breadth-first).

© 2006, Particle