main
November 25th, 2024    

CIS 2.55
Main
Files
Syllabus
Overview
Links
Homeworks

UPLOAD HOMEWORKS

Notes
0001
0002
0003
0004
0005
0006
0007
0008
0009
0010
0011
0012
0013
0014
0015
Bayes (src)

Tests
Sample Midterm
Sample Final

Misc
Arithmetics
Fourier Mult

Notes 0009

Directory Methods

In Perl, we can manipulate directories in pretty much the same fashion as in the UNIX shell, using just about the same commands.

For example, creating a directory, is a mkdir function.

Here are some directory manipulation functions (directly from the perlfunc man pages):

Creating Directories

mkdir FILENAME,MASK
mkdir FILENAME

Creates the directory specified by FILENAME, with permissions specified by MASK (as modified by umask). If it succeeds it returns true, otherwise it returns false and sets $! (errno). If omitted, MASK defaults to 0777.

In general, it is better to create directories with permissive MASK, and let the user modify that with their umask, than it is to supply a restrictive MASK and give the user no way to be more permissive. The exceptions to this rule are when the file or directory should be kept private (mail files, for instance). The perlfunc(1) entry on umask discusses the choice of MASK in more detail.

Removing Directories

rmdir FILENAME
rmdir

Deletes the directory specified by FILENAME if that directory is empty. If it succeeds it returns true, otherwise it returns false and sets $! (errno). If FILENAME is omitted, uses $_.

Be warned using unlink to remove directories. Here are the unlink specs:

unlink LIST
unlink

Deletes a list of files. Returns the number of files successfully deleted.

$cnt = unlink 'a', 'b', 'c';

unlink @goners;

unlink <*.bak>;

Note: unlink will not delete directories unless you are superuser and the -U flag is supplied to Perl. Even if these conditions are met, be warned that unlinking a directory can inflict damage on your filesystem. Use rmdir instead.

If LIST is omitted, uses $_.

Changing Directories (current directory)

chdir EXPR

Changes the working directory to EXPR, if possible. If EXPR is omitted, changes to the directory specified by $ENV{HOME}, if set; if not, changes to the directory specified by $ENV{LOGDIR}. If neither is set, chdir does nothing. It returns true upon success, false otherwise. See the example under die.

Reading Directories

Before you can read a directory, you need to open it:

opendir DIRHANDLE,EXPR

Opens a directory named EXPR for processing by readdir, telldir, seekdir, rewinddir, and closedir. Returns true if successful. DIRHANDLEs have their own namespace separate from FILEHANDLEs.

Once you open it, you can read the DIRHANDLE to get contents of the directory:

readdir DIRHANDLE

Returns the next directory entry for a directory opened by opendir. If used in list context, returns all the rest of the entries in the directory. If there are no more entries, returns an undefined value in scalar context or a null list in list context.

If you're planning to filetest the return values out of a readdir, you'd better prepend the directory in question. Otherwise, because we didn't chdir there, it would have been testing the wrong file.

opendir(DIR, $some_dir) || die "can't opendir $some_dir: $!";
@dots = grep { /^\./ && -f "$some_dir/$_" } readdir(DIR);
closedir DIR;

Once done reading, obviously there is a closedir method.

closedir DIRHANDLE

Closes a directory opened by opendir and returns the success of that system call.

DIRHANDLE may be an expression whose value can be used as an indirect dirhandle, usually the real dirhandle name.

Alternative Method

There is also the filename globbing operator. The general format of it is:

@files = <*.xml>;

This will store the names of all xml files from the current directory in @files array.

You should however avoid it, since that makes it very confusing to use with reading file handles. For example: <$a> grabs a line from a file handle referenced by $a, while <a> looks for the file named "a" in the current directory.

A better way to express the glob is with a glob:

@files = glob("*.xml");

You can also use that in loops:

while(glob "*.xml"){
    print "xml file named: $_\n";
}

One thing about globs though, is that if the filename contains a "\n" (newline) character, the glob will return that as two different names (it splits things on new lines). While this is extremely rare, it does happen - and you're a lot better of using opendir and readdir combination as opposed to the glob (then again, if you need a quick fix without too much typing, then it's ok to use it - There's More Than One Way To Do It).

Recursion - Traversing Directories

You can traverse the directory structure fairly easily using a recursive subroutine:

sub trav {
   my $dir = shift;
   opendir DIR,$dir or die $!;
   my @entries = map { "$dir/$_" } grep { !/^\.$|^\.\.$/ } readdir DIR;
   closedir DIR;
   for (@entries) {
      print "file or directory: $_\n";
      trav($_) if -d;
   }
}

To traverse the current directory, you'd call it with:

trav(".");

(or with any other directory - the one you want to traverse). Note that you can have other things (subroutine calls) besides the print inside of that trav(). Actually a good exercise would be to modify this subroutine to also accept a reference to a subroutine as a parameter and call that subroutine on every directory. This would make the trav method a lot more reusable.

Non-Recursive?

As you know (or should know) that anything you can do recursively you can do without recursion using a stack, here's the non-recursive implementation:

sub trav {
   for $f (@_) {
      print "visit: $f\n";
      if(-d $f){
         opendir DIR,$f or die $!;
         push @_,map { "$f/$_" } grep { !/^\.$|^\.\.$/ } readdir DIR;
         close DIR;
      }
   }
}

You call this trav the same was as above (the order of the traversal is different, but otherwise it's the same). The order is different because in effect, we're not using a "stack" but a "queue" (which makes our traversal breadth-first).



































© 2006, Particle