About UNIX File Permissions

Note that "file" here is a generic term applying to ordinary files, directories, special device files, pipes, sockets, etc.

For an ordinary file, r, w, x stand for readable, writable, and executable, respectively. For a directory, they stand for readable, writable, and searchable, respectively.

If a directory is readable, you can list just the contents of the directories -- i.e., the names of files it contains, but not information about individual files. You can consider a directory having the following structure (much simplified).

file1 - location of file1
file2 - location of file2
...

So if you type "ls", you get the list of filenames. But if you try to get further information about the files, such as via "ls -l", you get into trouble.

If a directory is writable, you can change the directory's contents (e.g., adding a file). But to do that you have to be able to "open up" the directory in order to create a new entry in the proper place. To do that the directory has to be searchable (x) (not just readable). Hence, "w" by itself is of no practical use.

If a directory is not readable, but searchable, we can still access a file in the directory by supplying the file's name. This is a common practice for letting others access a certain file of yours but not allowing them to look at the contents of your directory.

On Links

UNIX's file-system is basically tree-structured, but the invention of links (soft and hard links) has turned it into a general graph with the possibility of cycles.

A link is like a second name for a file, which can be placed in another directory. A hard link is indistinguishable from the original name of the file. So in the following

$ echo hello world >file
$ ln file file2

Both "file" and "file2" refer to the same physical file. "ls -l" would show that this physical file has two names (two hard links). "rm file" would delete only the link "file"; the file is still there and is pointed at by the remaining link, "file2".

A soft link is simply a (special) file containing a pathname. For example,

$ ln -s /usr/local/versions file3

creates a special link file called "file3" in which is the pathname "/usr/local/versions".

The filename a soft link contains does not have to exist.

Through linking, we can create cycles, as in the following.

$ ln -s .. mom

This one creates a link pointing back to the parent directory, hence a cycle.

The common use of links is make directories or files with complicated pathnames available in the user's own directories. For example,

ln -s /usr/local/WWW/jdk1.1.7 JDK

File Descriptors and Inodes

Every process has a table of file descriptors, say fd[0], fd[1], ... A file descriptor contains a pointer pointing at a physical file (file, directory, device, socket, etc.).

By default, fd[0] points at the keyboard input device, fd[1] the monitor output device, and fd[2] also the monitor. They are generally known as the standard input (stdin), standard output (stdout), and standard error (stderr), respectively.

Through various system calls, a program can modify the contents of these default file descriptors to point at something else. For example, if we want the standard output to go to a file, we can modify fd[1] to point at the desired file.

The shell provides some simple mechanism for changing the contents of file descriptors of a running program on the fly:

cat <file			# fd[0] connected to file
who >file			# fd[1] points at file instead of monitor
grep hello xyz 2>/dev/null	# fd[2] points at the garbage dump
a.out 2>file 1>&2		# both fd[1] and fd[2] go to file

Every file has an inode (index node) containing various information (name, various dates, size, permissions, etc.) pertaining to the file as well as pointers to data blocks making up the file.

A file descriptor does not point directly at the physical file, but through some system's table, at the inode of the file, as shown below.

In this example, process X is trying to read something from file F which is connected to fd[4] of process X. The same file has also been opened by process Y through Y's fd[2] -- i.e., two processes sharing the same file.

Note also in the example above, the inode of file F has been brought into memory. This allows more efficient file operations on file F. If there is any change made to file F's inode during the process, the original copy on disk must be updated at the end (e.g., when the file is closed).

The following program is an example of file descriptor manipulation. It implements the ">" file redirection operator commonly found in shell languages. source (compile with g++)


#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

main() {
	char *line = new char[100];

	int f = open("test", O_WRONLY | O_CREAT, 0700);

	dup2(f, 1);	// or ... close(1); dup(f);

	puts(gets(line));
}

The open system call creates a file and assigns it the next available descriptor in the process' file descriptor table -- that is, file descriptor 3, because 0, 1, 2 are already allocated by default. The system call dup2 then makes file descriptor 1 point to the same file as file descriptor 3. As a result, the output of puts goes to the file instead of stdout which was originally pointed to by file descriptor 1. Here is what happens in picture:

The following one is slightly more elaborate; it implements a pipe between two programs, cat and wc. source (compile with gcc)

#include <stdio.h>

main() {
	int fd[2];
	pipe(fd);
	if (fork() == 0) {	// first child
		dup2(fd[1], 1);	// or ... close(1); dup(fd[1]);
		close(fd[0]);
		close(fd[1]);
		execlp("cat", "cat", NULL);
	}
	if (fork() == 0) {	// second child
		dup2(fd[0], 0);	// or ... close(0), dup(fd[0]);
		close(fd[0]);
		close(fd[1]);
		execlp("wc", "wc", "-c", NULL);
	}
	close(fd[0]);
	close(fd[1]);
	while (wait(NULL) != -1)
		;
}

The pipe system call grabs the next two available file descriptors, i.e., 3 and 4, and store them in fd[0] and fd[1]. fd[0] is connected to the input end of the pipe, and fd[1] the output end.

When a child is forked, its parent's file descriptors table is copied to the child so that all the opened files (including any pipes) are automatically pointed to by both processes.

A pipe is a FIFO type of file implemented as a memory object by the kernel.

Using dup2 and close, we can achieve the desired effect, as shown below.