Skip to main content

The Unix Model

Programs

Hello World

  • Makefile
  • hello_world.c

Startup Function

  • Makefile
  • startup.c

Environment List

  • Makefile
  • environ.c
  • environ2.c
  • usage.txt

Check Environment Variable

  • Makefile
  • check_env.c

Process IDs

  • Makefile
  • process_id.c

User Group IDs

  • Makefile
  • user_group_id.c

Passwd

  • Makefile
  • passwd.c

Group

CC=gcc
CFLAGS=-pedantic -ansi -std=c89

all: group

group: group.c
$(CC) $(CFLAGS) -o $@ $<

clean:

File

  • Makefile
  • file.c
  • typeof_file.c
  • usage.txt

Process Group

  • Makefile
  • process_group.c

Time of Day

  • Makefile
  • time_of_day.c

Open File

  • Makefile
  • open_file.c

Signal

  • Makefile
  • signal.c
  • signal2.c
  • usage.txt

Fork

  • Makefile
  • fork.c

Exec

  • Makefile
  • dummy.c
  • exec.c
  • usage.txt

Daemon

  • Makefile
  • daemon.c

Exercises

Question 2.1

If a process modifies an environment variable, by changing one of the strings pointed to by the environ pointer, what effect does this have on its parent process? What effect does this have on any child processes it invokes?

When a child process modifies the environment variable, the effect is not propagated to the parent process. However, if a process modifies a value of an environment variable before it forks to clone the process, the child process inherits the modified value.


Question 2.2

What effect does the following have:

setuid(getuid());

(Hint: refer to BSD line printer spooler client in Section 13.3)

From the man-page for getuid(2):

The getuid() function returns the real user ID of the calling process. The real user ID is that of the user who has invoked the program. The getuid() function is always successful, and no return value is reserved to indicate an error.

Similarly, the man-page for setuid(2) states that:

The setuid() function sets the real and effective user IDs and the saved set-user-ID of the current process to the specified value. The setuid() function is permitted if the effective user ID is that of the super user, or if the specified user ID is the same as the effective user ID. If not, but the specified user ID is the same as the real user ID, setuid() will set the effective user ID to the real user ID.

In Chapter 13, this expression is used after obtaining a reserved port through tcp_open. To obtain a reserved port, the user who invoked the program must be root or the program be set-user-ID root. For the latter scenario, after obtaining the socket using the reserved port, we have no reason to keep the process under high privilege. We also do this to assure that we can't read files as root that the user doesn't have normal access to.


Question 2.3

Both the functions getpwuid and getpwnam return a pointer to a structure that the function fills in. Where do you think this structure is stored? (Check the appropriate manual pages for your system.)

The DESCRIPTION section (of macOS) for these functions states that:

These functions obtain information from opendirectoryd(8), including records in /etc/master.passwd which is described in master.passwd(5). Each entry in the database is defined by the structure passwd found in the include file <pwd.h>:

  struct passwd {
char *pw_name; /* user name */
char *pw_passwd; /* encrypted password */
uid_t pw_uid; /* user uid */
gid_t pw_gid; /* user gid */
time_t pw_change; /* password change time */
char *pw_class; /* user access class */
char *pw_gecos; /* Honeywell login info */
char *pw_dir; /* home directory */
char *pw_shell; /* default shell */
time_t pw_expire; /* account expiration */
int pw_fields; /* internal: fields filled in */
};

The functions getpwnam(), getpwuid(), and getpwuuid() search the password database for the given login name, user uid, or user uuid respectively, always returning the first one encountered.

Apple-based devices use the opendirectoryd(8) daemon to look up for the passwd entry corresponding to the given name or user ID.

For Linux, there are two manual pages; library function and posix definition. Interested readers can look through the POSIX definition but I'll talk about the library function itself.

The structure passwd defined is as:

struct passwd {
char *pw_name; /* username */
char *pw_passwd; /* user password */
uid_t pw_uid; /* user ID */
gid_t pw_gid; /* group ID */
char *pw_gecos; /* user information */
char *pw_dir; /* home directory */
char *pw_shell; /* shell program */
};

As for the question, the manual states that the pointer to the structure contained in the password database (e.g., the local password file /etc/passwd, NIS, and LDAP). Refer to passwd(5) for details regarding the password file.


Question 2.4

Some network servers compare a user's encrypted password with the pw_passwd field in the passwd structure. What happens when the server is running on a system that has a shadow password file?

A shadow file is a text file found inside the etc directory, /etc/shadow. This file contains one entry per line, where fields in the entry are separated by a colon, :. The fields are as follows, in order:

  1. login name
  2. encrypted password
  3. date of last password change
  4. minimum password age
  5. maximum password age
  6. password warning period
  7. password inactivity period
  8. account expiration date
  9. reserved field

The manual for shadow(5) further refers to crypt(3) as this seems to be the function used to encrypt the password with a given salt using DES. Although it should be noted that glibc provides a workaround to use other encryption standards. Also realize that this file provides additional information about the user such as the idea of "password age" and such. This allows greater control over access to system. So a network server using a shadow password file can verify if the user is still able to log on to the system.

In essence, the user's identification is first consulted in /etc/passwd file. In Linux, if the password field in this file is 'x', then it signifies that the actual hashed password is stored in /etc/shadow file.


Question 2.5

Investigate the access system call (which we have not described here) on your Unix system. We'll use this system call in the remote shell server in Section 14.3. Write a similar function that uses the effective user ID and the effective group ID.

According to access(2) manual page for Linux, this function checks whether the calling process can access the file pathname--the first argument to the function. If pathname is a symbolic link, it is dereferenced. mode (the second argument) is either the value F_OK, or a mask consisting of the bitwise OR of one or more of R_OK, W_OK, and X_OK. The check is done using the calling process's real UID and GID.

Attempting to clone a system call is not a trivial task. But it does not mean we won't be able to mimic the functionality of calls such as access(2). In our case, we'll depend on the stat(2) system call that does some things for us, such as:

  1. As required by the question, the pathname is located using the effective user ID and effective group ID. Realize that access(2) uses real UID and real GID.
  2. We don't explicitly dereference a symbolic link file. In fact, stat(2) resolves the symbolic link automatically, unlike lstat(2).

Apart from this, it's safe to say that this is a very basic program. But it gets the job done.


Question 2.6

If multiple processes are appending records to a file, is there any differences between having each process open the file with the O_APPEND flag, versus having each process issue an lseek to the end of the file before each write?

On Linux, the manual for open(2) call contains the following remarks for O_APPEND flag:

O_APPEND
The file is opened in append mode. Before each write(2),
the file offset is positioned at the end of the file, as if
with lseek(2). The modification of the file offset and the
write operation are performed as a single atomic step.

O_APPEND may lead to corrupted files on NFS filesystems if
more than one process appends data to a file at once. This
is because NFS does not support appending to a file, so the
client kernel has to simulate it, which can't be done
without a race condition.

As NFS protocol can't atomically perform to: read the current file size, seek to the end of the file, and write the new data; this can cause race condition. We won't consider the lseek scenario for NFS as it's a protocol and it will introduce race conditions without extra handling.

So multiple processes are able to work on a file using O_APPEND flag. But in case of using lseek(2) before write(2), we may encounter early context switch which might not provide expected outcome. Since these both system calls are not performed atomically, the kernel's job scheduler might preemptively suspend the operation before write(2) call. While the question suggests we perform SEEK_END with lseek(2) and this doesn't necessarily introduce the race condition due to write(2) updating the offset, there is a race condition for "logical order of appended data". Consider a simple scenrio:

  1. The file's data we're expecting is "Pranav Joshi". We have two threads; A and B, who are responsible to write each word. Thread A writes "Pranav " and thread B writes "Joshi".
  2. Thread A lseek(2) to SEEK_END. Before it could write anything, it is preempted by the kernel.
  3. Thread B resumes. Consider that it was able to perform both lseek(2) and write(2) in the time quantum it was given. lseek(2) won't really make much difference as we're aiming for the SEEK_END. It then writes to the file, thereafter updating the offset.
  4. Thread A resumes. It will write the word on the updated offset. This results in the content being "JoshiPranav ".

This scenario is worsened for multiple processes having their own descriptors. If the context switch took place before the first process could write the word, and the other process was able to perform both lseek and write, it would not update the offset for the first process as they are independent descriptors. Later if the first process writes the data, it will overwrite the content the second process previously wrote since the offset was set before context switch took place.


Question 2.7

Implement the sleep function using the alarm system call. Be sure to handle the case of an alarm that is already set. Do you need reliable signals to do this correctly?

In our context, a reliable signal is such signals that are guaranteed to be received by the process. On older Unix systems, signals were unreliable and this caused race conditions where signals could get lost--an event could occur to generate a signal but the process would never get notified. Consider the program fragment shown in text:

int flag = 0;       /* global variable set when SIGINT occurs */
...
for (;;) {
while (flag == 0) {
pause(); /* wait for a signal to occur */
}
/* the signal has occurred, process it */
...
}

If a signal occurs after the test of the flag variable (in the while loop), but before the call to pause, the signal can be lost. A simple remedy is also shown:

int flag = 0;       /* global variable set when SIGINT occurs */

for (;;) {
sigblock(sigmask(SIGINT));
while (flag == 0) {
sigpause(0); /* wait for a signal to occur */
}
/* the signal has occurred, process it */
...
}

Notice that using sigblock before we check the flag variable assures that any SIGINT is blocked by the process. Later, sigpause is called with 0, meaning to pause the process until any signal is received by the process. The text compares the signal functions for BSD and System V but the one shown above is the BSD version. System V has similar ways to enforce reliable signals, but is limited to function's ability to work with one signal at a time rather than using the idea of "mask".

My implementation utilizes the POSIX defined signal functions. As the sleep(3) function states, it returns if a signal is received by the calling process. The return value on such scenario is the unslept time. As such, we don't need the "reliable" signal as described above. What we want is to simplpy block the SIGALRM signal before we call alarm(3). Moreover, sigsuspend(3) is used instead of sigpause(3) whose behavior is described in the source file. For the sake of brevity, we only handle SIGINT and SIGQUIT as signals that might interrupt the process. If such signal was received by the process, the call to my_alarm will return the unslept time. Note that other process terminating signals will indeed terminate the process, and sigsuspend never returns.


Question 2.8

Why doesn't fork return the process ID of the parent to the child and return zero to the parent?

There are couple of reasons for this design choice. I'll try to provide sufficient reasons below:

  1. The wait family of function is used by the parent process to determine the child that terminated--either gracefully or through signals. If the parent process received 0 instead of the child process ID, there would be no way for the parent to determine the child process.
  2. The child process can determine its parent process's ID through getppid(2) and it's own id through getpid(2). There is no such system call which determines the child process ID for a process.

Question 2.9

Implement the system function. (Hint: see Kernighan and Pike [1984].)

The manual for system(3) on Linux states that:

The system() library function behaves as if it used fork(2) to create a child process that executed the shell command specified in command using execl(3) as follows:

execl("/bin/sh", "sh", "-c", command, (char *) NULL);

system() returns after the command has been completed.

It further mentions that signals such as SIGINT and SIGQUIT will be ignored and blocks the SIGCHLD signal. Whenever the command argument is NULL, it checks for the availability of Bourne shell (/bin/sh).

The implementation of this function relies heavily on the manual but it is not ideal. Signals needs to be handled with care and there's probably some thing I've missed out. But it is indeed working as described in the manual.


Question 2.10

If a process that is run by a shell sets its file access creation mask to zero, does this affect other processes that are run after it by the shell?

To answer this, we first need to understand how the shell executes command. Whenever a command is entered in the shell, the shell process forks itself and the parent process is kept in the waiting state. It is the child process who is responsible for executing the command. The NOTES section of umask(2) manual on Linux states that:

A child process created via fork(2) inherits its parent's umask. The umask is left unchanged by execve(2).

Consider that we made a process who's core functionality is to modify the file access creation mask, like how the question mentions. Unlike the descriptors--which points to the same "file table" entries after fork(2), the file access creation mask is inherited, implying that any changes made to the mask later in the child process is not reflected back in the parent process. Any file created by this [child] process uses the modified mask but after the process terminates, the parent [shell] process resumes from waiting. The shell process persists it's file access creation mask. Any process afterward still uses the file access creation mask used for the shell process.