Navigate by using the arrow keys or clicking.
Press "n" to see notes for slides which have them.
Press "t" to toggle between slide view and outline view.
A copy of this presentation can be downloaded from
http://www-personal.umich.edu/~markmont/shells.tar.bz2
Please do the following:
ssh login.itd.umich.edu mkdir ~/shells cd ~/shells cp ~markmont/Public/shells/* .
Alternatively, to work on your local MacOS X or Linux system:
curl -O http://www-personal.umich.edu/~markmont/shells.tar.bz2 tar jxf shells.tar.bz2 cd shells
while ( 1 ) { print( "prompt: " ); read( command ); if ( done ) { exit(); } if ( built_in( command ) ) { do_built_in( command ); } else { /* it is an external command */ fork(); if ( this_process == child_process ) { exec( command ); } else { /* we're the parent (main shell) */ wait_for_child_to_finish(); } } }
This is sometimes called "REPL", the Read-Evaluate-Print Loop:
In our example -- and in most command-oriented shells, as opposed to language interpreters such as Python -- the "print" step is implicit in the "evaluate" step: evaluating/executing the command causes output to be generated, which is automatically displayed.
Understanding this code -- especially the fork()
and exec()
are important for demystifying how Unix shells work.
/* * A very simple Unix shell. * * To compile this program, run: * * gcc -o simple-shell simple-shell.c * * This shell supports comments (lines whose very first character is '#'), * the echo built-in, the exit built-in, and absolute paths to external * executables (for example, "/bin/ls -ld /var/tmp"). However, it supports * nothing else, and is also extremely inflexible and unforgiving (for * example, extra whitespace usually results in an error). * * modified 18 may 2012 by cja: added shebang handling and prompt suppression * */ #include <stdio.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h> #define MAX_COMMAND_LENGTH 1024 #define MAX_COMMAND_ARGUMENTS 10 char command[ MAX_COMMAND_LENGTH ]; char *command_arg[ MAX_COMMAND_ARGUMENTS + 1]; int main( int argc, char **argv, char **envp ) { char *c, *line; pid_t child_pid; int child_status, result; unsigned int n; if (argc > 1) if (freopen(argv[1], "r", stdin) == NULL) err(1, argv[1]); while ( 1 ) { if (argc < 2) fputs( "> ", stdout ); /* display a prompt */ line = fgets( command, MAX_COMMAND_LENGTH, stdin ); /* read a line */ if ( line == NULL ) { return 0; } /* exit if there was no more input */ /* remove the newline from the end of the line */ for ( c = command ; *c != '\0' ; c++ ) { if ( *c == '\n' ) { *c = '\0'; break; } } /* If the command begins with a '#', it's a comment. Go back * to the start of the loop. */ if ( command[0] == '#' ) { continue; } /* built-in command: exit */ if ( strcmp( command, "exit" ) == 0 ) { return 0; } /* built-in command: echo */ if ( strncmp( command, "echo ", 5 ) == 0 ) { puts( command + 5 ); continue; /* go back to the top of the loop */ } /* split the command into an array of command line arguments. * This is needed the Unix execve() system call takes an array * of arguments, which it passes the the main() function of the * program that it runs. */ memset( command_arg, 0, sizeof( command_arg ) ); command_arg[0] = strtok( command, " " ); n = 1; while ( n <= MAX_COMMAND_ARGUMENTS ) { command_arg[n] = strtok( NULL, " " ); if ( command_arg[n] == NULL ) { break; } n++; } if ( n > MAX_COMMAND_ARGUMENTS ) { puts( "Error: too many command line arguments" ); continue; } /* create a child process to run the command for us: */ child_pid = fork(); if ( child_pid == -1 ) { perror( "fork failed" ); return 0; } if ( child_pid == 0 ) { /* this code is run by the child process */ /* run the command: */ result = execve( command_arg[0], command_arg, envp ); /* if we could not run it, complain and exit */ perror( "exec failed" ); return 255; } else { /* this code is run by the parent process */ wait( &child_status ); /* wait for the child process to finish */ } } }
fgets()
exit
, echo
fork()
and exec()
argc, argv, envp
are given to us by the OS when we are run (as arguments for our function main()
, and how we provide them, in turn, to commands we run (the arguments to execve()
). This will be important later to understand real-world shells.execve()
(and, in turn, main()
) requires an array of individual arguments. This will also be important later to understand real-world shells.gcc -o simple-shell simple-shell.c ./simple-shell echo "Hello" /bin/ls -l exit cat example1 ./example1
sh
, now obsolete), which led to the Bourne Shell (sh
), which in turn led to the Bourne-Again Shell (bash
).csh
), which was designed to be more user-friendly than the Bourne Shell (especially adding many features to improve interactive use), and have built-in commands whose style was closer to that of the C programming language. tcsh
improved upon the command line editing, history, and autocompletion of the C shell. However, because many features and syntax in the C Shell are ad hoc, C Shell Programming is Considered Harmful.Command line editing, autocompletion, history, editing, ~
notation for home directories, and job control were all introduced by the C Shell, which accounts for its long time popularity. A few of these features were later added to the Bourne Shell; all have been added to Bash.
Also see: https://en.wikipedia.org/wiki/C_shell
I strongly recommend that no one ever use any C Shell variant. It's bad for scripting, and if you use C Shell interactively, you'll have a harder time scripting for other shells.
ksh
), Z Shell (zsh
), and Almquist Shell (ash
, dash
).
NAME=VALUE
VALUE
has to be a single "word", so if it's not, enclose it in quotes.favorite_food="apple pie"
echo Would you like some $favorite_food?
unset
shell built-in:unset favorite_food
set
PWD
always holds the path of the current working directory.PS1
to determine what the shell prompt should be.Running set
to see what variables are doesn't make a lot of sense. The set
built-in is used for at least four other things too, so it can be confusing.
PWD
stands for "Print Working Directory", while PS1
stands for "Prompt String 1". For a list of all variables that will be set or read by Bash, see the "Shell variables" sub-section in the "Parameters" section of the Bash manual page.
envp
argument of the execve()
system call). By convention, people usually choose all-uppercase names for variables that they know they will be exporting, but this is not required.export NAMEOr a variable can be set and exported in a single step,
export NAME=VALUE
NAME=VALUE commandFor example, to run firefox so that firefox will look for shared libraries in /opt/lib just this one time, but nothing else will,
LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/lib" firefox
printenv
Although many shells have a built-in command named printenv
, Bash does not; /usr/bin/printenv
is what actually runs.
Bash has a number of special variables known as "parameters"
$1
, $2
, and so on up through $9
, and continuing with ${10}
, ${11}
and so on are positional parameters. These are the arguments given to the shell (or shell script) when it was started, or the arguments given to the function that is currently executing, or arguments given to the set
shell built-in.$*
and $@
is both all of the positional parameters (arguments) together, and $#
is how many positional parameters there are.$?
is the status code returned by the most recently executed command.
For a list of all parameters, see the "Parameters" section of the Bash manual page.
The difference between $*
and $@
is when double quotes are used: "$*"
expands to "$1 $2 ..."
(one word) while "$@"
expands to "$1" "$2" ...
(each argument is a separate word).
$
or written as ${NAME}
are replaced by their values. There are many special forms for parameter expansion -- for example, ${COLOR:-red}
is replaced by the value of COLOR
if the variable COLOR
is set, or by the string "red" if COLOR
is not set`command`
or $(command)
are replaced by the output obtained after running command
.~
and ~username
are replaced by the path to the home directory.For a list of all of the special forms of parameter expansion, see the "Parameter expansion" section of the Bash man page.
*
(match zero or more characters), ?
(match exactly one character), and [...]
(match any character in the brackets) are substituted with matching file names or paths. If there are no matches, these words are left alone.mkdir /home/{sue,david}becomes
mkdir /home/sue /home/david
command > filename
command >> file
command < filename
command1 | command2
A "file descriptor" is a number that describes a file that is open and available for reading or writing. Most Unix processes usually have three standard file descriptors when they start:
File descriptor | Description |
---|---|
0 | standard input (stdin) |
1 | standard output (stdout) |
2 | standard error (stderr) |
Which file descriptors actually exist and what files each one "points to" is inherited from whatever the process' parent set up for it -- or they are the same as what the parent process had if the parent process did not set up anything special for its child process.
After a process is started, it can open files of its own, each of which will get its own file descriptor, starting at the next unused number.
File descriptions can be combined with the I/O redirection operators by putting the number in front of the I/O redirection operators:
command 2> error-messages.logActually, this means "open the file error-messages.log for writing, assign it to file descriptor 2 (which is the standard for Unix processes to use when they output error messages), and then run
command
.
command 1> filenameis the same as
command > filename
command 0< filename
command < filename
A file descriptor can also be duplicated. The construct
command 2>&1
means "take the file that file descriptor 1 already points to, and let file descriptor 2 write to it".
This is useful for sending both stdout and stderr to the same file:
command > output.txt 2>&1
The last command above says "Open output.txt for writing and assign it to file descriptor 1 (where processes usually write their normal output). Then take the file that file descriptor 1 already points to and assign it to file descriptor 2 (where error messages will be written). Then run command
."
Note that all of this opening of files and assigning descriptors is done by the shell after the shell fork()
s but before it runs the external command -- so it all happens in the child process, and the parent process (the shell which continues to run) is not affected.
When you run a command, normally it runs in the "foreground", which means that you can interact with it -- see its input and output -- and the shell does not give you another prompt until the command finishes.
To start a command in the background, end the command line with an ampersand. The shell will respond with a job number in square brackets, followed by the process ID of the command:
$ script.sh & [1] 9948 $
To see a list of all jobs started by the current shell, including their job numbers and current status, use the jobs
shell built-in.
If you start a job normally (in the foreground), you can suspend the job (temporarily cause it to stop running) and get a shell prompt by pressing Control-Z.
To resume running a suspended job in the background, run bg
. If there is more than one suspended job, specify the job number preceded by a percent sign (for example, "bg %2
").
To bring a job that is running in the background into the foreground, use the fg
built-in command, specifying the job number preceded by a percent sign, for example, fg %1
nohup
command can help prevent this, but better solutions include running the job via at
or in a detached screen
session.
Up to this point, we've seen quite a few characters that can trigger special behavior in a shell:
$ ~ ` { * ? [ # < > | &And there are more that we have not talked about yet:
; ( \ ' "
But what if you don't want the shell's special behavior?
$ echo The cost is $1.00 The cost is .00 $
Recall that $1
normally stands for the first argument that was passed to the shell (or, inside a function, to the function) when the shell (or function) was invoked.
A backslash can be used to escape the character which follows it -- it tells the shell to treat the character literally.
$ echo The cost is \$1.00 The cost is $1.00 $
This includes using a backslash to escape a backslash: \\
A backslash can also be used to ignore a RETURN character to continue a single command onto a new line.
Single quotes can be used to remove the special meanings of all characters inside of them, including backslashes, except for the special meaning of single quotes. The following does not work:
echo 'Don\'t use contractions inside single quotes'
Instead, get a single quote by escaping it with a backslash outside of any other single quotes:
echo 'Don'\''t use contractions inside single quotes'
Double quotes are the most frequently used. They remove the special meaning of all characters inside of them except $
, `
, \
, !
, and "
. This means that you can do variable interpolation and command substitution with in single quotes, but that most other special functionality is disabled.
Single and double quotes also keep the shell from splitting the line into words (or arguments) based on spaces; everything inside a pair of quotes is considered to be a single "word".
command1 ; command2 ; command3runs
command1
followed by command2
followed by command3
.
( cd ~markmont/Public ; tar cf - shells ) | \ ( mkdir /var/tmp/$USER ; cd /var/tmp/$USER ; tar xf - )
The last example above is an old-fashioned way of copying a directory tree from one place to another. The two cd
commands are each in their own sub-shell and so don't affect each other, nor do they affect the current working directory of the parent process (the main shell). But each cd
command does affect the tar
command that follows it in the same sub-shell.
Unix programs return an 8-bit value when they exit to indicate whether they completed successfully or not. By convention, 0 means "everything is OK" while any other value (from 1 though 255) means "there was a problem". Programs can use different non-zero values to indicate what type of problem was encountered.
This leads to the odd situation for shells -- opposite from most programming languages -- that 0 means "true" and non-zero means "false".
$ /bin/true $ echo $? 0 $ /bin/false $ echo $? 1 $
/bin/true
is a program whose sole purpose is to complete successfully, while /bin/false
is a program whose sole purpose is to complete with a value that indicates failure.
You can have a shell return a value to its parent process by giving the value to the exit
built-in: for example, exit 0
to end the shell and indicate that everything completed OK.
The shell built-in test
returns true (0) or false (1) based
on some condition it is asked to check.
test -e /foo | Does the file exist? |
test 5 -eq 3 | Are two numbers equal? |
test "bob" == "bob" | Are two strings equal? |
test -z $color | Is a string zero-length? |
Many more examples of tests are listed in the "Conditional Expressions" section of the Bash manual page.
if command1 ; then command2 ; fi if command1 ; then command2 ; else command3 ; fi if command1 ; then command2a command2b else command3a command3b fi
Example:
if test -d /bin ; then echo "/bin is a directory, good." ; fi
"fi", being "if" spelled backwards, signals the end of the if
statement.
The test
built-in command is also named [
.
And if the last argument to the test command is ]
, then the
test command will ignore it. This allows things like the following to be
written:
if [ $USER == "root" ] ; then echo All hail the mighty root\! fi
The above does the exact same thing as
if test $USER == "root" ; then echo All hail the mighty root\! fi
/bin/test
that works the
same way as the version that is built in to the shell. There's also
a program named /bin/[
which is usually hard linked to
/bin/test
.
You can build more complex tests by using &&
, ||
, and !
command1 && command2 | True if both command1 AND command2 succeed |
command1 || command2 | True if either command1 OR command2 succeed |
! command1 | True if command1 fails |
Example:
if [ $USER != "root" ] && [ `date +%A` == "Friday" ] ; then echo "Let's head to the bar after work." fi
&&
and ||
both use short circuiting, that is, they will stop as soon as it becomes obvious that the overall test will fail instead of running both commands before deciding.
Because of this, a common idiom is to use
command1 && command2
as a shorthand for
if command1 ; then command2 fi
command1 || command2
is used as a shorthand for
if ! command1 ; then command2 fi
For example:
[ -f /etc/motd ] && cat /etc/motd
The first command tests to see if /etc/motd
is a file. If it is not -- perhaps because it doesn't exist, or if it is a directory instead -- then the test as a whole cannot possibly be true, so Bash does not bother executing the second command.
On the other hand, if /etc/motd
is a file, then the first half of the test has succeeded, and the shell executes the second command, displaying the contents of the file. This avoids an error message from cat
if the file doesn't exists, and it's shorter than an "if" statement. The value of the test as a whole is not used for anything.
Of course, an easier way to avoid an error message in this particular example is
cat /etc/motd 2> /dev/null
but the command1 && command2
idiom is used in a lot of places.
for NAME in VALUE ; do command ; done
command
gets executed with the variable named "NAME" set successively to each of the words in VALUE.
For example, the following will rename all files in the current directory whose names end in ".bak" to the same name ending with ".old" instead:
for filename in *.bak ; do without_extension=`basename $filename .bak` mv $filename $without_extension.old done
If the if
statement ends with "fi", why doesn't the for
statement end with "rof"? It was probably introduced later when people knew better.
There's actually another form of the for
statement that is
like the for
statement in C. We won't cover that here, refer
to the Bash manual page.
case
and while
statementsIFS
eval
, shell arithmeticThe case
statements allows you to match a value against
a list of patterns, doing something different depending on which pattern(s)
match.
Command line editing, autocompletion, and history are very useful for increasing productivity and reducing work when using the shell interactively.
man bash
"
or seegzip -dc /usr/share/man/man1/bash.1.gz | \ groff -man -Tps > bash.ps
unix-admins@umich.edu