↑↑ Home ↑ UNIX

Using the shell bash

Input -- Other key codes -- Advanced features: Input and output, sequences of commands, variables and loops -- Automating tasks: Aliases, functions, shell scripts and other scripts -- Customising -- Further reading

Command line input

Today's bash has a very sophisticated input line (from the "readline" library). You can use many of the key codes of the editor emacs. The most important are: Ctrl-a beginning of line, Ctrl-e end of line, Alt-f word forward, Alt-b word backward, Ctrl-k kill to end of line, Insert or Ctrl-y paste ("yank") last killed text, Alt-y replace by previous killed text, Ctrl-_ undo, Alt-d delete word, and Alt-Backspace backspace word. Alt-Backspace is problematic in X terminals; apparently not all of them pass keypresses on to the shell unchanged. By trying out all the programs plagiarising xterm, one in which Alt-Backspace works can usually be found. On SuSe Linux 7.0, xterm itself does the job, on Mandrake 8.1, a program named aterm works, and on several Linux distributions and Cygwin, rxvt does it.

One key binding which differs from emacs is Ctrl-w. While in emacs it kills the region between a mark and the cursor, in the shell input line it kills the command-line parameter to the left of the cursor, ie everything until the preceding white space.

The readline input line can greatly reduce the amount of typing you have to do by automatically completing the word you are typing. It can complete commands (the first word in the line), file names in the current directory (all other words), shell variables (starting with "$"), user names (starting with "~") and host names in the local network (after "@"). Just type the first few characters of the name and press the Tab key. The word will be completed up to the point where the possible completions differ (if the first few characters did not determine it unambiguously). Press Tab again, and all possible completions will be listed.

This feature is partially defeated by Linux distributions' tendency to include ever more files and commands without tidying up old ones, so that an increasing proportion of all possible combinations of letters is available. But it still is a great help. When I create a new command (see below), I choose its name so that there are as few commands starting with the same letters as possible. Then I have to type just its first few characters and Tab to execute it.

It goes without saying that a modern command-line parser offers command history. By pressing the up and down arrow keys, you can navigate around the list of the commands you typed previously. Ctrl-r allows you to search backwards through the history. If you don't find what you are looking for, abort the search with Ctrl-g. When you press an arrow key or type another readline command, you will be dumped on the currently shown history line and are allowed to edit it. To get back to the end of the history if you did this in error, type Alt->. A command useful for executing several "historic" commands in a row is Ctrl-o. It executes the current command and immediately fetches the following one from history, so you can execute a sequence of history commands by pressing it several times.

If you want to do several things in a row to the same file, you don't have to go back in history and delete and retype the command; you can just type the new command and press Alt-. which will append the previuos command's last argument.

Other key commands and process management

The following key commands are strictly no business of bash, or even readline. But they are supported by most modern terminals (both xterms and consoles) and can therefore be used whenever you use the shell, so they fit in well at this point. Shift-PageUp scrolls back the output on the current terminal, Shift-PageDown scrolls back again, Ctrl-s stops output and Ctrl-q gets it going again. The last two are very useful when a program unexpectedly outputs large amounts of text. They work even in the terminal showing kernel messages during system startup. On the down side, they mask two none-too-frequently used commands of readline.

Other key codes allow you to control the execution of the program running in the terminal. (This is done by the shell again.) Ctrl-z suspends execution and Ctrl-c interrupts the program irreversibly. Suspended programs can be continued with the fg command. Its name stands for "foreground" and indicates that the suspended command will be continued in the foreground, ie the shell will pause until it is finished (or stopped). Only one program can run in the foreground at any time. However, multiple processes can be executed in the background, in parallel with the shell. You start a background process by putting an "&" after the command. The process should not output any text, or it will mess up your terminal; if it does, throw the output away with /dev/null. You can continue suspended processes in the background rather than the foreground, with the command bg. There may be several suspended processes. You can get an overview of all processes started from a shell with the command jobs. These commands are described in the manual page of bash.

Finally there is one last key shortcut which saves a lot of typing over the years. Crtl-d is equivalent to the exit command which quits the shell. It has to be typed on an otherwise empty input line, or bash will complain. (This exit key is pretty much standard also for other programs parsing terminal input.)

Advanced shell features

Input and output

UNIX systems are full of small programs which do nothing much taken on their own. What really makes UNIX systems powerful is the shell's features for making programs work together. Probably the most important of these are "pipes". If you execute two commands in one line separated by "|", the output of the first is taken as input for the second. The second program has to read from standard input (ie from what you would type on the terminal) for this to work. For instance, you can count the number of files in a directory like this:

ls -1 | wc -l
ls lists the files (and subdirectories) in the current directory. With the option -1, it prints only one file in each line. wc is a program which counts the characters, words and lines of a file or its input (if, as here, no file is given). -l tells wc to output only the number of lines. So you get the number of lines with one file per line = the number of files. See here for more examples of pipes.

Then you might want to write the output of a program into a file, for instance when a compiler or similar program produces more output than are saved by your terminal. You could do this by piping its output into tee which writes its input both to the terminal and to a file (see here for an example; view its manual page for more info). But in fact there is an easier way: type "> outfile" after the command (or pipe), and its output (the output of the last program of the pipe) will be written to the file outfile. This is called redirection of the standard output channel. You can append output to a file with the operator ">>". Input redirection is also possible, with "< infile". It feeds the contents of a file into a program as though the user had typed them. But since most programs can be made to read from files anyway, this is not needed nearly as often as output redirection.

It is also possible to pipe the output of other processes into programs which read only from a file, not from the standard (terminal) input. You just have to subsitute for the file name the expression "<( command; )". This also allows you to use the output of several commands simultaneously as input for another program. For instance you can compare the contents of two directories by typing:

diff <( ls dir1; ) <( ls dir2; )

We have just learnt how to connect input and output channels of two or more independent programs. However, sometimes a program has to work on the file names output by a different program, ie its command line ought to contain the output of the other program. This is achieved by putting the second program in the first's command line, enclosed in backticks "`". Alternatively, "$(...)" can be used. I often use this in connection with which, which gives the full path of an executable. For instance,

ls -l `which sh`
lists details of the executable of the sh shell. (Nowadays it is just a symbolical link to bash.) Similar constructs allow it to apply programs to executables of which you don't know the path. Another application of this feature is to apply an operation to a list of files in a text file. I do this for my backups, like this:
tar cfz backup.tgz `cat backupfiles`
The program tar creates an archive file containing all the files and directories given on its command line after the archive name. cat just writes the contents of a file on the terminal, or in this case into tar's command line. So tar will archive all the files listed in "backupfiles". The file names may even contain wildcard characters since the shell performs file globbing on the output of cat before passing the result on to tar.

Sequences of commands

Now what if you want to do several things in a row? The usual way to do this would be to enter one command at a time and pressing Return every time. But this is not really efficient if you have to repeat the sequence several times. You would have to go back in the history by several items and press Return a number of times. Better than having to type it all over again, but there's a better way. You can type several commands to be executed successively in one line, separated by semicola. For instance, to compile a LaTeX file and then create a postscript from it, you could type:

latex file.tex; dvips -o file.ps file.dvi
However, this will execute dvips even when an error occurs when LaTeX processes your file - quite unnecessarily in that case. There is a way to make the execution of the second command dependent on the success of the first. Just type "&&" between the two commands. This is recommendable in particular in constructs such as "cd directory && rm -rf *" (not "cd directory; rm -rf *"!). If the directory in question does not exist, the wrong construct can result in unintended consequences, as a colleague of mine learnt to his cost (good thing there was an automatic backup).

The operator "||" executes the second command only if the first one failed, but it is used much less.

You can group commands by enclosing them in braces. You have to type a semicolon after the last command of the group. This feature is used relatively rarely, it is needed mainly for putting a sequence of commands in the background (ie letting it be executed without blocking the terminal). An example is the UNIX way of reminding yourself when your tea is ready:

{ sleep 4m; echo $'\a' "Your tea's ready!"; } &
sleep waits for the given number of seconds or minutes ("m"). echo outputs its command line. The "$'\a'" is a string containing the alarm bell character; the shell allows C-style backslash character codes in strings started with "$'". So the terminal will beep when 240 seconds have elapsed. For the experienced UNIX hacker, programs like teatime or teacooker are quite superfluous ;).

Variables

Variables are useful for storing values and strings. A shell variable doesn't have to be declared; it is created by assigning a value to it, for instance "x=1". If the value of the variable is to be passed on to programs started from the shell, you have to export it with "export x". You can do both in one go by typing "export x=1". Exporting variables is particularly necessary when setting environment variables with special meaning like PRINTER (default printer), PATH (list of executable search paths), LESS (default options for less), ... .

You obtain the content of variables by putting a dollar sign in front of the name. Displaying a variable's value is usually done with the command echo which just prints everything passed to it in its command line. The shell substitutes the value of the variable for its name, and echo prints it. For instance type "echo $x" to find out the value of x.

Lots of useful things can be done to variables. For instance, the construct "${texfile%.tex}" removes the extension ".tex" from the file name contained in texfile. By appending a different extension you can replace one extension by another. The similar contruct "${texfile#pre}" removes the prefix "pre" from the beginning of the file name. The section about loops contains some examples for this. You can also do a complete search-and-replace operation on the variable contents: "${variable/search/replace}" replaces just the first match, "${variable//search/replace}" all matches. Note that the contents of the variable remain unchanged in all of these constructs, just the text substituted for them differs from the variable content in the said way.

There is one more similar contruct: "${!varname}" gives you the contents of the variable with the name contained in varname. This is sometimes useful for choosing between the contents of several variables without if clauses. The section about functions contains an example.

Loops

Probably the most useful loop construct for everyday life is the for loop. It executes a sequence of commands for every item in a list, usually a list of files. Here is an example which renames all files starting with "alpha" to file starting with "beta":

for i in alpha*; do mv -i $i beta${i#alpha}; done
The for is immediately followed by the name of the loop variable (without "$"). Then comes the word "in", followed by the list. In this case, the shell creates the list itself by expanding the file name containing the wildcard "*". After the following semicolon, there is the word "do" which is followed by the list of commands to be executed for each file. In this case there is only one mv command in the list. The -i option makes it ask the user for confirmation before overwriting anything. The new name for the files is constructed using bash's features for manipulating variable contents. Multiple commands in the or loop would have to be separated by semicola. After the semicolon after the last command, the for instruction has to be completed with the word "done".

To see what a for loop will do before actually executing it, you can put an "echo" after the "do". Since the program echo prints out all that comes after it, the commands which would otherwise be executed are printed out on the terminal.

If you want to use more complex shell features like pipes in commands inside a for loop, you have to enclose the command sequence in braces. For instance, you could use yes for automatically refusing an overwriting request by mv if you want no files to be overwritten:

for i in *.out.ps; do { yes n | mv -i $i ${i%.out.ps}.ps }; done
If you want to display all .dvi files in the current directory, you could type:
for i in  *.dvi; do { xdvi $i & }; done

Making your life easier

By now you can probably guess that bash's command syntax is a whole programming language with variables, conditionals, loops, subroutines and more. Indeed. And if you give complex commands in a programming language, you do not want to retype them every time you need them, but to put them into a program and just call that program whenever you want to perform a certain trask. That can in fact be done, and this is where working with a shell pays off most. You can use the command language you are using every day to automate complex tasks, thereby saving you a lot of trouble. bash provides three ways of abbreviating commands: aliases, functions and scripts. Here they are in increasing order of complexity and power:

Aliases

Aliases are the easiest way to save yourself work. They are small and ubiquitous and most often used to set default options for programs. As the name indicates, they serve as names for a command line consisting of several words by which they are replaced before execution. The most well-known alias is ll, my own definition of which is the following:

alias ll="ls -a -l --color=none"

At their simplest, aliases are mere abbreviations. For instance, I define

alias e=emacs
and can then call the emacs editor by just typing "e". The double quotes are not necessary when you define an alias as just one word.

You can also use pipes and input/output redirection in aliases. I use that to discard the output of the Konqueror browser of KDE when I launch it from a terminal (and at the same time save myself some typing):

alias kq="konqueror &> /dev/null"

Since almost any command can be redefined as an alias, you might want to check what it really is. You do that with the shell-built-in command type <command>. It tells you whether the command is an alias, a function, a built-in command or a file, which may be a script or an executable. In the case of an alias or a function (see next section), the definition is listed; if it is a file, the full path is given. You can remove an alias with the command unalias.

If you want aliases to be available immediately after you log in, you have to put their definition into your bashrc.

Functions

Aliases are fine for abbreviating frequently-used commands, but they soon reach their limits. Imagine you want to create a command that creates a new directory and immediately descends into it, ie executes mkdir and cd with the same argument. An alias can't do that since it can't duplicate its command line argument to use it for both mkdir and cd.

The next more complex labour-saving feature of bash are functions. They are defined with the command "function". It is followed by the name of the function with two parentheses "()" and the definition of the function. This consists of a sequence of commands enclosed in braces. For instance, our create-and-descend-into-directory function could be defined in the following way:

function mkcd() { mkdir "$1" && cd "$1"; }
What is new compared to aliases is the special shell variable with the name "1", dereferenced by "$1". It is the first command-line parameter. You can see it is used twice, once for mkdir and once for cd. I decided to make the cd conditional on the success of the mkdir, because if the directory could not be created it makes no sense to try to descend into it. Putting the $1 in double quotes does not affect the expansion of the variable; it only changes the function's behaviour if the argument contains white spaces or is empty.

In this case the function definition contained only one compound command, or "list". Note that even so there has to be a semicolon after it. By the way, you can also type a function in several lines. bash knows that it is not finished until the last closing brace and will interpret earlier Return keys as line feeds. If you do that, you don't have to type semicola at the end of each line.

While we are at it, let's define the converse of mkcd, the function which ascends one level in the directory hierarchy and removes the directory which we come from. To do that, we need the shell variable OLDPWD and some knowledge of what we can do with variables:

function rmcd() { cd .. && rmdir "${OLDPWD##*/}"; }
This function also consists of only one (conditional) compound command. The variable OLDPWD contains the full path of the previous directory. The construct around it is similar to one explained in the section about variables, but using wildcard characters. It removes everything preceding and including the last slash, leaving only the name of the directory.

Here's one more example of variable trickery: The following function will print its n'th command line argument if its first argument is the number n.

function printn() { echo ${!1}; }
The construct "${!1}" gives the value of the variable whose name is contained in the variable "1". Since the numbers are the names of the variables containing the command line arguments, this is the n'th command line argument if $1 is a number.

You can list functions with the command type, as for aliases. Also like aliases, functions are available only after they have been defined in a specific shell session, so if you want to have them always, write their definition into the bashrc.

Shell scripts

Shell scripts are programs for the shell. They are text files written in the shell's language. To be used as commands, they have to have their "executable" flag set and to be located in one of the directories of the executable search paths variable, PATH. Also it is safer to put "#!/bin/sh" into the first line. This tells the shell explicitly that this is a shell script, which may be necessary on older UNIXes. (/bin/sh is a symbolical link to the executable of the bash shell, which behaves slightly differently when called as sh.)

Writing shell scripts is similar to writing a function definition (and of course to typing shell input). In fact shell scripts are much more widespread and well-known than functions. As was the case for functions, the variables "1", "2",... contain the command line arguments, "@" all arguments, and "#" their number. In addition, there is the variable with the name "0" which contains the name of the command used to call the script. This is the name of the script unless it was called by way of a symbolical link (see manual page of ln). ($0 also exists in functions, but always expands to "bash" there.)

Whether to write a function or a script is mostly a consideration of size - scripts are separate files while function definitions are kept in bash's memory. However, there is one small difference: functions are executed by the shell from which they were called while scripts are executed in a subshell. This means that variable assignments, alias and function definitions and cd commands only take effect inside the script. The shell from which a script is executed remains unaffected. Therefore the functions mkcd and cdrm defined above could not have been implemented as scripts. However, such examples are rare, and it is usually better to write scripts.

In a way it is unnecessary to give examples for shell scripts since everything in this page could occur in them. However, just for illustration, I'll offer you two scripts. The first will help you to write shell scripts of your own; it creates files with their executable flag set containing a line "#!/bin/sh" and an empty line. The lazy way to write shell scripts is to create such a file and then edit it. To fully understand the script, you have to get the documentation on the built-in commands while, if and test (which is equivalent to "[ ... ]") with the help command or better read bash's manual page.

The second example script moves files starting with one prefix to files starting another prefix but the same tail as the original file. It also includes some error handling: It prints out a short documentation if called without parameters, complains if there are no source files and makes sure it never overwrites anything. The character "#" marks the start of a comment in bash's command language (and also in other script languages).

If you want to write shell scripts, you will have to look at bash's manual page frequently. One useful command-line option of bash itself which is not actually documented there (but mentioned in passing several times) is -n. bash -n <script> reads a script without actually executing any commands, but printing all messages concerning errors in the script. A further useful feature of bash, to be used in shell scripts rather than to test them, is the variable $$. This represents the process ID of the shell which executes the script. It is a neat way of generating names for temporary files which a script may have to create: As each shell script is executed by a different instance of the shell, temporary files with $$ in their name will be named differently by each concurrently running instance of the shell script.

There is one more thing one should know about command execution in bash. We have seen there are many ways of defining a command, and it is quite possible that commands of different types have the same name. So if, for instance, a function and an executable have the same name, which will be executed? bash behaves in a predictable way when executinga command line. First aliases are expanded. The first word resulting from alias expansion is the command to be executed. First, bash searches for a function of that name, then for a built-in command, then for a (script or executable) file. The paths contained in the PATH variable are searched in order, and the first path at which the file exists is used. This can be overridden by giving a path explicitly. Then the command is immediately recognised as a file, and aliases and functions are not used (nor are shell builtins). To disable alias expansion and exclude functions from the search, you can use the command "command". It executes the command put after it in its command line, but without taking aliases and functions into account.

A file is first considered an executable. If execution fails because it is not in executable format, it is assumed to be a shell script, unless the first line starts with "#!". In the latter case, the rest of the line is interpreted as the file name (with path) of an interpreter which should be used to execute this script.

Scripts for other programs

We have learnt at the end of the previous section that bash treats script files whose first line starts with "#!" specially: The file name after these letters is executed with the name of the script as its argument. This offers the possibility to write scripts for interpreters other than bash. The first line which is needed only for finding the right interpreter is ignored by the interpreter itself because the character "#" starts a comment in most script languages. Interpreters I have used in this context include Perl, bc and Gnuplot.

What if your favourite interpreter has a comment character different from "#"? Then it will choke on the first line starting "#!" giving the interpreter executable. There is another way to write scripts for other programs which overcomes another drawback of some interpreters - their inability to pass command-line options on to their script. This way is called "here documents". They are a sort of input redirection and are invoked using the operator "<<". Within an ordinary shell script, this operator followed by a key word (usually "EOF") will make all following lines in the script be used as input for the command in its line. The end of the input for the program is indicated by a line containing only the key word. The great thing is that shell variables can be used in the input text. For instance you could write a script for bc computing the square root of a number in this way:

#!/bin/sh

bc -l << EOF
sqrt($1)
EOF
The "$1" is replaced by the first command line argument of the script just as it would in a shell script. (To obtain a literal "$", you have to type "\$", to obtain a backslash, "\\".) There is another example of a here document in my section about bc.

Here documents also make it possible to put two scripts for different interpreters together in one file. I have used that to execute some calculation with Mathematica (in batch mode) and to plot the result with Gnuplot in one go. Schematically it works as follows:

#!/bin/sh

datafile=$1.dat
plotfile=$1.ps

math << EOF || { echo "Mathematica error! Aborting...";  exit 1; }
...
Calculations depending on $2, $3, ...
...
outtab = Table[<expression>, <range>];
Export["$datafile", outtab, "Table"]
EOF

gnuplot << EOF
set terminal postscript landscape
set output "$plotfile"
plot "$datafile" with lines
EOF
The line calling of the Mathematica kernel shows how the usual shell constructs can be used on top of here documents. If the kernel returns an error, a message is printed and the script is aborted before calling the plot program. (This requires that math returns a non-zero return code on error and zero when all went well. Most command-line programs under UNIX do that.) The words "$datafile" and "$plotfile" in the two here documents are replaced by the file names contained in the respective variable.

In the real script I wrote for doing some calculations and plots, I do some more preprocessing of arguments and options at the start. The resulting parameters for the calculations are saved in shell variables which can be used in the here document. The file names for the look-up table and the plot are automatically constructed from the parameters so that file names reflect its content. Shell scripts combined with here documents really give you a powerful tool for using text-based programs.

Customising

The configuration file for the shell bash is called "bashrc". There is a global one in the directory where all global configuration files reside, /etc/bashrc. And in addition, there is one in each user's home directory named ".bashrc". (The dot in front usually hides it from directory listings so the user doesn't see all kinds of cryptic files.)

These files are not so much configuration files with a syntax of their own, but rather scripts in bash's language which are executed when bash starts up. Things to put into your personal bashrc include aliases and function definitions, user settings like the umask and assignments to environment variables. Various programs use environment varibles for default settings, eg PRINTER (the default printer), LESS (default options for less), CDR_DEVICE (default device for cdrecord) etc...

Besides there are a number of environment varibles which influence the behaviour of bash itself. No doubt the most important one is PATH which contains the paths to be searched for an executable, separated by colons. If you want to include an additional path, you should assign "PATH=/your/new/path:$PATH" so the paths already contained in PATH are retained. The paths are searched in order, so an executable in one of the first paths may mask an eponymous one in a later one. The only antidote is giving the path explicitly. If you suspect the wrong program is called, use the which command.

An environment variable very useful for making yourself at home is PS1, the command prompt. It is a string containing special backslash characters which act as placeholders: "\h" for the host name, "\u" for the user name, "\w" for the current working directory, "\t" for the current time, "\#" for the command number and so on.

Other variables affect command history. HISTSIZE is the number of commands remembered in the history. The variable HISTCONTROL allows you to stop repeated commands being entered into history ("ignoredups"), to exclude commands preceded by a space (to give you the choice; "ignorespace") or both ("ignoreboth").

A customising command usually put into the bashrc is the shell built-in command set. It allows you to choose between logical and physical representations of the current working directory, ie with or without symbolic links (set -P). The -b option enables immediate reporting of the exit status of a job. Further fine-tuning of bash's behaviour can be done with the command shopt.

When writing shell scripts, it is important to know that bash doesn't read the bashrc when it is called as sh. This is usually the desired behaviour since it speeds up script execution, but it means you can't use your personal aliases. You could of course change the first line of the script to "#!/bin/bash", but it is usually better to write self-contained scripts that do not depend on definitions in a particular bashrc.

If you made a change to your bashrc and want to test the result, use the source command. It makes bash read and interpret a file written in its command language. Unlike usual "executable" shell scripts, this file is not executed in a subshell but in the shell that executes the source command. It also does not need to have its executable flag set or contain "#!/bin/sh" in its first line. The source command can be abbreviated with a single dot ".".

It has been mentioned above that bash uses a so-called "readline" library for its input line. This library has configuration files of its own. The global one is /etc/inputrc, the user-specific one .inputrc in the user's home directory. It can contain settings for various readline variables described in the manual page of bash. "set show-all-if-ambiguous on" makes possible completions be printed immediately rather than after the secong Tab only. Setting visible-stats to on will mark completions according to their file type. Enabling print-completions-horizontally will make readline fill rows before columns when printing completions. Besides, the inputrc can contain customised key bindings which may be either readline commands or macros.

Further Reading

A brief description of built-in shell commands is obtained with the command "help <command>". If you need information beyond that (and beyond what is on this page), only bash's man page can help you. The best way to read it is the following:

  1. Open an xterm (or similar)
  2. Select the smallest font which doesn't strain your eyes
  3. If you use KDE or FVWM, click on the maximise button of the xterm window with the middle mouse button to maximise it vertically
  4. Type "man bash"
It really is worth it - all 5087 lines of it :)!


TOS / Impressum