Back to All Articles Subscribe   

Advanced BASH Exercises

First published on: 1 January 2016

BASH is the most widely-used and widely-supported shell for Linux. There are other shells that are better than BASH in various ways, but we feel that none of these other shells are better enough to warrant replacing BASH as the de-facto standard when writing user shell scripts. NetBSD and Debian use a stripped-down shell as the default for system scripts, which start with #!/bin/sh. BASH is preferred for user scripts, which start with #! /usr/bin/env bash. BASH is installed by default on almost all Unix-based operating systems, and the majority of the world’s shell scripts are written in BASH. For this reason, we suggest that all of our developers learn standard shell, starting with BASH.

BASH scripts are a domain-specific programming language that is well-suited to managing processes and files. That being said, the large number of special characters appropriated for process management, its text expansions, and its unusual syntax make BASH poorly-suited for general purpose programming. Accordingly, we think that BASH should only be used for scripts that are predominantly concerned with processes and files.

BASH is designed to be fast and convenient, and many commands must be typed frequently into the terminal. Thus, the design goals for BASH scripts (being terse and convenient) clash with those for code written in a general-purpose language (being general and readable). You can read some more discussion about this topic in our article on programming languages. It is worth keeping these design constraints in mind when learning BASH and writing scripts.

In this post, we provide a set of exercises that should help you solidify your knowledge of BASH. Note that these are NOT introductory level questions, and they assume that you are starting with a working knowledge of Linux and BASH. The questions focus on features of BASH that

Background Reading

Before attempting to answer the questions, we recommend that you read the following resources:

Problems

What is the difference between the .bashrc, .bash_profile, and .profile files?

The .bashrc file is sourced on startup for non-login, interactive shells.

The .bash_profile file is sourced for login, interactive shells.

The .profile file is sourced if there is no .bash_login or .profile files.

You can read about the difference between the different ways of invoking shells on this stack overflow question.

It is a best practice to have a quiet .bashrc. Programs that tunnel over an SSH connection can be brittle with unexpected output. Since ssh always sources .bashrc, and rsync implements its protocol over an SSH shell, rsync will fail when the destination system’s ~/.bashrc is noisy.

It is a common practice to have a relatively bare .bash_profile which:

  1. handles login-specific customizations
  2. sources the .bashrc.

Typically any commands that you want to be sourced when you start non-login interactive shell, you also want to be sourced when you run a login interactive shell, so by sourcing the one from the other, you avoid code duplication.

Also note that the .profile file is used by the older sh command (which BASH can emulate).


What is the difference between an environment variable and a variable and how are they related?

An environment variable is an operating-system construct (in Linux, an array of strings that by convention follow the form “key=value”).

A variable is a BASH construct.

A BASH variable stores a value, and has attributes. For example, attributes describe whether the variable is read-only, what its “type” is, whether it transforms its input to all uppercase, etc. There is also an attribute on each BASH variables indicating whether they should be exported to new processes. Variables that are marked to be exported will be turned into environment variables when new processes are executed. Furthermore, all of the environment variables that are present when BASH is started, are turned into BASH variables marked for export, so the default behavior is to pass along all environment variables to child processes.


What is a builtin command? Which of the following commands are builtins: cd, bash, echo, ls, exit, kill, [, [[? Why is it useful to know which commands are builtins?

A builtin command is a command that can be executed within the BASH process. cd, echo, exit, kill, and [ are builtins. [[ is actually a syntactical construct and is not a command. You can read about the difference between [ and [[ here.

Forking a new process is a relatively expensive operation, thus shells will often re-implement common Unix commands inside itself to avoid the overhead of spawning new processes. Using builtin commands instead of separate processes can improve performance.


Explain in detail how BASH determines what to execute when you run a simple command, such as find . -name '*.py'.

When you execute a simple command—the smallest executable unit defined by BASH—the first word is the command to be executed.

Commands can be

  1. the relative or absolute path to an executable file
  2. a function
  3. a builtin
  4. the name of an executable file that exists in one of the directories specified in the PATH variable.

Commands are resolved in this order. Note that BASH determines if a command is a relative or absolute path based on the presence of a “/” in the word. Thus, if you want to run an executable in the current directory, you must run ./myexecutable instead of myexecutable, as the latter form will resolve as an executable on the PATH.

So what is the PATH variable?

A little background first. It is very convenient to be able to run commands in the same manner regardless of the current working directory, because it is annoying and fragile to have to update relative paths when you move a script’s location, and absolute paths are very long and system specific. For this reason Linux provides a mechanism to search through a standard set of directories when finding an executable (e.g. “/bin”, “/sbin”, “/usr/bin”) [1]. The PATH environment variable is a colon-separated list of these standard directories. When searching for the executable, BASH will look will look through each directory in the path for an executable with the command name, stopping at the first match. Thus, the entries at the beginning of the PATH environment variable have higher precedence.

Note that BASH uses a cache to speed up how quickly it finds commands; this allows it to avoid performing a full PATH search every time a command is invoked. You can print out this cache using the builtin hash function.

[1]: Although this mechanism is built into Linux (see the man page for “execvp”), it is also built into most shells.


Briefly explain how BASH uses the PS1 and PS2 variables, and why they is useful.

PS1 and PS2 are used by bash to format the prompt that is displayed to the user when running BASH interactively. There are a variety of special characters that expand to things like the date, current working directory, the host computer’s name, etc. PS1 is the standard prompt, PS2 is the prompt that is displayed if BASH needs more input to complete a command (e.g. if you hit RETURN with a hanging parenthesis).


What is readline? What is the name of the file that allows you to customize its functionality?

Readline is the library used by BASH to read input from the user in interactive mode. Readline provides lots of shortcuts (Emacs style by default, but you can turn on Vi style as well) and functionality for inputting text. It can be customized using the .inputrc file.

One great advantage to using readline for capturing text, is that users can apply consistent customizations to many different interactive tools. For example, if you define a readline shortcut in your .inputrc file, it will work when running BASH, Python, and MySQL because all of these tools use readline.


What is the difference between & and &&, | and ||?

The && and || operators are similar to short-circuiting logic operators in most programming languages, however because the operands are commands, they act on whether the command was successful or not. For example, if you run:

a && b

b will only run if a was successful (the exit status of a was zero). Similarly,

c || d

will only execute d if c was not successful (its exit status of c was non-zero).

The | operator is used to pipe the standard output of one command into the standard input of the next command.

The & operator indicates that the previous command (or pipeline of commands) should be run in the background.

Thus, && and & are not related and || and | are not related.


Is there a simple way to exclude a command from your BASH history (e.g. because you don’t want a password in your history file)?

Yes. Assuming the HISTCONTROL variable is set appropriately, then prepending commands with a space will exclude them from your bash history.


Explain what the following exit statuses mean to BASH: 0, 3, 103, 203, and 303.

An exit status of

  • 0 indicates success
  • 3 and 103 both mean the process failed, although details as to why it failed are program-specific
  • 203 means the process was stopped due to signal 75, although as of now, there is no meaning assigned to signal 75 (see manual for “signal”)
  • 303 is not a valid exit status.


What is wrong with the command git commit -m "Make `SomeClass` auditable", and what is a simple way to fix it?

Backticks will trigger command substitution, attempting to replace the text “SomeClass” with the output of running it as a command. To avoid this type of expansion, single quotes should be used. Single quotes and double quotes are both useful to avoid the need to escape special characters (such as spaces), however single quotes will also disable expansions (such as command substitution).


Briefly describe what a signal and a trap is, and what the relevant BASH commands are to manipulate them.

A signal is a form of inter-process-communication wherein one process can send a signal to another process. Signals contain no informational content besides a single number (usually between 1 and 128). A trap is a function that is executed when a process receives a specific signal. The bash kill command is used to send signals. The bash trap command is used to set traps.


How many processes are created by BASH when you execute: a | b | c && d || e | f? Explain your reasoning in detail.

It is not possible to know a-priori because

  1. We don’t know if any of the commands are functions
  2. We don’t know the exit status of the various commands.

If any of the commands were functions they could create zero or more processes during their execution.

Assuming that all of the commands are actual executables, then the number of processes created by BASH will depend on the exit status of the various commands. It is useful to group them by precedence, as such:

{ a | b | c; } && { d; } || { e | f; }
  • If c fails, only 3 processes will be created
  • If c and d succeed 4 processes will be created
  • If c succeeds and d fails, 6 processes will be created.

Note a, b failing will not cause the first pipeline to fail unless the pipefail option is enabled (which often is not a bad idea if you don’t exit any of the commands in the pipe to fail silently).


What does set -e do, and why is it often a good idea to include in BASH scripts?

set -e will cause BASH to exit immediately if a command (or a pipeline, or list of commands, or a compound command) exits with a non-zero status code.

This is almost always a good shell option to set in BASH scripts, because it ensures that errors are explicitly handled. If this option is not set, it is exceedingly easy for errors to go unnoticed. For example, imagine the simple script:

  a
  b
  c

Usually a c will depend on a or b executing properly, however unless BASH is set to exit immediately, errors in b or a will not prevent c from running. The script could be altered as follows:

  a && \
  b && \
  c

However it is not usually so straightforward to watch for exit statuses, and, it is just safer and more explicit to just set -e at the top of the script.


Describe the purpose of BASH’s expansion rules. List all the kinds of expansion BASH performs before executing a command, in order of how they are performed, and provide an example demonstrating the text before and after the expansion.

BASH is designed to be terse, so that command-line users who often need to type the same commands over many times, don’t need to type more characters than is necessary. In addition to providing a terse syntax that is especially well-suited for managing processes, BASH also provides several textual expansions that can greatly reduce the amount of typing necessary to execute certain commands. An expansion is a rule that transforms input text into other (usually longer text)—hence the name expansion.

The shell (e.g. BASH) is an appropriate place to provide text expansions, because if they were handled at the application layer, they would likely be subtly if not completely different between applications, and would be confusing for users.

There are certainly downsides to expansion. If you are unaware that expansion will occur on a given input, your input may be expanded into something undesirable. For this reason, it is good to know the different expansions that are performed by BASH so that you can escape expansions you don’t want it to perform.

Note that most shells have similar or identical expansions (e.g. zsh or fish).

In the order in which the expansions are performed by BASH, they are

  1. Alias Expansion

    If you have read the BASH man page, you may notice that it does not consider the “alias” system to be a type of expansion, however as far as we can tell, it is like all the other expansions in that it is typically used as a shortcut and it replaces text with (usually) more text.

    alias gc git commit
    gc -m "Add an alias for committing code"
    
  2. Brace Expansion

    Brace expansion is extremely useful because frequently we are passing in two variations on a long filename into a command.

    mv hello{,_kitty}.txt
    mv hello.txt hello_kitty.txt
    
    touch test_file_{000..100}.txt
    touch test_file_000 test_file_001 # ...
    
  3. Tilde Expansion

    cd ~/Desktop
    cd /home/david/Desktop
    
  4. Parameter and Variable Expansion

    world='New York'
    
    echo "Hello ${world}"
    echo "Hello New York"
    
  5. Command Substitution

    There are two forms of command substitution–wherein the standard output of a command expands to replace its invocation–the former syntax is preferred.

    . $(brew --prefix)/etc/bash_completion
    . /usr/local/etc/bash_completion
    
    echo "I am running `uname`"
    echo "I am running Darwin"
    
  6. Arithmetic Expansion

    Bash is not well-suited for general purpose programming, never-the-less occasionally it is useful to perform some simple arithmetic.

    echo "2 + 2 = $((2 + 2))"
    echo "2 + 2 = 4"
    
  7. Process Substitution

    Not to be confused with command substitution, process substitution is useful when you are dealing with a program that can only take a file as an input or output. Process substitution creates a named pipe (sort of like a “fake temporary file”) which can be passed to the program.

    diff <(ls ./today) <(ls ./yesterday)
    diff /dev/fd/63 /dev/fd/62
    

    Note that the specific names used for the output of the two commands will vary each time. As an added bonus, the named pipe is destroyed once it is no longer being used.

  8. Word Splitting

    Word splitting isn’t useful in its own right, but rather is a sort of cleanup to make parameter expansion, command substitution, and arithmetic expansion act as expected. You can read more about word splitting here.

  9. Pathname Expansion

    Probably the most commonly used expansion is pathname expansion. Often times you want to run a command against all files in a directory, or all files whose filename match a certain pattern. Pathname expansion lets us avoid writing out each filename specifically. Note once again how convenient it is that this functionality is present at the shell layer instead of the application layer.

    touch a.py b.py c.yml ab.py
    
    wc *.py
    wc a.py b.py ab.py
    
    wc *b*.py
    wc b.py ab.py
    
  10. Job Specification

    Different jobs can run simultaneously in a single shell environment. The shell provides a user interface to one foreground job, while background jobs don’t have an interface. &, as seen above, runs a command in the background. You may not see it again until it finishes, when BASH politely informs you. However, if a background job opens stdin for input, BASH will suspend it until you deal with that program’s interface.

    sleep 45 &
    [1]
    

    You’ll get your prompt back as sleep 45 does important work behind the scenes.

    %1 will expand to specify the ong-running job. We only care if sleep exits unsuccessfully, so the following syntax will wait for sleep, blocking the terminal:

    %1 || echo 'sleep failed!'
    

    By default, this expansion is not allowed in non-interactive mode; it must be explicitly enabled with set -m . This is non-standard, but unrelated to /Unofficial Bash Strict Mode/.

NOTE: Expansions 4-7 occur at the same preference level, and are evaluated from left-to-right in a single pass.







Was this article interesting?

We publish technical articles and coding case studies about topics we run into in the field. Follow us on Twitter or subscribe to our email list:



Back to All Articles