Advanced BASH Exercises

 January 01, 2016
SHARE ON

Software

BASH is the most widely-used and widely-supported shell for Linux. There are other shells that are better than BASH in various ways, but we feel that none of these other shells are better enough to warrant replacing BASH as the de-facto standard when writing user shell scripts. NetBSD and Debian use a stripped-down shell as the default for system scripts, which start with #!/bin/sh. BASH is preferred for user scripts, which start with #! /usr/bin/env bash. BASH is installed by default on almost all Unix-based operating systems, and the majority of the world’s shell scripts are written in BASH. For this reason, we suggest that all of our developers learn standard shell, starting with BASH.

BASH scripts are a domain-specific programming language that is well-suited to managing processes and files. That being said, the large number of special characters appropriated for process management, its text expansions, and its unusual syntax make BASH poorly-suited for general purpose programming. Accordingly, we think that BASH should only be used for scripts that are predominantly concerned with processes and files.

BASH is designed to be fast and convenient, and many commands must be typed frequently into the terminal. Thus, the design goals for BASH scripts (being terse and convenient) clash with those for code written in a general-purpose language (being general and readable). You can read some more discussion about this topic in our article on programming languages. It is worth keeping these design constraints in mind when learning BASH and writing scripts.

In this post, we provide a set of exercises that should help you solidify your knowledge of BASH. Note that these are NOT introductory level questions, and they assume that you are starting with a working knowledge of Linux and BASH. The questions focus on features of BASH that

  • Commonly cause confusion
  • Are very useful to know
  • Are commonly encountered when reading or writing simple BASH scripts.

Background reading 🔗

Before attempting to answer the questions, we recommend that you read the following resources:

Problems 🔗

To learn as much as possible from these exercises, write your responses before revealing the provided answers. If any exercises seem irrelevant, you can skip them and instead write a justification as to why they are unimportant. These justifications will help us improve the lesson for future employees.

Exercise 1

What is the difference between the .bashrc, .bash_profile, and .profile files?

Answer

The .bashrc file is only sourced on startup for non-login, interactive shells.

The .bash_profile file only is sourced for login, interactive shells.

The .profile file is sourced if there is no .bash_login or .profile files.

You can read about the difference between the different ways of invoking shells on this stack overflow question. Note that the exact order differs from the man page’s account; Peter Ward’s diagram is useful.

It is a best practice to have a quiet .bashrc. Programs that tunnel over an SSH connection can be brittle with unexpected output. ssh always sources .bashrc, regardless of whether it has a controlling terminal. rsync implements its protocol over an SSH shell, so rsync will receive unexpected text when the destination system’s ~/.bashrc is noisy.

It is a common practice to have a relatively bare .bash_profile which:

  1. sources the .bashrc.
  2. handles login-specific customizations, especially non-standard BASH specifics, like completion.

Typically any commands that you want to be sourced when you start non-login interactive shell, you also want to be sourced when you run a login interactive shell, so by sourcing the one from the other, you avoid code duplication.

Also note that the .profile file is used by POSIX shell (which BASH can emulate).

Exercise 2

What is the difference between an environment variable and a variable and how are they related?

Answer

An environment variable is an operating-system construct (in Linux, an array of strings that by convention follow the form “key=value”).

A variable is a shell construct. Basic POSIX shell scripts rely on subshells to manipulate variable values. BASH has extended capabilities to manipulate variable without spawning subshells.

A BASH variable stores a value, and has attributes. For example, attributes describe whether the variable is read-only, what its “type” is, whether it transforms its input to all uppercase, etc. There is also an attribute on BASH variables indicating whether they should be exported to new processes. Variables that are marked to be exported will be turned into environment variables when new processes are executed. Furthermore, all of the environment variables that are present when the shell is started, are turned into BASH variables marked for export, so the default behavior is to pass along all environment variables to child processes.

Exercise 3

What is a builtin command? Which of the following commands are builtins: cd, bash, echo, ls, exit, kill, [, [[? Why is it useful to know which commands are builtins?

Answer

A builtin command is a command that can be executed within the BASH process. cd, echo, exit, kill, and [ are builtins. Builtins override executables found in PATH resolution, so the presence of /usr/bin/echo and /usr/bin/[ are notable. which echo will indicate /usr/bin/echo is provided, but issuing echo cakewalk won’t call /usr/bin/echo. Try it, then run hash to see if that’s true. Why? Well, /usr/bin/echo and /usr/bin/[ are compatibility replacements for legacy systems. [[ is actually a syntactical construct and is not a command. You can read about the difference between [ and [[ here.

Forking a new process is a relatively expensive operation, thus shells will often re-implement common Unix commands inside itself to avoid the overhead of spawning new processes. Using builtin commands instead of separate processes can improve performance.

Exercise 4

Explain in detail how BASH determines what to execute when you run a simple command, such as find . -name '*.py'.

Answer

When you execute a simple command—the smallest executable unit defined by BASH—the first word is the command to be executed.

Commands can be

  1. the relative or absolute path to an executable file
  2. a function
  3. a builtin
  4. the name of an executable file that exists in one of the directories specified in the PATH variable.

Commands are resolved in this order. Note that BASH determines if a command is a relative or absolute path based on the presence of a “/” in the word. Thus, if you want to run an executable in the current directory, you must run ./myexecutable instead of myexecutable, as the latter form will resolve as an executable on the PATH.

So what is the PATH variable?

A little background first. It is very convenient to be able to run commands in the same manner regardless of the current working directory, because it is annoying and fragile to have to update relative paths when you move a script’s location, and absolute paths are very long and system specific. For this reason Linux provides a mechanism to search through a standard set of directories when finding an executable (e.g. “/bin”, “/sbin”, “/usr/bin”) [1]. The PATH environment variable is a colon-separated list of these standard directories. When searching for the executable, BASH will look will look through each directory in the path for an executable with the command name, stopping at the first match. Thus, the entries at the beginning of the PATH environment variable have higher precedence.

Note that BASH uses a cache to speed up how quickly it finds commands; this allows it to avoid performing a full PATH search every time a command is invoked. You can print out this cache using the builtin hash function.

[1]: Although this mechanism is built into Linux (see the man page for “execvp”), it is also built into most shells.

Exercise 5

Briefly explain how BASH uses the PS1 and PS2 variables, and why they are useful.

Answer

PS1 and PS2 are used only by BASH to format the prompt that is displayed to the user when running BASH interactively. There are a variety of special characters that expand to things like the date, current working directory, the host computer’s name, etc. PS1 is the standard prompt, PS2 is the prompt that is displayed if BASH needs more input to complete a command (e.g. if you hit RETURN with a hanging parenthesis).

If PS1 is empty, then the shell is running non-interactively.

Exercise 6

What is readline? What is the name of the file that allows you to customize its functionality?

Answer

Readline is the library used by BASH to read input from the user in interactive mode. Readline provides lots of shortcuts (Emacs style by default, but you can turn on Vi style as well) and functionality for inputting text. It can be customized using the .inputrc file.

One great advantage to using readline for capturing text, is that users can apply consistent customizations to many different interactive tools. For example, if you define a readline shortcut in your .inputrc file, it will work when running BASH, Python, and MySQL because all of these tools use readline.

Exercise 7

What is the difference between & and &&, | and ||?

Answer

The && and || operators are similar to short-circuiting logic operators in most programming languages, however because the operands are commands, they act on whether the command was successful or not. For example, if you run:

a && b

b will only run if a was successful (the exit status of a was zero). Similarly,

c || d

will only execute d if c was not successful (its exit status of c was non-zero).

The | operator is used to pipe the standard output of one command into the standard input of the next command.

The & operator indicates that the previous command (or pipeline of commands) should be run in the background.

Thus, && and & are not related and || and | are not related.

Exercise 8

Is there a simple way to exclude a command from your BASH history (e.g. because you don’t want a password in your history file)?

Answer

Yes. Assuming the HISTCONTROL variable is set appropriately, then prepending commands with a space will exclude them from your bash history.

Exercise 9

Explain what the following exit statuses mean to BASH: 0, 3, 103, 203, and 303.

Answer

An exit status of

  • 0 indicates success
  • 3 and 103 both mean the process failed, although details as to why it failed are program-specific
  • 203 means the process was stopped due to signal 75, although as of now, there is no meaning assigned to signal 75 (see manual for “signal”)
  • 303 is not a valid exit status. It would overflow to exit status 47.

Traditionally, user-facing programs used exit status numbers between 10 and 31 to indicate failure.

Exercise 10

What is wrong with the command git commit -m "Make `SomeClass` auditable", and what is a simple way to fix it?

Answer

Backticks will trigger command substitution, attempting to replace the text “SomeClass” with the output of running it as a command. To avoid this type of expansion, single quotes should be used. Single quotes and double quotes are both useful to avoid the need to escape special characters (such as spaces), however single quotes will also disable expansions (such as command substitution).

Exercise 11

Briefly describe what a signal and a trap is, and what the relevant the shell commands are to manipulate them.

Answer

A signal is a form of POSIX inter-process-communication (IPC) wherein one process can send a signal to another process. Those processes don’t need to be shell scripts. A Python process can receive a signal from a Java process. Signals contain no informational content besides a single number (usually between 1 and 128). A trap is a callback expression that is executed when a process receives a specific signal. The POSIX kill command is used to send signals. In POSIX shell, and BASH, trap command is used to set traps. In POSIX shell, a trap can only be an expression. Functions are BASH-specific, and trap can be a function.

Exercise 12

How many processes are created by the shell when you execute: a | b | c && d || e | f? Explain your reasoning in detail.

Answer

It is not possible to know a-priori because

  1. We don’t know if any of the commands are functions
  2. We don’t know the exit status of the various commands.

If any of the commands were functions they could create zero or more processes during their execution.

Assuming that all of the commands are actual executables, then the number of processes created by BASH will depend on the exit status of the various commands. It is useful to group them by precedence, as such:

{ a | b | c; } && { d; } || { e | f; }
  • If c fails, only 3 processes will be created
  • If c and d succeed 4 processes will be created
  • If c succeeds and d fails, 6 processes will be created.

Note a, b failing will not cause the first pipeline to fail unless the pipefail option is enabled (which often is not a bad idea if you don’t exit any of the commands in the pipe to fail silently).

Exercise 13

What happens if you type “CTRL-Z” while waiting for the above command to complete?

Exercise 14

What happens to a background process when BASH closes?

Exercise 15

What is the difference between BASH options and shell options? Why Explain how you can view, set, and unset shell options? BASH options?

Exercise 16

What do the following shell options do: autocd, cdspell, cmdhist, globstar, histappend

Exercise 17

When is the BASH history written, and how can you manually tell it to update? (This can be useful when you want to have access to command history in another BASH session.

Exercise 18

What does set -e do, and why is it often a good idea to include in shell scripts?

Answer

set -e will cause the shell to exit immediately if a command (or a pipeline, or list of commands, or a compound command) exits with a non-zero status code.

This is almost always a good shell option to set in shell scripts, because it ensures that errors are explicitly handled. If this option is not set, it is exceedingly easy for errors to go unnoticed. For example, imagine the simple script:

  a
  b
  c

Usually a c will depend on a or b executing properly, however unless the shell is set to exit immediately, errors in b or a will not prevent c from running. The script could be altered as follows:

  a && \
  b && \
  c

However it is not usually so straightforward to watch for exit statuses, and, it is just safer and more explicit to just set -e at the top of the script.

Exercise 19

Describe the purpose of BASH’s expansion rules. List all the kinds of expansion BASH performs before executing a command, in order of how they are performed, and provide an example demonstrating the text before and after the expansion.

Answer

BASH is designed to be terse, so that command-line users who often need to type the same commands over many times, don’t need to type more characters than is necessary. In addition to providing a terse syntax that is especially well-suited for managing processes, BASH also provides several textual expansions that can greatly reduce the amount of typing necessary to execute certain commands. An expansion is a rule that transforms input text into other (usually longer text)—hence the name expansion.

The shell is an appropriate place to provide text expansions, because if they were handled at the application layer, they would likely be subtly if not completely different between applications, and would be confusing for users.

There are certainly downsides to expansion. If you are unaware that expansion will occur on a given input, your input may be expanded into something undesirable. For this reason, it is good to know the different expansions that are performed by BASH so that you can escape expansions you don’t want it to perform.

Note that most shells have similar or identical expansions (e.g. zsh or fish).

In the order in which the expansions are performed by BASH, they are

  1. Alias Expansion

    Aliases are shell-dependent, not part of the POSIX specification. So, BASH behavior is a matter of opinion and may differ from the opinions of other shells. In sh compatibility mode, aliases are an error.

    After variable value specification and redirect token detection, if the first word is an alias, then it’s substituted in place. Under the hood, the order matters because an alias may include a space. That space needs to be tokenized to separate the command from further arguments. Consequently, aliases to paths that contain a space don’t work as expected. Alternative shells reproduce this behaviour to follow BASH.

    alias gc='git commit'
    gc -m "Add an alias for committing code"
    
  2. Brace Expansion

    Brace expansion is extremely useful because frequently we are passing in two variations on a long filename into a command.

    mv hello{,_kitty}.txt
    mv hello.txt hello_kitty.txt
    
    touch test_file_{000..100}.txt
    touch test_file_000 test_file_001 # ...
    
  3. Tilde Expansion

    cd ~/Desktop
    cd /home/david/Desktop
    
  4. Parameter and Variable Expansion

    world='New York'
    
    echo "Hello ${world}"
    echo "Hello New York"
    
    Note that parameter substitution can have the side effect of changing values
    
    echo ${I_was_unset='not anymore'}
    
  5. Command Substitution

    There are two forms of command substitution–wherein the standard output of a command expands to replace its invocation.

    $( the better way )
    `backticks`
    
    The former syntax is preferred. For example, $( printf '% 30s' $( date ) ) returns a 30-character padded date. `printf '% 30s' `date`` does not.
    
    . $(brew --prefix)/etc/bash_completion
    . /usr/local/etc/bash_completion
    
    echo "I am running `uname`"
    echo "I am running Darwin"
    
  6. Arithmetic Expansion

    Arithmetic expansion in BASH is confined to specific double-paren notation. It isn’t even supported in the older POSIX shell. Moreover, bash can only operate on integers. Contrast this with general purpose languages, which emphasize the ease of numerical operations – this is why BASH is avoided for general purpose use.

    echo "2 + 2 = $((2 + 2))"
    echo "2 + 2 = 4"
    

    Inside arithmetic substitution, variables do not need to be referenced with $ declare -i foo # foo is an integer, and presumed to be 0 echo “foo used to be $((foo++))” foo now has the integer value 1

  7. Process Substitution

    This is specific to BASH, and not found in POSIX. Not to be confused with command substitution, process substitution is useful when you are dealing with a program that can only take a file as an input or output. Process substitution creates a named pipe (sort of like a “fake temporary file”) which can be passed to the program.

    diff <(ls ./today) <(ls ./yesterday)
    diff /dev/fd/63 /dev/fd/62
    

    Note that the specific names used for the output of the two commands will vary each time. As an added bonus, the named pipe is destroyed once it is no longer being used.

  8. Word Splitting

    Word splitting isn’t useful in its own right, but rather is a sort of cleanup to make parameter expansion, command substitution, and arithmetic expansion act as expected. You can read more about word splitting here.

  9. Pathname Expansion

    Probably the most commonly used expansion is pathname expansion. Often times you want to run a command against all files in a directory, or all files whose filename match a certain pattern. Pathname expansion lets us avoid writing out each filename specifically. Note once again how convenient it is that this functionality is present at the shell layer instead of the application layer.

    touch a.py b.py c.yml ab.py
    
    wc *.py
    wc a.py b.py ab.py
    
    wc *b*.py
    wc b.py ab.py
    
  10. Job Specification

    Different jobs can run simultaneously in a single shell environment, as shown by jobs. The shell provides a user interface to one foreground job, while background jobs don’t have an interface. &, as seen above, runs a command in the background. You may not see it again until it finishes, when BASH politely informs you. However, if a background job opens stdin for input, BASH will suspend it until you deal with that program’s interface.

    sleep 45 &
    [1]
    

    You’ll get your prompt back as sleep 45 does important work behind the scenes.

    %1 will expand to specify the long-running job. We only care if sleep exits unsuccessfully, so the following syntax will wait for sleep, blocking the terminal:

    %1 || echo 'sleep failed!'
    

    By default, this expansion is not allowed in non-interactive mode; it can be enabled with set -m . This is non-standard and rarely useful. See the commands disown and wait for scripting.

NOTE: Expansions 4-7 occur at the same preference level, and are evaluated from left-to-right in a single pass.

SHARE ON
×

Get To Market Faster

Monthly Medtech Insider Insights

Our monthly Medtech tips will help you get safe and effective Medtech software on the market faster. We cover regulatory process, AI/ML, software, cybersecurity, interoperability and more.