BASH is the most widely-used and widely-supported shell for Linux. There are other shells that are better than BASH in various ways, but we feel that none of these other shells are better enough to warrant replacing BASH as the de-facto standard when writing user shell scripts. NetBSD and Debian use a stripped-down shell as the default for system scripts, which start with #!/bin/sh. BASH is preferred for user scripts, which start with #! /usr/bin/env bash. BASH is installed by default on almost all Unix-based operating systems, and the majority of the world’s shell scripts are written in BASH. For this reason, we suggest that all of our developers learn standard shell, starting with BASH.
BASH scripts are a domain-specific programming language that is well-suited to managing processes and files. That being said, the large number of special characters appropriated for process management, its text expansions, and its unusual syntax make BASH poorly-suited for general purpose programming. Accordingly, we think that BASH should only be used for scripts that are predominantly concerned with processes and files.
BASH is designed to be fast and convenient, and many commands must be typed frequently into the terminal. Thus, the design goals for BASH scripts (being terse and convenient) clash with those for code written in a general-purpose language (being general and readable). You can read some more discussion about this topic in our article on programming languages. It is worth keeping these design constraints in mind when learning BASH and writing scripts.
In this post, we provide a set of exercises that should help you solidify your knowledge of BASH. Note that these are NOT introductory level questions, and they assume that you are starting with a working knowledge of Linux and BASH. The questions focus on features of BASH that
Commonly cause confusion
Are very useful to know
Are commonly encountered when reading or writing simple BASH scripts.
To learn as much as possible from these exercises, write your responses before revealing the provided answers. If any exercises seem irrelevant, you can skip them and instead write a justification as to why they are unimportant. These justifications will help us improve the lesson for future employees.
Exercise 1
What is the difference between the .bashrc, .bash_profile, and .profile files?
Answer
The .bashrc file is only sourced on startup for non-login, interactive shells.
The .bash_profile file only is sourced for login, interactive shells.
The .profile file is sourced if there is no .bash_login or .profile files.
You can read about the difference between the different ways of invoking shells on this stack overflow question. Note that the exact order differs from the man page’s account; Peter Ward’s diagram is useful.
It is a best practice to have a quiet .bashrc. Programs that tunnel over an SSH connection can be brittle with unexpected output. ssh always sources .bashrc, regardless of whether it has a controlling terminal. rsync implements its protocol over an SSH shell, so rsync will receive unexpected text when the destination system’s ~/.bashrc is noisy.
It is a common practice to have a relatively bare .bash_profile which:
sources the .bashrc.
handles login-specific customizations, especially non-standard BASH specifics, like completion.
Typically any commands that you want to be sourced when you start non-login interactive shell, you also want to be sourced when you run a login interactive shell, so by sourcing the one from the other, you avoid code duplication.
Also note that the .profile file is used by POSIX shell (which BASH can emulate).
Exercise 2
What is the difference between an environment variable and a variable and how are they related?
Answer
An environment variable is an operating-system construct (in Linux, an array of strings that by convention follow the form “key=value”).
A variable is a shell construct. Basic POSIX shell scripts rely on subshells to manipulate variable values. BASH has extended capabilities to manipulate variable without spawning subshells.
A BASH variable stores a value, and has attributes. For example, attributes describe whether the variable is read-only, what its “type” is, whether it transforms its input to all uppercase, etc. There is also an attribute on BASH variables indicating whether they should be exported to new processes. Variables that are marked to be exported will be turned into environment variables when new processes are executed. Furthermore, all of the environment variables that are present when the shell is started, are turned into BASH variables marked for export, so the default behavior is to pass along all environment variables to child processes.
Exercise 3
What is a builtin command? Which of the following commands are builtins: cd, bash, echo, ls, exit, kill, [, [[? Why is it useful to know which commands are builtins?
Answer
A builtin command is a command that can be executed within the BASH process. cd, echo, exit, kill, and [ are builtins. Builtins override executables found in PATH resolution, so the presence of /usr/bin/echo and /usr/bin/[ are notable. which echo will indicate /usr/bin/echo is provided, but issuing echo cakewalk won’t call /usr/bin/echo. Try it, then run hash to see if that’s true. Why? Well, /usr/bin/echo and /usr/bin/[ are compatibility replacements for legacy systems.
[[ is actually a syntactical construct and is not a command. You can read about the difference between [ and [[ here.
Forking a new process is a relatively expensive operation, thus shells will often re-implement common Unix commands inside itself to avoid the overhead of spawning new processes. Using builtin commands instead of separate processes can improve performance.
Exercise 4
Explain in detail how BASH determines what to execute when you run a simple command, such as find . -name '*.py'.
Answer
When you execute a simple command—the smallest executable unit defined by BASH—the first word is the command to be executed.
Commands can be
the relative or absolute path to an executable file
a function
a builtin
the name of an executable file that exists in one of the directories specified in the PATH variable.
Commands are resolved in this order. Note that BASH determines if a command is a relative or absolute path based on the presence of a “/” in the word. Thus, if you want to run an executable in the current directory, you must run ./myexecutable instead of myexecutable, as the latter form will resolve as an executable on the PATH.
So what is the PATH variable?
A little background first. It is very convenient to be able to run commands in the same manner regardless of the current working directory, because it is annoying and fragile to have to update relative paths when you move a script’s location, and absolute paths are very long and system specific. For this reason Linux provides a mechanism to search through a standard set of directories when finding an executable (e.g. “/bin”, “/sbin”, “/usr/bin”) [1]. The PATH environment variable is a colon-separated list of these standard directories. When searching for the executable, BASH will look will look through each directory in the path for an executable with the command name, stopping at the first match. Thus, the entries at the beginning of the PATH environment variable have higher precedence.
Note that BASH uses a cache to speed up how quickly it finds commands; this allows it to avoid performing a full PATH search every time a command is invoked. You can print out this cache using the builtin hash function.
[1]: Although this mechanism is built into Linux (see the man page for “execvp”), it is also built into most shells.
Exercise 5
Briefly explain how BASH uses the PS1 and PS2 variables, and why they are useful.
Answer
PS1 and PS2 are used only by BASH to format the prompt that is displayed to the user when running BASH interactively. There are a variety of special characters that expand to things like the date, current working directory, the host computer’s name, etc. PS1 is the standard prompt, PS2 is the prompt that is displayed if BASH needs more input to complete a command (e.g. if you hit RETURN with a hanging parenthesis).
If PS1 is empty, then the shell is running non-interactively.
Exercise 6
What is readline? What is the name of the file that allows you to customize its functionality?
Answer
Readline is the library used by BASH to read input from the user in interactive mode. Readline provides lots of shortcuts (Emacs style by default, but you can turn on Vi style as well) and functionality for inputting text. It can be customized using the .inputrc file.
One great advantage to using readline for capturing text, is that users can apply consistent customizations to many different interactive tools. For example, if you define a readline shortcut in your .inputrc file, it will work when running BASH, Python, and MySQL because all of these tools use readline.
Exercise 7
What is the difference between & and &&, | and ||?
Answer
The && and || operators are similar to short-circuiting logic operators in most programming languages, however because the operands are commands, they act on whether the command was successful or not. For example, if you run:
a && b
b will only run if a was successful (the exit status of a was zero). Similarly,
c || d
will only execute d if c was not successful (its exit status of c was non-zero).
The | operator is used to pipe the standard output of one command into the standard input of the next command.
The & operator indicates that the previous command (or pipeline of commands) should be run in the background.
Thus, && and & are not related and || and | are not related.
Exercise 8
Is there a simple way to exclude a command from your BASH history (e.g. because you don’t want a password in your history file)?
Answer
Yes. Assuming the HISTCONTROL variable is set appropriately, then prepending commands with a space will exclude them from your bash history.
Exercise 9
Explain what the following exit statuses mean to BASH: 0, 3, 103, 203, and 303.
Answer
An exit status of
0 indicates success
3 and 103 both mean the process failed, although details as to why it failed are program-specific
203 means the process was stopped due to signal 75, although as of now, there is no meaning assigned to signal 75 (see manual for “signal”)
303 is not a valid exit status. It would overflow to exit status 47.
Traditionally, user-facing programs used exit status numbers between 10 and 31 to indicate failure.
Exercise 10
What is wrong with the command git commit -m "Make `SomeClass` auditable", and what is a simple way to fix it?
Answer
Backticks will trigger command substitution, attempting to replace the text “SomeClass” with the output of running it as a command. To avoid this type of expansion, single quotes should be used. Single quotes and double quotes are both useful to avoid the need to escape special characters (such as spaces), however single quotes will also disable expansions (such as command substitution).
Exercise 11
Briefly describe what a signal and a trap is, and what the relevant the shell commands are to manipulate them.
Answer
A signal is a form of POSIX inter-process-communication (IPC) wherein one process can send a signal to another process. Those processes don’t need to be shell scripts. A Python process can receive a signal from a Java process. Signals contain no informational content besides a single number (usually between 1 and 128). A trap is a callback expression that is executed when a process receives a specific signal. The POSIX kill command is used to send signals. In POSIX shell, and BASH, trap command is used to set traps. In POSIX shell, a trap can only be an expression. Functions are BASH-specific, and trap can be a function.
Exercise 12
How many processes are created by the shell when you execute: a | b | c && d || e | f? Explain your reasoning in detail.
Answer
It is not possible to know a-priori because
We don’t know if any of the commands are functions
We don’t know the exit status of the various commands.
If any of the commands were functions they could create zero or more processes during their execution.
Assuming that all of the commands are actual executables, then the number of processes created by BASH will depend on the exit status of the various commands. It is useful to group them by precedence, as such:
{ a | b | c; } && { d; } || { e | f; }
If c fails, only 3 processes will be created
If c and d succeed 4 processes will be created
If c succeeds and d fails, 6 processes will be created.
Note a, b failing will not cause the first pipeline to fail unless the pipefail option is enabled (which often is not a bad idea if you don’t exit any of the commands in the pipe to fail silently).
Exercise 13
What happens if you type “CTRL-Z” while waiting for the above command to complete?
Exercise 14
What happens to a background process when BASH closes?
Exercise 15
What is the difference between BASH options and shell options? Why Explain how you can view, set, and unset shell options? BASH options?
Exercise 16
What do the following shell options do: autocd, cdspell, cmdhist, globstar, histappend
Exercise 17
When is the BASH history written, and how can you manually tell it to update? (This can be useful when you want to have access to command history in another BASH session.
Exercise 18
What does set -e do, and why is it often a good idea to include in shell scripts?
Answer
set -e will cause the shell to exit immediately if a command (or a pipeline, or list of commands, or a compound command) exits with a non-zero status code.
This is almost always a good shell option to set in shell scripts, because it ensures that errors are explicitly handled. If this option is not set, it is exceedingly easy for errors to go unnoticed. For example, imagine the simple script:
a
b
c
Usually a c will depend on a or b executing properly, however unless the shell is set to exit immediately, errors in b or a will not prevent c from running. The script could be altered as follows:
a && \
b && \
c
However it is not usually so straightforward to watch for exit statuses, and, it is just safer and more explicit to just set -e at the top of the script.
Exercise 19
Describe the purpose of BASH’s expansion rules. List all the kinds of expansion BASH performs before executing a command, in order of how they are performed, and provide an example demonstrating the text before and after the expansion.
Answer
BASH is designed to be terse, so that command-line users who often need to type the same commands over many times, don’t need to type more characters than is necessary. In addition to providing a terse syntax that is especially well-suited for managing processes, BASH also provides several textual expansions that can greatly reduce the amount of typing necessary to execute certain commands. An expansion is a rule that transforms input text into other (usually longer text)—hence the name expansion.
The shell is an appropriate place to provide text expansions, because if they were handled at the application layer, they would likely be subtly if not completely different between applications, and would be confusing for users.
There are certainly downsides to expansion. If you are unaware that expansion will occur on a given input, your input may be expanded into something undesirable. For this reason, it is good to know the different expansions that are performed by BASH so that you can escape expansions you don’t want it to perform.
Note that most shells have similar or identical expansions (e.g. zsh or fish).
In the order in which the expansions are performed by BASH, they are
Alias Expansion
Aliases are shell-dependent, not part of the POSIX specification. So, BASH behavior is a matter of opinion and may differ from the opinions of other shells. In sh compatibility mode, aliases are an error.
After variable value specification and redirect token detection, if the first word is an alias, then it’s substituted in place. Under the hood, the order matters because an alias may include a space. That space needs to be tokenized to separate the command from further arguments. Consequently, aliases to paths that contain a space don’t work as expected. Alternative shells reproduce this behaviour to follow BASH.
alias gc='git commit'
gc -m "Add an alias for committing code"
Brace Expansion
Brace expansion is extremely useful because frequently we are passing in two variations on a long filename into a command.
world='New York'
echo "Hello ${world}"
echo "Hello New York"
Note that parameter substitution can have the side effect of changing values
echo ${I_was_unset='not anymore'}
Command Substitution
There are two forms of command substitution–wherein the standard output of a command expands to replace its invocation.
$( the better way )
`backticks`
The former syntax is preferred. For example, $( printf '% 30s' $( date ) ) returns a 30-character padded date. `printf '% 30s' `date`` does not.
. $(brew --prefix)/etc/bash_completion
. /usr/local/etc/bash_completion
echo "I am running `uname`"
echo "I am running Darwin"
Arithmetic Expansion
Arithmetic expansion in BASH is confined to specific double-paren notation. It isn’t even supported in the older POSIX shell. Moreover, bash can only operate on integers. Contrast this with general purpose languages, which emphasize the ease of numerical operations – this is why BASH is avoided for general purpose use.
echo "2 + 2 = $((2 + 2))"
echo "2 + 2 = 4"
Inside arithmetic substitution, variables do not need to be referenced with $
declare -i foo # foo is an integer, and presumed to be 0
echo “foo used to be $((foo++))”
foo now has the integer value 1
Process Substitution
This is specific to BASH, and not found in POSIX. Not to be confused with command substitution, process substitution is useful when you are dealing with a program that can only take a file as an input or output. Process substitution creates a named pipe (sort of like a “fake temporary file”) which can be passed to the program.
Note that the specific names used for the output of the two commands will vary each time. As an added bonus, the named pipe is destroyed once it is no longer being used.
Word Splitting
Word splitting isn’t useful in its own right, but rather is a sort of cleanup to make parameter expansion, command substitution, and arithmetic expansion act as expected. You can read more about word splitting here.
Pathname Expansion
Probably the most commonly used expansion is pathname expansion. Often times you want to run a command against all files in a directory, or all files whose filename match a certain pattern. Pathname expansion lets us avoid writing out each filename specifically. Note once again how convenient it is that this functionality is present at the shell layer instead of the application layer.
Different jobs can run simultaneously in a single shell environment, as shown by jobs. The shell provides a user interface to one foreground job, while background jobs don’t have an interface. &, as seen above, runs a command in the background. You may not see it again until it finishes, when BASH politely informs you. However, if a background job opens stdin for input, BASH will suspend it until you deal with that program’s interface.
sleep 45 &
[1]
You’ll get your prompt back as sleep 45 does important work behind the scenes.
%1 will expand to specify the long-running job. We only care if sleep exits unsuccessfully, so the following syntax will wait for sleep, blocking the terminal:
%1 || echo 'sleep failed!'
By default, this expansion is not allowed in non-interactive mode; it can be enabled with set -m . This is non-standard and rarely useful. See the commands disown and wait for scripting.
NOTE: Expansions 4-7 occur at the same preference level, and are evaluated from left-to-right in a single pass.
SHARE ON
×
Get To Market Faster
Medtech Insider Insights
Our Medtech tips will help you get safe and effective Medtech software on the market faster. We cover regulatory process, AI/ML, software, cybersecurity, interoperability and more.