Refactoring a codebase means changing its internal structure without altering its observable behaviour. Refactoring is an essential tool for keeping an evolving codebase maintainable. This article is a commentary on a book chapter about refactoring code—Chapter 24 of Code Complete.
Intro 🔗
The advice in Code Complete has improved the quality of the code I write. I have referred back to it often, and each time I leave with useful insight.
The subtitle on the cover reads “A practical handbook of software construction”. The term “software construction” is an analogy to physical building construction, and it is used to distinguish the topic of the book from the other stages of software development. Implicit in this distinction is the concept that software development passes through discrete stages. This implicit assumption forebodes my main qualm with the book: it was written at a time when the “waterfall model” of software was mainstream, and often its advice is made with the waterfall model in mind. By “waterfall model”, I mean the conception that software goes through a series of discrete stages on its path from conception to deployment.
I don’t say this to discourage you from reading the book, however! Even in our “agile world” it is certainly worth the read! In fact, you will see that Steve McConnell was ahead of the times in that he was well aware of the waterfall method’s shortcomings.
My second, smaller qualm about the book is that its advice is focused on languages like C++ and Java, but doesn’t address modern dynamic languages like Python and Javascript.
Summary of the material 🔗
The chapter begins with some advice that is now so widely circulated that it almost seems not to be worth saying:
Myth: a well-managed software project conducts methodical requirements development and defines a stable list of the program’s responsibilities. Design follows requirements, and it is done carefully so that coding can proceed linearly, from start to finish, implying that most of the code can be written once, tested, and forgotten. According to the myth, the only time that the code is significantly modified is during the software-maintenance phase, something that happens only after the initial version of a system has been delivered.
Reality: code evolves substantially during its initial development. Many of the changes seen during initial coding are at least as dramatic as changes seen during maintenance.
Again, keep in mind Steve is preaching to a generation of programmers that used the waterfall method.
Another reality: modern development practices increase the potential for code changes during construction. In older life cycles, the focus—successful or not—was on avoiding code changes. More modern approaches move away from coding predictability. Current approaches are more code-centered, and over the life of a project, you can expect code to evolve more than ever.
Once you have accepted (as most everybody now does) that software will evolve over time, you now have a responsibility to consciously make the code better as it changes. This is especially difficult if you are not one of the original developers, for example if you are fixing a bug on a five-year-old codebase.
The Cardinal Rule of Software Evolution is that evolution should improve the internal quality of the program. The following sections describe how to accomplish this.
The key strategy in achieving The Cardinal Rule of Software Evolution is refactoring, which Martin Fowler defines as “a change made to the internal structure of the software to make it easier to understand and cheaper to modify without changing its observable behavior” (Martin Fowler 1999). The word “refactoring” in modern programming grew out of Larry Constantine’s original use of the word “factoring” in structured programming, which referred to decomposing a program into its constituent parts as much as possible (Yourdon and Constantine 1979).
From here, the chapter goes on to present a list of reasons to refactor and then a list of specific refactoring. Parts of these lists feel very duplicated, as often the reason to refactor and what you need to do to refactor it are very related.
Here is a subset of the lists where the top items are “reasons to refactor”, and the sub-items are “specific refactors”:
- A routine is too long
- Extract inline code into a routine
- A class has poor cohesion
- Move a routine to another class
- A class interface does not provide a consistent level of abstraction
- Convert one class into two
- A parameter list has too many parameters
- Remove any unused parameters
- Separate query operations from modification operations
- Something has a bad name
- Replace a magic number with a named constant
- Rename a variable with a clearer or more informative name
- Replace an expression with a routine
- Introduce an intermediate variable
The book chapter has many more points, and detailed advice next to each point. Here is a particularly great piece of advice.
A program contains code that seems like it might be needed someday: Programmers are notoriously bad at guessing what functionality might be needed someday. “Designing ahead” is subject to numerous predictable problems:
- Requirements for the “design ahead” code haven’t been fully developed, which means the programmer will likely guess wrong about those future requirements. The “code ahead” work will ultimately be thrown away.
- If the programmer’s guess about the future requirement is pretty close, the programmer still will not generally anticipate all the intricacies of the future requirement. These intricacies undermine the programmer’s basic design assumptions, which means the “design ahead” work will have to be thrown away.
- Future programmers who use the “design ahead” code don’t know that it was “design ahead” code, or they assume the code works better than it does. They assume that the code has been coded, tested, and reviewed to the same level as the other code. They waste a lot of time building code that uses the “design ahead” code, only to discover ultimately that the “design ahead” code doesn’t actually work.
- The additional “design ahead” code creates additional complexity, which calls for additional testing, additional defect correction, and so on. The overall effect is to slow down the project. Experts agree that the best way to prepare for future requirements is not to write speculative code; it’s to make the currently required code as clear and straightforward as possible so that future programmers will know what it does and does not do and will make their changes accordingly (Fowler 1999, Beck 2000).
Finally, the chapter ends with some general advice about how to refactor. Some of the most valuable are:
Do refactorings one at a time:;Some refactorings are more complicated than others. For all but the simplest refactorings, do the refactorings one at a time, recompiling and retesting after a refactoring before doing the next one.
I have experienced the following piece of advice’s truth first hand many times:
Programmers treat small changes casually. They don’t desk-check them, they don’t have others review them, and they sometimes don’t even run the code to verify that the fix works properly.
The moral is simple: treat simple changes as if they were complicated.
And another good one:
Sometimes code doesn’t need small changes—it needs to be tossed out so that you can start over. If you find yourself in a major refactoring session, ask yourself whether instead you should be redesigning and reimplementing that section of code from the ground up
Finally, here is one last bit of great advice I thought was particularly true:
The number of refactorings that would be beneficial to any specific program is essentially infinite. Refactoring is subject to the same law of diminishing returns as other programming activities, and the 80/20 rule applies. Spend your time on the 20 percent of the refactorings that provide 80 percent of the benefit.
Commentary 🔗
We believe that refactoring as you add new features or fix bugs is better than setting aside time to refactor in particular. We believe this for a few reasons:
- You already spent the mental energy understand the relevant code in order to do your current task, hence the cost of doing the refactor is lower.
- It will naturally help you focus on the 20% of the fixes that can make 80% of the difference. There is no reason to refactor code that is working, or that is stable—in fact, you can often introduce new bugs by doing this! By refactoring as you go, you know that the code you are refactoring is not stable, because you are working on it! Hence, it is more likely it is a useful refactor to make.
- Clients (and Managers) in our experience have a hard time understanding the importance of code quality, and thus will rarely allocate time specifically for refactoring.
We believe refactors should be small, and should be performed one at a time. Ideally, they will all be in a single commit whose commit message indicates it is a “refactor”. This informs code reviewers and people reviewing the commit history that the commit was not expected to alter the code’s behaviour. Mixing refactors along with behavioral changes in a single commit is discouraged because it makes reading commits more difficult—it may not be obvious which changes are behavioral and which are structural.
We noticed that the lists of refactors often have opposing refactors adjacent to one another. For example, “Extract route” and “Move a routine’s code inline”. We took this to mean that refactoring is a subtle art, and takes a good deal of subjective judgement to do well.