Refactoring philosophy

Refactoring is unglamorous. Like changing a car’s oil and vacuuming its seats, refactoring is necessary to keep a codebase from becoming a broken-down disgusting mess. This post describes my philosophy of when and how to refactor. The aim is to help you identify the most impactful refactors, and complete them safely and effectively.

Why refactor?

Refactoring is an essential tactic for making adding new features easier and maintaining the reliability and performance of software by combating software rot. Software rots over time as cruft from partially implemented or obsolete features build up, previous design assumptions become invalid, user needs & dependencies, and tradeoffs change. Cruft makes it harder to understand the code when debugging or adding features. New features violating old assumptions are difficult to add because they require extensive changes.

For example the batch and streaming data processing engine I work on is littered with config fields and branches from an abandoned project to implement micro-batch support which is a data processing model between batch and streaming. Anyone reading the code has to know that this feature was never finished and mentally filter the branches making it harder to debug and modify the code. Refactoring the code by deleting the parts for the obsolete feature removes the burden of having to mentally filter it out.

Another example is the same string (e.g. “Liked Songs”) may be hardcoded dozens of places in both client and server code for a music app. When the product manager decides to change the text (e.g. “Your Likes”) it requires extensive work to synchronize updating the client and server to switch all occurrences of the text atomically. Refactoring the code to have a single place where the text is defined and to pass it to every place - including the client - makes changing the text easier because there is a guarantee all places it is displayed will be consistent and switch over to the new text at the same time.

What makes refactoring difficult?

Refactoring is more prone to breaking software than adding new features. Refactoring changes the implementation of existing features that users rely on. Any mistakes in the refactor can break the feature for users. New features in contrast start out with no users. The common practice is to guard new features with a feature flag (e.g. if <enable shiny new feature> then <code for shiny new feature>). The feature starts out disabled for all users except for the developer adding the feature. This means when there is a bug in the new feature only the developer is exposed and they can fix it before enabling the feature more broadly. Additionally the person refactoring an existing feature’s implementation may not be the original author and is likely to have incomplete knowledge of all the cases the code must handle. This makes it more likely they will break the app.

When to refactor?

The purpose of refactoring is to make adding new features or debugging problems with existing features easier. A common mistake is refactoring for the sake of making the code pretty or to use a trendy new framework. It does not make sense to refactor stable parts of the code for features that do not not need to change because the refactor introduces risk of breaking the feature without the reward of enabling any new capability. As the saying goes “if it’s not broken don’t fix it.” Engineering time is limited and usually there is more work and customer requests than the team has time for. A refactor must demonstrate its value to deserve prioritization over important work.

The time to refactor is right before implementing a new feature. The new feature provides a concrete guide of what to refactor and a justification of value to management. When adding a new feature, I split my changes into two parts chained on top of each other. The first part is a refactoring of the existing code related to the new feature and the second part is the new feature. The goal is the implementation of the new feature becomes simpler and easier to test as a result of the refactor. By writing both at once I can be confident I’m making the right refactor in the right place at the right time. Additionally it is easier for whoever is reviewing the refactor because they have the context of the following change showing how the refactor is making life easier and motivating the engineering decisions in the refactor.

The other time to refactor is to make debugging an issue easier. When a bug is hard to reproduce because there isn’t adequate information this is a sign the code should have more logging to provide the information necessary to reproduce or root cause it when it reoccurs. When it is hard to reason about a buggy piece of code, this is a sign to consider refactoring the code to be simpler and easier to reason about.

Poor performance is another reason to refactor code. For example the front end server could be requesting data from the backend or database that it doesn’t need anymore. Another example is unoptimized SQL queries that become unacceptably slow as the amount of data they process grows. A team may build a dashboard querying telemetry from a new feature to show summary statistics about the feature usage and reliability. The dashboard starts out blazingly fast because the feature is new and has only produced ~1 month of telemetry logs. After a year there is 12x more data and the dashboard is unbearably slow. This is a sign that it is time to refactor the query. An example refactor is changing what data gets written to which tables to eliminate a slow join. Again the approach is to use a concrete problem to guide the refactor to the right time and place.

How to refactor?

Refactoring should start by reviewing the existing test coverage and adding tests to fill in any gaps. Having good test coverage is essential to capturing the inevitable mistakes early and not breaking existing features for users. I separate increasing test coverage from making the refactor to increase my confidence and that of the reviewer in the correctness of the change. The first change only fills in gaps in test coverage. Because it does not change the implementation we can be confident it won’t break anything. The second change does the actual refactor. Refactors simplify the implementation of the product without changing the user-visible behavior. This means the refactor should only need to modify the implementation and not any of the tests. If you need to modify the tests that is a sign the tests are brittle and rely too heavily on access to internal implementation details rather than focusing on checking the observable public behavior. Because of the increased test coverage added before the refactor, you and the reviewer should be confident the refactor is a no-op that safely changes the implementation without changing the user visibility behavior.